csq module

CSQ

class CSQ(dict=None, **kwargs)[source]

Bases: collections.UserDict

Consequence of a variant. Access each CSQ field like a dict.

The class is used to set the annotation records in a Variant object. List of CSQ per feature will be stored at Variant.parsed_csq.

Examples

>>> csq = variant.parsed_csq[0]; csq
CSQ(SYMBOL='FANCM', HGVSc='ENST00000267430.5:c.5101N>T', Consequence='stop_gained', …)
>>> list(csq.keys())[:5]
['Allele', 'Consequence', 'IMPACT', 'SYMBOL', 'Gene']
>>> list(csq.values())[:5]
['T', 'stop_gained', 'HIGH', 'FANCM', 'ENSG00000187790']
>>> csq['HGVSc']
'ENST00000267430.5:c.5101N>T'
data
REQUIRED_FIELDS: Set[str] = {'Allele', 'Amino_acids', 'BIOTYPE', 'CDS_position', 'Codons', 'Consequence', 'Existing_variation', 'Feature', 'Feature_type', 'Gene', 'HGVSc', 'HGVSp', 'Protein_position', 'STRAND', 'SYMBOL', 'cDNA_position'}

Required CSQ fields. Will raise a ValueError if any of the fields is missing when creating a new CSQ object.

property consequence_types: List[str]

Get all the consequence types separated.

rank_consequence_type()[source]

Rank the severeness of its consequence type (CSQ column Consequence).

Severe consequence type has smaller rank (smallest being 0). Ranking is based on the order in ALL_CONSEQUENCE_TYPES. When the CSQ has multiple consequence types separated by &, return the smallest rank of all the types. When the consequence type is not known, return the biggest possible rank + 1.

Return type

int

is_truncation_type()[source]

Whether the consequence type is truncation.

See ALL_TRUNCATION_TYPES for the full list of consequence types.

Return type

bool

is_inframe_type()[source]

Whether the consequence type is inframe.

See ALL_INFRAME_TYPES for the full list of consequence types.

Return type

bool

Constants and helpers

ALL_CONSEQUENCE_TYPES: List[str] = ['transcript_ablation', 'splice_acceptor_variant', 'splice_donor_variant', 'stop_gained', 'frameshift_variant', 'stop_lost', 'start_lost', 'transcript_amplification', 'inframe_insertion', 'inframe_deletion', 'missense_variant', 'protein_altering_variant', 'splice_region_variant', 'incomplete_terminal_codon_variant', 'start_retained_variant', 'stop_retained_variant', 'synonymous_variant', 'coding_sequence_variant', 'mature_miRNA_variant', '5_prime_UTR_variant', '3_prime_UTR_variant', 'non_coding_transcript_exon_variant', 'intron_variant', 'NMD_transcript_variant', 'non_coding_transcript_variant', 'upstream_gene_variant', 'downstream_gene_variant', 'TFBS_ablation', 'TFBS_amplification', 'TF_binding_site_variant', 'regulatory_region_ablation', 'regulatory_region_amplification', 'feature_elongation', 'regulatory_region_variant', 'feature_truncation', 'intergenic_variant']

All the possible consequence types fetched from Ensembl v99 (January 2020).

The consequence types here are ordered by their severeness.

ALL_TRUNCATION_TYPES: List[str] = ['transcript_ablation', 'splice_acceptor_variant', 'splice_donor_variant', 'stop_gained', 'frameshift_variant', 'start_lost']

All consequence types considered as a truncation.

ALL_INFRAME_TYPES: List[str] = ['inframe_insertion', 'inframe_deletion', 'stop_lost']

All consequence types considered as to be inframe.