variant module

Variant

class Variant(chrom, start_pos, end_pos, ref_allele, alt_allele, id=None, filter=None, info=NOTHING, parsed_csq=None)[source]

Bases: object

Biallelic variant.

For normal usage, consider using read_and_parse_vcf() to construct the objects from a VEP annotated VCF.

Examples

>>> variant = Variant('13', 32340300, 32340301, 'GT', 'G', id='rs80359550')
>>> variant
Variant(13:32340300GT>G info: )
>>> v.is_snp()
False
>>> v.is_sv()
False
>>> v.is_indel()
True
>>> v.is_deletion()
True

Annotate it with online VEP,

>>> v = next(Variant.read_and_parse_vcf('rs80359550.vcf'))
>>> v
Variant(13:32340300GT>G info: CSQ[4 parsed])
Parameters
  • chrom (str) –

  • start_pos (int) –

  • end_pos (int) –

  • ref_allele (str) –

  • alt_allele (str) –

  • id (Optional[str]) –

  • filter (Optional[List[str]]) –

  • info (Dict[str, Any]) –

  • parsed_csq (Optional[List[charger.csq.CSQ]]) –

Return type

None

chrom

Chromosome.

start_pos

Start position (1-based closed). Same as POS in the VCF record.

end_pos

End position (1-based closed).

ref_allele

Reference allele sequence.

alt_allele

Alternative allele sequence (currently only allow one possible allele).

id

ID in the VCF record. None when the original value is ..

filter

FILTER in the VCF record. None when the original value is PASS.

info

INFO in the VCF record.

parsed_csq

All parsed CSQ annotations of the variant as a list of CSQ objects. Use read_and_parse_vcf() to automatically parse CSQ while reading an annotated VCF.

is_snp()[source]

True if the variant is a SNP.

Return type

bool

is_sv()[source]

True if the variant ia an SV.

Return type

bool

is_indel()[source]

True if the variant ia an INDEL.

Return type

bool

is_deletion()[source]

True if the variant is a deletion.

Return type

bool

get_most_severe_csq()[source]

Get the most severe CSQ based on the consequence type.

If multiple CSQs have the same consequence type, the canonical CSQ determined by VEP will be selected.

Return type

charger.csq.CSQ

classmethod from_cyvcf2(variant)[source]

Create one Variant object based on the given cyvcf2.Variant VCF record.

Parameters

variant (cyvcf2.cyvcf2.Variant) –

Return type

charger.variant.V

classmethod get_vep_version(vcf_raw_headers)[source]

Extract the VEP version in the given VCF.

Parameters

vcf_raw_headers (List[str]) –

Return type

str

classmethod get_vep_csq_fields(vcf_raw_headers)[source]

Extract the CSQ fields VEP output in the given VCF.

Parameters

vcf_raw_headers (List[str]) –

Return type

List[str]

classmethod read_vcf(path)[source]

Read VCF record from path.

This function walks through each variant record in the given VCF using cyvcf2.VCF, and yields the record as a Variant object.

See also read_and_parse_vcf() to read and parse the VCF.

Parameters

path (pathlib.Path) – Path to the VCF.

Returns

An generator walking through all variants per record.

Return type

Generator[charger.variant.V, None, None]

classmethod read_and_parse_vcf(path)[source]

Read and parse VCF record with its VEP-annotated CSQ from path.

This function walks through each variant record in the given VCF using cyvcf2.VCF, and yields the record as a Variant object. The parsed CSQ will be stored in the generated Variant.parsed_csq.

Parameters

path (pathlib.Path) – Path to the VCF.

Returns

An generator walking through all variants per record.

Return type

Generator[charger.variant.V, None, None]

Examples

Read an annotated VCF:

>>> vcf_reader = Variant.read_and_parse_vcf('my.vcf')
>>> variant = next(vcf_reader)
>>> variant
Variant(14:45658326C>T info: CSQ[5 parsed])
>>> variants[4].parsed_csq[0]
CSQ(SYMBOL='FANCM', HGVSc='ENST00000267430.5:c.5101N>T', Consequence='stop_gained', …)

Iterate all the VCF variants records:

>>> for variant in vcf_reader:
...     print(variant.chrom, variant.parsed_csq[0]['Allele'])

GeneInheritanceMode

class GeneInheritanceMode(value)[source]

Bases: enum.Flag

All possible modes of the gene inheritance dominance.

Used by CharGerConfig.inheritance_gene_table.

AUTO_DOMINANT = 1

The gene is autosomal dominant.

AUTO_RECESSIVE = 2

The gene is autosomal recessive.

X_LINKED_DOMINANT = 4

The gene is X-linked dominant.

X_LINKED_RECESSIVE = 8

The gene is X-linked recessive.

Y_LINKED = 16

The gene is Y-linked.

classmethod parse(value)[source]

Parse the inheritance modes from the given string. Multiple modes are comma separated.

>>> m = GeneInheritanceMode.parse("autosomal dominant, autosomal recessive")
>>> m
<GeneInheritanceMode.AUTO_RECESSIVE|AUTO_DOMINANT: 3>
>>> bool(m & GeneInheritanceMode.AUTO_RECESSIVE)
True
>>> bool(m & GeneInheritanceMode.Y_LINKED)
False
>>> GeneInheritanceMode.parse("unknown") is None
True
Parameters

value (str) –

Return type

Optional[charger.variant.GeneInheritanceMode]

ClinicalSignificance

class ClinicalSignificance(value)[source]

Bases: enum.Enum

All possible clinical significance types of a variant.

PATHOGENIC = 'Pathogenic'
LIKELY_PATHOGENIC = 'Likely Pathogenic'
LIKELY_BENIGN = 'Likely Benign'
BENIGN = 'Benign'
UNCERTAIN = 'Uncertain Significance'
classmethod parse_clinvar_record(record)[source]

Determine the pathogenicity of a ClinVar record.

Parameters

record (Dict[str, str]) –

Return type

charger.variant.ClinicalSignificance

Helpers

limit_seq_display(seq, limit=5)[source]

Limit the display of the sequence.

Examples:

>>> limit_seq_display('ATATCCG')
'ATATC…'
>>> limit_seq_display('ATA')
'ATA'
>>> limit_seq_display('ATA', limit=1)
'A…'
Parameters
  • seq (str) –

  • limit (int) –

Return type

str