vardb.queries package¶
The queries package contains standalone command-line routines for making production queries to the database. So far, these include:
- Germline CNV query: identifies CNVs that overlap with specified genes
- Experimental records: Queries the database to obtain the effects and annotations for each variant in a test library, as well as all other libraries/patients in vcall matching the variants in the test library have been seen
Submodules¶
vardb.queries.CNV_overlap module¶
Performs a query on the controlfreec table which contains germline cnvs. The query identifies CNVs that overlap with genes specified in the GENE_FILE.
INPUTS
usage: python CNV_overlap.pyc [-h] -g GENE_FILE -p PROJECTS_FILE -ov
VARIANTS_OUTPUT_FILE -ol LIBRARIES_OUTPUT_FILE
[-db DATABASE] [-th THRESHOLD]
optional arguments:
-h, --help show this help message and exit
-db DATABASE, --database DATABASE
the database to query
-th THRESHOLD, --thresh THRESHOLD
threshold: only variants with % occurrence in database
< threshold will be reported; if option is not
selected, all variants are reported (i.e. thresh = 0)
required named arguments:
-g GENE_FILE, --genes GENE_FILE
file containing the gene coordinates
-p PROJECTS_FILE, --projects_file PROJECTS_FILE
file containing the projects to query
-ov VARIANTS_OUTPUT_FILE, --variants_output VARIANTS_OUTPUT_FILE
file name to store variants returned in the query
-ol LIBRARIES_OUTPUT_FILE, --libraries_output LIBRARIES_OUTPUT_FILE
file containing the libraries detected in the query
OUTPUTS
- VARIANTS_OUTPUT_FILE
- tsv file with the following columns:
- gene name, chromosome, cnv start position, cnv end position, copy number, fraction of libraries where the exact cnv was found, library name, path to data, project, overlap type [partial_overlap|full_overlap], analysis date
- variants are in descending order of analysis date (most recent on top)
- LIBRARIES_OUTPUT_FILE
- contains a list of all of the libraries that were queried
- one library per line, in alphabetical/numerical order
-
vardb.queries.CNV_overlap.
main
()¶ Gets command line arguments and runs query.
-
vardb.queries.CNV_overlap.
query
(*args, **kwargs)¶ Performs a query on the controlfreec table which contains germline cnvs. The query identifies CNVs that overlap with genes specified in the gene_file.
Parameters: - gene_file – list of genes of interest
- projects_list – list of projects to include
- variants_output_path – output path of overlapping variants
- libraries_output_path – output path of all libraries with an overlap
- database – database to query
- threshold – a threshold on the MAXIMUM number of variants found
Modifies: creates/overwrites output files
vardb.queries.experimental_records module¶
Queries the database to obtain
- various effects and annotations for each variant in a test library
- all libraries/patients in vcall where the the above variants have been seen
INPUTS
usage: python experimental_records.py [-h] -l LIBRARY
(-p PROJECTS [PROJECTS ...] | -pf PROJECTS_FILE | -a)
[-db DATABASE] [-e] -d OUTPUT_DIRECTORY
[--log_level {debug,info,warning,error}]
optional arguments:
-h, --help show this help message and exit
-l LIBRARY, --library LIBRARY
library name
-p PROJECTS [PROJECTS ...], --projects PROJECTS [PROJECTS ...]
projects to include in the search
-pf PROJECTS_FILE, --projects_file PROJECTS_FILE
file containing the projects to query
-a, --all search all projects
-db DATABASE, --database DATABASE
the database in which to load
-e, --exclude exclude, rather than include projects
-d OUTPUT_DIRECTORY, --output_directory OUTPUT_DIRECTORY
file name for storing the output
--log_level {debug,info,warning,error}, -log {debug,info,warning,error}
the level of logging
OUTPUTS
Two tsv files containing the variant information for synonymous and non-synonymous variants:
- OUTPUT_DIRECTORY/output.[LIBRARY].coding.exprecords.tsv
- OUTPUT_DIRECTORY/output.[LIBRARY].non_coding.exprecords.tsv
- Both files have one row per variant in the test library, and the following columns:
- variant_id
- chromosome
- position
- ref
- alt
- var_obs (the number of reads for the variant)
- total_obs (total number of reads at this position)
- heterozygosity (currently ‘Unknown’)
- aligner
- variant_caller
- consequence_type (from SnpEff)
- impact (from SnpEff)
- variation_name (dbSNP and/or COSMIC)
- darned_annotation (ch:pos:strand:ref>alt)
- diseased_patient_count (number of unique diseased patients with the variant)
- normal_patient_count (number of unique normal patients with the variant)
- aa_change
- dna_change
- other_snpeff (other effects)
- gene_id
- ensembl_id (transcript_id)
- diseases (a list of the pathology alias for matching diseased libraries)
- variant_type (SNV, INS, DEL, INDEL)
The null string is ‘.’
-
vardb.queries.experimental_records.
main
()¶ Gets command line arguments and runs query.
-
vardb.queries.experimental_records.
query
(library, output_directory, database, projects=None, projects_file=None, all=True, exclude=False)¶ Queries for unpaired (vcall) libraries which have variants that match those in a comparison library.
Parameters: - library – library name of comparison library
- output_directory – location where output files will be created
- database – database to query
- [True]] ([projects|projects_file|all) – a list of projects to either include (default) or exclude (set exclude flag to true) | a file with a list of projects as described above | include all projects in database
- exclude – [False] exclude listed projects in query, instead of including them
Returns: 0 if execution is successful, -1 otherwise
Writes: two files - [library].coding.exprecords.tsv for non-synonymous coding variants, [library].non_coding.exprecords.tsv
vardb.queries.experimental_records_external module¶
Queries the database to obtain
- Various effects and annotations for each variant in a test library found in PROJECTS_FILE. The test library is loaded to a temporary table just for this query and leaves no permanent record on the database.
- All libraries/patients in vcall where the the above variants have been seen
INPUTS
usage: python experimental_records_external.py [-h] [-l LIBRARY_FILE]
(-p PROJECTS [PROJECTS ...] | -pf PROJECTS_FILE | -a)
[-db DATABASE] [-e] -d
OUTPUT_DIRECTORY
[--log_level {debug,info,warning,error}]
optional arguments:
-h, --help show this help message and exit
-l LIBRARY_FILE, --library_file LIBRARY_FILE
file name of library
-p PROJECTS [PROJECTS ...], --projects PROJECTS [PROJECTS ...]
projects to include in the search
-pf PROJECTS_FILE, --projects_file PROJECTS_FILE
file containing the projects to query
-a, --all search all projects
-db DATABASE, --database DATABASE
the database in which to load
-e, --exclude exclude, rather than include projects
-d OUTPUT_DIRECTORY, --output_directory OUTPUT_DIRECTORY
file name for storing the output
--log_level {debug,info,warning,error}, -log {debug,info,warning,error}
the level of logging
OUTPUTS
Two tsv files containing the variant information for synonymous and non-synonymous variants:
- OUTPUT_DIRECTORY/output.[library_name].synonymous.exprecords.tsv
- OUTPUT_DIRECTORY/output.[library_name].non_synonymous.exprecords.tsv
- Both files have one row per variant in the test library, and the following columns:
- variant_id
- chromosome
- position
- ref
- alt
- var_obs (the number of reads for the variant)
- total_obs (total number of reads at this position)
- heterozygosity (currently ‘Unknown’)
- aligner
- variant_caller
- consequence_type (from SnpEff)
- variation_name (dbSNP and/or COSMIC)
- darned_annotation (ch:pos:strand:ref>alt)
- diseased_library_count (number of unique diseased libraries with the variant)
- normal_library_count (number of unique normal libraries with the variant)
- aa_change
- other_snpeff (other effects)
- flanking (flanking amino acids - currently ‘Unknown’)
- gene_id
- ensembl_id (gene_id)
- diseased_patient_count (number of diseased patients with the variant)
- normal_patient_count (number of normal patients with the variant)
- diseased_libraries (a list of all the diseased libraries where the variants were found)
- normal_libraries (a list of all the normal libraries where the variants were found)
The null string is ‘.’
-
vardb.queries.experimental_records_external.
main
()¶
-
vardb.queries.experimental_records_external.
query
(library_file, output_directory, database, projects=None, projects_file=None, all=True, exclude=False)¶ Queries for unpaired (vcall) libraries which have variants that match those in a comparison library.
Parameters: - library_file – full path to comparison library vcf file
- output_directory – location where output files will be created
- database – database to query
- [True]] ([projects|projects_file|all) – a list of projects to either include (default) or exclude (set exclude flag to true) | a file with a list of projects as described above | include all projects in database
- exclude – [False] exclude listed projects in query, instead of including them
Returns: 0 if execution is successful, -1 otherwise
Writes: two files - [library].coding.exprecords.tsv for non-synonymous coding variants, [library].non_coding.exprecords.tsv