vardb.metadata_wrangling package¶

This package is responsible for retrieving and parsing metadata from internal projects to produce vardb .loader files, which are used for actually loading the data to vardb.

Each MetadataCollector is responsible for assembling the metadata for a single pipeline. It makes calls to other databases through their connections class, and/or scrapes the filesystem for the data and metadata for the pipeline.
The main function in this package is make_loader, which collects the metadata, and optionally compares to a previous version to find only new and changed data for loading.

Subpackages¶

vardb.metadata_wrangling.oasis package

Submodules¶

vardb.metadata_wrangling.configuration module¶

class vardb.metadata_wrangling.configuration.Config(config=None)¶

Bases: object

evaluate(key, *args, **kwargs)¶

Evaluates function parameters, and returns the result

Parameters:	key – function key args – function positional arguments kwargs – function keyword arguments
Returns:	the return value of the function

get(key)¶

Gets the parameter associated with key in the configuration

Parameters:	key –
Returns:	value of Config key

keys()¶

Returns keys

Returns:	keys

set(key, val)¶

Sets a key in the configuration dictionary

Parameters:	key – val –

update(config)¶

Update the Config object with new data

Parameters:	config – a dictionary of key value pairs
Raises:	Value error if a function parameter is not a function recognized in locals

validate(required_keys)¶

Makes sure that all of the required keys are defined

Parameters:	required_keys –
Raises:	ValueError if some keys are not defined

vardb.metadata_wrangling.get_bam_cnvs module¶

Locates all bam_CNVs-bam file pairs for the controlfreec pipeline. This is necessary because controlfreec is not currently tracked on a database. The bam_CNVs and bam files will be used to look up metadata on BioApps and LIMS

exception vardb.metadata_wrangling.get_bam_cnvs.GetBamCNVsException¶: Bases: exceptions.Exception

vardb.metadata_wrangling.get_bam_cnvs.get_bam_cnvs(bam_cnv_pattern)¶

Locates the bam_cnvs and bam file pairs under a particular search pattern.

Returns:	A pandas dataframe containing the BioApps lookup path (originating merged bam file path), library name, output data path, pipeline, and pipeline version for a given pair of bam_cnvs and bam files.

vardb.metadata_wrangling.helpers module¶

vardb.metadata_wrangling.helpers.get_patient_identifier(df)¶

vardb.metadata_wrangling.helpers.get_pog_controlfreec_library_name(df)¶

vardb.metadata_wrangling.helpers.get_pog_gene_model(df)¶

vardb.metadata_wrangling.helpers.get_pog_id(df)¶

vardb.metadata_wrangling.loader_maker module¶

Creates loader files to be used by vardb.variant_file_loaders.load_files to load data and metadata to vardb.

vardb.metadata_wrangling.loader_maker.make_loader(output_directory, project, query, previous_metadata_file=None, debug=False)¶

Creates a loader file, which includes all records matching a project and analysis query that need to be loaded to vardb. All metadata associated with the project and query is obtained, and then compared to the same results on a previous day (from previous_metadata_file). The rows that need to be loaded included all new/changed/deleted rows in the new metadata as compared to the previous metadata. If previous_metadata_file is not specified, all rows in the current metadata are added to the loader file.

Parameters:	output_directory – Destination for new metadata and loader files project – project query – analysis to query for (e.g. vcall) previous_metadata_file – path to metadata file created on a previous day debug – True if you want to suppress errors for debugging purposes
Returns:	path to loader file (None if no modified records were found)

vardb.metadata_wrangling.locate_metadata_changes module¶

Locates changes between two (variant data) metadata DataFrames, including row changes, deletions, and additions.

vardb.metadata_wrangling.locate_metadata_changes.locate_metadata_changes(old_metadata, new_metadata)¶

Finds new/changed data to be loaded to the database

Parameters:	old_metadata – Includes all metadata found for a project and pipeline at time of last loading to vardb new_metadata – All new metadata for the same project and pipeline
Returns:	A dataframe with just the new and changed metadata, or None if no changes occured. This is used to make a loader file.

vardb.metadata_wrangling.metadata_collector module¶

MetadataCollector is a base class with common functionality for assembling, cleaning and extracting information from various database sources. A MetadataCollector subclass must be defined for each new data type. Any information that can not be obtained from databases directly can be specified by the Config object.

class vardb.metadata_wrangling.metadata_collector.ControlFreeCCollector(config, debug=False)¶

Bases: vardb.metadata_wrangling.metadata_collector.MetadataCollector

Collects metadata associated with controlfreec pipeline

data_type = 'controlfreec'¶

class vardb.metadata_wrangling.metadata_collector.ExpressionCollector(config, debug=False)¶

Bases: vardb.metadata_wrangling.metadata_collector.MetadataCollector

Collects metadata associated with the gene coverage pipeline

collect_metadata()¶

The main function of the metadata collector. This collects metadata for all analyses matching the specified project and data type as defined in the config. Data is obtained by querying BioApps, LIMS and optionally the file system. Validates metadata and returns a Metadata object.

Returns:	a Metadata object with the collected metadata
Modifies:	self.metadata

data_type = 'expression'¶

class vardb.metadata_wrangling.metadata_collector.Metadata(df=None, path=None, debug=False)¶

Bases: object

Metadata is a class for storing metadata information for loading to vardb. It takes either a dataframe or a path. It loads the data, validates it, adds default values.

difference(old_metadata)¶

Finds the difference between metadata and another MetaData object

Parameters:	old_metadata – a Metadata object to compare to
Returns:

k = 'production'¶

output_to_tsv(output_path)¶

Writes the given DataFrame to a tab-delimited file in the specified load file directory.

Parameters:	output_path – full path to destination file

class vardb.metadata_wrangling.metadata_collector.MetadataCollector(config, debug=False)¶

Bases: object

Abstract class which defines the common operations needed to collect metadata for loading to vardb.

collect_metadata()¶

The main function of the metadata collector. This collects metadata for all analyses matching the specified project and data type as defined in the config. Data is obtained by querying BioApps, LIMS and optionally the file system. Validates metadata and returns a Metadata object.

Returns:	a Metadata object with the collected metadata
Modifies:	self.metadata

data_type¶

classmethod factory(config, data_type, debug=False)¶

Returns the correct MetadataCollector subclass which corresponds to the data_type requested

Parameters:	config – a Config object which contains all required parameters to fully specify the MC class data_type – the data type that is to be collected debug – True if you want to suppress errors for debugging purposes
Returns:	MC subclass corresponding to the data type
Raises:	MetadataCollectorException if essential information is missing from config

metadata = None¶

exception vardb.metadata_wrangling.metadata_collector.MetadataCollectorException¶: Bases: exceptions.Exception

class vardb.metadata_wrangling.metadata_collector.ReviewedSomaticCNVCollector(config, debug=False)¶

Bases: vardb.metadata_wrangling.metadata_collector.MetadataCollector, vardb.metadata_wrangling.metadata_collector.TCFilter

Collects metadata associated with the reviewed somatic CNV pipeline

collect_metadata()¶

The main function of the metadata collector. This collects metadata for all analyses matching the specified project and data type as defined in the config. Data is obtained by querying BioApps, LIMS and optionally the file system. Validates metadata and returns a Metadata object.

Returns:	a Metadata object with the collected metadata
Modifies:	self.metadata

data_type = 'somatic_cnv'¶

class vardb.metadata_wrangling.metadata_collector.ReviewedSomaticLOHCollector(config, debug=False)¶

Bases: vardb.metadata_wrangling.metadata_collector.MetadataCollector, vardb.metadata_wrangling.metadata_collector.TCFilter

Collects metadata associated with the somatic LOH pipeline.

collect_metadata()¶

The main function of the metadata collector. This collects metadata for all analyses matching the specified project and data type as defined in the config. Data is obtained by querying BioApps, LIMS and optionally the file system. Validates metadata and returns a Metadata object.

Returns:	a Metadata object with the collected metadata
Modifies:	self.metadata

data_type = 'somatic_loh'¶

class vardb.metadata_wrangling.metadata_collector.SomaticSmallVariantCollector(config, debug=False)¶

Bases: vardb.metadata_wrangling.metadata_collector.MetadataCollector

Collects metadata associated with strelka and mutationseq pipelines

data_type = 'small_somatic'¶

class vardb.metadata_wrangling.metadata_collector.TCFilter¶

Bases: object

Collection of routines for filtering somatic cnv pipelines by the reviewed tumour content

filter_metadata(bioapps_df, tumour_df)¶

get_tumour_content(output_data_path)¶

Retrieves tumour content from path for somatic_cnv pipeline

Returns:	tumour content

class vardb.metadata_wrangling.metadata_collector.VCallCollector(config, debug=False)¶

Bases: vardb.metadata_wrangling.metadata_collector.MetadataCollector

Collects metadata associated with the vcall pipeline

data_type = 'vcall'¶

vardb.metadata_wrangling.metadata_collector.throw_exception(msg, debug)¶

Raises MetadataCollectorException if debug = False, logs the error message

Parameters:	(str) (msg) – error message (bool) (debug) – true if you do NOT want to actually raise the exception, false if you just want to log to file

vardb.metadata_wrangling package¶

Subpackages¶

Submodules¶

vardb.metadata_wrangling.configuration module¶

vardb.metadata_wrangling.get_bam_cnvs module¶

vardb.metadata_wrangling.helpers module¶

vardb.metadata_wrangling.loader_maker module¶

vardb.metadata_wrangling.locate_metadata_changes module¶

vardb.metadata_wrangling.metadata_collector module¶

vardb

Navigation

Related Topics