vardb.metadata_wrangling.oasis package¶
The oasis package is a collection of scripts used for cleaning data from the Oasis database at BCCA, and creating tables for loading to vardb.
Submodules¶
vardb.metadata_wrangling.oasis.demographics module¶
-
vardb.metadata_wrangling.oasis.demographics.
add_biopsy_number_column
(dataframe)¶ Add the biopsy_number column to the Demographics dataframe sorted by the biopsy_date in ascending order
Parameters: dataframe – Demographics dataframe Returns: Demographics dataframe with the biopsy_number column added to it
-
vardb.metadata_wrangling.oasis.demographics.
extract_biopsy_columns
(dataframe_columns)¶ Extract the Biopsy columns from the Clinical dataframe columns which stores the Biopsy information
Parameters: dataframe_columns – Clinical dataframe Returns: List of columns which stores the Biopsy information
-
vardb.metadata_wrangling.oasis.demographics.
extract_data
(dataframe)¶ Extract demographic data from the clinical dataframe
Parameters: dataframe – Clinical dataframe Returns: Demographics dataframe
-
vardb.metadata_wrangling.oasis.demographics.
get_demographics_data
(clinical_dataframe)¶ Work with Demographics data
Parameters: clinical_dataframe – The original clinical dataframe Returns: Validated Demographics dataframe
-
vardb.metadata_wrangling.oasis.demographics.
validate_biopsy_date_less_than_or_equal_pog_report_date
(row)¶ Iterate over each row of the dataframe to validate biopsy_date <= pog_report_date
Parameters: row – Each row of the demographics dataframe Returns: The error code string for that row
-
vardb.metadata_wrangling.oasis.demographics.
validate_blood_collection_date_less_than_or_equal_pog_report_date
(row)¶ Iterate over each row of the dataframe to validate blood_collection_date <= pog_report_date
Parameters: row – Each row of the demographics dataframe Returns: The error code string for that row
-
vardb.metadata_wrangling.oasis.demographics.
validate_consent_date_less_than_or_equal_pog_report_date
(row)¶ Iterate over each row of the dataframe to validate consent_date <= pog_report_date
Parameters: row – Each row of the demographics dataframe Returns: The error code string for that row
-
vardb.metadata_wrangling.oasis.demographics.
validate_data
(dataframe)¶ Validate Demographics data
Parameters: dataframe – Demographics dataframe Returns: Validated Demographics dataframe
-
vardb.metadata_wrangling.oasis.demographics.
validate_mandatory_columns
(row)¶ - Iterate over each row of the dataframe to validate the mandatory columns in the demographic data
- patient_id sex consent_date consent_age
Parameters: row – Each row of the demographics dataframe Returns: The error code string for that row
-
vardb.metadata_wrangling.oasis.demographics.
validate_pog_report_date_mandatory_columns
(row)¶ Iterate over each row of the dataframe to validate the mandatory columns if pog_report_date exists in the demographic data
blood_collection_date biopsy_date bx_loc_radiated prior_primary_tumour biopsy_site post_pog_activities diag_changed re_bx_prog1Parameters: row – Each row of the demographics dataframe Returns: The error code string for that row
-
vardb.metadata_wrangling.oasis.demographics.
validate_post_pog_treatment_columns_for_null
(row)¶ - Iterate over each row of the dataframe to validate when post_pog_activies is not “POG informed treatment not given”,
- then all of “post_pog_treatment_*” should be null
- patient_id post_pog_treatment_deceased post_pog_treatment_sick post_pog_treatment_decision_pt post_pog_treatment_decision_phys post_pog_treatment_na post_pog_treatment_cost post_pog_treatment_travel post_pog_treatment_unknown
Parameters: row – Each row of the demographics dataframe Returns: The error code string for that row
-
vardb.metadata_wrangling.oasis.demographics.
validate_post_pog_treatment_columns_for_ys
(row)¶ Iterate over each row of the dataframe to validate when post_pog_activies is “POG informed treatment not given”, then exactly one of the “post_pog_treatment_*” should have a “Y” and the rest should be null
post_pog_treatment_deceased post_pog_treatment_sick post_pog_treatment_decision_pt post_pog_treatment_decision_phys post_pog_treatment_na post_pog_treatment_cost post_pog_treatment_travel post_pog_treatment_unknownParameters: row – Each row of the demographics dataframe Returns: The error code string for that row
-
vardb.metadata_wrangling.oasis.demographics.
validate_re_bx_date_for_not_null
(row)¶ Iterate over each row of the dataframe to validate when re_bx_prog1, re_bx_prog2, etc is ‘Y’, then re_bx_date1, re_bx_date2, etc should not be null
Parameters: row – Each row of the demographics dataframe Returns: The error code string for that row
-
vardb.metadata_wrangling.oasis.demographics.
validate_re_bx_date_greater_than_or_equal_biopsy_date
(row)¶ Iterate over each row of the dataframe to validate re_bx_date1, re_bx_date2, etc. >= biopsy_date
Parameters: row – Each row of the demographics dataframe Returns: The error code string for that row
vardb.metadata_wrangling.oasis.diagnosis module¶
-
vardb.metadata_wrangling.oasis.diagnosis.
extract_data
(dataframe)¶ Extract diagnosis data from the clinical dataframe
Parameters: dataframe – Clinical dataframe Returns: Diagnosis dataframe
-
vardb.metadata_wrangling.oasis.diagnosis.
get_diagnosis_data
(clinical_dataframe)¶ Work with Diagnosis data
Parameters: clinical_dataframe – The original clinical dataframe Returns: Validated Diagnosis dataframe
-
vardb.metadata_wrangling.oasis.diagnosis.
reshape_diagnosis_data
(dataframe)¶ Reshapes the Diagnosis dataframe by applying pandas Wide to Long method
Parameters: dataframe – Diagnosis dataframe Returns: Reshaped Diagnosis dataframe
-
vardb.metadata_wrangling.oasis.diagnosis.
validate_data
(dataframe)¶ Validate Diagnosis data
Parameters: dataframe – Diagnosis dataframe Returns: Validated Diagnosis dataframe
-
vardb.metadata_wrangling.oasis.diagnosis.
validate_mandatory_columns
(row)¶ - Iterate over each row of the dataframe to validate the mandatory columns in the diagnosis data
- site_desc tumour_group diagnosis_date age_at_diagnosis
Parameters: row – Each row of the diagnosis dataframe Returns: The error code string for that row
vardb.metadata_wrangling.oasis.drug_map module¶
-
vardb.metadata_wrangling.oasis.drug_map.
drop_comma_separated_drugs_column
(drug_treatment_dataframe)¶ Drop the comma_separated_drugs column from the Drug Treatment dataframe
Parameters: drug_treatment_dataframe – Drug Treatment dataframe
-
vardb.metadata_wrangling.oasis.drug_map.
eliminate_duplicates_and_sort_drug_list
(drug_list)¶ Eliminate duplicate drug names Sort the drug names alphabetically
Parameters: drug_list – The list of drug names Returns: The sorted drug list
-
vardb.metadata_wrangling.oasis.drug_map.
get_longest_length_matched_token
(matched_tokens)¶ Get the token with the longest length that matched
Parameters: matched_tokens – List of matched drug tokens Returns: Matching token
-
vardb.metadata_wrangling.oasis.drug_map.
get_tokens_matching_the_drug_map
(drug_tokens)¶ Get the list of tokens that match to the drug map YAML file
Parameters: drug_tokens – Drug name tokens Returns: List of matching tokens
-
vardb.metadata_wrangling.oasis.drug_map.
insert_original_drug_string_column_before_error_column
(drug_treatment_dataframe)¶ Copy the ‘drug_list’ column and rename it to ‘original_drug_string’ Insert the ‘original_drug_string’ column before the ‘error’ column
Parameters: drug_treatment_dataframe – Drug Treatment dataframe Returns: Drug Treatment dataframe with the ‘original_drug_string’ column
-
vardb.metadata_wrangling.oasis.drug_map.
map_oasis_drug_to_ontology
(row)¶ Map Oasis drug names to Ontology
Parameters: row – Each row of the Drug Treatment Dataframe Returns: List of Ontology-mapped drugs
-
vardb.metadata_wrangling.oasis.drug_map.
map_oasis_drugs_using_drug_map
(drug_treatment_dataframe)¶ Create the Drug Map and update the Drug Treatment table
Parameters: drug_treatment_dataframe – Drug Treatment dataframe
-
vardb.metadata_wrangling.oasis.drug_map.
split_and_strip_drug_names
(drug_names)¶ Split the comma separated drug names into a list Strip out the whitespaces in the beginning and end of the drug name
Parameters: drug_names – List of drug names from Oasis Returns: Processed column cell data filtering out empty strings ‘a’,,’ b’ –> [‘a’,’b’] or Nan for empty cells
-
vardb.metadata_wrangling.oasis.drug_map.
split_drug_names_on_special_characters
(oasis_drug_name)¶ Split the drug names on special characters (E.g. ‘ ‘, ‘(‘, ‘)’) E.g. a (b) –> [‘a’, ‘b’]
Parameters: oasis_drug_name – Oasis drug name Returns: List of split drug names
-
vardb.metadata_wrangling.oasis.drug_map.
tokenize_drug_name
(oasis_drug_name)¶ Tokenize the Drug names E.g. Input: ‘a b’ Output: [‘a’, ‘b’, ‘ab’]
Parameters: oasis_drug_name – Drug name from Oasis Returns: Drug name tokens
vardb.metadata_wrangling.oasis.error_code module¶
-
vardb.metadata_wrangling.oasis.error_code.
append_error_codes
(old_error_code, new_error_code)¶ - Append the error code strings together
Parameters: - old_error_code – Old error code
- new_error_code – New error code
Returns: Appended error code strings with or without comma(s)
-
vardb.metadata_wrangling.oasis.error_code.
collect_error_codes
(error_reporting_dataframe, demographics_dataframe, diagnosis_dataframe, drug_treatment_dataframe, radiation_dataframe, diagnosis_error_dataframe, drug_treatment_error_dataframe, radiation_error_dataframe)¶ Perform operations to generate and process the error code column in the error and individual tables
Parameters: - error_reporting_dataframe – Clinical dataframe where the errors are reported
- demographics_dataframe – Demographics dataframe
- diagnosis_dataframe – Diagnosis dataframe
- drug_treatment_dataframe – Drug Treatment dataframe
- radiation_dataframe – Radiation dataframe
- diagnosis_error_dataframe – Diagnosis Error dataframe
- drug_treatment_error_dataframe – Drug Treatment Error dataframe
- radiation_error_dataframe – Radiation Error dataframe
Returns: Clinical dataframe where the error are reported
-
vardb.metadata_wrangling.oasis.error_code.
concatenate_error_codes
(row)¶ Iterate over every row of the error dataframe to concatenate the individual error codes
Parameters: row – Each row of the error code dataframe Returns: Row information
-
vardb.metadata_wrangling.oasis.error_code.
generate_error_dataframe
(dataframe, demographics_dataframe, diagnosis_error_dataframe, drug_treatment_error_dataframe, radiation_error_dataframe)¶ Generate the error code reporting dataframe
Parameters: - dataframe – Copy of the Clinical dataframe
- demographics_dataframe – Demographics error code dataframe
- diagnosis_error_dataframe – Diagnosis error code dataframe
- drug_treatment_error_dataframe – Drug Treatment error dataframe
- radiation_error_dataframe – Radiation error dataframe
Returns: Generated error code dataframe
-
vardb.metadata_wrangling.oasis.error_code.
group_and_aggregate_error_codes
(dataframe, aggregate_column)¶ Group by and aggregate the error codes from individual dataframes
Parameters: - dataframe – Input dataframe
- aggregate_column – The column on which to perform the aggregate operation
Returns: Dataframe grouped by the error codes for each patient id and reset index
-
vardb.metadata_wrangling.oasis.error_code.
rename_error_code_column_to_errors
(demographics_dataframe, diagnosis_dataframe, drug_treatment_dataframe, radiation_dataframe)¶ Rename the error code columns in the individual tables
Parameters: - demographics_dataframe – Demographics dataframe
- diagnosis_dataframe – Diagnosis dataframe
- drug_treatment_dataframe – Drug treatment dataframe
- radiation_dataframe – Radiation dataframe
-
vardb.metadata_wrangling.oasis.error_code.
replace_nan_with_empty_string
(dataframe)¶ Replace string Nan’s with empty string ‘’
Parameters: dataframe – Error dataframe Returns: Error dataframe with string Nan’s replaced with ‘’
vardb.metadata_wrangling.oasis.helpers module¶
-
vardb.metadata_wrangling.oasis.helpers.
extract_column_names_from_base_names
(dataframe_columns, base_names, pattern_match)¶ Extracts column names from base names
Parameters: - dataframe_columns – Columns of the dataframe
- base_names – Base names for that dataframe
- pattern_match – Matching pattern for that dataframe column name
Returns: List of column names with patient_id
-
vardb.metadata_wrangling.oasis.helpers.
extract_date_columns
(dataframe)¶ Extract the columns from the Clinical dataframe which stores dates
Parameters: dataframe – Clinical dataframe Returns: List of columns which stores dates
-
vardb.metadata_wrangling.oasis.helpers.
reshape_dataframe
(dataframe, stub_names, id_variable, sub_observation, separator='', suffix='\\d+')¶ Reshapes a dataframe from wide to long, drops NaN rows and resets the indices
Parameters: - dataframe – The dataframe to be reshaped
- stub_names – Column names in the reshaped dataframe
- id_variable – Column to use as id variable
- sub_observation – Column name that you wish to name your suffix in the long format.
- separator – A character indicating the separation of the variable names in the wide format, to be stripped from the names in the long format.
- suffix – A regular expression capturing the wanted suffixes.
Returns: Reshaped dataframe
vardb.metadata_wrangling.oasis.oasis module¶
-
vardb.metadata_wrangling.oasis.oasis.
main
()¶
-
vardb.metadata_wrangling.oasis.oasis.
parse_oasis_data
(oasis_file_path, output_path)¶ Parameters: - oasis_file_path – Input OASIS file path
- output_path – The folder path to store the output
vardb.metadata_wrangling.oasis.output module¶
-
vardb.metadata_wrangling.oasis.output.
dataframe_to_tsv
(dataframe, file_path, date_stamp)¶ Write the dataframe to a TSV file
Parameters: - dataframe – Input dataframe
- file_path – File path where to write it
- date_stamp – YYYYMMDD date format of the file
-
vardb.metadata_wrangling.oasis.output.
filter_non_pediatric_ids
(error_dataframe)¶ Filter out non pediatric ids from the error reporting dataframe
Parameters: error_dataframe – Error reporting dataframe Returns: Filtered dataframe without pediatric ids
-
vardb.metadata_wrangling.oasis.output.
output_to_tsv
(demographics_dataframe, diagnosis_dataframe, drug_treatment_dataframe, radiation_dataframe, error_dataframe, output_path)¶ Write the output tables to TSV files
Parameters: - demographics_dataframe – Demographics dataframe
- diagnosis_dataframe – Diagnosis dataframe
- drug_treatment_dataframe – Drug Treatment dataframe
- radiation_dataframe – Radiation dataframe
- error_dataframe – Clinical dataframe where the errors are reported
- output_path – The folder path to store the output
-
vardb.metadata_wrangling.oasis.output.
write_output_to_tsv
(demographics_dataframe, diagnosis_dataframe, drug_treatment_dataframe, radiation_dataframe, error_dataframe, output_path)¶ Convert the error code lists to strings in the individual tables
Parameters: - demographics_dataframe – Demographics dataframe
- diagnosis_dataframe – Diagnosis dataframe
- drug_treatment_dataframe – Drug treatment dataframe
- radiation_dataframe – Radiation dataframe
- error_dataframe – Error dataframe
- output_path – The folder path to store the output
vardb.metadata_wrangling.oasis.preprocess module¶
-
vardb.metadata_wrangling.oasis.preprocess.
append_patient_id_column
(dataframe)¶ Append a column named ‘patient_id’ which saves the gsc_pog_id value POG 001-GIC as POG001
Parameters: dataframe – Input dataframe Returns: Dataframe with the appended column
-
vardb.metadata_wrangling.oasis.preprocess.
clean_data
(clinical_dataframe)¶ Clean the OASIS data
Parameters: clinical_dataframe – Clinical dataframe Returns: (Clinical dataframe, Error reporting dataframe)
-
vardb.metadata_wrangling.oasis.preprocess.
compare_death_dates_to_all_dates
(row, date_columns)¶ Compare death date to the rest of the dates to validate death_date > all other dates
Parameters: - row – Each row of the Error Reporting dataframe
- date_columns – List of date columns from the Error Reporting Dataframe
Returns: Error code string for that row
-
vardb.metadata_wrangling.oasis.preprocess.
compare_tumour_group_to_pog_tumour_groups
(row, pog_tumour_group_columns)¶ Compare the tumour group in the gsc_pog_id column to pog_tumour_groups from the treatment data
Parameters: - row – Each row of the Error reporting dataframe
- pog_tumour_group_columns – List of pog_tumour_group columns from the treatment data
Returns: Error reporting dataframe with the error reported
-
vardb.metadata_wrangling.oasis.preprocess.
drop_duplicate_pogs_with_same_biopsy_dates
(dataframe)¶ Drop multiple POG biopsies with the same biopsy date. Drop all rows.
Parameters: dataframe – Returns:
-
vardb.metadata_wrangling.oasis.preprocess.
drop_duplicate_rows
(dataframe)¶ Drop duplicate rows
Parameters: dataframe – Input dataframe Returns: Dataframe without duplicate rows
-
vardb.metadata_wrangling.oasis.preprocess.
drop_empty_pog_id_rows
(dataframe)¶ Drop empty rows with only GSC POG ID
Parameters: dataframe – Input dataframe Returns: Dataframe with the dropped rows
-
vardb.metadata_wrangling.oasis.preprocess.
drop_missing_pog_id_rows
(dataframe)¶ Drop the rows with missing GSC POG IDs
Parameters: dataframe – Input dataframe Returns: New dataframe excluding the erroneous rows
-
vardb.metadata_wrangling.oasis.preprocess.
identify_empty_pog_ids
(row)¶ Iterate over each row of the dataframe to identify empty rows with only GSC POG IDs to label them as ‘Empty’
Parameters: row – Each row of the Clinical dataframe Returns: Error code string for that row
-
vardb.metadata_wrangling.oasis.preprocess.
identify_missing_pog_ids
(row)¶ Iterate over each row of the Clinical dataframe to identify missing gsc_pog_ids
Parameters: row – Each row of the Clinical dataframe Returns: Error code string for that row
-
vardb.metadata_wrangling.oasis.preprocess.
read_data
(oasis_file)¶ Read the input OASIS Excel file
Parameters: oasis_file – Input OASIS file Returns: Clinical Dataframe
-
vardb.metadata_wrangling.oasis.preprocess.
split_gsc_pog_id_and_drop_it
(dataframe)¶ Split the gsc_pog_id column into tumour_group, patient_id and pediatric_id. Drop the gsc_pog_id after that. Move the new columns as the first three columns
Parameters: dataframe – Input dataframe Returns: Dataframe with the new columns and without gsc_pog_id column
-
vardb.metadata_wrangling.oasis.preprocess.
split_strip_and_join_column_data
(column_cell_data)¶ Split the comma separated strings Strip out the whitespace Join them together as a comma-separated string
Parameters: column_cell_data – Data from the column cell Returns: Processed column cell data as a comma separated string filtering out empty strings ‘a’,,’ b’ –> ‘a’,’b’ or Nan for empty cells
-
vardb.metadata_wrangling.oasis.preprocess.
strip_1_from_date
(dataframe)¶ Strip of a trailing ‘ 1’ from the dates
Parameters: dataframe – Clinical dataframe Returns: Date formatted Clinical dataframe
-
vardb.metadata_wrangling.oasis.preprocess.
strip_whitespace_and_consecutive_commas_from_column_data
(dataframe)¶ Strips whitespaces and consecutive commas from the column data i.e. ‘a’,,’ b’ –> ‘a’,’b’
Parameters: dataframe – Clinical Dataframe Returns: Clinical Dataframe with no whitespace in the data and eliminate consecutive commas (,,)
-
vardb.metadata_wrangling.oasis.preprocess.
strip_whitespace_from_beginning_and_end
(dataframe)¶ Strips the whitespace from the data for all the columns
Parameters: dataframe – Input dataframe Returns: Dataframe sans whitespaces from the beginning and end
-
vardb.metadata_wrangling.oasis.preprocess.
uniform_date_format
(dataframe, date_columns)¶ Format all the dates to a uniform pattern YYYY-MM-DD
Parameters: - dataframe – Clinical dataframe
- date_columns – List of dataframe columns that store dates
Returns: Date formatted Clinical dataframe
-
vardb.metadata_wrangling.oasis.preprocess.
validate_data
(clinical_dataframe, error_reporting_dataframe)¶ Validate the Clinical dataframe and report errors on the Error Reporting dataframe
Parameters: - clinical_dataframe – Clinical dataframe
- error_reporting_dataframe – Error Reporting dataframe
Returns: Updated Clinical dataframe and errors reported on the Clinical dataframe
-
vardb.metadata_wrangling.oasis.preprocess.
validate_death_date
(dataframe)¶ Validate death_date > all other date columns (except pog_report_date as that can be reported anytime)
Parameters: dataframe – Error Reporting dataframe Returns: Updated Error Reporting dataframe with the appropriate error code
-
vardb.metadata_wrangling.oasis.preprocess.
validate_duplicate_pogs_with_same_biopsy_dates
(dataframe)¶ Identify multiple POG biopsies with the same biopsy date. Flag all rows with the error code.
Parameters: dataframe – Error dataframe Returns: Error Reporting Dataframe updated with multiple POG biopsies with the same biopsy dates identified
-
vardb.metadata_wrangling.oasis.preprocess.
validate_duplicate_rows
(dataframe)¶ Identify and iterate over each row of the duplicate dataframe and label them as ‘Duplicate’
Parameters: dataframe – Error Reporting Dataframe Returns: Error Reporting Dataframe updated with duplicate records identified
-
vardb.metadata_wrangling.oasis.preprocess.
validate_empty_pog_ids
(error_reporting_dataframe)¶ Identify the empty POG ID rows in the Clinical dataframe Report them in the Error Reporting dataframe Drop them from the Clinical dataframe
Parameters: error_reporting_dataframe – Error Reporting dataframe Returns: Updated Error Reporting dataframe
-
vardb.metadata_wrangling.oasis.preprocess.
validate_missing_pog_ids
(error_reporting_dataframe)¶ Identify the missing POG ID rows in the Clinical dataframe Report them in the Error Reporting dataframe Drop them from the Clinical dataframe
Parameters: error_reporting_dataframe – Error Reporting dataframe Returns: Updated Error Reporting dataframe
-
vardb.metadata_wrangling.oasis.preprocess.
validate_tumour_groups
(dataframe)¶ Validate same tumour groups in the gsc_pog_id column and treatment columns
Parameters: dataframe – Error reporting Dataframe Returns: Error reporting dataframe with the error code reported
vardb.metadata_wrangling.oasis.radiation module¶
-
vardb.metadata_wrangling.oasis.radiation.
extract_data
(dataframe)¶ Extract radiation data from the clinical dataframe
Parameters: dataframe – Clinical dataframe Returns: Radiation dataframe
-
vardb.metadata_wrangling.oasis.radiation.
get_radiation_data
(clinical_dataframe)¶ Work with Radiation data
Parameters: clinical_dataframe – The original clinical dataframe Returns: Validated Radiation dataframe
-
vardb.metadata_wrangling.oasis.radiation.
reshape_radiation_data
(dataframe)¶ Reshapes the Radiation dataframe by applying pandas Wide to Long method
Parameters: dataframe – Radiation Dataframe Returns: Reshaped Radiation Dataframe
vardb.metadata_wrangling.oasis.treatment module¶
-
vardb.metadata_wrangling.oasis.treatment.
count_non_null_treatment_type_for_treatment_groups
(pog_informed_group)¶ For each treatment group (based on patient_id) count the no of non NULL treatment_type entries
Parameters: pog_informed_group – pog_informed group for Treatment groups (based on patient_id) Returns: Count of non NULL treatment_type entries
-
vardb.metadata_wrangling.oasis.treatment.
count_pog_informed_for_treatment_groups
(pog_informed_group)¶ For each treatment group (based on patient_id) count the no of pog_informed entries
Parameters: pog_informed_group – pog_informed group for Treatment groups (based on patient_id) Returns: Count of pog_informed entries
-
vardb.metadata_wrangling.oasis.treatment.
extract_data
(dataframe)¶ Extract drug treatment data from the clinical dataframe
Parameters: dataframe – Clinical dataframe Returns: Drug Treatment dataframe
-
vardb.metadata_wrangling.oasis.treatment.
get_drug_treatment_data
(clinical_dataframe)¶ Work with Drug Treatment data
Parameters: clinical_dataframe – The original clinical dataframe Returns: Validated Drug Treatment dataframe
-
vardb.metadata_wrangling.oasis.treatment.
reshape_treatment_data
(dataframe)¶ Reshapes the Drug Treatment dataframe by applying pandas Wide to Long method
Parameters: dataframe – Drug Treatment Dataframe Returns: Reshaped Drug Treatment Dataframe
-
vardb.metadata_wrangling.oasis.treatment.
validate_best_response_should_not_be_null
(row)¶ Validate best_response should not be null for pog_informed (Y) entries
Parameters: row – Each row of the Drug Treatment dataframe Returns: The error code string for that row
-
vardb.metadata_wrangling.oasis.treatment.
validate_data
(drug_treatment_dataframe, demographics_dataframe)¶ All validations pertaining ot the Drug Treatment dataframe
Parameters: - drug_treatment_dataframe – Drug Treatment dataframe
- demographics_dataframe – Demographics dataframe supplied for cross validation with treatment table
Returns: Validated Drug Treatment dataframe
-
vardb.metadata_wrangling.oasis.treatment.
validate_demographics_post_pog_activity_categories
(pog_informed_y_dataframe, demographics_dataframe)¶ Validate when pog_informed = ‘Y’ for at least one treatment, Demographics data Post POG activities to be either ‘POG informed out of province’ ‘ST/CT therapy at BCCA’ ‘POG informed compassionate access therapy’ ‘POG informed private pay’
Parameters: - pog_informed_y_dataframe – Filtered Drug treatment dataframe with pog_informed = ‘Y’
- demographics_dataframe – Demographics dataframe
Returns: Demographics dataframe with the error code reported
-
vardb.metadata_wrangling.oasis.treatment.
validate_demographics_with_treatment
(demographics_dataframe, drug_treatment_dataframe)¶ Validation performed on a dataframe obtained by merging Demographics with Drug Treatment dataframe and reporting error_codes on the Demographics dataframe
Parameters: - demographics_dataframe – Demographics dataframe
- drug_treatment_dataframe – Drug Treatment dataframe
Returns: Demographics dataframe with the error codes reported
-
vardb.metadata_wrangling.oasis.treatment.
validate_drug_treatment_for_bcca_treatment_type
(pog_informed_y_dataframe, demographics_dataframe)¶ Validate Drug Treatment data for bcca treatment type
Parameters: - pog_informed_y_dataframe – Filtered Drug treatment dataframe with pog_informed = ‘Y’
- demographics_dataframe – Demographics dataframe
Returns: Demographics dataframe with the error code reported
-
vardb.metadata_wrangling.oasis.treatment.
validate_for_bcca_treatment_type_should_not_be_null
(row)¶ Iterate over each row of the merged dataframe to validate when demographics.post_pog_activities is ‘ST/CT therapy at BCCA’ then for at least one entry where drug_treatment.pog_informed = ‘Y’, drug_treatment.treatment_type should not be null
Parameters: row – Each row of the merged Demographics dataframe Returns: The error code string for that row
-
vardb.metadata_wrangling.oasis.treatment.
validate_for_post_pog_activities_bcca_province_compassionate_private
(row)¶ Iterate over each row of the dataframe to validate whether Post POG activities is either ‘POG informed out of province’ or ‘ST/CT therapy at BCCA’
Parameters: row – Each row of the merged Demographics dataframe Returns: The error string for that row
-
vardb.metadata_wrangling.oasis.treatment.
validate_mandatory_columns
(row)¶ - Iterate over each row of the dataframe to validate the mandatory columns in the drug treatment data
- tumour_group pog_tumour_group course_begin_on course_end_on drug_list intent treatment_time pog_informed
Parameters: row – Each row of the demographics dataframe Returns: The error code string for that row
-
vardb.metadata_wrangling.oasis.treatment.
validate_progression_documentation_is_not_null
(row)¶ - Iterate over each row of the dataframe to validate when progression_on is present then,
- progression_documentation must also be present
Parameters: row – Each row of the drug treatment dataframe Returns: The error code string for that row
-
vardb.metadata_wrangling.oasis.treatment.
validate_treatment_data
(drug_treatment_dataframe)¶ Validate Drug Treatment data
Parameters: drug_treatment_dataframe – Drug Treatment dataframe Returns: Validated Drug Treatment dataframe
-
vardb.metadata_wrangling.oasis.treatment.
validate_treatment_time_either_pre_or_post_pog_report
(row)¶ Iterate over each row of the dataframe to validate when course_begin_on date is <= demographics.pog_report_date then treatment_time should be ‘Pre POG Report’; otherwise, treatment_time should be ‘Post POG Report’
Parameters: row – Each row of the drug treatment dataframe Returns: The error code string for that row
-
vardb.metadata_wrangling.oasis.treatment.
validate_treatment_time_for_pre_or_post_pog_report
(drug_treatment_dataframe, demographics_dataframe)¶ Validate if course_begin_on date is <= demographics.pog_report_date, then treatment_time should be ‘Pre POG Report’ otherwise, treatment_time should be ‘Post POG Report’.
Parameters: - drug_treatment_dataframe – Drug treatment dataframe
- demographics_dataframe – Demographics dataframe
Returns: Drug treatment dataframe with the error code reported
-
vardb.metadata_wrangling.oasis.treatment.
validate_treatment_time_is_post_pog_report
(row)¶ Iterate over each row of the dataframe to validate when pog_informed = ‘Y’ then treatment_time= Post POG Report
Parameters: row – Each row of the drug treatment dataframe Returns: The error code string for that row
-
vardb.metadata_wrangling.oasis.treatment.
validate_treatment_with_demographics
(drug_treatment_dataframe, demographics_dataframe)¶ Validation performed on a dataframe obtained by merging Drug Treatment with Demographics dataframe and reporting error_codes on the Drug Treatment dataframe
Parameters: - drug_treatment_dataframe – Drug Treatment dataframe
- demographics_dataframe – Demographics dataframe
Returns: Drug Treatment dataframe with the error codes reported