xyalign.utils module¶
-
xyalign.utils.
validate_external_prog
(prog_path, prog_name)[source]¶ Checks to see if external program can be called using provided path
Parameters: prog_path: path to call program
prog_name: name of program
Returns: int
0
-
xyalign.utils.
validate_dir
(parent_dir, dir_name)[source]¶ Checks if directory exists and if not, creates it.
Parameters: parent_dir : Parent directory name
dir_name : Name of the new directory
Returns: bool
whether the directory existed
-
xyalign.utils.
check_bam_fasta_compatibility
(bam_object, fasta_object)[source]¶ Checks to see if bam and fasta sequence names and lengths are equivalent (i.e., if it is likely that the bam file was generated using the fasta in question).
Parameters: bam_object : BamFile() object
fasta_object: RefFasta() object
Returns: bool
True if sequence names and lengths match. False otherwise.
-
xyalign.utils.
check_compatibility_bam_list
(bam_obj_list)[source]¶ Checks to see if bam sequence names and lengths are equivalent (i.e., if it is likely that the bam files were generated using the same reference genome).
Parameters: bam_obj_list : list
List of bam.BamFile() objects
Returns: bool
True if sequence names and lengths match. False otherwise.
-
xyalign.utils.
merge_bed_files
(output_file, *bed_files)[source]¶ This function simply takes an output_file (full path to desired output file) and an arbitrary number of external bed files (including full path), and merges the bed files into the output_file
Parameters: output_file : str
Full path to and name of desired output bed file
*bed_files
Variable length argument list of external bed files (include full path)
Returns: str
path to output_file
-
xyalign.utils.
make_region_lists_genome_filters
(depthAndMapqDf, mapqCutoff, min_depth, max_depth)[source]¶ Filters a pandas dataframe for mapq and depth based on using all values from across the entire genome
Parameters: depthAndMapqDf : pandas dataframe
Must have ‘depth’ and ‘mapq’ columns
mapqCutoff : int
The minimum mapq for a window to be considered high quality
min_depth : float
Fraction of mean to set as minimum depth
max_depth : float
Multiple of mean to set as maximum depth
Returns: tuple
(passing dataframe, failing dataframe)
-
xyalign.utils.
make_region_lists_chromosome_filters
(depthAndMapqDf, mapqCutoff, min_depth, max_depth)[source]¶ Filters a pandas dataframe for mapq and depth based on thresholds calculated per chromosome
Parameters: depthAndMapqDf : pandas dataframe
Must have ‘depth’ and ‘mapq’ columns
mapqCutoff : int
The minimum mapq for a window to be considered high quality
min_depth : float
Fraction of mean to set as minimum depth
max_depth : float
Multiple of mean to set as maximum depth
Returns: tuple
(passing dataframe, failing dataframe)
-
xyalign.utils.
output_bed
(outBed, *regionDfs)[source]¶ Concatenate and merges dataframes into an output bed file
Parameters: outBed : str
The full path to and name of the output bed file
*regionDfs
Variable length list of dataframes to be included
Returns: int
0
-
xyalign.utils.
output_bed_no_merge
(outBed, *regionDfs)[source]¶ Concatenate dataframes into an output bed file. This will preserve all columns after the first three as well.
Parameters: outBed : str
The full path to and name of the output bed file
*regionDfs
Variable length list of dataframes to be included
Returns: int
0
-
xyalign.utils.
chromosome_wide_plot
(chrom, positions, y_value, measure_name, sampleID, output_prefix, MarkerSize, MarkerAlpha, Xlim, Ylim, x_scale=1000000)[source]¶ Plots values across a chromosome, where the x axis is the position along the chromosome and the Y axis is the value of the measure of interest.
Parameters: chrom : str
Name of the chromosome
positions : numpy array
Genomic coordinates
y_value : numpy array
The values of the measure of interest
measure_name : str
The name of the measure of interest (y axis title)
sampleID : str
The name of the sample
output_prefix : str
Full path to and prefix of desired output plot
MarkerSize : float
Size in points^2
MarkerAlpha : float
Transparency (0 to 1)
Xlim : float
Maximum X value
Ylim : float
Maximum Y value
x_scale : int
Divide all x values (including Xlim) by this value. Default is 1000000 (1MB)
Returns: int
0
-
xyalign.utils.
hist_array
(chrom, value_array, measure_name, sampleID, output_prefix)[source]¶ Plots a histogram of an array of values of interest. Intended for mapq and depth, but generalizeable. Separate function from variants.hist_read_balance because that function eliminates fixed variants, while this function will plot all values.
Parameters: chrom : str
Name of the chromosome
value_array : numpy array
Read balance values
measure_name : str
The name of the measure of interest (y axis title)
sampleID : str
Sample name or id to include in the plot title
output_prefix : str
Desired prefix (including full path) of the output files
Returns: int
0 if plotting successful, 1 otherwise.
-
xyalign.utils.
plot_depth_mapq
(window_df, output_prefix, sampleID, chrom_length, MarkerSize, MarkerAlpha, x_scale=1000000)[source]¶ Creates histograms and genome-wide plots of various metrics.
Parameters: window_df : pandas dataframe
Columns must include chrom, start, depth, and mapq (at least)
output_prefix : str
Path and prefix of output files to create
sampleID : str
Sample ID
chrom_length: int
Length of chromosome
x_scale : int
Divide all x values (including Xlim) by this value for chromosome_wide_plot. Default is 1000000 (1MB)
Returns: int
0
-
xyalign.utils.
before_after_plot
(chrom, positions, values_before, values_after, measure_name, sampleID, output_prefix, MarkerSize, MarkerAlpha, Xlim, YMin='auto', YMax='auto', x_scale=1000000, Color='black')[source]¶ Plots difference between before/after values (after minus before) across a chromosome.
Parameters: chrom : str
Name of the chromosome
positions : numpy array
Genomic coordinates
values_before : numpy array
The values of the measure of interest in the “before” condidtion
values_after : numpy array
The values of the measure of interest in the “after” condidtion
measure_name : str
The name of the measure of interest (for y-axis title)
sampleID : str
The name of the sample
output_prefix : str
Full path to and prefix of desired output plot
MarkerSize : float
Size in points^2
MarkerAlpha : float
Transparency (0 to 1)
Xlim : float
Maximum X value
YMin : str, int, or float
If “auto”, will allow matplotlib to automatically determine limit. Otherwise, will set the y axis minimum to the value provided (int or float)
YMax : str, int, or float
If “auto”, will allow matplotlib to automatically determine limit. Otherwise, will set the y axis maximum to the value provided (int or float)
x_scale : int
Divide all x values (including Xlim) by this value. Default is 1000000 (1MB)
Color : str
Color to use for points. See matplotlib documentation for acceptable options
Returns: int
0 if plotting successful, 1 otherwise