xyalign.utils module

xyalign.utils.validate_external_prog(prog_path, prog_name)[source]

Checks to see if external program can be called using provided path

Parameters:

prog_path: path to call program

prog_name: name of program

Returns:

int

0

xyalign.utils.validate_dir(parent_dir, dir_name)[source]

Checks if directory exists and if not, creates it.

Parameters:

parent_dir : Parent directory name

dir_name : Name of the new directory

Returns:

bool

whether the directory existed

xyalign.utils.check_bam_fasta_compatibility(bam_object, fasta_object)[source]

Checks to see if bam and fasta sequence names and lengths are equivalent (i.e., if it is likely that the bam file was generated using the fasta in question).

Parameters:

bam_object : BamFile() object

fasta_object: RefFasta() object

Returns:

bool

True if sequence names and lengths match. False otherwise.

xyalign.utils.check_compatibility_bam_list(bam_obj_list)[source]

Checks to see if bam sequence names and lengths are equivalent (i.e., if it is likely that the bam files were generated using the same reference genome).

Parameters:

bam_obj_list : list

List of bam.BamFile() objects

Returns:

bool

True if sequence names and lengths match. False otherwise.

xyalign.utils.merge_bed_files(output_file, *bed_files)[source]

This function simply takes an output_file (full path to desired output file) and an arbitrary number of external bed files (including full path), and merges the bed files into the output_file

Parameters:

output_file : str

Full path to and name of desired output bed file

*bed_files

Variable length argument list of external bed files (include full path)

Returns:

str

path to output_file

xyalign.utils.make_region_lists_genome_filters(depthAndMapqDf, mapqCutoff, min_depth, max_depth)[source]

Filters a pandas dataframe for mapq and depth based on using all values from across the entire genome

Parameters:

depthAndMapqDf : pandas dataframe

Must have ‘depth’ and ‘mapq’ columns

mapqCutoff : int

The minimum mapq for a window to be considered high quality

min_depth : float

Fraction of mean to set as minimum depth

max_depth : float

Multiple of mean to set as maximum depth

Returns:

tuple

(passing dataframe, failing dataframe)

xyalign.utils.make_region_lists_chromosome_filters(depthAndMapqDf, mapqCutoff, min_depth, max_depth)[source]

Filters a pandas dataframe for mapq and depth based on thresholds calculated per chromosome

Parameters:

depthAndMapqDf : pandas dataframe

Must have ‘depth’ and ‘mapq’ columns

mapqCutoff : int

The minimum mapq for a window to be considered high quality

min_depth : float

Fraction of mean to set as minimum depth

max_depth : float

Multiple of mean to set as maximum depth

Returns:

tuple

(passing dataframe, failing dataframe)

xyalign.utils.output_bed(outBed, *regionDfs)[source]

Concatenate and merges dataframes into an output bed file

Parameters:

outBed : str

The full path to and name of the output bed file

*regionDfs

Variable length list of dataframes to be included

Returns:

int

0

xyalign.utils.output_bed_no_merge(outBed, *regionDfs)[source]

Concatenate dataframes into an output bed file. This will preserve all columns after the first three as well.

Parameters:

outBed : str

The full path to and name of the output bed file

*regionDfs

Variable length list of dataframes to be included

Returns:

int

0

xyalign.utils.chromosome_wide_plot(chrom, positions, y_value, measure_name, sampleID, output_prefix, MarkerSize, MarkerAlpha, Xlim, Ylim, x_scale=1000000)[source]

Plots values across a chromosome, where the x axis is the position along the chromosome and the Y axis is the value of the measure of interest.

Parameters:

chrom : str

Name of the chromosome

positions : numpy array

Genomic coordinates

y_value : numpy array

The values of the measure of interest

measure_name : str

The name of the measure of interest (y axis title)

sampleID : str

The name of the sample

output_prefix : str

Full path to and prefix of desired output plot

MarkerSize : float

Size in points^2

MarkerAlpha : float

Transparency (0 to 1)

Xlim : float

Maximum X value

Ylim : float

Maximum Y value

x_scale : int

Divide all x values (including Xlim) by this value. Default is 1000000 (1MB)

Returns:

int

0

xyalign.utils.hist_array(chrom, value_array, measure_name, sampleID, output_prefix)[source]

Plots a histogram of an array of values of interest. Intended for mapq and depth, but generalizeable. Separate function from variants.hist_read_balance because that function eliminates fixed variants, while this function will plot all values.

Parameters:

chrom : str

Name of the chromosome

value_array : numpy array

Read balance values

measure_name : str

The name of the measure of interest (y axis title)

sampleID : str

Sample name or id to include in the plot title

output_prefix : str

Desired prefix (including full path) of the output files

Returns:

int

0 if plotting successful, 1 otherwise.

xyalign.utils.plot_depth_mapq(window_df, output_prefix, sampleID, chrom_length, MarkerSize, MarkerAlpha, x_scale=1000000)[source]

Creates histograms and genome-wide plots of various metrics.

Parameters:

window_df : pandas dataframe

Columns must include chrom, start, depth, and mapq (at least)

output_prefix : str

Path and prefix of output files to create

sampleID : str

Sample ID

chrom_length: int

Length of chromosome

x_scale : int

Divide all x values (including Xlim) by this value for chromosome_wide_plot. Default is 1000000 (1MB)

Returns:

int

0

xyalign.utils.before_after_plot(chrom, positions, values_before, values_after, measure_name, sampleID, output_prefix, MarkerSize, MarkerAlpha, Xlim, YMin='auto', YMax='auto', x_scale=1000000, Color='black')[source]

Plots difference between before/after values (after minus before) across a chromosome.

Parameters:

chrom : str

Name of the chromosome

positions : numpy array

Genomic coordinates

values_before : numpy array

The values of the measure of interest in the “before” condidtion

values_after : numpy array

The values of the measure of interest in the “after” condidtion

measure_name : str

The name of the measure of interest (for y-axis title)

sampleID : str

The name of the sample

output_prefix : str

Full path to and prefix of desired output plot

MarkerSize : float

Size in points^2

MarkerAlpha : float

Transparency (0 to 1)

Xlim : float

Maximum X value

YMin : str, int, or float

If “auto”, will allow matplotlib to automatically determine limit. Otherwise, will set the y axis minimum to the value provided (int or float)

YMax : str, int, or float

If “auto”, will allow matplotlib to automatically determine limit. Otherwise, will set the y axis maximum to the value provided (int or float)

x_scale : int

Divide all x values (including Xlim) by this value. Default is 1000000 (1MB)

Color : str

Color to use for points. See matplotlib documentation for acceptable options

Returns:

int

0 if plotting successful, 1 otherwise