xyalign.xyalign module¶
-
xyalign.xyalign.
parse_args
()[source]¶ Parse command-line arguments
Returns: Parser argument namespace
-
xyalign.xyalign.
ref_prep
(ref_obj, ref_mask, ref_dir, xx, xy, y_chromosome, samtools_path, bwa_path, bwa_index)[source]¶ Reference prep part of XYalign pipeline.
- Creates two reference fasta files. Both will include masks provied with
ref_mask. One will additionally have the entire Y chromosome hard masked.
- Indexes (.fai, .dict, and optionally bwa indices) both new references
Parameters: ref_obj : RefFasta() object
A reftools.RefFasta() object of a fasta reference file to be processed
ref_mask : list or None
List of files to use to hard-mask references. None will ignore masking.
ref_dir : str
Path to output directory
xx : str
Path to XX output reference
xy : str
Path to XY output reference
y_chromosome : str
Name of Y chromosome in fasta
samtools_path : str
The path to samtools (i.e, “samtools” if in path)
bwa_path : str
The path to bwa (i.e, “bwa” if in path)
bwa_index : bool
If True, create bwa indices. Don’t if False.
Returns: tuple
Paths to two masked references (y_masked, y_unmasked)
-
xyalign.xyalign.
chrom_stats
(bam_obj_list, chrom_list, use_counts)[source]¶ Runs chrom stats module.
Calculates mean depth and mapq across entire scaffolds for a list of bam files
Returns: tuple
Tuple containing two dictionaries with results for depth and mapq, respectively. Or, if use_counts is True, returns a tuple containing the count dictionary and None.
-
xyalign.xyalign.
bam_analysis
(input_bam_obj, platypus_calling, platypus_path, vcf_log, ref_obj, input_chroms, cpus, out_vcf, no_variant_plots, window_size, target_bed, sample_id, readbalance_prefix, variant_site_quality, variant_genotype_quality, variant_depth, marker_size, marker_transparency, homogenize_read_balance, data_frame_readbalance, min_variant_count, no_bam_analysis, ignore_duplicates, exact_depth, whole_genome_threshold, mapq_cutoff, min_depth_filter, max_depth_filter, depth_mapq_prefix, bam_data_frame, output_bed_high, output_bed_low, use_bed_for_platypus, coordinate_scale, fixed)[source]¶ Runs bam analyis part of XYalign pipeline on bam file.
- (Optionally) calls variants using Platypus
- (Optionally) parses and filters Platypus vcf, and plots read balance
- (Optionally) Calculates window based metrics from the bam file:
- depth and mapq
- (optionally) Plots window-based metrics
- Outputs two bed files: high quality windows, and low quality windows.
Parameters: input_bam_obj : bam.BamFile() object
platypus_calling : bool
If True, will call and analyze variants
platypus_path : str
Command to call platypus (e.g, “platypus”)
vcf_log : str
Path to file for platypus log
ref_obj : reftools.RefFasta() object
input_chroms : list
Chromosomes to analyze
cpus : int
Number of threads/cpus
out_vcf : str
Output vcf path/name
no_variant_plots : bool
If True, will not plot read balance
window_size : int or None
Window size for sliding window analyses (both bam and vcf). If None, will use regions in target_bed
target_bed : str or None
Path to bed file containing targets to use in sliding window analyses
sample_id : str
readbalance_prefix : str
Prefix, including full path, to use for output files for readbalance analyses
variant_site_quality : int
Minimum site quality (PHRED) for a site to be included in readbalance analyses
variant_genotype_quality : int
Minimum genotype quality for a site to be included in read balance analyses
variant_depth : int
Minimum depth for a site to be included in read balance analyses
marker_size : float
Marker size for plotting genome scatter plots
marker_transparency: float
Value to use for marker transparency in genome scatter plots
homogenize_read_balance : bool
If true, will subtract values less than 0.5 from 1. I.e., 0.25 and 0.75 would be treated equivalently
data_frame_readbalance: str
Path of output file for full read balance dataframe
min_variant_count : int
Minimum number of variants in a given window for the window to be plotted in window-based read balance analyses
no_bam_analysis : bool
If True, no bam analyses will take place
ignore_duplicates : bool
If True, duplicates excluded from bam analyses
exact_depth : bool
If True, exact depth calculated in each window. Else, a much faster approximation will be used
whole_genome_threshold : bool
If True, values for depth filters will be calculated using mean from across all chromosomes included in analyses. Else, mean will be taken per chromosome
min_depth_filter : float
Minimum depth threshold for a window to be considered high. Calculated as mean depth * min_depth_filter.
max_depth_filter : float
Maximum depth threshold for a window to be considered high. Calculated as mean depth * min_depth_filter.
depth_mapq_prefix : str
Prefix, including full path, to be used for files output from depth and mapq analyses
bam_data_frame : str
Full path to output file for dataframe containing all data from bam analyses
output_bed_high : str
Full path to output bed containing high quality (i.e., passing filters) windows
output_bed_low : str
Full path to output bed containing low quality (i.e., failing filters) windows
use_bed_for_platypus : bool
If True, use output_bed_high as regions for Platypus calling
coordinate_scale : int
Divide all coordinates by this value for plotting. In most cases, 1000000 will be ideal for eukaryotic genomes.
fixed : bool
If False, only plots histogram for values between 0.05 and 1.0 (non-inclusive). If True, plots histogram of all variants.
Returns: tuple
(list of pandas dataframes with passing windows, list of pandas dataframes with failing windows)
-
xyalign.xyalign.
ploidy_analysis
(passing_df, failing_df, no_perm_test, no_ks_test, no_bootstrap, input_chroms, x_chromosome, y_chromosome, results_dir, num_permutations, num_bootstraps, sample_id)[source]¶ Runs the ploidy analysis part of XYalign.
- Runs permutation test to systematically compare means between
every possible pair of chromosomes
- Runs K-S two sample test to systematically compare distributions between
every possible pair of chromosomes
- Bootstraps the mean depth ratio for every possible pair of chromosomes
Parameters: passing_df : list
Passing pandas dataframes, one per chromosome
failing_df : list
Failing pandas dataframes, one per chromosome
no_perm_test : bool
If False, permutation test will be run
no_ks_test : bool
If False, KS test will be run
no_bootstrap : bool
If False, bootstrap analysis will be run
input_chroms : list
Chromosomes/scaffolds to analyze
x_chromosome : list
X-linked scaffolds
y_chromosome : list
Y-likned scaffolds
results_dir : str
Full path to directory to output results
num_permutations : int
Number of permutations
num_bootstraps : int
Number of bootstrap replicates
sample_id : str
Returns: dictionary
Results for each test. Keys: perm, ks, boot.
-
xyalign.xyalign.
remapping
(input_bam_obj, y_pres, masked_references, samtools_path, sambamba_path, repairsh_path, shufflesh_path, bwa_path, bwa_flags, single_end, bam_dir, fastq_dir, sample_id, x_chromosome, y_chromosome, cpus, xmx, fastq_compression, cleanup, read_group_id)[source]¶ Runs remapping steps of XYalign.
- Strips, sorts, and re-pair reads from the sex chromosomes (collecting read
group information)
- Maps (with sorting) reads (with read group information) to appropriate
reference based on presence (or not) of Y chromosome
- Merge bam files (if more than one read group)
Parameters: input_bam_obj : bam.BamFile() object
y_pres : bool
True if Y chromosome present in individual
masked_references : tuple
Masked reference objects (xx, xy)
samtools_path : str
Path/command to call samtools
sambamba_path : str
Path/command to call sambamba
repairsh_path : str
Path/command to call repair.sh
shufflesh_path : str
Path/command to call shuffle.sh
bwa_path : str
Path/command to call bwa
bwa_flags : str
Flags to use for bwa mapping
single_end : bool
If True, reads treated as single end
bam_dir : str
Path to output directory for bam files
fastq_dir : str
Path to output directory for fastq files
sample_id : str
x_chromosome : list
X-linked scaffolds
y_chromosome : list
Y-linked scaffolds
cpus : int
Number of threads/cpus
xmx : str
Value to be combined with -Xmx for java programs (i.e., 4g would result in -Xmx4g)
fastq_compression : int
Compression level for fastq files. 0 leaves fastq files uncompressed. Otherwise values should be between 1 and 9 (inclusive), with larger values indicating more compression
cleanup : bool
If True, will delete temporary files
read_group_id : str
ID to use to add read group information
Returns: str
Path to bam containing remapped sex chromsomes
-
xyalign.xyalign.
swap_sex_chroms
(input_bam_obj, new_bam_obj, samtools_path, sambamba_path, x_chromosome, y_chromosome, bam_dir, sample_id, cpus, xyalign_params)[source]¶ Switches sex chromosmes from new_bam_file with those in original bam file
Parameters: input_bam_obj : bam.BamFile() object
Original input bam file object
new_bam_obj : bam.BamFile() object
Bam file object containing newly mapped sex chromosomes (to insert)
samtools_path : str
Path/command to call samtools
sambamba_path : str
Path/command to call sambamba
x_chromosome : list
X-linked scaffolds
y_chromosome : str
Y-linked scaffolds
bam_dir : str
Path to bam output directory
sample_id : str
cpus : int
Number of threads/cpus
xyalign_params : dict
Dictionary of xyalign_params to add to bam header
Returns: str
Path to new bam file containing original autosomes and new sex chromosomes