Usage Overview¶

The Basics¶

Requirements¶

The different modules of XYalign have slightly different requirements, but in general you’ll need: a bam file and the reference fasta file used to generate it (it’s critically important, as using a different fasta will cause errors). XYalign also requires a list of chromosomes to analyze, the name of the X chromosome, and the name of the Y chromosome (if in the assembly). The chromosome names must exactly match those in the bam header and reference fasta - ‘chr19’ is not equivalent to ‘19’, for example.

You also need a variety of python packages and external programs installed. See Installation for more information.

The Pipeline¶

Xyalign is composed of the following modules that can be thought of as steps in the pipeline (with the exception of CHROM_STATS):

PREPARE_REFERENCE
ANALYZE_BAM
CHARACTERIZE_SEX_CHROMS
STRIP_READS
REMAPPING
CHROM_STATS

Each of these modules can be invoked as a command line flag with no arguments (e.g., --PREPARE_REFERENCE), and XYalign will execute only that module. If no flags are provided, XYalign will run the full pipeline in the following order: PREPARE_REFERENCE -> ANALYZE_BAM -> CHARACTERIZE_SEX_CHROMS -> STRIP_READS -> REMAPPING -> ANALYZE_BAM. This will:

1. Prepare two reference genomes - one with the Y chromosome masked, the other with both X and Y unmasked. In both cases, XYalign will optionally mask other regions of the genome provided in an input bed file (using the flag --reference_mask <file1.bed> <file2.bed> ...).

2. Analyze the bam file to calculate metrics related to read balance, read depth, and mapping quality. Read depth and mapping quality are calculated in windows, and either --window_size <integer window size> or --target_bed <path to target bed file> must be provided. --window_size is the fixed size of windows to use in a nonoverlapping sliding window analysis in bases (e.g., 10000 for 10 kb windows). --target_bed is a bed file of targets to use as windows, e.g. exome capture targets.

3. Plot read balance, depth, and mapq for each chromosome, and output bed files of high and low quality regions, based on either default or user-defined thresholds.

4. Run a series of tests comparing ANALYZE_BAM metrics for each chromosome. If the flag --CHARACTERIZE_SEX_CHROMS is invoked, XYalign will carry out the bam analysis steps above and then proceed to these tests.

5. Strip and sort reads mapping to the sex chromosomes, map to the reference with the appropriate masking (step 1) based on the results of step 4, and replace the sex chromosome alignments in the original bam file with these new ones.

Analyze the new bam file as in steps 4 and 5.

CHROM_STATS provides quicker, coarser statistics and is designed for cases in which a reference genome is well-understood and when multiple samples are available.

Suggested Command Lines¶

Below we highlight example command lines, as well as useful optional flags for each module (PREPARE_REFERENCE, ANALYZE_BAM, CHARACTERIZE_SEX_CHROMS, STRIP_READS, REMAPPING, CHROM_STATS) as well as the full pipeline. You can find a complete list of command line flags, their descriptions, and their defaults from the command line:

xyalign -h

In all examples, reference.fasta is our input reference in fasta format, input.bam is our input bam file (created using reference.fasta), sample1 is the ID of our sample, and sample1_output is the name of our desired output directory. We’ll analyze chromosomes named ‘chr19’, ‘chrX’, and ‘chrY’, with chrX representing the X chromosome and chrY representing the Y chromosome. We’ll assume that all programs are in our PATH and can be invoked by typing the program name from the command line without any associated path (e.g., samtools). We’ll also assume that we’re working on a cluster with 4 cores available to XYalign.

1. PREPARE_REFERENCE

xyalign --PREPARE_REFERENCE --ref reference.fasta \
--output_dir sample1_output --sample_id sample1 --cpus 4 --reference_mask mask.bed \
--x_chromosome chrX --y_chromosome chrY

Here, mask.bed is a bed file containing regions to mask in both output reference genomes (e.g., coordinates for the pseudoautosomal regions on the Y chromosome). More than one can be included as well (e.g., --reference_mask mask.bed mask2.bed).

This will output two reference genomes, one with the Y chromosome completely masked (defaults to sample1_output/reference/xyalign_noY.masked.fa) and one with an unmasked Y (defaults to sample1_output/reference/xyalign_withY.masked.fa). These default names can be changed with the --xx_ref_out_name and --xy_ref_out_name flags. With these flags, files will still be deposited in sample1_output/reference. To deposit these files in a specific location, use --xx_ref_out and --xy_ref_out with the full path to and name of desired output files. You can optionally use BWA to index the output fasta files as well by using the “–bwa_index” flag.

2. ANALYZE_BAM

xyalign --ANALYZE_BAM --ref reference.fasta --bam input.bam \
--output_dir sample1_output --sample_id sample1 --cpus 4 --window_size 10000 \
--chromosomes chr19 chrX chrY --x_chromosome chrX --y_chromosome chrY

Here, 10000 is the fixed window size to use in (nonoverlapping) sliding window analyses of the bam file. If you’re working with targeted sequencing data (e.g. exome), you can provide a list of regions to use instead of windows. For example, if your regions are in targets.bed you would add the flag: --targed_bed targets.bed.

This command line will default to a minimum quality of 30 (SNP), genotype quality of 30 (SNP), variant depth of 4 (SNP), and mapping quality of 20 (bam window). These can be set with the flags --variant_site_quality, --variant_genotype_quality, --variant_depth, and --mapq_cutoff, respectively. One can also apply depth filters to bam windows with --min_depth_filter and --max_depth_filter.

This will output a series of plots in sample1_output/plots, bed files containing high and low quality windows in sample1_output/bed, and the entire dataframe with values for each measure in each window in sample1_output/bed.

3. CHARACTERIZE_SEX_CHROMS

xyalign --CHARACTERIZE_SEX_CHROMS --ref reference.fasta --bam input.bam \
--output_dir sample1_output --sample_id sample1 --cpus 4 --window_size 10000 \
--chromosomes chr19 chrX chrY --x_chromosome chrX --y_chromosome chrY

Settings here are identical to 3 because the first step of CHARACTERIZE_SEX_CHROMS involves running ANALYZE_BAM.

In addition to everything in ANALYZE_BAM, CHARACTERIZE_SEX_CHROMS will output the results of a series of statistical tests in sample1_output/results.

4. STRIP_READS

xyalign --STRIP_READS --ref reference.fasta --bam input.bam \
--output_dir sample1_output --sample_id sample1 --cpus 4 \
--chromosomes chr1 chr2 chr3 chr4 chr5 --xmx 2g \
--fastq_compression 5

This will strip the reads, by read group, from chromosomes 1-5 and output a pair of fastqs per read group, as well as the read groups themselves, and a text file connecting fastqs with their respective read groups in the directory sample1_output/fastq. If we were working with single-end reads, we would have had to include the flag --single_end. Here, the reference file isn’t used at all (it’s a general requirement of XYalign), so a dummy file can be used in its place. To strip reads from the entire genome (including unmapped), use `` –chromosomes ALL``. --xmx tells the Java programs that XYalign is calling how much memory to use (e.g., --xmx 2g provides 2 GB ram). --fastq_compression determines the compression level of output fastqs (between 0 and 9, with 0 leaving files uncompressed).

5. REMAPPING

xyalign --REMAPPING --ref reference.fasta --bam input.bam \
--output_dir sample1_output --sample_id sample1 --cpus 4 \
--chromosomes chr19 chrX chrY --x_chromosome chrX --y_chromosome chrY \
--xx_ref_in sample1_output/reference/xyalign_noY.masked.fa \
--xy_ref_in sample1_output/reference/xyalign_withY.masked.fa \
--y_absent

Here, we’ve input our reference genomes generated in step 1 (if we don’t, XYalign will repeat that step). We’ve also used the flag --y_absent to indicate that there is no Y chromosome in our sample (perhaps as the result of step 3, or outside knowledge). If a Y is present, we would have used --y_present instead. REMAPPING requires one of those two flags, as it does not involve any steps to estimate sex chromosome content (those are carried out in CHARACTERIZE_SEX_CHROMS). Note that REMAPPING will run STRIP_READS first.

Full pipeline

And if we want to run the full XYalign pipeline on a sample, we’d use a command line along the lines of:

xyalign --ref reference.fasta --bam input.bam \
--output_dir sample1_output --sample_id sample1 --cpus 4 --reference_mask mask.bed \
--window_size 10000 --chromosomes chr19 chrX chrY \
--x_chromosome chrX --y_chromosome chrY

We could have optionally provided preprocessed reference genomes with --xx_ref_in and --xx_ref_in, as in 4. We could have also used --y_absent or --y_present to force mapping to a certain reference. Because we didn’t include either of these two flags, XYalign will use --sex_chrom_calling_threshold to determine the sex chromosome complement (default is 2.0).

6. CHROM_STATS

xyalign --CHROM_STATS --use_counts --bam input1.bam input2.bam input3.bam --ref null \
--output_dir directory_name --sample_id analysis_name --chromosomes chr19 chrX chrY

Here, --use_counts simply grabs the number of reads mapped to each chromosome from the bam index. It’s by far the fastest, yet coarsest option. Running without this flag will calculated depth and mapq along each chromosome for more detail, but this will take longer.

Recommendations for Incorporating XYalign into Pipelines¶

While the full XYalign pipeline will be useful in certain situations, we feel that the following pipeline is better suited to most users’ needs and will save time and space.

Use XYalign PREPARE_REFERENCE to prepare Y present and Y absent genomes.

2. Preliminarily map reads to the standard reference (or Y present) and sort the bam file using any mapper and sorting algorithm. We have found that one can usually use smaller dataset for this step (e.g., a whole exome sequencing run or one lane of a whole genome sequencing run).

3. Run CHARACTERIZE_SEX_CHROMS, to analyze the bam file, output plots, and estimate ploidy. If a number of samples are available and sex chromosomes are well-differentiated (as in humans), consider using CHROM_STATS with plot_count_stats.

4. Remap reads to the fasta produced in 1 corresponding to the sex chromosome complement characterized in 3. E.g., if Y is not detected, map to Y absent. This time run full pipeline of mapping, sorting, removing duplicates, etc., using users’ preferred tools/pipeline.

Optionally run ANALYZE_BAM on bam file produced in 4.
Call variants using user-preferred caller.

7. Analyze variants taking into account ploidy estimated in 3, and consider masking low quality regions using bed files output in 5.

XYalign - Speed and Memory¶

The minimum memory requirements for XYalign are determined by external programs, rather than any internal code. Right now, the major limiting step is bwa indexing of reference genomes which requires 5-6 GB of memory to index a human-sized genome. In addition, in certain situations (e.g., removing all reads from deep coverage genome data with a single - or no - read group) the STRIP_READS module will require a great deal of memory to sort and match paired reads (the memory requirement is that of the external program repair.sh).

The slowest parts of the pipeline also all involve steps relying on external programs, such as genome preparation, variant calling, read mapping, swapping sex chromosome alignments, etc. In almost all cases, you’ll see substantial increases in the speed of the pipeline by increasing the number of threads/cores. You must provide information about the number of cores available to XYalign with the --cpus flag (XYalign will assume only a single thread is available unless this flag is set).

Exome data¶

XYalign handles exome data, with a few minor considerations. In particular, either setting --window_size to a smaller value, perhaps 5000 or less, or inputting targets instead of a window size (--target_bed targets.bed) will be critical for getting more accurate window measures. In addition, users should manually check the results of CHARACTERIZE_SEX_CHROMS for a number of samples to get a feel for expected values on the sex chromosomes, as these values are likely to vary among experimental design (especially among different capture kits).

Nonhuman genomes¶

XYalign will theoretically work with any genome, and on any combination of chromosomes or scaffolds (see more on the latter below). Simply provide the names of the chromosomes/scaffolds to analyze and the names of the sex chromosomes (e.g., --chromosomes chr1a chr1b chr2 lga lgb --x_chromosome lga --y_chromosome lgb if our x_linked scaffold was lga and y_linked scaffold was lgb, and we wanted to compare these scaffolds to chromosomes: chr1a chr1b and chr2). However, please note that, as of right now, XYalign does not support multiple X or Y chromosomes/scaffolds (we are planning on supporting this soon though).

Keep in mind, however, that read balance, mapq, and depth ratios might differ among organisms, so default XYalign settings will likely not be appropriate in most cases. Instead, if multiple samples are available, we recommend running XYalign’s CHARACTERIZE_SEX_CHROMS on each sample (steps 2-3 in “Recommendations for Incorporating XYalign into pipelines” above) using the same output directory for all samples. One can then quickly concatenate results (we recommend starting with bootstrap results) and plot them to look for clustering of samples (see the XYalign publication for examples of this).

Analyzing arbitrary chromosomes¶

Currently, XYalign requires a minimum of two chromosomes (an “autosome and an “x chromosome”) for analyses in ANALYZE_BAM and CHARACTERIZE_SEX_CHROMS (and therefore, the whole pipeline) These chromosomes, however, can be arbitrary. Below, we highlight two example cases: looking for evidence of Trisomy 21 in human samples, and running the full XYalign pipeline on a ZW sample (perhaps a bird, squamate reptile, or moth).

If one wanted to look for evidence of Trisomy 21 in human data mapped to hg19 (which uses “chr” in chromosome names), s/he could use a command along the lines of:

xyalign --CHARACTERIZE_SEX_CHROMS --ref reference.fasta --bam input.bam \
--output_dir sample1_output --sample_id sample1 --cpus 4 --window_size 10000 \
--chromosomes chr1 chr10 chr19 chr21 --x_chromosome chr21

This would run the CHARACTERIZE_SEX_CHROMS module, systematically comparing chr21 to chr1, chr10, and chr19.

To run the full pipeline on a ZW sample (in ZZ/ZW systems, males are ZZ and females are ZW), one could simply run a command like (assuming the Z scaffold was named “scaffoldz” and the W scaffold was named “scaffoldw”):

xyalign --ref reference.fasta --bam input.bam \
--output_dir sample1_output --sample_id sample1 --cpus 4 --reference_mask mask.bed \
--window_size 10000 --chromosomes scaffold1 scaffoldz scaffoldw --x_chromosome scaffoldz \
--y_chromosome scaffoldw

In this example, it’s important that the the “X” and “Y” chromosomes are assigned in this way because PREPARE_REFERENCE (the first step in the full pipeline) will create two reference genomes: one with the “Y” completely masked, and one with both “X” and “Y” unmasked. This command will therefore create the appropriate references (a ZW and a Z only). Other organisms or uses might not require this consideration.

Using XYalign as a Python library¶

All modules in the XYalign/xyalign directory are designed to support the command line program XYalign. However, some classes and functions might be of use in other circumstances. If you’ve installed XYalign as described in Installation, then you should be able to import XYalign libraries just like you would for any other Python package. E.g.:

from xyalign import bam

Or:

import xyalign.bam

Full List of Command-Line Flags¶

This list can also be produced with the command::: xyalign -h

Flags:

-h, --help            show this help message and exit
--bam [BAM [BAM ...]]
                                          Full path to input bam files. If more than one
                                          provided, only the first will be used for modules
                                          other than --CHROM_STATS
--cram [CRAM [CRAM ...]]
                                          Full path to input cram files. If more than one
                                          provided, only the first will be used for modules
                                          other than --CHROM_STATS. Not currently supported.
--sam [SAM [SAM ...]]
                                          Full path to input sam files. If more than one
                                          provided, only the first will be used for modules
                                          other than --CHROM_STATS. Not currently supported.
--ref REF             REQUIRED. Path to reference sequence (including file
                                          name).
--output_dir OUTPUT_DIR, -o OUTPUT_DIR
                                          REQUIRED. Output directory. XYalign will create a
                                          directory structure within this directory
--chromosomes [CHROMOSOMES [CHROMOSOMES ...]], -c [CHROMOSOMES [CHROMOSOMES ...]]
                                          Chromosomes to analyze (names must match reference
                                          exactly). For humans, we recommend at least chr19,
                                          chrX, chrY. Generally, we suggest including the sex
                                          chromosomes and at least one autosome. To analyze all
                                          chromosomes use '--chromosomes ALL' or '--chromosomes
                                          all'.
--x_chromosome [X_CHROMOSOME [X_CHROMOSOME ...]], -x [X_CHROMOSOME [X_CHROMOSOME ...]]
                                          Names of x-linked scaffolds in reference fasta (must
                                          match reference exactly).
--y_chromosome [Y_CHROMOSOME [Y_CHROMOSOME ...]], -y [Y_CHROMOSOME [Y_CHROMOSOME ...]]
                                          Names of y-linked scaffolds in reference fasta (must
                                          match reference exactly). Defaults to chrY. Give None
                                          if using an assembly without a Y chromosome
--sample_id SAMPLE_ID, -id SAMPLE_ID
                                          Name/ID of sample - for use in plot titles and file
                                          naming. Default is sample
--cpus CPUS           Number of cores/threads to use. Default is 1
--xmx XMX             Memory to be provided to java programs via -Xmx. E.g.,
                                          use the flag '--xmx 4g' to pass '-Xmx4g' as a flag
                                          when running java programs (currently just repair.sh).
                                          Default is 'None' (i.e., nothing provided on the
                                          command line), which will allow repair.sh to
                                          automatically allocate memory. Note that if you're
                                          using --STRIP_READS on deep coverage whole genome
                                          data, you might need quite a bit of memory, e.g. '--
                                          xmx 16g', '--xmx 32g', or more depending on how many
                                          reads are present per read group.
--fastq_compression {0,1,2,3,4,5,6,7,8,9}
                                          Compression level for fastqs output from repair.sh.
                                          Between (inclusive) 0 and 9. Default is 3. 1 through 9
                                          indicate compression levels. If 0, fastqs will be
                                          uncompressed.
--single_end          Include flag if reads are single-end and NOT paired-
                                          end.
--version, -V         Print version and exit.
--no_cleanup          Include flag to preserve temporary files.
--PREPARE_REFERENCE   This flag will limit XYalign to only preparing
                                          reference fastas for individuals with and without Y
                                          chromosomes. These fastas can then be passed with each
                                          sample to save subsequent processing time.
--CHROM_STATS         This flag will limit XYalign to only analyzing
                                          provided bam files for depth and mapq across entire
                                          chromosomes.
--ANALYZE_BAM         This flag will limit XYalign to only analyzing the bam
                                          file for depth, mapq, and (optionally) read balance
                                          and outputting plots.
--CHARACTERIZE_SEX_CHROMS
                                          This flag will limit XYalign to the steps required to
                                          characterize sex chromosome content (i.e., analyzing
                                          the bam for depth, mapq, and read balance and running
                                          statistical tests to help infer ploidy)
--REMAPPING           This flag will limit XYalign to only the steps
                                          required to strip reads and remap to masked
                                          references. If masked references are not provided,
                                          they will be created.
--STRIP_READS         This flag will limit XYalign to only the steps
                                          required to strip reads from a provided bam file.
--logfile LOGFILE     Name of logfile. Will overwrite if exists. Default is
                                          sample_xyalign.log
--reporting_level {DEBUG,INFO,ERROR,CRITICAL}
                                          Set level of messages printed to console. Default is
                                          'INFO'. Choose from (in decreasing amount of
                                          reporting) DEBUG, INFO, ERROR or CRITICAL
--platypus_path PLATYPUS_PATH
                                          Path to platypus. Default is 'platypus'. If platypus
                                          is not directly callable (e.g., '/path/to/platypus' or
                                          '/path/to/Playpus.py'), then provide path to python as
                                          well (e.g., '/path/to/python /path/to/platypus'). In
                                          addition, be sure provided python is version 2. See
                                          the documentation for more information about setting
                                          up an anaconda environment.
--bwa_path BWA_PATH   Path to bwa. Default is 'bwa'
--samtools_path SAMTOOLS_PATH
                                          Path to samtools. Default is 'samtools'
--repairsh_path REPAIRSH_PATH
                                          Path to bbmap's repair.sh script. Default is
                                          'repair.sh'
--shufflesh_path SHUFFLESH_PATH
                                          Path to bbmap's shuffle.sh script. Default is
                                          'shuffle.sh'
--sambamba_path SAMBAMBA_PATH
                                          Path to sambamba. Default is 'sambamba'
--bedtools_path BEDTOOLS_PATH
                                          Path to bedtools. Default is 'bedtools'
--platypus_calling {both,none,before,after}
                                          Platypus calling withing the pipeline (before
                                          processing, after processing, both, or neither).
                                          Options: both, none, before, after.
--no_variant_plots    Include flag to prevent plotting read balance from VCF
                                          files.
--no_bam_analysis     Include flag to prevent depth/mapq analysis of bam
                                          file. Used to isolate platypus_calling.
--skip_compatibility_check
                                          Include flag to prevent check of compatibility between
                                          input bam and reference fasta
--no_perm_test        Include flag to turn off permutation tests.
--no_ks_test          Include flag to turn off KS Two Sample tests.
--no_bootstrap        Include flag to turn off bootstrap analyses. Requires
                                          either --y_present, --y_absent, or
                                          --sex_chrom_calling_threshold if running full
                                          pipeline.
--variant_site_quality VARIANT_SITE_QUALITY, -vsq VARIANT_SITE_QUALITY
                                          Consider all SNPs with a site quality (QUAL) greater
                                          than or equal to this value. Default is 30.
--variant_genotype_quality VARIANT_GENOTYPE_QUALITY, -vgq VARIANT_GENOTYPE_QUALITY
                                          Consider all SNPs with a sample genotype quality
                                          greater than or equal to this value. Default is 30.
--variant_depth VARIANT_DEPTH, -vd VARIANT_DEPTH
                                          Consider all SNPs with a sample depth greater than or
                                          equal to this value. Default is 4.
--platypus_logfile PLATYPUS_LOGFILE
                                          Prefix to use for Platypus log files. Will default to
                                          the sample_id argument provided
--homogenize_read_balance HOMOGENIZE_READ_BALANCE
                                          If True, read balance values will be transformed by
                                          subtracting each value from 1. For example, 0.25 and
                                          0.75 would be treated equivalently. Default is False.
--min_variant_count MIN_VARIANT_COUNT
                                          Minimum number of variants in a window for the read
                                          balance of that window to be plotted. Note that this
                                          does not affect plotting of variant counts. Default is
                                          1, though we note that many window averages will be
                                          meaningless at this setting.
--reference_mask [REFERENCE_MASK [REFERENCE_MASK ...]]
                                          Bed file containing regions to replace with Ns in the
                                          sex chromosome reference. Examples might include the
                                          pseudoautosomal regions on the Y to force all
                                          mapping/calling on those regions of the X chromosome.
                                          Default is None.
--xx_ref_out_name XX_REF_OUT_NAME
                                          Desired name for masked output fasta for samples
                                          WITHOUT a Y chromosome (e.g., XX, XXX, XO, etc.).
                                          Defaults to 'xyalign_noY.masked.fa'. Will be output in
                                          the XYalign reference directory.
--xy_ref_out_name XY_REF_OUT_NAME
                                          Desired name for masked output fasta for samples WITH
                                          a Y chromosome (e.g., XY, XXY, etc.). Defaults to
                                          'xyalign_withY.masked.fa'. Will be output in the
                                          XYalign reference directory.
--xx_ref_out XX_REF_OUT
                                          Desired path to and name of masked output fasta for
                                          samples WITHOUT a Y chromosome (e.g., XX, XXX, XO,
                                          etc.). Overwrites if exists. Use if you would like
                                          output somewhere other than XYalign reference
                                          directory. Otherwise, use --xx_ref_name.
--xy_ref_out XY_REF_OUT
                                          Desired path to and name of masked output fasta for
                                          samples WITH a Y chromosome (e.g., XY, XXY, etc.).
                                          Overwrites if exists. Use if you would like output
                                          somewhere other than XYalign reference directory.
                                          Otherwise, use --xy_ref_name.
--xx_ref_in XX_REF_IN
                                          Path to preprocessed reference fasta to be used for
                                          remapping in X0 or XX samples. Default is None. If
                                          none, will produce a sample-specific reference for
                                          remapping.
--xy_ref_in XY_REF_IN
                                          Path to preprocessed reference fasta to be used for
                                          remapping in samples containing Y chromosome. Default
                                          is None. If none, will produce a sample-specific
                                          reference for remapping.
--bwa_index BWA_INDEX
                                          If True, index with BWA during PREPARE_REFERENCE. Only
                                          relevantwhen running the PREPARE_REFERENCE module by
                                          itself. Default is False.
--read_group_id READ_GROUP_ID
                                          If read groups are present in a bam file, they are
                                          used by default in remapping steps. However, if read
                                          groups are not present in a file, there are two
                                          options for proceeding. If '--read_group_id None' is
                                          provided (case sensitive), then no read groups will be
                                          used in subsequent mapping steps. Otherwise, any other
                                          string provided to this flag will be used as a read
                                          group ID. Default is '--read_group_id xyalign'
--bwa_flags BWA_FLAGS
                                          Provide a string (in quotes, with spaces between
                                          arguments) for additional flags desired for BWA
                                          mapping (other than -R and -t). Example: '-M -T 20 -v
                                          4'. Note that those are spaces between arguments.
--sex_chrom_bam_only  This flag skips merging the new sex chromosome bam
                                          file back into the original bam file (i.e., sex chrom
                                          swapping). This will output a bam file containing only
                                          the newly remapped sex chromosomes.
--sex_chrom_calling_threshold SEX_CHROM_CALLING_THRESHOLD
                                          This is the *maximum* filtered X/Y depth ratio for an
                                          individual to be considered as having heterogametic
                                          sex chromsomes (e.g., XY) for the REMAPPING module of
                                          XYalign. Note here that X and Y chromosomes are simply
                                          the chromosomes that have been designated as X and Y
                                          via --x_chromosome and --y_chromosome. Keep in mind
                                          that the ideal threshold will vary according to sex
                                          determination mechanism, sequence homology between the
                                          sex chromosomes, reference genome, sequencing methods,
                                          etc. See documentation for more detail. Default is
                                          2.0, which we found to be reasonable for exome, low-
                                          coverage whole-genome, and high-coverage whole-genome
                                          human data.
--y_present           Overrides sex chr estimation by XYalign and remaps
                                          with Y present.
--y_absent            Overrides sex chr estimation by XY align and remaps
                                          with Y absent.
--window_size WINDOW_SIZE, -w WINDOW_SIZE
                                          Window size (integer) for sliding window calculations.
                                          Default is 50000. Default is None. If set to None,
                                          will use targets provided using --target_bed.
--target_bed TARGET_BED
                                          Bed file containing targets to use in sliding window
                                          analyses instead of a fixed window width. Either this
                                          or --window_size needs to be set. Default is None,
                                          which will use window size provided with
                                          --window_size. If not None, and --window_size is None,
                                          analyses will use targets in provided file. Must be
                                          typical bed format, 0-based indexing, with the first
                                          three columns containing the chromosome name, start
                                          coordinate, stop coordinate.
--exact_depth         Calculate exact depth within windows, else use much
                                          faster approximation. *Currently exact is not
                                          implemented*. Default is False.
--whole_genome_threshold
                                          This flag will calculate the depth filter threshold
                                          based on all values from across the genome. By
                                          default, thresholds are calculated per chromosome.
--mapq_cutoff MAPQ_CUTOFF, -mq MAPQ_CUTOFF
                                          Minimum mean mapq threshold for a window to be
                                          considered high quality. Default is 20.
--min_depth_filter MIN_DEPTH_FILTER
                                          Minimum depth threshold for a window to be considered
                                          high quality. Calculated as mean depth *
                                          min_depth_filter. So, a min_depth_filter of 0.2 would
                                          require at least a minimum depth of 2 if the mean
                                          depth was 10. Default is 0.0 to consider all windows.
--max_depth_filter MAX_DEPTH_FILTER
                                          Maximum depth threshold for a window to be considered
                                          high quality. Calculated as mean depth *
                                          max_depth_filter. So, a max_depth_filter of 4 would
                                          require depths to be less than or equal to 40 if the
                                          mean depth was 10. Default is 10000.0 to consider all
                                          windows.
--num_permutations NUM_PERMUTATIONS
                                          Number of permutations to use for permutation
                                          analyses. Default is 10000
--num_bootstraps NUM_BOOTSTRAPS
                                          Number of bootstrap replicates to use when
                                          bootstrapping mean depth ratios among chromosomes.
                                          Default is 10000
--ignore_duplicates   Ignore duplicate reads in bam analyses. Default is to
                                          include duplicates.
--marker_size MARKER_SIZE
                                          Marker size for genome-wide plots in matplotlib.
                                          Default is 10.
--marker_transparency MARKER_TRANSPARENCY, -mt MARKER_TRANSPARENCY
                                          Transparency of markers in genome-wide plots. Alpha in
                                          matplotlib. Default is 0.5
--coordinate_scale COORDINATE_SCALE
                                          For genome-wide scatter plots, divide all coordinates
                                          by this value.Default is 1000000, which will plot
                                          everything in megabases.
--include_fixed INCLUDE_FIXED
                                          Default is False, which removes read balances less
                                          than 0.05 and greater than 0.95 for histogram
                                          plotting. True will include all values. Extreme values
                                          removed by default because they often swamp out the
                                          signal of the rest of the distribution.
--use_counts          If True, get counts of reads per chromosome for
                                          CHROM_STATS, rather than calculating mean depth and
                                          mapq. Much faster, but provides less information.
                                          Default is False