xyalign.reftools module

class xyalign.reftools.RefFasta(filepath, samtools='samtools', bwa='bwa', no_initial_index=False)[source]

A class for working with external reference fasta files

Attributes

filepath (str) Full path to external bam file.
samtools (str) Full path to samtools. Default = ‘samtools’
bwa (str) Full path to bwa. Default = ‘bwa’
is_faidxed()[source]

Checks that fai index exists, is not empty and is newer than reference.

Returns:

bool

True if fai index exists and is newer than fasta, False otherwise.

index_fai()[source]

Create fai index for reference using samtools (‘samtools faidx ref.fa’)

Returns:

bool

True if successful

Raises:

RuntimeError

If return code from external call is not 0

index_bwa()[source]

Index reference using bwa

Returns:

bool

True if successful

Raises:

RuntimeError

If return code from external call is not 0

check_bwa_index()[source]

Checks to see if bwa indices are newer than fasta.

Returns:

bool

True if indices exist and are newer than fasta. False otherwise.

conditional_index_bwa(bwa='bwa')[source]

Indexes if indices are the same age or older than the fasta. Use RefFasta.index_bwa() to force indexing.

Parameters:

bwa : str

Path to bwa program (default is ‘bwa’)

check_seq_dict()[source]

Checks that sequence dictionary exists, is not empty and is newer than reference.

Returns:

bool

True if seq dict exists and is newer than fasta, False otherwise.

seq_dict(out_dict=None)[source]

Create sequence dictionary .dict file using samtools

Parameters:

out_dict : str

The desired file name for the sequence dictionary - defaults to adding ‘.dict’ to the fasta name

Returns:

bool

True if exit code of external call is 0.

Raises:

RuntimeError

If external call exit code is not 0.

conditional_seq_dict()[source]

Creates sequence dictionary if .dict the same age or older than the fasta, or doesn’t exist.

Use RefFasta.seq_dict() to force creation.

mask_reference(bed_mask, output_fasta)[source]

Creates a new masked references by hardmasking regions included in the bed_mask

Parameters:

bed_mask : str

Bed file of regions to mask (as N) in the new reference

output_fasta : str

The full path to and filename of the output fasta

Returns:

str

Path to new (indexed and masked) fasta

isolate_chroms(new_ref_prefix, chroms, bed_mask=None)[source]

Takes a reference fasta file and a list of chromosomes to include and outputs a new, indexed (and optionally masked) reference fasta.

Parameters:

new_ref_prefix : str

The desired path to and prefix of the output files

chroms : list

Chromosomes to include in the output fasta

bed_mask : str

Bed file of regions to mask (as N) in the new reference

Returns:

str

Path to new, indexed (optionally masked) fasta

get_chrom_length(chrom)[source]

Extract chromosome length from fasta.

Parameters:

chrom : str

The name of the chromosome or scaffold.

Returns:

length : int

The length (integer) of the chromsome/scaffold

Raises:

RuntimeError

If chromosome name not present in bam header

chromosome_bed(output_file, chromosome_list)[source]

Takes list of chromosomes and outputs a bed file with the length of each chromosome on each line (e.g., chr1 0 247249719).

Parameters:

output_file : str

Name of (including full path to) desired output file

chromosome_list : list

Chromosome/scaffolds to include

Returns:

str

output_file

Raises:

RuntimeError

If chromosome name is not in fasta.

chromosome_lengths()[source]
Returns:

tuple

Chromosome lengths ordered by sequence order in fasta

chromosome_names()[source]
Returns:

tuple

Chromosome names ordered by sequence order in fasta