xyalign.reftools module¶
-
class
xyalign.reftools.
RefFasta
(filepath, samtools='samtools', bwa='bwa', no_initial_index=False)[source]¶ A class for working with external reference fasta files
Attributes
filepath (str) Full path to external bam file. samtools (str) Full path to samtools. Default = ‘samtools’ bwa (str) Full path to bwa. Default = ‘bwa’ -
is_faidxed
()[source]¶ Checks that fai index exists, is not empty and is newer than reference.
Returns: bool
True if fai index exists and is newer than fasta, False otherwise.
-
index_fai
()[source]¶ Create fai index for reference using samtools (‘samtools faidx ref.fa’)
Returns: bool
True if successful
Raises: RuntimeError
If return code from external call is not 0
-
index_bwa
()[source]¶ Index reference using bwa
Returns: bool
True if successful
Raises: RuntimeError
If return code from external call is not 0
-
check_bwa_index
()[source]¶ Checks to see if bwa indices are newer than fasta.
Returns: bool
True if indices exist and are newer than fasta. False otherwise.
-
conditional_index_bwa
(bwa='bwa')[source]¶ Indexes if indices are the same age or older than the fasta. Use RefFasta.index_bwa() to force indexing.
Parameters: bwa : str
Path to bwa program (default is ‘bwa’)
-
check_seq_dict
()[source]¶ Checks that sequence dictionary exists, is not empty and is newer than reference.
Returns: bool
True if seq dict exists and is newer than fasta, False otherwise.
-
seq_dict
(out_dict=None)[source]¶ Create sequence dictionary .dict file using samtools
Parameters: out_dict : str
The desired file name for the sequence dictionary - defaults to adding ‘.dict’ to the fasta name
Returns: bool
True if exit code of external call is 0.
Raises: RuntimeError
If external call exit code is not 0.
-
conditional_seq_dict
()[source]¶ Creates sequence dictionary if .dict the same age or older than the fasta, or doesn’t exist.
Use RefFasta.seq_dict() to force creation.
-
mask_reference
(bed_mask, output_fasta)[source]¶ Creates a new masked references by hardmasking regions included in the bed_mask
Parameters: bed_mask : str
Bed file of regions to mask (as N) in the new reference
output_fasta : str
The full path to and filename of the output fasta
Returns: str
Path to new (indexed and masked) fasta
-
isolate_chroms
(new_ref_prefix, chroms, bed_mask=None)[source]¶ Takes a reference fasta file and a list of chromosomes to include and outputs a new, indexed (and optionally masked) reference fasta.
Parameters: new_ref_prefix : str
The desired path to and prefix of the output files
chroms : list
Chromosomes to include in the output fasta
bed_mask : str
Bed file of regions to mask (as N) in the new reference
Returns: str
Path to new, indexed (optionally masked) fasta
-
get_chrom_length
(chrom)[source]¶ Extract chromosome length from fasta.
Parameters: chrom : str
The name of the chromosome or scaffold.
Returns: length : int
The length (integer) of the chromsome/scaffold
Raises: RuntimeError
If chromosome name not present in bam header
-
chromosome_bed
(output_file, chromosome_list)[source]¶ Takes list of chromosomes and outputs a bed file with the length of each chromosome on each line (e.g., chr1 0 247249719).
Parameters: output_file : str
Name of (including full path to) desired output file
chromosome_list : list
Chromosome/scaffolds to include
Returns: str
output_file
Raises: RuntimeError
If chromosome name is not in fasta.
-