builder
Classes for generating fasta files and records for testing¶
This module contains utility classes for creating fasta files, indexed fasta files (.fai), and sequence dictionaries (.dict).
Examples of creating sets of contigs for writing to fasta¶
Writing a FASTA with two contigs each with 100 bases:
>>> from pathlib import Path
>>> from fgpyo.fasta.builder import FastaBuilder
>>> builder = FastaBuilder()
>>> builder.add("chr10").add("AAAAAAAAAA", 10)
<fgpyo.fasta.builder.ContigBuilder object at ...>
>>> builder = builder.add("chr11").add("GGGGGGGGGG", 10)
>>> fasta_path = Path(getfixture("tmp_path")) / "test.fasta"
>>> builder.to_file(path=fasta_path)
Writing a FASTA with one contig with 100 A's and 50 T's:
>>> from fgpyo.fasta.builder import FastaBuilder
>>> builder = FastaBuilder()
>>> builder.add("chr10").add("AAAAAAAAAA", 10).add("TTTTTTTTTT", 5)
<fgpyo.fasta.builder.ContigBuilder object at ...>
>>> builder.to_file(path=fasta_path)
Add bases to existing contig:
>>> from fgpyo.fasta.builder import FastaBuilder
>>> builder = FastaBuilder()
>>> contig_one = builder.add("chr10").add("AAAAAAAAAA", 1)
>>> contig_one.add("NNN", 1)
<fgpyo.fasta.builder.ContigBuilder object at ...>
>>> contig_one.bases
'AAAAAAAAAANNN'
Classes¶
ContigBuilder ¶
Builder for constructing new contigs, and adding bases to existing contigs. Existing contigs cannot be overwritten, each contig name in FastaBuilder must be unique. Instances of ContigBuilders should be created using FastaBuilder.add(), where species and assembly are optional parameters and will defualt to FastaBuilder.assembly and FastaBuilder.species.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
Unique contig ID, ie., "chr10" |
|
assembly |
Assembly information, if None default is 'testassembly' |
|
species |
Species information, if None default is 'testspecies' |
|
bases |
The bases to be added to the contig ex "A" |
Source code in fgpyo/fasta/builder.py
Functions¶
add ¶
add(bases: str, times: int = 1) -> ContigBuilder
Method for adding bases to a new or existing instance of ContigBuilder.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bases
|
str
|
The bases to be added to the contig |
required |
times
|
int
|
The number of times the bases should be repeated |
1
|
Example add("AAA", 2) results in the following bases -> "AAAAAA"
Source code in fgpyo/fasta/builder.py
FastaBuilder ¶
Builder for constructing sets of one or more contigs.
Provides the ability to manufacture sets of contigs from minimal input, and automatically generates the information necessary for writing the FASTA file, index, and dictionary.
A builder is constructed from an assembly, species, and line length. All attributes have defaults, however these can be overwritten.
Contigs are added to FastaBuilder using:
add()
Bases are added to existing contigs using:
add()
Once accumulated the contigs can be written to a file using:
to_file()
Calling to_file() will also generate the fasta index (.fai) and sequence dictionary (.dict).
Attributes:
| Name | Type | Description |
|---|---|---|
assembly |
str
|
Assembly information, if None default is 'testassembly' |
species |
str
|
Species, if None default is 'testspecies' |
line_length |
int
|
Desired line length, if None default is 80 |
contig_builders |
int
|
Private dictionary of contig names and instances of ContigBuilder |
Source code in fgpyo/fasta/builder.py
134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 | |
Functions¶
__getitem__ ¶
__getitem__(key: str) -> ContigBuilder
add ¶
add(name: str, assembly: Optional[str] = None, species: Optional[str] = None) -> ContigBuilder
Creates and returns a new ContigBuilder for a contig with the provided name. Contig names must be unique, attempting to create two seperate contigs with the same name will result in an error.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Unique contig ID, ie., "chr10" |
required |
assembly
|
Optional[str]
|
Assembly information, if None default is 'testassembly' |
None
|
species
|
Optional[str]
|
Species information, if None default is 'testspecies' |
None
|
Source code in fgpyo/fasta/builder.py
to_file ¶
Writes out the set of accumulated contigs to a FASTA file at the path given.
Also generates the accompanying fasta index file (.fa.fai) and sequence
dictionary file (.dict).
Contigs are emitted in the order they were added to the builder. Sequence lines in the FASTA file are wrapped to the line length given when the builder was constructed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path
|
Path to write files to. |
required |
Example: FastaBuilder.to_file(path = pathlib.Path("my_fasta.fa"))
Source code in fgpyo/fasta/builder.py
Functions¶
pysam_dict ¶
Calls pysam.dict and writes the sequence dictionary to the provided output path
Args assembly: Assembly species: Species output_path: File path to write dictionary to input_path: Path to fasta file
Source code in fgpyo/fasta/builder.py
pysam_faidx ¶
Calls pysam.faidx and writes fasta index in the same file location as the fasta file
Args input_path: Path to fasta file