read_structure
Classes for representing Read Structures¶
A Read Structure refers to a String that describes how the bases in a sequencing run should be
allocated into logical reads. It serves a similar purpose to the --use-bases-mask in Illumina's
bcltofastq software, but provides some additional capabilities.
A Read Structure is a sequence of <number><operator> pairs or segments where, optionally, the last
segment in the string is allowed to use + instead of a number for its length. The + translates
to whatever bases are left after the other segments are processed and can be thought of as meaning
[0..infinity].
See more at: https://github.com/fulcrumgenomics/fgbio/wiki/Read-Structures
Examples¶
>>> from fgpyo.read_structure import ReadStructure
>>> rs = ReadStructure.from_string("75T8B75T")
>>> [str(segment) for segment in rs]
['75T', '8B', '75T']
>>> rs[0]
ReadSegment(offset=0, length=75, kind=<SegmentType.Template: 'T'>)
>>> rs = rs.with_variable_last_segment()
>>> [str(segment) for segment in rs]
['75T', '8B', '+T']
>>> rs[-1]
ReadSegment(offset=83, length=None, kind=<SegmentType.Template: 'T'>)
>>> rs = ReadStructure.from_string("1B2M+T")
>>> [s.bases for s in rs.extract("A"*6)]
['A', 'AA', 'AAA']
>>> [s.bases for s in rs.extract("A"*5)]
['A', 'AA', 'AA']
>>> [s.bases for s in rs.extract("A"*4)]
['A', 'AA', 'A']
>>> [s.bases for s in rs.extract("A"*3)]
['A', 'AA', '']
>>> rs.template_segments()
(ReadSegment(offset=3, length=None, kind=<SegmentType.Template: 'T'>),)
>>> [str(segment) for segment in rs.template_segments()]
['+T']
>>> try:
... ReadStructure.from_string("23T2TT23T")
... except ValueError as ex:
... print(str(ex))
Read structure missing length information: 23T2T[T]23T
Attributes¶
ANY_LENGTH_CHAR
module-attribute
¶
A character that can be put in place of a number in a read structure to mean "0 or more bases".
Classes¶
ReadSegment ¶
Encapsulates all the information about a segment within a read structure. A segment can either have a definite length, in which case length must be Some(Int), or an indefinite length (can be any length, 0 or more) in which case length must be None.
Attributes:
| Name | Type | Description |
|---|---|---|
offset |
int
|
The offset of the read segment in the read. |
length |
Optional[int]
|
The length of the segment, or None if it is variable length. |
kind |
SegmentType
|
The kind of read segment. |
Source code in fgpyo/read_structure.py
Attributes¶
fixed_length
property
¶
The fixed length if there is one. Throws an exception on segments without fixed lengths!
Functions¶
extract ¶
extract(bases: str) -> SubReadWithoutQuals
Gets the bases associated with this read segment.
extract_with_quals ¶
extract_with_quals(bases: str, quals: str) -> SubReadWithQuals
Gets the bases and qualities associated with this read segment.
Source code in fgpyo/read_structure.py
ReadStructure ¶
Bases: Iterable[ReadSegment]
Describes the structure of a give read. A read contains one or more read segments. A read segment describes a contiguous stretch of bases of the same type (ex. template bases) of some length and some offset from the start of the read.
Attributes:
| Name | Type | Description |
|---|---|---|
segments |
Tuple[ReadSegment, ...]
|
The segments composing the read structure |
Source code in fgpyo/read_structure.py
195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 | |
Attributes¶
fixed_length
property
¶
The fixed length if there is one. Throws an exception on segments without fixed lengths!
has_fixed_length
property
¶
True if the ReadStructure has a fixed (i.e. non-variable) length
length
property
¶
Length is defined as the number of segments (not bases!) in the read structure
Functions¶
extract ¶
extract(bases: str) -> Tuple[SubReadWithoutQuals, ...]
Splits the given bases into tuples with its associated read segment.
extract_with_quals ¶
extract_with_quals(bases: str, quals: str) -> Tuple[SubReadWithQuals, ...]
Splits the given bases and qualities into triples with its associated read segment.
Source code in fgpyo/read_structure.py
from_segments
classmethod
¶
from_segments(segments: Tuple[ReadSegment, ...], reset_offsets: bool = False) -> ReadStructure
Creates a new ReadStructure, optionally resetting the offsets on each of the segments
Source code in fgpyo/read_structure.py
segments_by_kind ¶
segments_by_kind(kind: SegmentType) -> Tuple[ReadSegment, ...]
with_variable_last_segment ¶
with_variable_last_segment() -> ReadStructure
Generates a new ReadStructure that is the same as this one except that the last segment has undefined length
Source code in fgpyo/read_structure.py
SegmentType ¶
Bases: Enum
The type of segments that can show up in a read structure
Source code in fgpyo/read_structure.py
Attributes¶
CellBarcode
class-attribute
instance-attribute
¶
The segment type for cell barcode bases.
MolecularBarcode
class-attribute
instance-attribute
¶
The segment type for molecular barcode bases.
SampleBarcode
class-attribute
instance-attribute
¶
The segment type for sample barcode bases.
Skip
class-attribute
instance-attribute
¶
The segment type for bases that need to be skipped.
SubReadWithQuals ¶
Contains the bases and qualities that correspond to the given read segment
Source code in fgpyo/read_structure.py
Attributes¶
quals
instance-attribute
¶
The sub-read base qualities that correspond to the given read segment.
segment
instance-attribute
¶
segment: ReadSegment
The segment of the read structure that describes this sub-read.
SubReadWithoutQuals ¶
Contains the bases that correspond to the given read segment.
Source code in fgpyo/read_structure.py
Attributes¶
segment
instance-attribute
¶
segment: ReadSegment
The segment of the read structure that describes this sub-read.