sam
Utility Classes and Methods for SAM/BAM¶
This module contains utility classes for working with SAM/BAM files and the data contained within them. This includes i) utilities for opening SAM/BAM files for reading and writing, ii) functions for manipulating supplementary alignments, iii) classes and functions for maniuplating CIGAR strings, and iv) a class for building sam records and files for testing.
Motivation for Reader and Writer methods¶
The following are the reasons for choosing to implement methods to open a SAM/BAM file for
reading and writing, rather than relying on pysam.AlignmentFile directly:
- Provides a centralized place for the implementation of opening a SAM/BAM for reading and writing. This is useful if any additional parameters are added, or changes to standards or defaults are made.
- Makes the requirement to provide a header when opening a file for writing more explicit.
- Adds support for
pathlib.Path. - Remove the reliance on specifying the mode correctly, including specifying the file type (i.e. SAM, BAM, or CRAM), as well as additional options (ex. compression level). This makes the code more explicit and easier to read.
- An explicit check is performed to ensure the file type is specified when writing using a file-like object rather than a path to a file.
Examples of Opening a SAM/BAM for Reading or Writing¶
Opening a SAM/BAM file for reading, auto-recognizing the file-type by the file extension. See
SamFileType() for the supported file types.
>>> from fgpyo.sam import reader
>>> with reader("/path/to/sample.sam") as fh:
... for record in fh:
... print(record.query_name) # do something
>>> with reader("/path/to/sample.bam") as fh:
... for record in fh:
... print(record.query_name) # do something
Opening a SAM/BAM file for reading, explicitly passing the file type.
>>> from fgpyo.sam import SamFileType
>>> with reader(path="/path/to/sample.ext1", file_type=SamFileType.SAM) as fh:
... for record in fh:
... print(record.query_name) # do something
>>> with reader(path="/path/to/sample.ext2", file_type=SamFileType.BAM) as fh:
... for record in fh:
... print(record.query_name) # do something
Opening a SAM/BAM file for reading, using an existing file-like object
>>> with open("/path/to/sample.sam", "rb") as file_object:
... with reader(path=file_object, file_type=SamFileType.BAM) as fh:
... for record in fh:
... print(record.query_name) # do something
Opening a SAM/BAM file for writing follows similar to the reader()
method, but the SAM file header object is required.
>>> from fgpyo.sam import writer
>>> header: Dict[str, Any] = {
... "HD": {"VN": "1.5", "SO": "coordinate"},
... "RG": [{"ID": "1", "SM": "1_AAAAAA", "LB": "lib", "PL": "ILLUMINA", "PU": "xxx.1"}],
... "SQ": [
... {"SN": "chr1", "LN": 249250621},
... {"SN": "chr2", "LN": 243199373}
... ]
... }
>>> with writer(path="/path/to/sample.bam", header=header) as fh:
... pass # do something
Examples of Manipulating Cigars¶
Creating a Cigar from a pysam.AlignedSegment.
>>> from fgpyo.sam import Cigar
>>> with reader("/path/to/sample.sam") as fh:
... record = next(fh)
... cigar = Cigar.from_cigartuples(record.cigartuples)
... print(str(cigar))
50M2D5M10S
Creating a Cigar from a str().
If the cigar string is invalid, the exception message will show you the problem character(s) in square brackets.
>>> cigar = Cigar.from_cigarstring("10M5U")
Traceback (most recent call last):
...
fgpyo.sam.CigarParsingException: Malformed cigar: 10M5[U]
The cigar contains a tuple of CigarElement()s. Each element
contains the cigar operator (CigarOp()) and associated operator
length. A number of useful methods are part of both classes.
The number of bases aligned on the query (i.e. the number of bases consumed by the cigar from the query):
>>> cigar = Cigar.from_cigarstring("50M2D5M2I10S")
>>> [e.length_on_query for e in cigar.elements]
[50, 0, 5, 2, 10]
>>> [e.length_on_target for e in cigar.elements]
[50, 2, 5, 0, 0]
>>> [e.operator.is_indel for e in cigar.elements]
[False, True, False, True, False]
Any particular element can be accessed directly via .elements with its index (and works with
negative indexes and slices):
>>> cigar = Cigar.from_cigarstring("50M2D5M2I10S")
>>> cigar.elements[0].length
50
>>> cigar.elements[1].operator
<CigarOp.D: (2, 'D', False, True)>
>>> cigar.elements[-1].operator
<CigarOp.S: (4, 'S', True, False)>
>>> tuple(x.operator.character for x in cigar.elements[1:3])
('D', 'M')
>>> tuple(x.operator.character for x in cigar.elements[-2:])
('I', 'S')
Examples of parsing the SA tag and individual supplementary alignments¶
>>> from fgpyo.sam import SupplementaryAlignment
>>> sup = SupplementaryAlignment.parse("chr1,123,+,50S100M,60,0")
>>> sup.reference_name
'chr1'
>>> sup.nm
0
>>> from typing import List
>>> sa_tag = "chr1,123,+,50S100M,60,0;chr2,456,-,75S75M,60,1"
>>> sups: List[SupplementaryAlignment] = SupplementaryAlignment.parse_sa_tag(tag=sa_tag)
>>> len(sups)
2
>>> [str(sup.cigar) for sup in sups]
['50S100M', '75S75M']
Attributes¶
DefaultProperlyPairedOrientations
module-attribute
¶
DefaultProperlyPairedOrientations: set[PairOrientation] = {FR}
The default orientations for properly paired reads.
NO_QUERY_QUALITIES
module-attribute
¶
NO_QUERY_QUALITIES: array = qualitystring_to_array(STRING_PLACEHOLDER)
The quality array corresponding to an unavailable query quality string ("*").
NO_REF_INDEX
module-attribute
¶
The reference index to use to indicate no reference in SAM/BAM.
NO_REF_NAME
module-attribute
¶
NO_REF_NAME: str = STRING_PLACEHOLDER
The reference name to use to indicate no reference in SAM/BAM.
NO_REF_POS
module-attribute
¶
The reference position to use to indicate no position in SAM/BAM.
STRING_PLACEHOLDER
module-attribute
¶
The value to use when a string field's information is unavailable.
SamPath
module-attribute
¶
The valid base classes for opening a SAM/BAM/CRAM file.
Classes¶
Cigar ¶
Class representing a cigar string.
Attributes:
| Name | Type | Description |
|---|---|---|
- |
elements (Tuple[CigarElement, ...]
|
zero or more cigar elements |
Source code in fgpyo/sam/__init__.py
511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 | |
Functions¶
from_cigarstring
classmethod
¶
from_cigarstring(cigarstring: str) -> Cigar
Constructs a Cigar from a string returned by pysam.
If "*" is given, returns an empty Cigar.
Source code in fgpyo/sam/__init__.py
from_cigartuples
classmethod
¶
from_cigartuples(cigartuples: Optional[List[Tuple[int, int]]]) -> Cigar
Returns a Cigar from a list of tuples returned by pysam.
Each tuple denotes the operation and length. See
CigarOp() for more information on the
various operators. If None is given, returns an empty Cigar.
Source code in fgpyo/sam/__init__.py
length_on_query ¶
length_on_target ¶
query_alignment_offsets ¶
Gets the 0-based, end-exclusive positions of the first and last aligned base in the query.
The resulting range will contain the range of positions in the SEQ string for
the bases that are aligned.
If counting from the end of the query is desired, use
cigar.reversed().query_alignment_offsets()
Returns:
| Type | Description |
|---|---|
Tuple[int, int]
|
A tuple (start, stop) containing the start and stop positions of the aligned part of the query. These offsets are 0-based and open-ended, with respect to the beginning of the query. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If according to the cigar, there are no aligned query bases. |
Source code in fgpyo/sam/__init__.py
CigarElement ¶
Represents an element in a Cigar
Attributes:
| Name | Type | Description |
|---|---|---|
- |
length (int
|
the length of the element |
- |
operator (CigarOp
|
the operator of the element |
Source code in fgpyo/sam/__init__.py
CigarOp ¶
Bases: Enum
Enumeration of operators that can appear in a Cigar string.
Attributes:
| Name | Type | Description |
|---|---|---|
code |
int
|
The |
character |
int
|
The single character cigar operator. |
consumes_query |
bool
|
True if this operator consumes query bases, False otherwise. |
consumes_target |
bool
|
True if this operator consumes target bases, False otherwise. |
Source code in fgpyo/sam/__init__.py
Attributes¶
is_clipping
property
¶
Returns true if the operator is a soft/hard clip, false otherwise.
Functions¶
from_character
staticmethod
¶
from_character(character: str) -> CigarOp
Returns the operator from the single character.
CigarParsingException ¶
PairOrientation ¶
Bases: Enum
Enumerations of read pair orientations.
Source code in fgpyo/sam/__init__.py
Attributes¶
FR
class-attribute
instance-attribute
¶
A pair orientation for forward-reverse reads ("innie").
RF
class-attribute
instance-attribute
¶
A pair orientation for reverse-forward reads ("outie").
TANDEM
class-attribute
instance-attribute
¶
A pair orientation for tandem (forward-forward or reverse-reverse) reads.
Functions¶
from_recs
classmethod
¶
from_recs(rec1: AlignedSegment, rec2: Optional[AlignedSegment] = None) -> Optional[PairOrientation]
Returns the pair orientation if both reads are mapped to the same reference sequence.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rec1
|
AlignedSegment
|
The first record in the pair. |
required |
rec2
|
Optional[AlignedSegment]
|
The second record in the pair. If None, then mate info on |
None
|
Source code in fgpyo/sam/__init__.py
ReadEditInfo ¶
Counts various stats about how a read compares to a reference sequence.
Attributes:
| Name | Type | Description |
|---|---|---|
matches |
int
|
the number of bases in the read that match the reference |
mismatches |
int
|
the number of mismatches between the read sequence and the reference sequence as dictated by the alignment. Like as defined for the SAM NM tag computation, any base except A/C/G/T in the read is considered a mismatch. |
insertions |
int
|
the number of insertions in the read vs. the reference. I.e. the number of I operators in the CIGAR string. |
inserted_bases |
int
|
the total number of bases contained within insertions in the read |
deletions |
int
|
the number of deletions in the read vs. the reference. I.e. the number of D operators in the CIGAT string. |
deleted_bases |
int
|
the total number of that are deleted within the alignment (i.e. bases in the reference but not in the read). |
nm |
int
|
the computed value of the SAM NM tag, calculated as mismatches + inserted_bases + deleted_bases |
Source code in fgpyo/sam/__init__.py
SamFileType ¶
Bases: Enum
Enumeration of valid SAM/BAM/CRAM file types.
Attributes:
| Name | Type | Description |
|---|---|---|
mode |
str
|
The additional mode character to add when opening this file type. |
ext |
str
|
The standard file extension for this file type. |
Source code in fgpyo/sam/__init__.py
Attributes¶
Functions¶
from_path
classmethod
¶
from_path(path: Union[Path, str]) -> SamFileType
Infers the file type based on the file extension.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Union[Path, str]
|
the path to the SAM/BAM/CRAM to read or write. |
required |
Source code in fgpyo/sam/__init__.py
SamOrder ¶
Bases: Enum
Enumerations of possible sort orders for a SAM file.
Source code in fgpyo/sam/__init__.py
SupplementaryAlignment ¶
Stores a supplementary alignment record produced by BWA and stored in the SA SAM tag.
Attributes:
| Name | Type | Description |
|---|---|---|
reference_name |
str
|
the name of the reference (i.e. contig, chromosome) aligned to |
start |
int
|
the 0-based start position of the alignment |
is_forward |
bool
|
true if the alignment is in the forward strand, false otherwise |
cigar |
Cigar
|
the cigar for the alignment |
mapq |
int
|
the mapping quality |
nm |
int
|
the number of edits |
Source code in fgpyo/sam/__init__.py
800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 | |
Attributes¶
Functions¶
from_read
classmethod
¶
from_read(read: AlignedSegment) -> List[SupplementaryAlignment]
Construct a list of SupplementaryAlignments from the SA tag in a pysam.AlignedSegment.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
read
|
AlignedSegment
|
An alignment. The presence of the "SA" tag is not required. |
required |
Returns:
| Type | Description |
|---|---|
List[SupplementaryAlignment]
|
A list of all SupplementaryAlignments present in the SA tag. |
List[SupplementaryAlignment]
|
If the SA tag is not present, or it is empty, an empty list will be returned. |
Source code in fgpyo/sam/__init__.py
parse
staticmethod
¶
parse(string: str) -> SupplementaryAlignment
Returns a supplementary alignment parsed from the given string. The various fields
should be comma-delimited (ex. chr1,123,-,100M50S,60,4)
Source code in fgpyo/sam/__init__.py
parse_sa_tag
staticmethod
¶
parse_sa_tag(tag: str) -> List[SupplementaryAlignment]
Parses an SA tag of supplementary alignments from a BAM file. If the tag is empty or contains just a single semi-colon then an empty list will be returned. Otherwise a list containing a SupplementaryAlignment per ;-separated value in the tag will be returned.
Source code in fgpyo/sam/__init__.py
Template ¶
A container for alignment records corresponding to a single sequenced template or insert.
It is strongly preferred that new Template instances be created with Template.build()
which will ensure that reads are stored in the correct Template property, and run basic
validations of the Template by default. If constructing Template instances by construction
users are encouraged to use the validate method post-construction.
In the special cases there are alignments records that are both secondary and supplementary
then they will be stored upon the r1_supplementals and r2_supplementals fields only.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
the name of the template/query |
r1 |
Optional[AlignedSegment]
|
Primary non-supplementary alignment for read 1, or None if there is none |
r2 |
Optional[AlignedSegment]
|
Primary non-supplementary alignment for read 2, or None if there is none |
r1_supplementals |
List[AlignedSegment]
|
Supplementary alignments for read 1 |
r2_supplementals |
List[AlignedSegment]
|
Supplementary alignments for read 2 |
r1_secondaries |
List[AlignedSegment]
|
Secondary (non-primary, non-supplementary) alignments for read 1 |
r2_secondaries |
List[AlignedSegment]
|
Secondary (non-primary, non-supplementary) alignments for read 2 |
Source code in fgpyo/sam/__init__.py
1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 | |
Functions¶
all_r1s ¶
Yields all R1 alignments of this template including secondary and supplementary.
Source code in fgpyo/sam/__init__.py
all_r2s ¶
Yields all R2 alignments of this template including secondary and supplementary.
Source code in fgpyo/sam/__init__.py
all_recs ¶
Returns a list with all the records for the template.
Source code in fgpyo/sam/__init__.py
build
staticmethod
¶
build(recs: Iterable[AlignedSegment], validate: bool = True) -> Template
Build a template from a set of records all with the same queryname.
Source code in fgpyo/sam/__init__.py
iterator
staticmethod
¶
iterator(alns: Iterator[AlignedSegment]) -> Iterator[Template]
Returns an iterator over templates. Assumes the input iterable is queryname grouped, and gathers consecutive runs of records sharing a common query name into templates.
Source code in fgpyo/sam/__init__.py
primary_recs ¶
set_mate_info ¶
set_mate_info(is_proper_pair: Callable[[AlignedSegment, AlignedSegment], bool] = is_proper_pair, isize: Callable[[AlignedSegment, AlignedSegment], int] = isize) -> Self
Reset all mate information on every alignment in the template.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
is_proper_pair
|
Callable[[AlignedSegment, AlignedSegment], bool]
|
A function that takes two alignments and determines proper pair status. |
is_proper_pair
|
isize
|
Callable[[AlignedSegment, AlignedSegment], int]
|
A function that takes the two alignments and calculates their isize. |
isize
|
Source code in fgpyo/sam/__init__.py
set_tag ¶
Add a tag to all records associated with the template.
Setting a tag to None will remove the tag.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tag
|
str
|
The name of the tag. |
required |
value
|
Union[str, int, float, None]
|
The value of the tag. |
required |
Source code in fgpyo/sam/__init__.py
validate ¶
Performs sanity checks that all the records in the Template are as expected.
Source code in fgpyo/sam/__init__.py
write_to ¶
Write the records associated with the template to file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
writer
|
AlignmentFile
|
An open, writable AlignmentFile. |
required |
primary_only
|
bool
|
If True, only write primary alignments. |
False
|
Source code in fgpyo/sam/__init__.py
TemplateIterator ¶
Bases: Iterator[Template]
An iterator that converts an iterator over query-grouped reads into an iterator over templates.
Source code in fgpyo/sam/__init__.py
Functions¶
calculate_edit_info ¶
calculate_edit_info(rec: AlignedSegment, reference_sequence: str, reference_offset: Optional[int] = None) -> ReadEditInfo
Constructs a ReadEditInfo instance giving summary stats about how the read aligns to the
reference. Computes the number of mismatches, indels, indel bases and the SAM NM tag.
The read must be aligned.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rec
|
AlignedSegment
|
the read/record for which to calculate values |
required |
reference_sequence
|
str
|
the reference sequence (or fragment thereof) that the read is aligned to |
required |
reference_offset
|
Optional[int]
|
if provided, assume that reference_sequence[reference_offset] is the first base aligned to in reference_sequence, otherwise use r.reference_start |
None
|
Returns:
| Type | Description |
|---|---|
ReadEditInfo
|
a ReadEditInfo with information about how the read differs from the reference |
Source code in fgpyo/sam/__init__.py
is_proper_pair ¶
is_proper_pair(rec1: AlignedSegment, rec2: Optional[AlignedSegment] = None, max_insert_size: int = 1000, orientations: Collection[PairOrientation] = DefaultProperlyPairedOrientations, isize: Callable[[AlignedSegment, AlignedSegment], int] = isize) -> bool
Determines if a pair of records are properly paired or not.
Criteria for records in a proper pair are
- Both records are aligned
- Both records are aligned to the same reference sequence
- The pair orientation of the records is one of the valid pair orientations (default "FR")
- The inferred insert size is not more than a maximum length (default 1000)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rec1
|
AlignedSegment
|
The first record in the pair. |
required |
rec2
|
Optional[AlignedSegment]
|
The second record in the pair. If None, then mate info on |
None
|
max_insert_size
|
int
|
The maximum insert size to consider a pair "proper". |
1000
|
orientations
|
Collection[PairOrientation]
|
The valid set of orientations to consider a pair "proper". |
DefaultProperlyPairedOrientations
|
isize
|
Callable[[AlignedSegment, AlignedSegment], int]
|
A function that takes the two alignments and calculates their isize. |
isize
|
Source code in fgpyo/sam/__init__.py
isize ¶
Computes the insert size ("template length" or "TLEN") for a pair of records.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rec1
|
AlignedSegment
|
The first record in the pair. |
required |
rec2
|
Optional[AlignedSegment]
|
The second record in the pair. If None, then mate info on |
None
|
Source code in fgpyo/sam/__init__.py
reader ¶
reader(path: SamPath, file_type: Optional[SamFileType] = None, unmapped: bool = False) -> AlignmentFile
Opens a SAM/BAM/CRAM for reading.
To read from standard input, provide any of "-", "stdin", or "/dev/stdin" as the input
path.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
SamPath
|
a file handle or path to the SAM/BAM/CRAM to read or write. |
required |
file_type
|
Optional[SamFileType]
|
the file type to assume when opening the file. If None, then the file type will be auto-detected. |
None
|
unmapped
|
bool
|
True if the file is unmapped and has no sequence dictionary, False otherwise. |
False
|
Source code in fgpyo/sam/__init__.py
set_mate_info ¶
set_mate_info(rec1: AlignedSegment, rec2: AlignedSegment, is_proper_pair: Callable[[AlignedSegment, AlignedSegment], bool] = is_proper_pair, isize: Callable[[AlignedSegment, AlignedSegment], int] = isize) -> None
Resets mate pair information between two primary alignments that share a query name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rec1
|
AlignedSegment
|
The first record in the pair. |
required |
rec2
|
AlignedSegment
|
The second record in the pair. |
required |
is_proper_pair
|
Callable[[AlignedSegment, AlignedSegment], bool]
|
A function that takes the two alignments and determines proper pair status. |
is_proper_pair
|
isize
|
Callable[[AlignedSegment, AlignedSegment], int]
|
A function that takes the two alignments and calculates their isize. |
isize
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If rec1 and rec2 are of the same read ordinal. |
ValueError
|
If either rec1 or rec2 is secondary or supplementary. |
ValueError
|
If rec1 and rec2 do not share the same query name. |
Source code in fgpyo/sam/__init__.py
set_mate_info_on_secondary ¶
Set mate info on a secondary alignment from its mate's primary alignment.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
secondary
|
AlignedSegment
|
The secondary alignment to set mate information upon. |
required |
mate_primary
|
AlignedSegment
|
The primary alignment of the secondary's mate. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If secondary and mate_primary are of the same read ordinal. |
ValueError
|
If secondary and mate_primary do not share the same query name. |
ValueError
|
If mate_primary is secondary or supplementary. |
ValueError
|
If secondary is not marked as a secondary alignment. |
Source code in fgpyo/sam/__init__.py
set_mate_info_on_supplementary ¶
Set mate info on a supplementary alignment from its mate's primary alignment.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
supp
|
AlignedSegment
|
The supplementary alignment to set mate information upon. |
required |
mate_primary
|
AlignedSegment
|
The primary alignment of the supplementary's mate. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If supp and mate_primary are of the same read ordinal. |
ValueError
|
If supp and mate_primary do not share the same query name. |
ValueError
|
If mate_primary is secondary or supplementary. |
ValueError
|
If supp is not marked as a supplementary alignment. |
Source code in fgpyo/sam/__init__.py
set_pair_info ¶
Resets mate pair information between reads in a pair.
Can be handed reads that already have pairing flags setup or independent R1 and R2 records that are currently flagged as SE reads.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
r1
|
AlignedSegment
|
Read 1 (first read in the template). |
required |
r2
|
AlignedSegment
|
Read 2 with the same query name as r1 (second read in the template). |
required |
proper_pair
|
bool
|
whether the pair is proper or not. |
True
|
Source code in fgpyo/sam/__init__.py
sum_of_base_qualities ¶
Calculate the sum of base qualities score for an alignment record.
This function is useful for calculating the "mate score" as implemented in samtools fixmate.
Consistently with samtools fixmate, this function returns 0 if the record has no base
qualities.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rec
|
AlignedSegment
|
The alignment record to calculate the sum of base qualities from. |
required |
min_quality_score
|
int
|
The minimum base quality score to use for summation. |
15
|
Returns:
| Type | Description |
|---|---|
int
|
The sum of base qualities on the input record. 0 if the record has no base qualities. |
Source code in fgpyo/sam/__init__.py
writer ¶
writer(path: SamPath, header: Union[str, Dict[str, Any], AlignmentHeader], file_type: Optional[SamFileType] = None) -> AlignmentFile
Opens a SAM/BAM/CRAM for writing.
To write to standard output, provide any of "-", "stdout", or "/dev/stdout" as the output
path. Note: When writing to stdout, the file_type must be given.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
SamPath
|
a file handle or path to the SAM/BAM/CRAM to read or write. |
required |
header
|
Union[str, Dict[str, Any], AlignmentHeader]
|
Either a string to use for the header or a multi-level dictionary. The multi-level dictionary should be given as follows. The first level are the four types (‘HD’, ‘SQ’, ...). The second level are a list of lines, with each line being a list of tag-value pairs. The header is constructed first from all the defined fields, followed by user tags in alphabetical order. |
required |
file_type
|
Optional[SamFileType]
|
the file type to assume when opening the file. If |
None
|
Source code in fgpyo/sam/__init__.py
Modules¶
builder ¶
Classes for generating SAM and BAM files and records for testing¶
This module contains utility classes for the generation of SAM and BAM files and alignment records, for use in testing.
Classes¶
SamBuilder ¶
Builder for constructing one or more sam records (AlignmentSegments in pysam terms).
Provides the ability to manufacture records from minimal arguments, while generating any remaining attributes to ensure a valid record.
A builder is constructed with a handful of defaults including lengths for generated R1s and R2s, the default base quality score to use, a sequence dictionary and a single read group.
Records are then added using the add_pair()
method. Once accumulated the records can be accessed in the order in which they were created
through the to_unsorted_list()
function, or in a list sorted by coordinate order via
to_sorted_list(). The latter creates
a temporary file to do the sorting and is somewhat slower as a result. Lastly, the records can
be written to a temporary file using
to_path().
Source code in fgpyo/sam/builder.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 | |
Attributes¶
Functions¶
__init__(r1_len: Optional[int] = None, r2_len: Optional[int] = None, base_quality: int = 30, mapping_quality: int = 60, sd: Optional[List[Dict[str, Any]]] = None, rg: Optional[Dict[str, str]] = None, extra_header: Optional[Dict[str, Any]] = None, seed: int = 42, sort_order: SamOrder = Coordinate) -> None
Initializes a new SamBuilder for generating alignment records and SAM/BAM files.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
r1_len
|
Optional[int]
|
The length of R1s to create unless otherwise specified |
None
|
r2_len
|
Optional[int]
|
The length of R2s to create unless otherwise specified |
None
|
base_quality
|
int
|
The base quality of bases to create unless otherwise specified |
30
|
sd
|
Optional[List[Dict[str, Any]]]
|
a sequence dictionary as a list of dicts; defaults to calling default_sd() if None |
None
|
rg
|
Optional[Dict[str, str]]
|
a single read group as a dict; defaults to calling default_sd() if None |
None
|
extra_header
|
Optional[Dict[str, Any]]
|
a dictionary of extra values to add to the header, None otherwise. See
|
None
|
seed
|
int
|
a seed value for random number/string generation |
42
|
sort_order
|
SamOrder
|
Order to sort records when writing to file, or output of to_sorted_list() |
Coordinate
|
Source code in fgpyo/sam/builder.py
add_pair(*, name: Optional[str] = None, bases1: Optional[str] = None, bases2: Optional[str] = None, quals1: Optional[List[int]] = None, quals2: Optional[List[int]] = None, chrom: Optional[str] = None, chrom1: Optional[str] = None, chrom2: Optional[str] = None, start1: int = NO_REF_POS, start2: int = NO_REF_POS, cigar1: Optional[str] = None, cigar2: Optional[str] = None, mapq1: Optional[int] = None, mapq2: Optional[int] = None, strand1: str = '+', strand2: str = '-', attrs: Optional[Dict[str, Any]] = None) -> Tuple[AlignedSegment, AlignedSegment]
Generates a new pair of reads, adds them to the internal collection, and returns them.
Most fields are optional.
Mapped pairs can be created by specifying both start1 and start2 and either chrom, for
pairs where both reads map to the same contig, or both chrom1 and chrom2, for pairs
where reads map to different contigs. i.e.:
- `add_pair(chrom, start1, start2)` will create a mapped pair where both reads map to
the same contig (`chrom`).
- `add_pair(chrom1, start1, chrom2, start2)` will create a mapped pair where the reads
map to different contigs (`chrom1` and `chrom2`).
A pair with only one of the two reads mapped can be created by setting only one start position. Flags will automatically be set correctly for the unmapped mate.
- `add_pair(chrom, start1)`
- `add_pair(chrom1, start1)`
- `add_pair(chrom, start2)`
- `add_pair(chrom2, start2)`
An unmapped pair can be created by calling the method with no parameters (specifically,
not setting chrom, chrom1, start1, chrom2, or start2). If either cigar is
provided, it will be ignored.
For a given read (i.e. R1 or R2) the length of the read is determined based on the presence or absence of bases, quals, and cigar. If values are provided for one or more of these parameters, the lengths must match, and the length will be used to generate any unsupplied values. If none of bases, quals, and cigar are provided, all three will be synthesized based on either the r1_len or r2_len stored on the class as appropriate.
When synthesizing, bases are always a random sequence of bases, quals are all the default base quality (supplied when constructing a SamBuilder) and the cigar is always a single M operator of the read length.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
Optional[str]
|
The name of the template. If None is given a unique name will be auto-generated. |
None
|
bases1
|
Optional[str]
|
The bases for R1. If None is given a random sequence is generated. |
None
|
bases2
|
Optional[str]
|
The bases for R2. If None is given a random sequence is generated. |
None
|
quals1
|
Optional[List[int]]
|
The list of int qualities for R1. If None, the default base quality is used. |
None
|
quals2
|
Optional[List[int]]
|
The list of int qualities for R2. If None, the default base quality is used. |
None
|
chrom
|
Optional[str]
|
The chromosome to which both reads are mapped. Defaults to the unmapped value. |
None
|
chrom1
|
Optional[str]
|
The chromosome to which R1 is mapped. If None, |
None
|
chrom2
|
Optional[str]
|
The chromosome to which R2 is mapped. If None, |
None
|
start1
|
int
|
The start position of R1. Defaults to the unmapped value. |
NO_REF_POS
|
start2
|
int
|
The start position of R2. Defaults to the unmapped value. |
NO_REF_POS
|
cigar1
|
Optional[str]
|
The cigar string for R1. Defaults to None for unmapped reads, otherwise all M. |
None
|
cigar2
|
Optional[str]
|
The cigar string for R2. Defaults to None for unmapped reads, otherwise all M. |
None
|
mapq1
|
Optional[int]
|
Mapping quality for R1. Defaults to self.mapping_quality if None. |
None
|
mapq2
|
Optional[int]
|
Mapping quality for R2. Defaults to self.mapping_quality if None. |
None
|
strand1
|
str
|
The strand for R1, either "+" or "-". Defaults to "+". |
'+'
|
strand2
|
str
|
The strand for R2, either "+" or "-". Defaults to "-". |
'-'
|
attrs
|
Optional[Dict[str, Any]]
|
An optional dictionary of SAM attribute to place on both R1 and R2. |
None
|
Raises:
| Type | Description |
|---|---|
ValueError
|
if either strand field is not "+" or "-" |
ValueError
|
if bases/quals/cigar are set in a way that is not self-consistent |
Returns:
| Type | Description |
|---|---|
Tuple[AlignedSegment, AlignedSegment]
|
Tuple[AlignedSegment, AlignedSegment]: The pair of records created, R1 then R2. |
Source code in fgpyo/sam/builder.py
281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 | |
add_single(*, name: Optional[str] = None, read_num: Optional[int] = None, bases: Optional[str] = None, quals: Optional[List[int]] = None, chrom: str = NO_REF_NAME, start: int = NO_REF_POS, cigar: Optional[str] = None, mapq: Optional[int] = None, strand: str = '+', secondary: bool = False, supplementary: bool = False, attrs: Optional[Dict[str, Any]] = None) -> AlignedSegment
Generates a new single reads, adds them to the internal collection, and returns it.
Most fields are optional.
If read_num is None (the default) an unpaired read will be created. If read_num is
set to 1 or 2, the read will have it's paired flag set and read number flags set.
An unmapped read can be created by calling the method with no parameters (specifically, not setting chrom, start1 or start2). If cigar is provided, it will be ignored.
A mapped read is created by providing chrom and start.
The length of the read is determined based on the presence or absence of bases, quals, and cigar. If values are provided for one or more of these parameters, the lengths must match, and the length will be used to generate any unsupplied values. If none of bases, quals, and cigar are provided, all three will be synthesized based on either the r1_len or r2_len stored on the class as appropriate.
When synthesizing, bases are always a random sequence of bases, quals are all the default base quality (supplied when constructing a SamBuilder) and the cigar is always a single M operator of the read length.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
Optional[str]
|
The name of the template. If None is given a unique name will be auto-generated. |
None
|
read_num
|
Optional[int]
|
Either None, 1 for R1 or 2 for R2 |
None
|
bases
|
Optional[str]
|
The bases for the read. If None is given a random sequence is generated. |
None
|
quals
|
Optional[List[int]]
|
The list of qualities for the read. If None, the default base quality is used. |
None
|
chrom
|
str
|
The chromosome to which both reads are mapped. Defaults to the unmapped value. |
NO_REF_NAME
|
start
|
int
|
The start position of the read. Defaults to the unmapped value. |
NO_REF_POS
|
cigar
|
Optional[str]
|
The cigar string for R1. Defaults to None for unmapped reads, otherwise all M. |
None
|
mapq
|
Optional[int]
|
Mapping quality for the read. Default to self.mapping_quality if not given. |
None
|
strand
|
str
|
The strand for R1, either "+" or "-". Defaults to "+". |
'+'
|
secondary
|
bool
|
If true the read will be flagged as secondary |
False
|
supplementary
|
bool
|
If true the read will be flagged as supplementary |
False
|
attrs
|
Optional[Dict[str, Any]]
|
An optional dictionary of SAM attribute to place on both R1 and R2. |
None
|
Raises:
| Type | Description |
|---|---|
ValueError
|
if strand field is not "+" or "-" |
ValueError
|
if read_num is not None, 1 or 2 |
ValueError
|
if bases/quals/cigar are set in a way that is not self-consistent |
Returns:
| Name | Type | Description |
|---|---|---|
AlignedSegment |
AlignedSegment
|
The record created |
Source code in fgpyo/sam/builder.py
420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 | |
staticmethod
¶Returns the default read group used by the SamBuilder, as a dictionary.
staticmethod
¶Generates the sequence dictionary that is used by default by SamBuilder.
Matches the names and lengths of the HG19 reference in use in production.
Returns:
| Type | Description |
|---|---|
List[Dict[str, Any]]
|
A new copy of the sequence dictionary as a list of dictionaries, one per chromosome. |
Source code in fgpyo/sam/builder.py
Returns the single read group that is defined in the header.
Source code in fgpyo/sam/builder.py
Returns the ID of the single read group that is defined in the header.
Source code in fgpyo/sam/builder.py
to_path(path: Optional[Path] = None, index: bool = True, pred: Callable[[AlignedSegment], bool] = lambda r: True, tmp_file_type: Optional[SamFileType] = None) -> Path
Write the accumulated records to a file, sorts & indexes it, and returns the Path. If a path is provided, it will be written to, otherwise a temporary file is created and returned.
If path is provided, tmp_file_type may not be provided. In this case, the file type
(SAM/BAM/CRAM) will be automatically determined by the file extension when a path
is provided. See ~pysam for more details.
If path is not provided, the file type will default to BAM unless tmp_file_type is
provided.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Optional[Path]
|
a path at which to write the file, otherwise a temp file is used. |
None
|
index
|
bool
|
if True and |
True
|
pred
|
Callable[[AlignedSegment], bool]
|
optional predicate to specify which reads should be output |
lambda r: True
|
tmp_file_type
|
Optional[SamFileType]
|
the file type to output when a path is not provided (default is BAM) |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
Path |
Path
|
The path to the sorted (and possibly indexed) file. |
Source code in fgpyo/sam/builder.py
Returns the accumulated records in coordinate order.
Source code in fgpyo/sam/builder.py
clipping ¶
Utility Functions for Soft-Clipping records in SAM/BAM Files¶
This module contains utility functions for soft-clipping reads. There are four variants that support clipping the beginnings and ends of reads, and specifying the amount to be clipped in terms of query bases or reference bases:
softclip_start_of_alignment_by_query()clips the start of the alignment in terms of query basessoftclip_end_of_alignment_by_query()clips the end of the alignment in terms of query basessoftclip_start_of_alignment_by_ref()clips the start of the alignment in terms of reference basessoftclip_end_of_alignment_by_ref()clips the end of the alignment in terms of reference bases
The difference between query and reference based versions is apparent only when there are insertions or deletions in the read as indels have lengths on either the query (insertions) or reference (deletions) but not both.
Upon clipping a set of additional SAM tags are removed from reads as they are likely invalid.
For example, to clip the last 10 query bases of all records and reduce the qualities to Q2:
>>> from fgpyo.sam import reader, clipping
>>> with reader("./tests/fgpyo/sam/data/valid.sam") as fh:
... for rec in fh:
... before = rec.cigarstring
... info = clipping.softclip_end_of_alignment_by_query(rec, 10, 2)
... after = rec.cigarstring
... print(f"before: {before} after: {after} info: {info}")
before: 101M after: 91M10S info: ClippingInfo(query_bases_clipped=10, ref_bases_clipped=10)
before: 101M after: 91M10S info: ClippingInfo(query_bases_clipped=10, ref_bases_clipped=10)
before: 101M after: 91M10S info: ClippingInfo(query_bases_clipped=10, ref_bases_clipped=10)
before: 101M after: 91M10S info: ClippingInfo(query_bases_clipped=10, ref_bases_clipped=10)
before: 101M after: 91M10S info: ClippingInfo(query_bases_clipped=10, ref_bases_clipped=10)
before: 101M after: 91M10S info: ClippingInfo(query_bases_clipped=10, ref_bases_clipped=10)
before: 10M1D10M5I76M after: 10M1D10M5I66M10S info: ClippingInfo(query_bases_clipped=10, ref_bases_clipped=10)
before: None after: None info: ClippingInfo(query_bases_clipped=0, ref_bases_clipped=0)
It should be noted that any clipping potentially makes the common SAM tags NM, MD and UQ invalid, as well as potentially other alignment based SAM tags. Any clipping added to the start of an alignment changes the position (reference_start) of the record. Any reads that have no aligned bases after clipping are set to be unmapped. If writing the clipped reads back to a BAM it should be noted that:
- Mate pairs may have incorrect information about their mate's positions
- Even if the input was coordinate sorted, the output may be out of order
To rectify these problems it is necessary to do the equivalent of:
Classes¶
ClippingInfo ¶
Bases: NamedTuple
Named tuple holding the number of bases clipped on the query and reference respectively.
Source code in fgpyo/sam/clipping.py
Functions¶
softclip_end_of_alignment_by_query ¶
softclip_end_of_alignment_by_query(rec: AlignedSegment, bases_to_clip: int, clipped_base_quality: Optional[int] = None, tags_to_invalidate: Iterable[str] = TAGS_TO_INVALIDATE) -> ClippingInfo
Adds soft-clipping to the end of a read's alignment.
Clipping is applied before any existing hard or soft clipping. E.g. a read with cigar 100M5S that is clipped with bases_to_clip=10 will yield a cigar of 90M15S.
If the read is unmapped or bases_to_clip < 1 then nothing is done.
If the read has fewer clippable bases than requested the read will be unmapped.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rec
|
AlignedSegment
|
the BAM record to clip |
required |
bases_to_clip
|
int
|
the number of additional bases of clipping desired in the read/query |
required |
clipped_base_quality
|
Optional[int]
|
if not None, set bases in the clipped region to this quality |
None
|
tags_to_invalidate
|
Iterable[str]
|
the set of extended attributes to remove upon clipping |
TAGS_TO_INVALIDATE
|
Returns:
| Name | Type | Description |
|---|---|---|
ClippingInfo |
ClippingInfo
|
a named tuple containing the number of query/read bases and the number of target/reference bases clipped. |
Source code in fgpyo/sam/clipping.py
softclip_end_of_alignment_by_ref ¶
softclip_end_of_alignment_by_ref(rec: AlignedSegment, bases_to_clip: int, clipped_base_quality: Optional[int] = None, tags_to_invalidate: Iterable[str] = TAGS_TO_INVALIDATE) -> ClippingInfo
Soft-clips the end of an alignment by bases_to_clip bases on the reference.
Clipping is applied beforeany existing hard or soft clipping. E.g. a read with cigar 100M5S that is clipped with bases_to_clip=10 will yield a cigar of 90M15S.
If the read is unmapped or bases_to_clip < 1 then nothing is done.
If the read has fewer clippable bases than requested the read will be unmapped.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rec
|
AlignedSegment
|
the BAM record to clip |
required |
bases_to_clip
|
int
|
the number of additional bases of clipping desired on the reference |
required |
clipped_base_quality
|
Optional[int]
|
if not None, set bases in the clipped region to this quality |
None
|
tags_to_invalidate
|
Iterable[str]
|
the set of extended attributes to remove upon clipping |
TAGS_TO_INVALIDATE
|
Returns:
| Name | Type | Description |
|---|---|---|
ClippingInfo |
ClippingInfo
|
a named tuple containing the number of query/read bases and the number of target/reference bases clipped. |
Source code in fgpyo/sam/clipping.py
softclip_start_of_alignment_by_query ¶
softclip_start_of_alignment_by_query(rec: AlignedSegment, bases_to_clip: int, clipped_base_quality: Optional[int] = None, tags_to_invalidate: Iterable[str] = TAGS_TO_INVALIDATE) -> ClippingInfo
Adds soft-clipping to the start of a read's alignment.
Clipping is applied after any existing hard or soft clipping. E.g. a read with cigar 5S100M that is clipped with bases_to_clip=10 will yield a cigar of 15S90M.
If the read is unmapped or bases_to_clip < 1 then nothing is done.
If the read has fewer clippable bases than requested the read will be unmapped.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rec
|
AlignedSegment
|
the BAM record to clip |
required |
bases_to_clip
|
int
|
the number of additional bases of clipping desired in the read/query |
required |
clipped_base_quality
|
Optional[int]
|
if not None, set bases in the clipped region to this quality |
None
|
tags_to_invalidate
|
Iterable[str]
|
the set of extended attributes to remove upon clipping |
TAGS_TO_INVALIDATE
|
Returns:
| Name | Type | Description |
|---|---|---|
ClippingInfo |
ClippingInfo
|
a named tuple containing the number of query/read bases and the number of target/reference bases clipped. |
Source code in fgpyo/sam/clipping.py
softclip_start_of_alignment_by_ref ¶
softclip_start_of_alignment_by_ref(rec: AlignedSegment, bases_to_clip: int, clipped_base_quality: Optional[int] = None, tags_to_invalidate: Iterable[str] = TAGS_TO_INVALIDATE) -> ClippingInfo
Soft-clips the start of an alignment by bases_to_clip bases on the reference.
Clipping is applied after any existing hard or soft clipping. E.g. a read with cigar 5S100M that is clipped with bases_to_clip=10 will yield a cigar of 15S90M.
If the read is unmapped or bases_to_clip < 1 then nothing is done.
If the read has fewer clippable bases than requested the read will be unmapped.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rec
|
AlignedSegment
|
the BAM record to clip |
required |
bases_to_clip
|
int
|
the number of additional bases of clipping desired on the reference |
required |
clipped_base_quality
|
Optional[int]
|
if not None, set bases in the clipped region to this quality |
None
|
tags_to_invalidate
|
Iterable[str]
|
the set of extended attributes to remove upon clipping |
TAGS_TO_INVALIDATE
|
Returns:
| Name | Type | Description |
|---|---|---|
ClippingInfo |
ClippingInfo
|
a named tuple containing the number of query/read bases and the number of target/reference bases clipped. |