platform

Modules¶

illumina ¶

Methods for working with Illumina-specific UMIs in SAM files¶

The functions in this module make it easy to:

check whether a UMI is valid
extract UMI(s) from an Illumina-style read name
copy a UMI from an alignment's read name to its RX SAM tag

Attributes¶

SAM_UMI_DELIMITER `module-attribute` ¶

SAM_UMI_DELIMITER: str = '-'

Multiple UMI delimiter, which SAM specification recommends should be a hyphen; see specification here: https://samtools.github.io/hts-specs/SAMtags.pdf

Functions¶

copy_umi_from_read_name ¶

copy_umi_from_read_name(rec: AlignedSegment, strict: bool = False, remove_umi: bool = False) -> bool

Copy a UMI from an alignment's read name to its RX SAM tag. UMI will not be copied to RX tag if invalid.

Parameters:

Name	Type	Description	Default
`rec`	`AlignedSegment`	The alignment record to update.	required
`strict`	`bool`	If `True` and UMI invalid, will throw an exception	`False`
`remove_umi`	`bool`	If `True`, the UMI will be removed from the read name after copying.	`False`

Returns:

Type	Description
`bool`	`True` if the UMI was successfully extracted, False if otherwise.

Raises:

Type	Description
`ValueError`	If the read name does not end with a valid UMI.
`ValueError`	If the record already has a populated `RX` SAM tag.

Source code in fgpyo/platform/illumina.py

def copy_umi_from_read_name(
    rec: AlignedSegment, strict: bool = False, remove_umi: bool = False
) -> bool:
    """
    Copy a UMI from an alignment's read name to its `RX` SAM tag. UMI will not be copied to RX
    tag if invalid.

    Args:
        rec: The alignment record to update.
        strict: If `True` and UMI invalid, will throw an exception
        remove_umi: If `True`, the UMI will be removed from the read name after copying.

    Returns:
        `True` if the UMI was successfully extracted, False if otherwise.

    Raises:
        ValueError: If the read name does not end with a valid UMI.
        ValueError: If the record already has a populated `RX` SAM tag.
    """

    umi = extract_umis_from_read_name(
        read_name=rec.query_name,
        strict=strict,
        umi_delimiter=_ILLUMINA_READ_NAME_DELIMITER,
    )
    if umi is not None:
        if rec.has_tag("RX"):
            raise ValueError(f"Record {rec.query_name} already has a populated RX tag")
        rec.set_tag(tag="RX", value=umi)
        if remove_umi:
            last_index = rec.query_name.rfind(_ILLUMINA_READ_NAME_DELIMITER)
            rec.query_name = rec.query_name[:last_index] if last_index != -1 else rec.query_name
        return True
    elif strict:
        raise ValueError(f"Invalid UMI {umi} extracted from {rec.query_name}")
    else:
        return False

extract_umis_from_read_name ¶

extract_umis_from_read_name(read_name: str, read_name_delimiter: str = _ILLUMINA_READ_NAME_DELIMITER, umi_delimiter: str = _ILLUMINA_UMI_DELIMITER, strict: bool = False) -> Optional[str]

Extract UMI(s) from an Illumina-style read name.

The UMI is expected to be the final component of the read name, delimited by the read_name_delimiter. Multiple UMIs may be present, delimited by the umi_delimiter. This delimiter will be replaced by the SAM-standard -.