sequence
Utility Functions for Manipulating DNA and RNA sequences.¶
This module contains utility functions for manipulating DNA and RNA sequences.
levenshtein and hamming functions are included for convenience.
If you are performing many distance calculations, using a C based method is preferable.
ex. https://pypi.org/project/Distance/
Functions¶
complement ¶
gc_content ¶
Calculates the fraction of G and C bases in a sequence.
hamming ¶
Calculates hamming distance between two strings, case sensitive. Strings must be of equal lengths.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
string1
|
str
|
first string for comparison |
required |
string2
|
str
|
second string for comparison |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If strings are of different lengths. |
Source code in fgpyo/sequence.py
levenshtein ¶
Calculates levenshtein distance between two strings, case sensitive.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
string1
|
str
|
first string for comparison |
required |
string2
|
str
|
second string for comparison |
required |
Source code in fgpyo/sequence.py
longest_dinucleotide_run_length ¶
Number of bases in the longest dinucleotide run in a primer.
A dinucleotide run is when two nucleotides are repeated in tandem. For example, TTGG (length = 4) or AACCAACCAA (length = 10). If there are no such runs, returns 0.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bases
|
str
|
the bases over which to compute |
required |
Return
the number of bases in the longest dinuc repeat (NOT the number of repeat units)
Source code in fgpyo/sequence.py
longest_homopolymer_length ¶
Calculates the length of the longest homopolymer in the input sequence.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bases
|
str
|
the bases over which to compute |
required |
Return
the length of the longest homopolymer
Source code in fgpyo/sequence.py
longest_hp_length ¶
Calculates the length of the longest homopolymer in the input sequence.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bases
|
str
|
the bases over which to compute |
required |
Return
the length of the longest homopolymer
Source code in fgpyo/sequence.py
longest_multinucleotide_run_length ¶
Number of bases in the longest multi-nucleotide run.
A multi-nucleotide run is when N nucleotides are repeated in tandem. For example, TTGG (length = 4, N=2) or TAGTAGTAG (length = 9, N = 3). If there are no such runs, returns 0.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bases
|
str
|
the bases over which to compute |
required |
repeat_unit_length
|
int
|
the length of the multi-nucleotide repetitive unit (must be > 0) |
required |
Returns:
| Type | Description |
|---|---|
int
|
the number of bases in the longest multinucleotide repeat (NOT the number of repeat units) |
Source code in fgpyo/sequence.py
reverse_complement ¶
Reverse complements a base sequence.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bases
|
str
|
the bases to be reverse complemented. |
required |
Returns:
| Type | Description |
|---|---|
str
|
the reverse complement of the provided base string |