util
Modules¶
inspect ¶
Attributes¶
FieldType
module-attribute
¶
TypeAlias for dataclass Fields or attrs Attributes. It will correspond to the correct type for the corresponding _DataclassesOrAttrClass
Functions¶
attr_from ¶
attr_from(cls: Type[_AttrFromType], kwargs: Dict[str, str], parsers: Optional[Dict[type, Callable[[str], Any]]] = None) -> _AttrFromType
Builds an attr or dataclasses class from key-word arguments
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cls
|
Type[_AttrFromType]
|
the attr or dataclasses class to be built |
required |
kwargs
|
Dict[str, str]
|
a dictionary of keyword arguments |
required |
parsers
|
Optional[Dict[type, Callable[[str], Any]]]
|
a dictionary of parser functions to apply to specific types |
None
|
Source code in fgpyo/util/inspect.py
dict_parser ¶
dict_parser(cls: Type, type_: TypeAlias, parsers: Optional[Dict[type, Callable[[str], Any]]] = None) -> partial
Returns a function that parses a stringified dict into a Dict of the correct type.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cls
|
Type
|
the type of the class object this is being parsed for (used to get default val for parsers) |
required |
type_
|
TypeAlias
|
the type of the attribute to be parsed parsers: an optional mapping from type to the function to use for parsing that type (allows for parsing of more complex types) |
required |
Source code in fgpyo/util/inspect.py
get_fields ¶
get_fields(cls: Union[_DataclassesOrAttrClass, Type[_DataclassesOrAttrClass]]) -> Tuple[FieldType, ...]
Get the fields tuple from either a dataclasses or attr dataclass (or instance)
Source code in fgpyo/util/inspect.py
get_fields_dict ¶
get_fields_dict(cls: Union[_DataclassesOrAttrClass, Type[_DataclassesOrAttrClass]]) -> Mapping[str, FieldType]
Get the fields dict from either a dataclasses or attr dataclass (or instance)
Source code in fgpyo/util/inspect.py
is_attr_class ¶
list_parser ¶
list_parser(cls: Type, type_: TypeAlias, parsers: Optional[Dict[type, Callable[[str], Any]]] = None) -> partial
Returns a function that parses a "stringified" list into a List of the correct type.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cls
|
Type
|
the type of the class object this is being parsed for (used to get default val for parsers) |
required |
type_
|
TypeAlias
|
the type of the attribute to be parsed |
required |
parsers
|
Optional[Dict[type, Callable[[str], Any]]]
|
an optional mapping from type to the function to use for parsing that type (allows for parsing of more complex types) |
None
|
Source code in fgpyo/util/inspect.py
set_parser ¶
set_parser(cls: Type, type_: TypeAlias, parsers: Optional[Dict[type, Callable[[str], Any]]] = None) -> partial
Returns a function that parses a stringified set into a Set of the correct type.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cls
|
Type
|
the type of the class object this is being parsed for (used to get default val for parsers) |
required |
type_
|
TypeAlias
|
the type of the attribute to be parsed |
required |
parsers
|
Optional[Dict[type, Callable[[str], Any]]]
|
an optional mapping from type to the function to use for parsing that type (allows for parsing of more complex types) |
None
|
Source code in fgpyo/util/inspect.py
split_at_given_level ¶
split_at_given_level(field: str, split_delim: str = ',', increase_depth_chars: Iterable[str] = ('{', '(', '['), decrease_depth_chars: Iterable[str] = ('}', ')', ']')) -> List[str]
Splits a nested field by its outer-most level
Note that this method may produce incorrect results fields containing strings containing unpaired characters that increase or decrease the depth
Not currently smart enough to deal with fields enclosed in quotes ('' or "") - TODO
Source code in fgpyo/util/inspect.py
tuple_parser ¶
tuple_parser(cls: Type, type_: TypeAlias, parsers: Optional[Dict[type, Callable[[str], Any]]] = None) -> partial
Returns a function that parses a stringified tuple into a Tuple of the correct type.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cls
|
Type
|
the type of the class object this is being parsed for (used to get default val for parsers) |
required |
type_
|
TypeAlias
|
the type of the attribute to be parsed |
required |
parsers
|
Optional[Dict[type, Callable[[str], Any]]]
|
an optional mapping from type to the function to use for parsing that type (allows for parsing of more complex types) |
None
|
Source code in fgpyo/util/inspect.py
Modules¶
logging ¶
Methods for setting up logging for tools.¶
Progress Logging Examples¶
Frequently input data (SAM/BAM/CRAM/VCF) are iterated in genomic coordinate order. Logging
progress is useful to not only log how many inputs have been consumed, but also their genomic
coordinate. ProgressLogger() can log progress every
fixed number of records. Logging can be written to logging.Logger as well as custom print
method.
>>> from fgpyo.util.logging import ProgressLogger
>>> logged_lines = []
>>> progress = ProgressLogger(
... printer=lambda s: logged_lines.append(s),
... verb="recorded",
... noun="items",
... unit=2
... )
>>> progress.record(reference_name="chr1", position=1) # does not log
False
>>> progress.record(reference_name="chr1", position=2) # logs
True
>>> progress.record(reference_name="chr1", position=3) # does not log
False
>>> progress.log_last() # will log the last recorded item, if not previously logged
True
>>> logged_lines # show the lines logged
['recorded 2 items: chr1:2', 'recorded 3 items: chr1:3']
Classes¶
ProgressLogger ¶
Bases: AbstractContextManager
A little class to track progress.
This will output a log message every unit number times recorded.
Attributes:
| Name | Type | Description |
|---|---|---|
printer |
Callable[[str], Any]
|
either a Logger (in which case progress will be printed at Info) or a lambda that consumes a single string |
noun |
str
|
the noun to use in the log message |
verb |
str
|
the verb to use in the log message |
unit |
int
|
the number of items for every log message |
count |
int
|
the total count of items recorded |
Source code in fgpyo/util/logging.py
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 | |
Functions¶
Force logging the last record, for example when progress has completed.
Source code in fgpyo/util/logging.py
Record an item at a given genomic coordinate. Args: reference_name: the reference name of the item position: the 1-based start position of the item Returns: true if a message was logged, false otherwise
Source code in fgpyo/util/logging.py
Correctly record pysam.AlignedSegments (zero-based coordinates).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rec
|
AlignedSegment
|
pysam.AlignedSegment object |
required |
Returns:
| Type | Description |
|---|---|
bool
|
true if a message was logged, false otherwise |
Source code in fgpyo/util/logging.py
Correctly record multiple pysam.AlignedSegments (zero-based coordinates).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
recs
|
Iterable[AlignedSegment]
|
pysam.AlignedSegment objects |
required |
Returns:
| Type | Description |
|---|---|
bool
|
true if a message was logged, false otherwise |
Source code in fgpyo/util/logging.py
Functions¶
setup_logging ¶
Globally configure logging for all modules
Configures logging to run at a specific level and output messages to stderr with useful information preceding the actual log message.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
level
|
str
|
the default level for the logger |
'INFO'
|
name
|
str
|
the name of the logger |
'fgpyo'
|
Source code in fgpyo/util/logging.py
metric ¶
Metrics¶
Module for storing, reading, and writing metric-like tab-delimited information.
Metric files are tab-delimited, contain a header, and zero or more rows for metric values. This
makes it easy for them to be read in languages like R. For example, a row per person, with
columns for age, gender, and address.
The Metric() class makes it easy to read, write, and store
one or metrics of the same type, all the while preserving types for each value in a metric. It is
an abstract base class decorated by
@dataclass, or
@attr.s, with attributes storing one or more
typed values. If using multiple layers of inheritance, keep in mind that it's not possible to mix
these dataclass utils, e.g. a dataclasses class derived from an attr class will not appropriately
initialize the values of the attr superclass.
Examples¶
Defining a new metric class:
>>> from fgpyo.util.metric import Metric
>>> import dataclasses
>>> @dataclasses.dataclass(frozen=True)
... class Person(Metric["Person"]):
... name: str
... age: int
or using attr:
>>> from fgpyo.util.metric import Metric
>>> import attr
>>> from typing import Optional
>>> @attr.s(auto_attribs=True, frozen=True)
... class PersonAttr(Metric["PersonAttr"]):
... name: str
... age: int
... address: Optional[str] = None
Getting the attributes for a metric class. These will be used for the header when reading and writing metric files.
Getting the values from a metric class instance. The values are in the same order as the header.
Writing a list of metrics to a file:
>>> metrics = [
... Person(name="Alice", age=47),
... Person(name="Bob", age=24)
... ]
>>> from pathlib import Path
>>> Person.write(Path("/path/to/metrics.txt"), *metrics)
Then the contents of the written metrics file:
Reading the metrics file back in:
>>> list(Person.read(Path("/path/to/metrics.txt")))
[Person(name='Alice', age=47), Person(name='Bob', age=24)]
Formatting and parsing the values for custom types is supported by overriding the _parsers() and
format_value() methods.
>>> @dataclasses.dataclass(frozen=True)
... class Name:
... first: str
... last: str
... @classmethod
... def parse(cls, value: str) -> "Name":
... fields = value.split(" ")
... return Name(first=fields[0], last=fields[1])
>>> from typing import Dict, Callable, Any
>>> @dataclasses.dataclass(frozen=True)
... class PersonWithName(Metric["PersonWithName"]):
... name: Name
... age: int
... @classmethod
... def _parsers(cls) -> Dict[type, Callable[[str], Any]]:
... return {Name: lambda value: Name.parse(value=value)}
... @classmethod
... def format_value(cls, value: Any) -> str:
... if isinstance(value, Name):
... return f"{value.first} {value.last}"
... else:
... return super().format_value(value=value)
>>> PersonWithName.parse(fields=["john doe", "42"])
PersonWithName(name=Name(first='john', last='doe'), age=42)
>>> PersonWithName(name=Name(first='john', last='doe'), age=42).formatted_values()
['john doe', '42']
Classes¶
Metric ¶
Bases: ABC, Generic[MetricType]
Abstract base class for all metric-like tab-delimited files
Metric files are tab-delimited, contain a header, and zero or more rows for metric values. This
makes it easy for them to be read in languages like R.
Subclasses of Metric() can support parsing and
formatting custom types with _parsers() and
format_value().
Source code in fgpyo/util/metric.py
179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 | |
Functions¶
classmethod
¶The default method to format values of a given type.
By default, this method will comma-delimit list, tuple, and set types, and apply
str to all others.
Dictionaries / mappings will have keys and vals separated by semicolons, and key val pairs delimited by commas.
In addition, lists will be flanked with '[]', tuples with '()' and sets and dictionaries with '{}'
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
Any
|
the value to format. |
required |
Source code in fgpyo/util/metric.py
An iterator over formatted attribute values in the same order as the header.
An iterator over formatted attribute values in the same order as the header.
classmethod
¶An iterator over field names and their corresponding values in the same order as the header.
Source code in fgpyo/util/metric.py
classmethod
¶An iterator over field names in the same order as the header.
classmethod
¶Parses the string-representation of this metric. One string per attribute should be given.
Source code in fgpyo/util/metric.py
classmethod
¶read(path: Path, ignore_extra_fields: bool = True, strip_whitespace: bool = False, threads: Optional[int] = None) -> Iterator[Any]
Reads in zero or more metrics from the given path.
The metric file must contain a matching header.
Columns that are not present in the file but are optional in the metric class will be default values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path
|
the path to the metrics file. |
required |
ignore_extra_fields
|
bool
|
True to ignore any extra columns, False to raise an exception. |
True
|
strip_whitespace
|
bool
|
True to strip leading and trailing whitespace from each field, False to keep as-is. |
False
|
threads
|
Optional[int]
|
the number of threads to use when decompressing gzip files |
None
|
Source code in fgpyo/util/metric.py
223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 | |
classmethod
¶An iterator over attribute values in the same order as the header.
classmethod
¶Writes zero or more metrics to the given path.
The header will always be written.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path
|
Path to the output file. |
required |
values
|
MetricType
|
Zero or more metrics. |
()
|
threads
|
Optional[int]
|
the number of threads to use when compressing gzip files |
None
|
Source code in fgpyo/util/metric.py
MetricFileHeader
dataclass
¶
Header of a file.
A file's header contains an optional preamble, consisting of lines prefixed by a comment character and/or empty lines, and a required row of fieldnames before the data rows begin.
Attributes:
| Name | Type | Description |
|---|---|---|
preamble |
List[str]
|
A list of any lines preceding the fieldnames. |
fieldnames |
List[str]
|
The field names specified in the final line of the header. |
Source code in fgpyo/util/metric.py
MetricWriter ¶
Bases: Generic[MetricType], AbstractContextManager
Source code in fgpyo/util/metric.py
462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 | |
Functions¶
__init__(filename: Union[Path, str], metric_class: Type[Metric], append: bool = False, delimiter: str = '\t', include_fields: Optional[List[str]] = None, exclude_fields: Optional[List[str]] = None, lineterminator: str = '\n', threads: Optional[int] = None) -> None
Args:
filename: Path to the file to write.
metric_class: Metric class.
append: If `True`, the file will be appended to. Otherwise, the specified file will be
overwritten.
delimiter: The output file delimiter.
include_fields: If specified, only the listed fieldnames will be included when writing
records to file. Fields will be written in the order provided.
May not be used together with `exclude_fields`.
exclude_fields: If specified, any listed fieldnames will be excluded when writing
records to file.
May not be used together with `include_fields`.
lineterminator: The string used to terminate lines produced by the MetricWriter.
Default = "
". threads: the number of threads to use when compressing gzip files
Raises:
TypeError: If the provided metric class is not a dataclass- or attr-decorated
subclass of `Metric`.
AssertionError: If the provided filepath is not writable.
AssertionError: If `append=True` and the provided file is not readable. (When appending,
we check to ensure that the header matches the specified metric class. The file must
be readable to get the header.)
ValueError: If `append=True` and the provided file is a FIFO (named pipe).
ValueError: If `append=True` and the provided file does not include a header.
ValueError: If `append=True` and the header of the provided file does not match the
specified metric class and the specified include/exclude fields.
Source code in fgpyo/util/metric.py
Write a single Metric instance to file.
The Metric is converted to a dictionary and then written using the underlying
csv.DictWriter. If the MetricWriter was created using the include_fields or
exclude_fields arguments, the fields of the Metric are subset and/or reordered
accordingly before writing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metric
|
MetricType
|
An instance of the specified Metric. |
required |
Raises:
| Type | Description |
|---|---|
TypeError
|
If the provided |
Source code in fgpyo/util/metric.py
Write multiple Metric instances to file.
Each Metric is converted to a dictionary and then written using the underlying
csv.DictWriter. If the MetricWriter was created using the include_fields or
exclude_fields arguments, the attributes of each Metric are subset and/or reordered
accordingly before writing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metrics
|
Iterable[MetricType]
|
A sequence of instances of the specified Metric. |
required |
Source code in fgpyo/util/metric.py
Modules¶
string ¶
Functions¶
column_it ¶
A simple version of Unix's column utility. This assumes the table is NxM.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rows
|
List[List[str]]
|
the rows to adjust. Each row must have the same number of delimited fields. |
required |
delimiter
|
str
|
the delimiter for each field in a row. |
' '
|
Source code in fgpyo/util/string.py
types ¶
Attributes¶
TypeAnnotation
module-attribute
¶
A function parameter's type annotation may be any of the following:
1) type, when declaring any of the built-in Python types
2) typing._GenericAlias, when declaring generic collection types or union types using pre-PEP
585 and pre-PEP 604 syntax (e.g. List[int], Optional[int], or Union[int, None])
3) types.UnionType, when declaring union types using PEP604 syntax (e.g. int | None)
4) types.GenericAlias, when declaring generic collection types using PEP 585 syntax (e.g.
list[int])
types.GenericAlias is a subclass of type, but typing._GenericAlias and types.UnionType are
not and must be considered explicitly.
Functions¶
is_constructible_from_str ¶
Returns true if the provided type can be constructed from a string
Source code in fgpyo/util/types.py
is_list_like ¶
make_enum_parser ¶
make_literal_parser ¶
make_literal_parser(literal: Type[LiteralType], parsers: Iterable[Callable[[str], LiteralType]]) -> partial
Generates a parser function for a literal type object and a set of parsers for the possible parsers to that literal type object
Source code in fgpyo/util/types.py
make_union_parser ¶
Generates a parser function for a union type object and set of parsers for the possible parsers to that union type object
Source code in fgpyo/util/types.py
none_parser ¶
Returns None if the value is 'None', else raises an error
parse_bool ¶
Parses strings into bools accounting for the many different text representations of bools that can be used