Metrics
Module for storing, reading, and writing metric-like tab-delimited information.
Metric files are tab-delimited, contain a header, and zero or more rows for metric values. This
makes it easy for them to be read in languages like R. For example, a row per person, with
columns for age, gender, and address.
The Metric() class makes it easy to read, write, and store
one or metrics of the same type, all the while preserving types for each value in a metric. It is
an abstract base class decorated by
@dataclass, or
@attr.s, with attributes storing one or more
typed values. If using multiple layers of inheritance, keep in mind that it's not possible to mix
these dataclass utils, e.g. a dataclasses class derived from an attr class will not appropriately
initialize the values of the attr superclass.
Examples
Defining a new metric class:
>>> from fgpyo.util.metric import Metric
>>> import dataclasses
>>> @dataclasses.dataclass(frozen=True)
... class Person(Metric["Person"]):
... name: str
... age: int
or using attr:
>>> from fgpyo.util.metric import Metric
>>> import attr
>>> from typing import Optional
>>> @attr.s(auto_attribs=True, frozen=True)
... class PersonAttr(Metric["PersonAttr"]):
... name: str
... age: int
... address: Optional[str] = None
Getting the attributes for a metric class. These will be used for the header when reading and
writing metric files.
>>> Person.header()
['name', 'age']
Getting the values from a metric class instance. The values are in the same order as the header.
>>> list(Person(name="Alice", age=47).values())
['Alice', 47]
Writing a list of metrics to a file:
>>> metrics = [
... Person(name="Alice", age=47),
... Person(name="Bob", age=24)
... ]
>>> from pathlib import Path
>>> Person.write(Path("/path/to/metrics.txt"), *metrics)
Then the contents of the written metrics file:
$ column -t /path/to/metrics.txt
name age
Alice 47
Bob 24
Reading the metrics file back in:
>>> list(Person.read(Path("/path/to/metrics.txt")))
[Person(name='Alice', age=47), Person(name='Bob', age=24)]
Formatting and parsing the values for custom types is supported by overriding the _parsers() and
format_value() methods.
>>> @dataclasses.dataclass(frozen=True)
... class Name:
... first: str
... last: str
... @classmethod
... def parse(cls, value: str) -> "Name":
... fields = value.split(" ")
... return Name(first=fields[0], last=fields[1])
>>> from typing import Dict, Callable, Any
>>> @dataclasses.dataclass(frozen=True)
... class PersonWithName(Metric["PersonWithName"]):
... name: Name
... age: int
... @classmethod
... def _parsers(cls) -> Dict[type, Callable[[str], Any]]:
... return {Name: lambda value: Name.parse(value=value)}
... @classmethod
... def format_value(cls, value: Any) -> str:
... if isinstance(value, Name):
... return f"{value.first} {value.last}"
... else:
... return super().format_value(value=value)
>>> PersonWithName.parse(fields=["john doe", "42"])
PersonWithName(name=Name(first='john', last='doe'), age=42)
>>> PersonWithName(name=Name(first='john', last='doe'), age=42).formatted_values()
['john doe', '42']
Classes
Metric
Bases: ABC, Generic[MetricType]
Abstract base class for all metric-like tab-delimited files
Metric files are tab-delimited, contain a header, and zero or more rows for metric values. This
makes it easy for them to be read in languages like R.
Subclasses of Metric() can support parsing and
formatting custom types with _parsers() and
format_value().
Source code in fgpyo/util/metric.py
| class Metric(ABC, Generic[MetricType]):
"""Abstract base class for all metric-like tab-delimited files
Metric files are tab-delimited, contain a header, and zero or more rows for metric values. This
makes it easy for them to be read in languages like `R`.
Subclasses of [`Metric()`][fgpyo.util.metric.Metric] can support parsing and
formatting custom types with `_parsers()` and
[`format_value()`][fgpyo.util.metric.Metric.format_value].
"""
@classmethod
def keys(cls) -> Iterator[str]:
"""An iterator over field names in the same order as the header."""
for field in inspect.get_fields(cls): # type: ignore[arg-type]
yield field.name
def values(self) -> Iterator[Any]:
"""An iterator over attribute values in the same order as the header."""
for field in inspect.get_fields(self.__class__): # type: ignore[arg-type]
yield getattr(self, field.name)
def items(self) -> Iterator[Tuple[str, Any]]:
"""
An iterator over field names and their corresponding values in the same order as the header.
"""
for field in inspect.get_fields(self.__class__): # type: ignore[arg-type]
yield (field.name, getattr(self, field.name))
def formatted_values(self) -> List[str]:
"""An iterator over formatted attribute values in the same order as the header."""
return [self.format_value(value) for value in self.values()]
def formatted_items(self) -> List[Tuple[str, str]]:
"""An iterator over formatted attribute values in the same order as the header."""
return [(key, self.format_value(value)) for key, value in self.items()]
@classmethod
def _parsers(cls) -> Dict[type, Callable[[str], Any]]:
"""Mapping of type to a specific parser for that type. The parser must accept a string
as a single parameter and return a single value of the given type. Sub-classes may
override this method to support custom types."""
return {}
@classmethod
def read(
cls,
path: Path,
ignore_extra_fields: bool = True,
strip_whitespace: bool = False,
threads: Optional[int] = None,
) -> Iterator[Any]:
"""Reads in zero or more metrics from the given path.
The metric file must contain a matching header.
Columns that are not present in the file but are optional in the metric class will
be default values.
Args:
path: the path to the metrics file.
ignore_extra_fields: True to ignore any extra columns, False to raise an exception.
strip_whitespace: True to strip leading and trailing whitespace from each field,
False to keep as-is.
threads: the number of threads to use when decompressing gzip files
"""
parsers = cls._parsers()
with io.to_reader(path, threads=threads) as reader:
header: List[str] = reader.readline().rstrip("\r\n").split("\t")
# check the header
class_fields = set(cls.header())
file_fields = set(header)
missing_from_class = file_fields.difference(class_fields)
missing_from_file = class_fields.difference(file_fields)
field_name_to_attribute = inspect.get_fields_dict(cls) # type: ignore[arg-type]
# ignore class fields that are missing from the file (via header) if they're optional
# or have a default
if len(missing_from_file) > 0:
fields_with_defaults = [
field
for field in missing_from_file
if inspect._attribute_has_default(field_name_to_attribute[field])
]
# remove optional class fields from the fields
missing_from_file = missing_from_file.difference(fields_with_defaults)
# raise an exception if there are non-optional class fields missing from the file
if len(missing_from_file) > 0:
raise ValueError(
f"In file: {path}, fields in file missing from class '{cls.__name__}': "
+ ", ".join(missing_from_file)
)
# raise an exception if there are fields in the file not in the header, unless they
# should be ignored.
if not ignore_extra_fields and len(missing_from_class) > 0:
raise ValueError(
f"In file: {path}, extra fields in file missing from class '{cls.__name__}': "
", ".join(missing_from_file)
)
# read the metric lines
for lineno, line in enumerate(reader, 2):
# parse the raw values
values: List[str] = line.rstrip("\r\n").split("\t")
if strip_whitespace:
values = [v.strip() for v in values]
# raise an exception if there aren't the same number of values as the header
if len(header) != len(values):
raise ValueError(
f"In file: {path}, expected {len(header)} columns, got {len(values)} on "
f"line {lineno}: {line}"
)
# build the metric
instance: Metric[MetricType] = inspect.attr_from(
cls=cls, kwargs=dict(zip(header, values)), parsers=parsers
)
yield instance
@classmethod
def parse(cls, fields: List[str]) -> Any:
"""Parses the string-representation of this metric. One string per attribute should be
given.
"""
parsers = cls._parsers()
header = cls.header()
assert len(fields) == len(header)
return inspect.attr_from(cls=cls, kwargs=dict(zip(header, fields)), parsers=parsers)
@classmethod
def write(cls, path: Path, *values: MetricType, threads: Optional[int] = None) -> None:
"""Writes zero or more metrics to the given path.
The header will always be written.
Args:
path: Path to the output file.
values: Zero or more metrics.
threads: the number of threads to use when compressing gzip files
"""
with MetricWriter[MetricType](path, metric_class=cls, threads=threads) as writer:
writer.writeall(values)
@classmethod
def header(cls) -> List[str]:
"""The list of header values for the metric."""
return [a.name for a in inspect.get_fields(cls)] # type: ignore[arg-type]
@classmethod
def format_value(cls, value: Any) -> str: # noqa: C901
"""The default method to format values of a given type.
By default, this method will comma-delimit `list`, `tuple`, and `set` types, and apply
`str` to all others.
Dictionaries / mappings will have keys and vals separated by semicolons, and key val pairs
delimited by commas.
In addition, lists will be flanked with '[]', tuples with '()' and sets and dictionaries
with '{}'
Args:
value: the value to format.
"""
if issubclass(type(value), Enum):
return cls.format_value(value.value)
if isinstance(value, (tuple)):
if len(value) == 0:
return "()"
else:
return "(" + ",".join(cls.format_value(v) for v in value) + ")"
if isinstance(value, (list)):
if len(value) == 0:
return ""
else:
return ",".join(cls.format_value(v) for v in value)
if isinstance(value, (set)):
if len(value) == 0:
return ""
else:
return "{" + ",".join(cls.format_value(v) for v in value) + "}"
elif isinstance(value, dict):
if len(value) == 0:
return "{}"
else:
return (
"{"
+ ",".join(
f"{cls.format_value(k)};{cls.format_value(v)}" for k, v in value.items()
)
+ "}"
)
elif isinstance(value, float):
return f"{round(value, 5)}"
elif value is None:
return ""
else:
return f"{value}"
@classmethod
def to_list(cls, value: str) -> List[Any]:
"""Returns a list value split on comma delimeter."""
return [] if value == "" else value.split(",")
@staticmethod
def fast_concat(*inputs: Path, output: Path) -> None:
if len(inputs) == 0:
raise ValueError("No inputs provided")
headers = [next(io.read_lines(input_path)) for input_path in inputs]
assert len(set(headers)) == 1, "Input headers do not match"
io.write_lines(path=output, lines_to_write=set(headers))
for input_path in inputs:
io.write_lines(
path=output, lines_to_write=list(io.read_lines(input_path))[1:], append=True
)
@staticmethod
def _read_header(
reader: TextIOWrapper,
delimiter: str = "\t",
comment_prefix: str = "#",
) -> MetricFileHeader:
"""
Read the header from an open file.
The first row after any commented or empty lines will be used as the fieldnames.
Lines preceding the fieldnames will be returned in the `preamble`. Leading and trailing
whitespace are removed and ignored.
Args:
reader: An open, readable file handle.
delimiter: The delimiter character used to separate fields in the file.
comment_prefix: The prefix for comment lines in the file.
Returns:
A `MetricFileHeader` containing the field names and any preceding lines.
Raises:
ValueError: If the file was empty or contained only comments or empty lines.
"""
preamble: List[str] = []
for line in reader:
if line.strip().startswith(comment_prefix) or line.strip() == "":
# Skip any commented or empty lines before the header
preamble.append(line.strip())
else:
# The first line with any other content is assumed to be the header
fieldnames = line.strip().split(delimiter)
break
else:
# If the file was empty, kick back an empty header
fieldnames = []
return MetricFileHeader(preamble=preamble, fieldnames=fieldnames)
|
Functions
format_value(value: Any) -> str
The default method to format values of a given type.
By default, this method will comma-delimit list, tuple, and set types, and apply
str to all others.
Dictionaries / mappings will have keys and vals separated by semicolons, and key val pairs
delimited by commas.
In addition, lists will be flanked with '[]', tuples with '()' and sets and dictionaries
with '{}'
Parameters:
| Name |
Type |
Description |
Default |
value
|
Any
|
|
required
|
Source code in fgpyo/util/metric.py
| @classmethod
def format_value(cls, value: Any) -> str: # noqa: C901
"""The default method to format values of a given type.
By default, this method will comma-delimit `list`, `tuple`, and `set` types, and apply
`str` to all others.
Dictionaries / mappings will have keys and vals separated by semicolons, and key val pairs
delimited by commas.
In addition, lists will be flanked with '[]', tuples with '()' and sets and dictionaries
with '{}'
Args:
value: the value to format.
"""
if issubclass(type(value), Enum):
return cls.format_value(value.value)
if isinstance(value, (tuple)):
if len(value) == 0:
return "()"
else:
return "(" + ",".join(cls.format_value(v) for v in value) + ")"
if isinstance(value, (list)):
if len(value) == 0:
return ""
else:
return ",".join(cls.format_value(v) for v in value)
if isinstance(value, (set)):
if len(value) == 0:
return ""
else:
return "{" + ",".join(cls.format_value(v) for v in value) + "}"
elif isinstance(value, dict):
if len(value) == 0:
return "{}"
else:
return (
"{"
+ ",".join(
f"{cls.format_value(k)};{cls.format_value(v)}" for k, v in value.items()
)
+ "}"
)
elif isinstance(value, float):
return f"{round(value, 5)}"
elif value is None:
return ""
else:
return f"{value}"
|
formatted_items() -> List[Tuple[str, str]]
An iterator over formatted attribute values in the same order as the header.
Source code in fgpyo/util/metric.py
| def formatted_items(self) -> List[Tuple[str, str]]:
"""An iterator over formatted attribute values in the same order as the header."""
return [(key, self.format_value(value)) for key, value in self.items()]
|
formatted_values() -> List[str]
An iterator over formatted attribute values in the same order as the header.
Source code in fgpyo/util/metric.py
| def formatted_values(self) -> List[str]:
"""An iterator over formatted attribute values in the same order as the header."""
return [self.format_value(value) for value in self.values()]
|
The list of header values for the metric.
Source code in fgpyo/util/metric.py
| @classmethod
def header(cls) -> List[str]:
"""The list of header values for the metric."""
return [a.name for a in inspect.get_fields(cls)] # type: ignore[arg-type]
|
items
items() -> Iterator[Tuple[str, Any]]
An iterator over field names and their corresponding values in the same order as the header.
Source code in fgpyo/util/metric.py
| def items(self) -> Iterator[Tuple[str, Any]]:
"""
An iterator over field names and their corresponding values in the same order as the header.
"""
for field in inspect.get_fields(self.__class__): # type: ignore[arg-type]
yield (field.name, getattr(self, field.name))
|
keys
classmethod
An iterator over field names in the same order as the header.
Source code in fgpyo/util/metric.py
| @classmethod
def keys(cls) -> Iterator[str]:
"""An iterator over field names in the same order as the header."""
for field in inspect.get_fields(cls): # type: ignore[arg-type]
yield field.name
|
parse
classmethod
parse(fields: List[str]) -> Any
Parses the string-representation of this metric. One string per attribute should be
given.
Source code in fgpyo/util/metric.py
| @classmethod
def parse(cls, fields: List[str]) -> Any:
"""Parses the string-representation of this metric. One string per attribute should be
given.
"""
parsers = cls._parsers()
header = cls.header()
assert len(fields) == len(header)
return inspect.attr_from(cls=cls, kwargs=dict(zip(header, fields)), parsers=parsers)
|
read
classmethod
read(path: Path, ignore_extra_fields: bool = True, strip_whitespace: bool = False, threads: Optional[int] = None) -> Iterator[Any]
Reads in zero or more metrics from the given path.
The metric file must contain a matching header.
Columns that are not present in the file but are optional in the metric class will
be default values.
Parameters:
| Name |
Type |
Description |
Default |
path
|
Path
|
the path to the metrics file.
|
required
|
ignore_extra_fields
|
bool
|
True to ignore any extra columns, False to raise an exception.
|
True
|
strip_whitespace
|
bool
|
True to strip leading and trailing whitespace from each field,
False to keep as-is.
|
False
|
threads
|
Optional[int]
|
the number of threads to use when decompressing gzip files
|
None
|
Source code in fgpyo/util/metric.py
| @classmethod
def read(
cls,
path: Path,
ignore_extra_fields: bool = True,
strip_whitespace: bool = False,
threads: Optional[int] = None,
) -> Iterator[Any]:
"""Reads in zero or more metrics from the given path.
The metric file must contain a matching header.
Columns that are not present in the file but are optional in the metric class will
be default values.
Args:
path: the path to the metrics file.
ignore_extra_fields: True to ignore any extra columns, False to raise an exception.
strip_whitespace: True to strip leading and trailing whitespace from each field,
False to keep as-is.
threads: the number of threads to use when decompressing gzip files
"""
parsers = cls._parsers()
with io.to_reader(path, threads=threads) as reader:
header: List[str] = reader.readline().rstrip("\r\n").split("\t")
# check the header
class_fields = set(cls.header())
file_fields = set(header)
missing_from_class = file_fields.difference(class_fields)
missing_from_file = class_fields.difference(file_fields)
field_name_to_attribute = inspect.get_fields_dict(cls) # type: ignore[arg-type]
# ignore class fields that are missing from the file (via header) if they're optional
# or have a default
if len(missing_from_file) > 0:
fields_with_defaults = [
field
for field in missing_from_file
if inspect._attribute_has_default(field_name_to_attribute[field])
]
# remove optional class fields from the fields
missing_from_file = missing_from_file.difference(fields_with_defaults)
# raise an exception if there are non-optional class fields missing from the file
if len(missing_from_file) > 0:
raise ValueError(
f"In file: {path}, fields in file missing from class '{cls.__name__}': "
+ ", ".join(missing_from_file)
)
# raise an exception if there are fields in the file not in the header, unless they
# should be ignored.
if not ignore_extra_fields and len(missing_from_class) > 0:
raise ValueError(
f"In file: {path}, extra fields in file missing from class '{cls.__name__}': "
", ".join(missing_from_file)
)
# read the metric lines
for lineno, line in enumerate(reader, 2):
# parse the raw values
values: List[str] = line.rstrip("\r\n").split("\t")
if strip_whitespace:
values = [v.strip() for v in values]
# raise an exception if there aren't the same number of values as the header
if len(header) != len(values):
raise ValueError(
f"In file: {path}, expected {len(header)} columns, got {len(values)} on "
f"line {lineno}: {line}"
)
# build the metric
instance: Metric[MetricType] = inspect.attr_from(
cls=cls, kwargs=dict(zip(header, values)), parsers=parsers
)
yield instance
|
to_list
classmethod
to_list(value: str) -> List[Any]
Returns a list value split on comma delimeter.
Source code in fgpyo/util/metric.py
| @classmethod
def to_list(cls, value: str) -> List[Any]:
"""Returns a list value split on comma delimeter."""
return [] if value == "" else value.split(",")
|
values
values() -> Iterator[Any]
An iterator over attribute values in the same order as the header.
Source code in fgpyo/util/metric.py
| def values(self) -> Iterator[Any]:
"""An iterator over attribute values in the same order as the header."""
for field in inspect.get_fields(self.__class__): # type: ignore[arg-type]
yield getattr(self, field.name)
|
write
classmethod
write(path: Path, *values: MetricType, threads: Optional[int] = None) -> None
Writes zero or more metrics to the given path.
The header will always be written.
Parameters:
| Name |
Type |
Description |
Default |
path
|
Path
|
|
required
|
values
|
MetricType
|
|
()
|
threads
|
Optional[int]
|
the number of threads to use when compressing gzip files
|
None
|
Source code in fgpyo/util/metric.py
| @classmethod
def write(cls, path: Path, *values: MetricType, threads: Optional[int] = None) -> None:
"""Writes zero or more metrics to the given path.
The header will always be written.
Args:
path: Path to the output file.
values: Zero or more metrics.
threads: the number of threads to use when compressing gzip files
"""
with MetricWriter[MetricType](path, metric_class=cls, threads=threads) as writer:
writer.writeall(values)
|
Header of a file.
A file's header contains an optional preamble, consisting of lines prefixed by a comment
character and/or empty lines, and a required row of fieldnames before the data rows begin.
Attributes:
| Name |
Type |
Description |
preamble |
List[str]
|
A list of any lines preceding the fieldnames.
|
fieldnames |
List[str]
|
The field names specified in the final line of the header.
|
Source code in fgpyo/util/metric.py
| @dataclass(frozen=True)
class MetricFileHeader:
"""
Header of a file.
A file's header contains an optional preamble, consisting of lines prefixed by a comment
character and/or empty lines, and a required row of fieldnames before the data rows begin.
Attributes:
preamble: A list of any lines preceding the fieldnames.
fieldnames: The field names specified in the final line of the header.
"""
preamble: List[str]
fieldnames: List[str]
|
MetricWriter
Bases: Generic[MetricType], AbstractContextManager
Source code in fgpyo/util/metric.py
| class MetricWriter(Generic[MetricType], AbstractContextManager):
_metric_class: Type[Metric]
_fieldnames: List[str]
_fout: TextIOWrapper
_writer: DictWriter
def __init__(
self,
filename: Union[Path, str],
metric_class: Type[Metric],
append: bool = False,
delimiter: str = "\t",
include_fields: Optional[List[str]] = None,
exclude_fields: Optional[List[str]] = None,
lineterminator: str = "\n",
threads: Optional[int] = None,
) -> None:
"""
Args:
filename: Path to the file to write.
metric_class: Metric class.
append: If `True`, the file will be appended to. Otherwise, the specified file will be
overwritten.
delimiter: The output file delimiter.
include_fields: If specified, only the listed fieldnames will be included when writing
records to file. Fields will be written in the order provided.
May not be used together with `exclude_fields`.
exclude_fields: If specified, any listed fieldnames will be excluded when writing
records to file.
May not be used together with `include_fields`.
lineterminator: The string used to terminate lines produced by the MetricWriter.
Default = "\n".
threads: the number of threads to use when compressing gzip files
Raises:
TypeError: If the provided metric class is not a dataclass- or attr-decorated
subclass of `Metric`.
AssertionError: If the provided filepath is not writable.
AssertionError: If `append=True` and the provided file is not readable. (When appending,
we check to ensure that the header matches the specified metric class. The file must
be readable to get the header.)
ValueError: If `append=True` and the provided file is a FIFO (named pipe).
ValueError: If `append=True` and the provided file does not include a header.
ValueError: If `append=True` and the header of the provided file does not match the
specified metric class and the specified include/exclude fields.
"""
filepath: Path = Path(filename)
if (filepath.is_fifo() or filepath.is_char_device()) and append:
raise ValueError("Cannot append to stdout, stderr, or other named pipe or stream")
ordered_fieldnames: List[str] = _validate_and_generate_final_output_fieldnames(
metric_class=metric_class,
include_fields=include_fields,
exclude_fields=exclude_fields,
)
_assert_is_metric_class(metric_class)
io.assert_path_is_writable(filepath)
if append:
io.assert_path_is_readable(filepath)
_assert_file_header_matches_metric(
path=filepath,
metric_class=metric_class,
ordered_fieldnames=ordered_fieldnames,
delimiter=delimiter,
)
self._metric_class = metric_class
self._fieldnames = ordered_fieldnames
self._fout = io.to_writer(filepath, append=append, threads=threads)
self._writer = DictWriter(
f=self._fout,
fieldnames=self._fieldnames,
delimiter=delimiter,
lineterminator=lineterminator,
)
# If we aren't appending to an existing file, write the header before any rows
if not append:
self._writer.writeheader()
def __enter__(self) -> "MetricWriter":
return self
def __exit__(
self,
exc_type: Type[BaseException],
exc_value: BaseException,
traceback: TracebackType,
) -> None:
self.close()
super().__exit__(exc_type, exc_value, traceback)
def close(self) -> None:
"""Close the underlying file handle."""
self._fout.close()
def write(self, metric: MetricType) -> None:
"""
Write a single Metric instance to file.
The Metric is converted to a dictionary and then written using the underlying
`csv.DictWriter`. If the `MetricWriter` was created using the `include_fields` or
`exclude_fields` arguments, the fields of the Metric are subset and/or reordered
accordingly before writing.
Args:
metric: An instance of the specified Metric.
Raises:
TypeError: If the provided `metric` is not an instance of the Metric class used to
parametrize the writer.
"""
# Serialize the Metric to a dict for writing by the underlying `DictWriter`
row = {fieldname: val for fieldname, val in metric.formatted_items()}
# Filter and/or re-order output fields if necessary
row = {fieldname: row[fieldname] for fieldname in self._fieldnames}
self._writer.writerow(row)
def writeall(self, metrics: Iterable[MetricType]) -> None:
"""
Write multiple Metric instances to file.
Each Metric is converted to a dictionary and then written using the underlying
`csv.DictWriter`. If the `MetricWriter` was created using the `include_fields` or
`exclude_fields` arguments, the attributes of each Metric are subset and/or reordered
accordingly before writing.
Args:
metrics: A sequence of instances of the specified Metric.
"""
for metric in metrics:
self.write(metric)
|
Functions
__init__
__init__(filename: Union[Path, str], metric_class: Type[Metric], append: bool = False, delimiter: str = '\t', include_fields: Optional[List[str]] = None, exclude_fields: Optional[List[str]] = None, lineterminator: str = '\n', threads: Optional[int] = None) -> None
Args:
filename: Path to the file to write.
metric_class: Metric class.
append: If `True`, the file will be appended to. Otherwise, the specified file will be
overwritten.
delimiter: The output file delimiter.
include_fields: If specified, only the listed fieldnames will be included when writing
records to file. Fields will be written in the order provided.
May not be used together with `exclude_fields`.
exclude_fields: If specified, any listed fieldnames will be excluded when writing
records to file.
May not be used together with `include_fields`.
lineterminator: The string used to terminate lines produced by the MetricWriter.
Default = "
".
threads: the number of threads to use when compressing gzip files
Raises:
TypeError: If the provided metric class is not a dataclass- or attr-decorated
subclass of `Metric`.
AssertionError: If the provided filepath is not writable.
AssertionError: If `append=True` and the provided file is not readable. (When appending,
we check to ensure that the header matches the specified metric class. The file must
be readable to get the header.)
ValueError: If `append=True` and the provided file is a FIFO (named pipe).
ValueError: If `append=True` and the provided file does not include a header.
ValueError: If `append=True` and the header of the provided file does not match the
specified metric class and the specified include/exclude fields.
Source code in fgpyo/util/metric.py
| def __init__(
self,
filename: Union[Path, str],
metric_class: Type[Metric],
append: bool = False,
delimiter: str = "\t",
include_fields: Optional[List[str]] = None,
exclude_fields: Optional[List[str]] = None,
lineterminator: str = "\n",
threads: Optional[int] = None,
) -> None:
"""
Args:
filename: Path to the file to write.
metric_class: Metric class.
append: If `True`, the file will be appended to. Otherwise, the specified file will be
overwritten.
delimiter: The output file delimiter.
include_fields: If specified, only the listed fieldnames will be included when writing
records to file. Fields will be written in the order provided.
May not be used together with `exclude_fields`.
exclude_fields: If specified, any listed fieldnames will be excluded when writing
records to file.
May not be used together with `include_fields`.
lineterminator: The string used to terminate lines produced by the MetricWriter.
Default = "\n".
threads: the number of threads to use when compressing gzip files
Raises:
TypeError: If the provided metric class is not a dataclass- or attr-decorated
subclass of `Metric`.
AssertionError: If the provided filepath is not writable.
AssertionError: If `append=True` and the provided file is not readable. (When appending,
we check to ensure that the header matches the specified metric class. The file must
be readable to get the header.)
ValueError: If `append=True` and the provided file is a FIFO (named pipe).
ValueError: If `append=True` and the provided file does not include a header.
ValueError: If `append=True` and the header of the provided file does not match the
specified metric class and the specified include/exclude fields.
"""
filepath: Path = Path(filename)
if (filepath.is_fifo() or filepath.is_char_device()) and append:
raise ValueError("Cannot append to stdout, stderr, or other named pipe or stream")
ordered_fieldnames: List[str] = _validate_and_generate_final_output_fieldnames(
metric_class=metric_class,
include_fields=include_fields,
exclude_fields=exclude_fields,
)
_assert_is_metric_class(metric_class)
io.assert_path_is_writable(filepath)
if append:
io.assert_path_is_readable(filepath)
_assert_file_header_matches_metric(
path=filepath,
metric_class=metric_class,
ordered_fieldnames=ordered_fieldnames,
delimiter=delimiter,
)
self._metric_class = metric_class
self._fieldnames = ordered_fieldnames
self._fout = io.to_writer(filepath, append=append, threads=threads)
self._writer = DictWriter(
f=self._fout,
fieldnames=self._fieldnames,
delimiter=delimiter,
lineterminator=lineterminator,
)
# If we aren't appending to an existing file, write the header before any rows
if not append:
self._writer.writeheader()
|
close
Close the underlying file handle.
Source code in fgpyo/util/metric.py
| def close(self) -> None:
"""Close the underlying file handle."""
self._fout.close()
|
write
write(metric: MetricType) -> None
Write a single Metric instance to file.
The Metric is converted to a dictionary and then written using the underlying
csv.DictWriter. If the MetricWriter was created using the include_fields or
exclude_fields arguments, the fields of the Metric are subset and/or reordered
accordingly before writing.
Parameters:
| Name |
Type |
Description |
Default |
metric
|
MetricType
|
An instance of the specified Metric.
|
required
|
Raises:
| Type |
Description |
TypeError
|
If the provided metric is not an instance of the Metric class used to
parametrize the writer.
|
Source code in fgpyo/util/metric.py
| def write(self, metric: MetricType) -> None:
"""
Write a single Metric instance to file.
The Metric is converted to a dictionary and then written using the underlying
`csv.DictWriter`. If the `MetricWriter` was created using the `include_fields` or
`exclude_fields` arguments, the fields of the Metric are subset and/or reordered
accordingly before writing.
Args:
metric: An instance of the specified Metric.
Raises:
TypeError: If the provided `metric` is not an instance of the Metric class used to
parametrize the writer.
"""
# Serialize the Metric to a dict for writing by the underlying `DictWriter`
row = {fieldname: val for fieldname, val in metric.formatted_items()}
# Filter and/or re-order output fields if necessary
row = {fieldname: row[fieldname] for fieldname in self._fieldnames}
self._writer.writerow(row)
|
writeall
writeall(metrics: Iterable[MetricType]) -> None
Write multiple Metric instances to file.
Each Metric is converted to a dictionary and then written using the underlying
csv.DictWriter. If the MetricWriter was created using the include_fields or
exclude_fields arguments, the attributes of each Metric are subset and/or reordered
accordingly before writing.
Parameters:
| Name |
Type |
Description |
Default |
metrics
|
Iterable[MetricType]
|
A sequence of instances of the specified Metric.
|
required
|
Source code in fgpyo/util/metric.py
| def writeall(self, metrics: Iterable[MetricType]) -> None:
"""
Write multiple Metric instances to file.
Each Metric is converted to a dictionary and then written using the underlying
`csv.DictWriter`. If the `MetricWriter` was created using the `include_fields` or
`exclude_fields` arguments, the attributes of each Metric are subset and/or reordered
accordingly before writing.
Args:
metrics: A sequence of instances of the specified Metric.
"""
for metric in metrics:
self.write(metric)
|
Modules