metric

Metrics¶

Module for storing, reading, and writing metric-like tab-delimited information.

Metric files are tab-delimited, contain a header, and zero or more rows for metric values. This makes it easy for them to be read in languages like R. For example, a row per person, with columns for age, gender, and address.

The Metric() class makes it easy to read, write, and store one or metrics of the same type, all the while preserving types for each value in a metric. It is an abstract base class decorated by @dataclass, or @attr.s, with attributes storing one or more typed values. If using multiple layers of inheritance, keep in mind that it's not possible to mix these dataclass utils, e.g. a dataclasses class derived from an attr class will not appropriately initialize the values of the attr superclass.

Examples¶

Defining a new metric class:

>>> from fgpyo.util.metric import Metric
>>> import dataclasses
>>> @dataclasses.dataclass(frozen=True)
... class Person(Metric["Person"]):
...     name: str
...     age: int

or using attr:

>>> from fgpyo.util.metric import Metric
>>> import attr
>>> from typing import Optional
>>> @attr.s(auto_attribs=True, frozen=True)
... class PersonAttr(Metric["PersonAttr"]):
...     name: str
...     age: int
...     address: Optional[str] = None

Getting the attributes for a metric class. These will be used for the header when reading and writing metric files.

>>> Person.header()
['name', 'age']

Getting the values from a metric class instance. The values are in the same order as the header.

>>> list(Person(name="Alice", age=47).values())
['Alice', 47]

Writing a list of metrics to a file:

>>> metrics = [
...     Person(name="Alice", age=47),
...     Person(name="Bob", age=24)
... ]
>>> from pathlib import Path
>>> Person.write(Path("/path/to/metrics.txt"), *metrics)

Then the contents of the written metrics file:

$ column -t /path/to/metrics.txt
name   age
Alice  47
Bob    24

Reading the metrics file back in:

>>> list(Person.read(Path("/path/to/metrics.txt")))  
[Person(name='Alice', age=47), Person(name='Bob', age=24)]

Formatting and parsing the values for custom types is supported by overriding the _parsers() and format_value() methods.

>>> @dataclasses.dataclass(frozen=True)
... class Name:
...     first: str
...     last: str
...     @classmethod
...     def parse(cls, value: str) -> "Name":
...          fields = value.split(" ")
...          return Name(first=fields[0], last=fields[1])
>>> from typing import Dict, Callable, Any
>>> @dataclasses.dataclass(frozen=True)
... class PersonWithName(Metric["PersonWithName"]):
...     name: Name
...     age: int
...     @classmethod
...     def _parsers(cls) -> Dict[type, Callable[[str], Any]]:
...         return {Name: lambda value: Name.parse(value=value)}
...     @classmethod
...     def format_value(cls, value: Any) -> str:
...         if isinstance(value, Name):
...             return f"{value.first} {value.last}"
...         else:
...             return super().format_value(value=value)
>>> PersonWithName.parse(fields=["john doe", "42"])
PersonWithName(name=Name(first='john', last='doe'), age=42)
>>> PersonWithName(name=Name(first='john', last='doe'), age=42).formatted_values()
['john doe', '42']

Classes¶

Metric ¶

Bases: ABC, Generic[MetricType]

Abstract base class for all metric-like tab-delimited files

Metric files are tab-delimited, contain a header, and zero or more rows for metric values. This makes it easy for them to be read in languages like R.

Subclasses of Metric() can support parsing and formatting custom types with _parsers() and format_value().

Source code in fgpyo/util/metric.py

class Metric(ABC, Generic[MetricType]):
    """Abstract base class for all metric-like tab-delimited files

    Metric files are tab-delimited, contain a header, and zero or more rows for metric values.  This
    makes it easy for them to be read in languages like `R`.

    Subclasses of [`Metric()`][fgpyo.util.metric.Metric] can support parsing and
    formatting custom types with `_parsers()` and
    [`format_value()`][fgpyo.util.metric.Metric.format_value].
    """

    @classmethod
    def keys(cls) -> Iterator[str]:
        """An iterator over field names in the same order as the header."""
        for field in inspect.get_fields(cls):  # type: ignore[arg-type]
            yield field.name

    def values(self) -> Iterator[Any]:
        """An iterator over attribute values in the same order as the header."""
        for field in inspect.get_fields(self.__class__):  # type: ignore[arg-type]
            yield getattr(self, field.name)

    def items(self) -> Iterator[Tuple[str, Any]]:
        """
        An iterator over field names and their corresponding values in the same order as the header.
        """
        for field in inspect.get_fields(self.__class__):  # type: ignore[arg-type]
            yield (field.name, getattr(self, field.name))

    def formatted_values(self) -> List[str]:
        """An iterator over formatted attribute values in the same order as the header."""
        return [self.format_value(value) for value in self.values()]

    def formatted_items(self) -> List[Tuple[str, str]]:
        """An iterator over formatted attribute values in the same order as the header."""
        return [(key, self.format_value(value)) for key, value in self.items()]

    @classmethod
    def _parsers(cls) -> Dict[type, Callable[[str], Any]]:
        """Mapping of type to a specific parser for that type.  The parser must accept a string
        as a single parameter and return a single value of the given type.  Sub-classes may
        override this method to support custom types."""
        return {}

    @classmethod
    def read(
        cls,
        path: Path,
        ignore_extra_fields: bool = True,
        strip_whitespace: bool = False,
        threads: Optional[int] = None,
    ) -> Iterator[Any]:
        """Reads in zero or more metrics from the given path.

        The metric file must contain a matching header.

        Columns that are not present in the file but are optional in the metric class will
        be default values.

        Args:
            path: the path to the metrics file.
            ignore_extra_fields: True to ignore any extra columns, False to raise an exception.
            strip_whitespace: True to strip leading and trailing whitespace from each field,
                               False to keep as-is.
            threads: the number of threads to use when decompressing gzip files
        """
        parsers = cls._parsers()
        with io.to_reader(path, threads=threads) as reader:
            header: List[str] = reader.readline().rstrip("\r\n").split("\t")
            # check the header
            class_fields = set(cls.header())
            file_fields = set(header)
            missing_from_class = file_fields.difference(class_fields)
            missing_from_file = class_fields.difference(file_fields)

            field_name_to_attribute = inspect.get_fields_dict(cls)  # type: ignore[arg-type]

            # ignore class fields that are missing from the file (via header) if they're optional
            # or have a default
            if len(missing_from_file) > 0:
                fields_with_defaults = [
                    field
                    for field in missing_from_file
                    if inspect._attribute_has_default(field_name_to_attribute[field])
                ]
                # remove optional class fields from the fields
                missing_from_file = missing_from_file.difference(fields_with_defaults)

            # raise an exception if there are non-optional class fields missing from the file
            if len(missing_from_file) > 0:
                raise ValueError(
                    f"In file: {path}, fields in file missing from class '{cls.__name__}': "
                    + ", ".join(missing_from_file)
                )

            # raise an exception if there are fields in the file not in the header, unless they
            # should be ignored.
            if not ignore_extra_fields and len(missing_from_class) > 0:
                raise ValueError(
                    f"In file: {path}, extra fields in file missing from class '{cls.__name__}': "
                    ", ".join(missing_from_file)
                )

            # read the metric lines
            for lineno, line in enumerate(reader, 2):
                # parse the raw values
                values: List[str] = line.rstrip("\r\n").split("\t")
                if strip_whitespace:
                    values = [v.strip() for v in values]

                # raise an exception if there aren't the same number of values as the header
                if len(header) != len(values):
                    raise ValueError(
                        f"In file: {path}, expected {len(header)} columns, got {len(values)} on "
                        f"line {lineno}: {line}"
                    )

                # build the metric
                instance: Metric[MetricType] = inspect.attr_from(
                    cls=cls, kwargs=dict(zip(header, values)), parsers=parsers
                )
                yield instance

    @classmethod
    def parse(cls, fields: List[str]) -> Any:
        """Parses the string-representation of this metric.  One string per attribute should be
        given.

        """
        parsers = cls._parsers()
        header = cls.header()
        assert len(fields) == len(header)
        return inspect.attr_from(cls=cls, kwargs=dict(zip(header, fields)), parsers=parsers)

    @classmethod
    def write(cls, path: Path, *values: MetricType, threads: Optional[int] = None) -> None:
        """Writes zero or more metrics to the given path.

        The header will always be written.

        Args:
            path: Path to the output file.
            values: Zero or more metrics.
            threads: the number of threads to use when compressing gzip files

        """
        with MetricWriter[MetricType](path, metric_class=cls, threads=threads) as writer:
            writer.writeall(values)

    @classmethod
    def header(cls) -> List[str]:
        """The list of header values for the metric."""
        return [a.name for a in inspect.get_fields(cls)]  # type: ignore[arg-type]

    @classmethod
    def format_value(cls, value: Any) -> str:  # noqa: C901
        """The default method to format values of a given type.

        By default, this method will comma-delimit `list`, `tuple`, and `set` types, and apply
        `str` to all others.

        Dictionaries / mappings will have keys and vals separated by semicolons, and key val pairs
        delimited by commas.

        In addition, lists will be flanked with '[]', tuples with '()' and sets and dictionaries
        with '{}'

        Args:
            value: the value to format.
        """
        if issubclass(type(value), Enum):
            return cls.format_value(value.value)
        if isinstance(value, (tuple)):
            if len(value) == 0:
                return "()"
            else:
                return "(" + ",".join(cls.format_value(v) for v in value) + ")"
        if isinstance(value, (list)):
            if len(value) == 0:
                return ""
            else:
                return ",".join(cls.format_value(v) for v in value)
        if isinstance(value, (set)):
            if len(value) == 0:
                return ""
            else:
                return "{" + ",".join(cls.format_value(v) for v in value) + "}"

        elif isinstance(value, dict):
            if len(value) == 0:
                return "{}"
            else:
                return (
                    "{"
                    + ",".join(
                        f"{cls.format_value(k)};{cls.format_value(v)}" for k, v in value.items()
                    )
                    + "}"
                )
        elif isinstance(value, float):
            return f"{round(value, 5)}"
        elif value is None:
            return ""
        else:
            return f"{value}"

    @classmethod
    def to_list(cls, value: str) -> List[Any]:
        """Returns a list value split on comma delimeter."""
        return [] if value == "" else value.split(",")

    @staticmethod
    def fast_concat(*inputs: Path, output: Path) -> None:
        if len(inputs) == 0:
            raise ValueError("No inputs provided")

        headers = [next(io.read_lines(input_path)) for input_path in inputs]
        assert len(set(headers)) == 1, "Input headers do not match"
        io.write_lines(path=output, lines_to_write=set(headers))

        for input_path in inputs:
            io.write_lines(
                path=output, lines_to_write=list(io.read_lines(input_path))[1:], append=True
            )

    @staticmethod
    def _read_header(
        reader: TextIOWrapper,
        delimiter: str = "\t",
        comment_prefix: str = "#",
    ) -> MetricFileHeader:
        """
        Read the header from an open file.

        The first row after any commented or empty lines will be used as the fieldnames.

        Lines preceding the fieldnames will be returned in the `preamble`. Leading and trailing
        whitespace are removed and ignored.

        Args:
            reader: An open, readable file handle.
            delimiter: The delimiter character used to separate fields in the file.
            comment_prefix: The prefix for comment lines in the file.

        Returns:
            A `MetricFileHeader` containing the field names and any preceding lines.

        Raises:
            ValueError: If the file was empty or contained only comments or empty lines.
        """

        preamble: List[str] = []

        for line in reader:
            if line.strip().startswith(comment_prefix) or line.strip() == "":
                # Skip any commented or empty lines before the header
                preamble.append(line.strip())
            else:
                # The first line with any other content is assumed to be the header
                fieldnames = line.strip().split(delimiter)
                break
        else:
            # If the file was empty, kick back an empty header
            fieldnames = []

        return MetricFileHeader(preamble=preamble, fieldnames=fieldnames)

Functions¶

format_value `classmethod` ¶

format_value(value: Any) -> str

The default method to format values of a given type.

By default, this method will comma-delimit list, tuple, and set types, and apply str to all others.

Dictionaries / mappings will have keys and vals separated by semicolons, and key val pairs delimited by commas.

In addition, lists will be flanked with '[]', tuples with '()' and sets and dictionaries with '{}'

Parameters:

Name	Type	Description	Default
`value`	`Any`	the value to format.	required

Source code in fgpyo/util/metric.py

@classmethod
def format_value(cls, value: Any) -> str:  # noqa: C901
    """The default method to format values of a given type.

    By default, this method will comma-delimit `list`, `tuple`, and `set` types, and apply
    `str` to all others.

    Dictionaries / mappings will have keys and vals separated by semicolons, and key val pairs
    delimited by commas.

    In addition, lists will be flanked with '[]', tuples with '()' and sets and dictionaries
    with '{}'

    Args:
        value: the value to format.
    """
    if issubclass(type(value), Enum):
        return cls.format_value(value.value)
    if isinstance(value, (tuple)):
        if len(value) == 0:
            return "()"
        else:
            return "(" + ",".join(cls.format_value(v) for v in value) + ")"
    if isinstance(value, (list)):
        if len(value) == 0:
            return ""
        else:
            return ",".join(cls.format_value(v) for v in value)
    if isinstance(value, (set)):
        if len(value) == 0:
            return ""
        else:
            return "{" + ",".join(cls.format_value(v) for v in value) + "}"

    elif isinstance(value, dict):
        if len(value) == 0:
            return "{}"
        else:
            return (
                "{"
                + ",".join(
                    f"{cls.format_value(k)};{cls.format_value(v)}" for k, v in value.items()
                )
                + "}"
            )
    elif isinstance(value, float):
        return f"{round(value, 5)}"
    elif value is None:
        return ""
    else:
        return f"{value}"

formatted_items ¶

formatted_items() -> List[Tuple[str, str]]

An iterator over formatted attribute values in the same order as the header.

Source code in fgpyo/util/metric.py

def formatted_items(self) -> List[Tuple[str, str]]:
    """An iterator over formatted attribute values in the same order as the header."""
    return [(key, self.format_value(value)) for key, value in self.items()]

formatted_values ¶

formatted_values() -> List[str]

An iterator over formatted attribute values in the same order as the header.

Source code in fgpyo/util/metric.py

def formatted_values(self) -> List[str]:
    """An iterator over formatted attribute values in the same order as the header."""
    return [self.format_value(value) for value in self.values()]

header() -> List[str]

The list of header values for the metric.

Source code in fgpyo/util/metric.py

@classmethod
def header(cls) -> List[str]:
    """The list of header values for the metric."""
    return [a.name for a in inspect.get_fields(cls)]  # type: ignore[arg-type]

items ¶

items() -> Iterator[Tuple[str, Any]]

An iterator over field names and their corresponding values in the same order as the header.

Source code in fgpyo/util/metric.py

def items(self) -> Iterator[Tuple[str, Any]]:
    """
    An iterator over field names and their corresponding values in the same order as the header.
    """
    for field in inspect.get_fields(self.__class__):  # type: ignore[arg-type]
        yield (field.name, getattr(self, field.name))

keys `classmethod` ¶

keys() -> Iterator[str]

An iterator over field names in the same order as the header.

Source code in fgpyo/util/metric.py

@classmethod
def keys(cls) -> Iterator[str]:
    """An iterator over field names in the same order as the header."""
    for field in inspect.get_fields(cls):  # type: ignore[arg-type]
        yield field.name

parse `classmethod` ¶

parse(fields: List[str]) -> Any

Parses the string-representation of this metric. One string per attribute should be given.

Source code in fgpyo/util/metric.py

@classmethod
def parse(cls, fields: List[str]) -> Any:
    """Parses the string-representation of this metric.  One string per attribute should be
    given.

    """
    parsers = cls._parsers()
    header = cls.header()
    assert len(fields) == len(header)
    return inspect.attr_from(cls=cls, kwargs=dict(zip(header, fields)), parsers=parsers)

read `classmethod` ¶

read(path: Path, ignore_extra_fields: bool = True, strip_whitespace: bool = False, threads: Optional[int] = None) -> Iterator[Any]

Reads in zero or more metrics from the given path.

The metric file must contain a matching header.

Columns that are not present in the file but are optional in the metric class will be default values.

Parameters:

Name	Type	Description	Default
`path`	`Path`	the path to the metrics file.	required
`ignore_extra_fields`	`bool`	True to ignore any extra columns, False to raise an exception.	`True`
`strip_whitespace`	`bool`	True to strip leading and trailing whitespace from each field, False to keep as-is.	`False`
`threads`	`Optional[int]`	the number of threads to use when decompressing gzip files	`None`

Source code in fgpyo/util/metric.py

@classmethod
def read(
    cls,
    path: Path,
    ignore_extra_fields: bool = True,
    strip_whitespace: bool = False,
    threads: Optional[int] = None,
) -> Iterator[Any]:
    """Reads in zero or more metrics from the given path.

    The metric file must contain a matching header.

    Columns that are not present in the file but are optional in the metric class will
    be default values.

    Args:
        path: the path to the metrics file.
        ignore_extra_fields: True to ignore any extra columns, False to raise an exception.
        strip_whitespace: True to strip leading and trailing whitespace from each field,
                           False to keep as-is.
        threads: the number of threads to use when decompressing gzip files
    """
    parsers = cls._parsers()
    with io.to_reader(path, threads=threads) as reader:
        header: List[str] = reader.readline().rstrip("\r\n").split("\t")
        # check the header
        class_fields = set(cls.header())
        file_fields = set(header)
        missing_from_class = file_fields.difference(class_fields)
        missing_from_file = class_fields.difference(file_fields)

        field_name_to_attribute = inspect.get_fields_dict(cls)  # type: ignore[arg-type]

        # ignore class fields that are missing from the file (via header) if they're optional
        # or have a default
        if len(missing_from_file) > 0:
            fields_with_defaults = [
                field
                for field in missing_from_file
                if inspect._attribute_has_default(field_name_to_attribute[field])
            ]
            # remove optional class fields from the fields
            missing_from_file = missing_from_file.difference(fields_with_defaults)

        # raise an exception if there are non-optional class fields missing from the file
        if len(missing_from_file) > 0:
            raise ValueError(
                f"In file: {path}, fields in file missing from class '{cls.__name__}': "
                + ", ".join(missing_from_file)
            )

        # raise an exception if there are fields in the file not in the header, unless they
        # should be ignored.
        if not ignore_extra_fields and len(missing_from_class) > 0:
            raise ValueError(
                f"In file: {path}, extra fields in file missing from class '{cls.__name__}': "
                ", ".join(missing_from_file)
            )

        # read the metric lines
        for lineno, line in enumerate(reader, 2):
            # parse the raw values
            values: List[str] = line.rstrip("\r\n").split("\t")
            if strip_whitespace:
                values = [v.strip() for v in values]

            # raise an exception if there aren't the same number of values as the header
            if len(header) != len(values):
                raise ValueError(
                    f"In file: {path}, expected {len(header)} columns, got {len(values)} on "
                    f"line {lineno}: {line}"
                )

            # build the metric
            instance: Metric[MetricType] = inspect.attr_from(
                cls=cls, kwargs=dict(zip(header, values)), parsers=parsers
            )
            yield instance

to_list `classmethod` ¶

to_list(value: str) -> List[Any]

Returns a list value split on comma delimeter.

Source code in fgpyo/util/metric.py

@classmethod
def to_list(cls, value: str) -> List[Any]:
    """Returns a list value split on comma delimeter."""
    return [] if value == "" else value.split(",")

values ¶

values() -> Iterator[Any]

An iterator over attribute values in the same order as the header.

Source code in fgpyo/util/metric.py

def values(self) -> Iterator[Any]:
    """An iterator over attribute values in the same order as the header."""
    for field in inspect.get_fields(self.__class__):  # type: ignore[arg-type]
        yield getattr(self, field.name)

write `classmethod` ¶

write(path: Path, *values: MetricType, threads: Optional[int] = None) -> None

Writes zero or more metrics to the given path.

The header will always be written.

Parameters:

Name	Type	Description	Default
`path`	`Path`	Path to the output file.	required
`values`	`MetricType`	Zero or more metrics.	`()`
`threads`	`Optional[int]`	the number of threads to use when compressing gzip files	`None`

Source code in fgpyo/util/metric.py

@classmethod
def write(cls, path: Path, *values: MetricType, threads: Optional[int] = None) -> None:
    """Writes zero or more metrics to the given path.

    The header will always be written.

    Args:
        path: Path to the output file.
        values: Zero or more metrics.
        threads: the number of threads to use when compressing gzip files

    """
    with MetricWriter[MetricType](path, metric_class=cls, threads=threads) as writer:
        writer.writeall(values)

MetricFileHeader `dataclass` ¶

Header of a file.

A file's header contains an optional preamble, consisting of lines prefixed by a comment character and/or empty lines, and a required row of fieldnames before the data rows begin.

Attributes:

Name	Type	Description
`preamble`	`List[str]`	A list of any lines preceding the fieldnames.
`fieldnames`	`List[str]`	The field names specified in the final line of the header.

Source code in fgpyo/util/metric.py

@dataclass(frozen=True)
class MetricFileHeader:
    """
    Header of a file.

    A file's header contains an optional preamble, consisting of lines prefixed by a comment
    character and/or empty lines, and a required row of fieldnames before the data rows begin.

    Attributes:
        preamble: A list of any lines preceding the fieldnames.
        fieldnames: The field names specified in the final line of the header.
    """

    preamble: List[str]
    fieldnames: List[str]

MetricWriter ¶

Bases: Generic[MetricType], AbstractContextManager

Source code in fgpyo/util/metric.py

class MetricWriter(Generic[MetricType], AbstractContextManager):
    _metric_class: Type[Metric]
    _fieldnames: List[str]
    _fout: TextIOWrapper
    _writer: DictWriter

    def __init__(
        self,
        filename: Union[Path, str],
        metric_class: Type[Metric],
        append: bool = False,
        delimiter: str = "\t",
        include_fields: Optional[List[str]] = None,
        exclude_fields: Optional[List[str]] = None,
        lineterminator: str = "\n",
        threads: Optional[int] = None,
    ) -> None:
        """
        Args:
            filename: Path to the file to write.
            metric_class: Metric class.
            append: If `True`, the file will be appended to. Otherwise, the specified file will be
                overwritten.
            delimiter: The output file delimiter.
            include_fields: If specified, only the listed fieldnames will be included when writing
                records to file. Fields will be written in the order provided.
                May not be used together with `exclude_fields`.
            exclude_fields: If specified, any listed fieldnames will be excluded when writing
                records to file.
                May not be used together with `include_fields`.
            lineterminator: The string used to terminate lines produced by the MetricWriter.
                Default = "\n".
            threads: the number of threads to use when compressing gzip files

        Raises:
            TypeError: If the provided metric class is not a dataclass- or attr-decorated
                subclass of `Metric`.
            AssertionError: If the provided filepath is not writable.
            AssertionError: If `append=True` and the provided file is not readable. (When appending,
                we check to ensure that the header matches the specified metric class. The file must
                be readable to get the header.)
            ValueError: If `append=True` and the provided file is a FIFO (named pipe).
            ValueError: If `append=True` and the provided file does not include a header.
            ValueError: If `append=True` and the header of the provided file does not match the
                specified metric class and the specified include/exclude fields.
        """

        filepath: Path = Path(filename)
        if (filepath.is_fifo() or filepath.is_char_device()) and append:
            raise ValueError("Cannot append to stdout, stderr, or other named pipe or stream")

        ordered_fieldnames: List[str] = _validate_and_generate_final_output_fieldnames(
            metric_class=metric_class,
            include_fields=include_fields,
            exclude_fields=exclude_fields,
        )

        _assert_is_metric_class(metric_class)
        io.assert_path_is_writable(filepath)
        if append:
            io.assert_path_is_readable(filepath)
            _assert_file_header_matches_metric(
                path=filepath,
                metric_class=metric_class,
                ordered_fieldnames=ordered_fieldnames,
                delimiter=delimiter,
            )

        self._metric_class = metric_class
        self._fieldnames = ordered_fieldnames
        self._fout = io.to_writer(filepath, append=append, threads=threads)
        self._writer = DictWriter(
            f=self._fout,
            fieldnames=self._fieldnames,
            delimiter=delimiter,
            lineterminator=lineterminator,
        )

        # If we aren't appending to an existing file, write the header before any rows
        if not append:
            self._writer.writeheader()

    def __enter__(self) -> "MetricWriter":
        return self

    def __exit__(
        self,
        exc_type: Type[BaseException],
        exc_value: BaseException,
        traceback: TracebackType,
    ) -> None:
        self.close()
        super().__exit__(exc_type, exc_value, traceback)

    def close(self) -> None:
        """Close the underlying file handle."""
        self._fout.close()

    def write(self, metric: MetricType) -> None:
        """
        Write a single Metric instance to file.

        The Metric is converted to a dictionary and then written using the underlying
        `csv.DictWriter`. If the `MetricWriter` was created using the `include_fields` or
        `exclude_fields` arguments, the fields of the Metric are subset and/or reordered
        accordingly before writing.

        Args:
            metric: An instance of the specified Metric.

        Raises:
            TypeError: If the provided `metric` is not an instance of the Metric class used to
                parametrize the writer.
        """

        # Serialize the Metric to a dict for writing by the underlying `DictWriter`
        row = {fieldname: val for fieldname, val in metric.formatted_items()}

        # Filter and/or re-order output fields if necessary
        row = {fieldname: row[fieldname] for fieldname in self._fieldnames}

        self._writer.writerow(row)

    def writeall(self, metrics: Iterable[MetricType]) -> None:
        """
        Write multiple Metric instances to file.

        Each Metric is converted to a dictionary and then written using the underlying
        `csv.DictWriter`. If the `MetricWriter` was created using the `include_fields` or
        `exclude_fields` arguments, the attributes of each Metric are subset and/or reordered
        accordingly before writing.

        Args:
            metrics: A sequence of instances of the specified Metric.
        """
        for metric in metrics:
            self.write(metric)

Functions¶

init ¶

__init__(filename: Union[Path, str], metric_class: Type[Metric], append: bool = False, delimiter: str = '\t', include_fields: Optional[List[str]] = None, exclude_fields: Optional[List[str]] = None, lineterminator: str = '\n', threads: Optional[int] = None) -> None

    Args:
        filename: Path to the file to write.
        metric_class: Metric class.
        append: If `True`, the file will be appended to. Otherwise, the specified file will be
            overwritten.
        delimiter: The output file delimiter.
        include_fields: If specified, only the listed fieldnames will be included when writing
            records to file. Fields will be written in the order provided.
            May not be used together with `exclude_fields`.
        exclude_fields: If specified, any listed fieldnames will be excluded when writing
            records to file.
            May not be used together with `include_fields`.
        lineterminator: The string used to terminate lines produced by the MetricWriter.
            Default = "

". threads: the number of threads to use when compressing gzip files

    Raises:
        TypeError: If the provided metric class is not a dataclass- or attr-decorated
            subclass of `Metric`.
        AssertionError: If the provided filepath is not writable.
        AssertionError: If `append=True` and the provided file is not readable. (When appending,
            we check to ensure that the header matches the specified metric class. The file must
            be readable to get the header.)
        ValueError: If `append=True` and the provided file is a FIFO (named pipe).
        ValueError: If `append=True` and the provided file does not include a header.
        ValueError: If `append=True` and the header of the provided file does not match the
            specified metric class and the specified include/exclude fields.

Source code in fgpyo/util/metric.py

def __init__(
    self,
    filename: Union[Path, str],
    metric_class: Type[Metric],
    append: bool = False,
    delimiter: str = "\t",
    include_fields: Optional[List[str]] = None,
    exclude_fields: Optional[List[str]] = None,
    lineterminator: str = "\n",
    threads: Optional[int] = None,
) -> None:
    """
    Args:
        filename: Path to the file to write.
        metric_class: Metric class.
        append: If `True`, the file will be appended to. Otherwise, the specified file will be
            overwritten.
        delimiter: The output file delimiter.
        include_fields: If specified, only the listed fieldnames will be included when writing
            records to file. Fields will be written in the order provided.
            May not be used together with `exclude_fields`.
        exclude_fields: If specified, any listed fieldnames will be excluded when writing
            records to file.
            May not be used together with `include_fields`.
        lineterminator: The string used to terminate lines produced by the MetricWriter.
            Default = "\n".
        threads: the number of threads to use when compressing gzip files

    Raises:
        TypeError: If the provided metric class is not a dataclass- or attr-decorated
            subclass of `Metric`.
        AssertionError: If the provided filepath is not writable.
        AssertionError: If `append=True` and the provided file is not readable. (When appending,
            we check to ensure that the header matches the specified metric class. The file must
            be readable to get the header.)
        ValueError: If `append=True` and the provided file is a FIFO (named pipe).
        ValueError: If `append=True` and the provided file does not include a header.
        ValueError: If `append=True` and the header of the provided file does not match the
            specified metric class and the specified include/exclude fields.
    """

    filepath: Path = Path(filename)
    if (filepath.is_fifo() or filepath.is_char_device()) and append:
        raise ValueError("Cannot append to stdout, stderr, or other named pipe or stream")

    ordered_fieldnames: List[str] = _validate_and_generate_final_output_fieldnames(
        metric_class=metric_class,
        include_fields=include_fields,
        exclude_fields=exclude_fields,
    )

    _assert_is_metric_class(metric_class)
    io.assert_path_is_writable(filepath)
    if append:
        io.assert_path_is_readable(filepath)
        _assert_file_header_matches_metric(
            path=filepath,
            metric_class=metric_class,
            ordered_fieldnames=ordered_fieldnames,
            delimiter=delimiter,
        )

    self._metric_class = metric_class
    self._fieldnames = ordered_fieldnames
    self._fout = io.to_writer(filepath, append=append, threads=threads)
    self._writer = DictWriter(
        f=self._fout,
        fieldnames=self._fieldnames,
        delimiter=delimiter,
        lineterminator=lineterminator,
    )

    # If we aren't appending to an existing file, write the header before any rows
    if not append:
        self._writer.writeheader()

close ¶

close() -> None

Close the underlying file handle.

Source code in fgpyo/util/metric.py

def close(self) -> None:
    """Close the underlying file handle."""
    self._fout.close()

write ¶

write(metric: MetricType) -> None

Write a single Metric instance to file.

The Metric is converted to a dictionary and then written using the underlying csv.DictWriter. If the MetricWriter was created using the include_fields or exclude_fields arguments, the fields of the Metric are subset and/or reordered accordingly before writing.

Parameters:

Name	Type	Description	Default
`metric`	`MetricType`	An instance of the specified Metric.	required

Raises:

Type	Description
`TypeError`	If the provided `metric` is not an instance of the Metric class used to parametrize the writer.

Source code in fgpyo/util/metric.py

def write(self, metric: MetricType) -> None:
    """
    Write a single Metric instance to file.

    The Metric is converted to a dictionary and then written using the underlying
    `csv.DictWriter`. If the `MetricWriter` was created using the `include_fields` or
    `exclude_fields` arguments, the fields of the Metric are subset and/or reordered
    accordingly before writing.

    Args:
        metric: An instance of the specified Metric.

    Raises:
        TypeError: If the provided `metric` is not an instance of the Metric class used to
            parametrize the writer.
    """

    # Serialize the Metric to a dict for writing by the underlying `DictWriter`
    row = {fieldname: val for fieldname, val in metric.formatted_items()}

    # Filter and/or re-order output fields if necessary
    row = {fieldname: row[fieldname] for fieldname in self._fieldnames}

    self._writer.writerow(row)

writeall ¶

writeall(metrics: Iterable[MetricType]) -> None

Write multiple Metric instances to file.

Each Metric is converted to a dictionary and then written using the underlying csv.DictWriter. If the MetricWriter was created using the include_fields or exclude_fields arguments, the attributes of each Metric are subset and/or reordered accordingly before writing.

Parameters:

Name	Type	Description	Default
`metrics`	`Iterable[MetricType]`	A sequence of instances of the specified Metric.	required

Source code in fgpyo/util/metric.py

def writeall(self, metrics: Iterable[MetricType]) -> None:
    """
    Write multiple Metric instances to file.

    Each Metric is converted to a dictionary and then written using the underlying
    `csv.DictWriter`. If the `MetricWriter` was created using the `include_fields` or
    `exclude_fields` arguments, the attributes of each Metric are subset and/or reordered
    accordingly before writing.

    Args:
        metrics: A sequence of instances of the specified Metric.
    """
    for metric in metrics:
        self.write(metric)

metric

Metrics¶

Examples¶

Classes¶

Metric ¶

Functions¶

format_value `classmethod` ¶

formatted_items ¶

formatted_values ¶

header `classmethod` ¶

items ¶

keys `classmethod` ¶

parse `classmethod` ¶

read `classmethod` ¶

to_list `classmethod` ¶

values ¶

write `classmethod` ¶

MetricFileHeader `dataclass` ¶

MetricWriter ¶

Functions¶

init ¶

close ¶

write ¶

writeall ¶

Modules¶

metric

Metrics¶

Examples¶

Classes¶

Metric ¶

Functions¶

format_value classmethod ¶

formatted_items ¶

formatted_values ¶

header classmethod ¶

items ¶

keys classmethod ¶

parse classmethod ¶

read classmethod ¶

to_list classmethod ¶

values ¶

write classmethod ¶

MetricFileHeader dataclass ¶

MetricWriter ¶

Functions¶

__init__ ¶

close ¶

write ¶

writeall ¶

Modules¶

format_value `classmethod` ¶

header `classmethod` ¶

keys `classmethod` ¶

parse `classmethod` ¶

read `classmethod` ¶

to_list `classmethod` ¶

write `classmethod` ¶

MetricFileHeader `dataclass` ¶

init ¶