OnlineMetrics

OnlineMetrics.OnlineMetrics — Module

OnlineMetrics

OnlineMetrics provides metrics for online, streaming, and batched machine learning workflows. Metrics implement a small, consistent interface so they can be updated batch-by-batch, merged across devices/processes, and queried for current values with minimal overhead.

Features

Compute metrics in batches - no need to store full datasets.
Track both indivudual and multiple metrics.
Mergeable states for distributed/parallel aggregation.
Pluggable data formats (e.g., OneHot) with validation and formatting hooks.
Small, explicit API surface for easy extension.

Design philosophy

Minimal, explicit interface: Implement AbstractMetric and the handful of required functions (name, initial_state, batch_state, merge_state, metric_value) to add a new metric.
Robust input handling: Metrics declare or accept data formats; validation and formatting are separated from core metric logic.
Usability: Friendly Base.show representations and MetricCollection tree-printing for quick inspection.

Quickstart

Install:

using Pkg
Pkg.add("OnlineMetrics")

Basic usage:

using OnlineMetrics

m = Metric(Accuracy(2))                           # track a single metric
mc = MetricCollection(Accuracy(2), Precision(2))  # track multiple metrics

step!(m, [0,1,1,0], [0,1,0,0])                    # update with a batch
step!(mc, [0,1,1,0], [0,1,0,0])                   # update collection

value(m)                                          # current value of single metric
value(mc)                                         # NamedTuple of values for collection

Merging states (useful for distributed runs):

m1 = Metric(Accuracy(2)); step!(m1, preds1, labels1)
m2 = Metric(Accuracy(2)); step!(m2, preds2, labels2)
m_merged = merge(m1, m2)
value(m_merged)

Examples

See the src folder for metric implementations and refer to the docs for usage examples.

Development & testing

Run tests from the package root:

julia --project=. -e 'using Pkg; Pkg.test()'

Or in the package REPL:

] test OnlineMetrics

Contributing

Contributions welcome. Please open issues or pull requests for bug fixes, new metrics, or API improvements. Follow existing style in src and add tests to runtests.jl.

License

Distributed under the terms of the MIT license. See the LICENSE file.

Metric Tracking

OnlineMetrics.Metric — Type

Metric(m::AbstractMetric; name=name(m))

An object to track a single metric and its state over multiple mini-batches.

Parameters

m::AbstractMetric: A metric to be tracked.
name::String: An optional name for the metric. Defaults to the name of the metric type.

Example

julia> m = Metric(Accuracy(2); name="My Accuracy Metric");

julia> step!(m, [0,1,0,1], [0,1,1,1]);

julia> value(m)
0.75

OnlineMetrics.MetricCollection — Type

MetricCollection(metrics...)

An object to track one or more metrics concurrently.

Parameters

metrics...: A variable number of metrics to be tracked. Each metric can be provided either as an instance of a subtype of AbstractMetric or as a Pair{String, AbstractMetric} to override the default name of the metric.

Example

julia> m = MetricCollection(Accuracy(2), "precision" => Precision(2, agg=nothing), mIoU(2))
MetricCollection
├─ accuracy: value=0.0 | correct=0 | n=0
├─ precision: value=[1.0, 1.0] | tp=[0, 0] | fp=[0, 0]
└─ mIoU: value=1.0 | intersection=[0, 0] | union=[0, 0]


julia> step!(m, [0, 0, 1, 0], [0, 0, 1, 1])
MetricCollection
├─ accuracy: value=0.75 | correct=3 | n=4
├─ precision: value=[0.666667, 1.0] | tp=[2, 1] | fp=[1, 0]
└─ mIoU: value=0.583333 | intersection=[2, 1] | union=[3, 2]

OnlineMetrics.step! — Function

step!(x::Metric, y_pred, y_true)
step!(x::MetricCollection, y_pred, y_true)

Update the metric(s) with a new batch of predictions and true labels.

OnlineMetrics.merge — Function

merge(metrics...)

Merge muliple Metric or MetricCollection objects into a single object.

Useful for aggregating metrics across multiple devices or processes.

OnlineMetrics.value — Function

value(x::Metric)
value(x::MetricCollection)

Compute the current value of the metric(s).

Metric Interface

OnlineMetrics.AbstractMetric — Type

Metrics are measures of a model's performance, such as loss, accuracy, or squared error.

Metrics are updated incrementally as new data arrives, making them suitable for online learning scenarios.

Each metric must implement the following interface:

name: Returns the human-readable name of the metric.
initial_state: Returns the initial state of the metric.
batch_state: Computes the metric's state for a single batch of predictions and labels.
merge_state: Merges two metric states into a single state.
metric_value: Computes the metric's value from its current state.

Optional Methods

data_format: Returns the data format expected by the metric, or nothing if no specific format is required. Defaults to nothing.

OnlineMetrics.ClassificationMetric — Type

Classification metrics are used to evaluate the performance of models that predict a discrete label for each observation. Subtypes of ClassificationMetric have a default data format of OneHot, which provides batch_state with Matrix{Bool} inputs representing one-hot encoded class labels.

OnlineMetrics.name — Function

name(m::AbstractMetric)

Return the human readable name of the metric.

OnlineMetrics.initial_state — Function

initial_state(m::AbstractMetric) -> state

Return the initial state of the metric.

OnlineMetrics.batch_state — Function

batch_state(m::AbstractMetric, y_pred, y_true) -> state

Compute the metric's state for a single batch of predictions and labels.

The type of y_pred and y_true should not be specialized. Data formatting and validaton should instead handled by the format and validate methods defined by the metric's data format. This ensures that users receive more informative error messages when providing invalid data.

OnlineMetrics.merge_state — Function

merge_state(m::AbstractMetric, state1, state2) -> merged_state

Merge two metric states into a single state.

OnlineMetrics.metric_value — Function

metric_value(m::AbstractMetric, state)

Compute the metric's value from its current state.

OnlineMetrics.data_format — Function

data_format(m::AbstractMetric) -> Union{AbstractDataFormat, Nothing}

Return the data format expected by the metric, or nothing if no specific format is required.

OnlineMetrics.step — Function

step(m::AbstractMetric, y_pred, y_true, oldstate) -> newstate

Update the metric state for the given batch of labels and predictions.

Parameters

m::AbstractMetric: The metric to be updated.
y_pred: The model predictions for the current batch.
y_true: The true labels for the current batch.
oldstate: The previous state of the metric.

Returns

newstate: The updated state of the metric.

Classification Metrics

OnlineMetrics.Accuracy — Type

Accuracy(nclasses::Int)

Measures the model's overall accuracy as correct / n.

Parameters

nclasses::Int: The number of classes for the classification task.

OnlineMetrics.IoU — Type

IoU(nclasses::Int; agg=nothing)

Intersection over Union (IoU) is a measure of the overlap between a prediction and a label.

Parameters

nclasses::Int: The number of classes for the classification task.

Keyword Parameters

agg: Specifies the type of IoU aggregation to be computed. The possible values are:
- :mean: Calculates mean IoU (mIoU) by averaging the IoU across all classes.
- nothing: Calculates the per-class IoU, which is returned as a Vector with the same length as classes.

OnlineMetrics.mIoU — Function

mIoU(nclasses::Int)

A convenience constructor for IoU that defaults to mean aggregation.

OnlineMetrics.Precision — Type

Precision(nclasses::Int; agg=:macro)

Precision is the ratio of true positives to the sum of true positives and false positives, measuring the accuracy of positive predictions.

Parameters

nclasses::Int: The number of classes for the classification task.

Keyword Parameters

agg: Specifies the type of precision aggregation to be computed. The possible values are:
- :macro: Calculates macro-averaged precision, which computes the precision for each class independently and then takes the average.
- :micro: Calculates micro-averaged precision, which aggregates the contributions of all classes to compute a single precision value.
- nothing: Calculates the per-class precision, which is returned as a Vector with the same length as classes.

OnlineMetrics.BinaryPrecision — Type

BinaryPrecision()

A variant of Precision specialized for binary classification.

OnlineMetrics.Recall — Type

Recall(nclasses::Int; agg=:macro)

Recall, also known as sensitivity or true positive rate, is the ratio of true positives to the sum of true positives and false negatives, measuring the ability of the classifier to identify all positive instances.

Parameters

nclasses::Int: The number of classes for the classification task.

Keyword Parameters

agg: Specifies the type of recall aggregation to be computed. The possible values are:
- :macro: Calculates macro-averaged recall, which computes the recall for each class independently and then takes the average.
- :micro: Calculates micro-averaged recall, which aggregates the contributions of all classes to compute a single recall value.
- :nothing: Calculates the per-class recall, which is returned as a Vector with the same length as classes.

OnlineMetrics.BinaryRecall — Type

BinaryRecall()

A variant of Recall specialized for binary classification.

OnlineMetrics.F1Score — Type

F1Score(nclasses::Int; agg=:macro)

F1 Score is the harmonic mean of precision and recall, providing a single metric that balances both concerns.

Parameters

nclasses::Int: The number of classes for the classification task.

Keyword Parameters

agg: Specifies the type of F1 score aggregation to be computed. The possible values are:
- :macro: Calculates macro-averaged F1 score, which computes the F1 score for each class independently and then takes the average.
- :micro: Calculates micro-averaged F1 score, which aggregates the contributions of all classes to compute a single F1 score.
- nothing: Calculates the per-class F1 score, which is returned as a Vector with the same length as classes.

OnlineMetrics.ConfusionMatrix — Type

ConfusionMatrix(nclasses::Int)

Calculate the confusion matrix over two or more classes. The columns of the resulting nclasses x nclasses matrix correspond to the true label while the rows correspond to the prediction.

Parameters

nclasses::Int: The number of possible classes in the classification task.

Other Metrics

OnlineMetrics.AverageMeasure — Type

AverageMeasure(measure::Function, name::String)

Tracks the average value of a given measure over mini-batches. Typically used to track loss functions or regression metrics.

Data Formats

OnlineMetrics.AbstractDataFormat — Type

AbstractDataFormat

An abstract type representing a data format for classification tasks.

Subtypes should implement the format and validate methods.

OnlineMetrics.OneHot — Type

OneHot(nclasses::Int)

A data format consisting of one-hot encoded class labels for nclasses classes.

Input

If the input data x is an array of shape (D...,1,N) or (N,), it is interpreted as class logits in the range [0, nclasses-1].
If the input data x is an array of shape (D...,nclasses,N), it is interpreted as one-hot encoded vectors.

Output

The output is a Matrix{Bool} of shape (nclasses, N).

OnlineMetrics.format — Function

format(df::Nothing, x::Array{<:Real})
format(df::AbstractDataFormat, x::Array{<:Real})

Format the input data x according to the specified data format df.

Parameters

df: An instance of a subtype of AbstractDataFormat or nothing.
x: An array of real-valued data to be formatted.

Returns

The formatted data.

OnlineMetrics.validate — Function

validate(df::Nothing, x::Array{<:Real})
validate(df::AbstractDataFormat, x::Array{<:Real})

Validate that the input data x conforms to the specified data format df.

Raises an ArgumentError if the data does not conform.

Parameters

df: An instance of a subtype of AbstractDataFormat or nothing.
x: An array of real-valued data to be validated.

Index

OnlineMetrics.OnlineMetrics
OnlineMetrics.AbstractDataFormat
OnlineMetrics.AbstractMetric
OnlineMetrics.Accuracy
OnlineMetrics.AverageMeasure
OnlineMetrics.BinaryPrecision
OnlineMetrics.BinaryRecall
OnlineMetrics.ClassificationMetric
OnlineMetrics.ConfusionMatrix
OnlineMetrics.F1Score
OnlineMetrics.IoU
OnlineMetrics.Metric
OnlineMetrics.MetricCollection
OnlineMetrics.OneHot
OnlineMetrics.Precision
OnlineMetrics.Recall
OnlineMetrics.batch_state
OnlineMetrics.data_format
OnlineMetrics.format
OnlineMetrics.initial_state
OnlineMetrics.mIoU
OnlineMetrics.merge
OnlineMetrics.merge_state
OnlineMetrics.metric_value
OnlineMetrics.name
OnlineMetrics.step
OnlineMetrics.step!
OnlineMetrics.validate
OnlineMetrics.value