API Reference
Anomsmith: A strict 4-layer architecture for anomaly detection.
- class anomsmith.BaseDetector(**params: Any)[source]
Bases:
BaseEstimatorBase class for anomaly detectors.
Detectors produce both scores and binary labels.
- class anomsmith.BaseScorer(**params: Any)[source]
Bases:
BaseEstimatorBase class for anomaly scorers.
Scorers assign anomaly scores to time series points. Higher scores indicate more anomalous points.
- class anomsmith.IQRScorer(factor: float = 1.5, random_state: int | None = None)[source]
Bases:
BaseScorerInterquartile Range (IQR) based outlier scorer.
Computes outlier scores based on IQR bounds. Higher scores indicate more anomalous points.
- Parameters:
factor – IQR multiplier for outlier bounds (default: 1.5)
random_state – Random state for reproducibility (not used, kept for compatibility)
- class anomsmith.IsolationForestDetector(contamination: float = 0.05, n_estimators: int = 200, random_state: int | None = None, n_jobs: int = -1)[source]
Bases:
BaseDetectorIsolation Forest anomaly detector.
Isolation Forest is an ensemble method that isolates anomalies by randomly selecting features and splitting values.
- Parameters:
contamination – Expected proportion of outliers in the data
n_estimators – Number of base estimators
random_state – Random state for reproducibility
n_jobs – Number of jobs to run in parallel
- fit(y: ndarray | Series, X: ndarray | DataFrame | None = None) IsolationForestDetector[source]
Fit the Isolation Forest detector.
- Parameters:
y – Training data (target)
X – Optional features (if None, uses y)
- Returns:
Self for method chaining
- class anomsmith.LOFDetector(contamination: float = 0.05, n_neighbors: int = 20, random_state: int | None = None, n_jobs: int = -1)[source]
Bases:
BaseDetectorLocal Outlier Factor (LOF) anomaly detector.
LOF measures the local deviation of density of a given sample with respect to its neighbors.
- Parameters:
contamination – Expected proportion of outliers in the data
n_neighbors – Number of neighbors to use
random_state – Random state for reproducibility
n_jobs – Number of jobs to run in parallel
- fit(y: ndarray | Series, X: ndarray | DataFrame | None = None) LOFDetector[source]
Fit the LOF detector.
- Parameters:
y – Training data (target)
X – Optional features (if None, uses y)
- Returns:
Self for method chaining
- class anomsmith.PCADetector(n_components: float | int = 0.95, score_method: Literal['reconstruction', 'mahalanobis', 'both'] = 'reconstruction', contamination: float = 0.05, random_state: int | None = None)[source]
Bases:
BaseDetectorPCA-based anomaly detector.
Uses Principal Component Analysis to model healthy operation boundaries. Anomalies are detected using either: - Mahalanobis distance in the principal component space - Reconstruction error (difference between original and reconstructed data)
- Parameters:
n_components – Number of components to keep. If 0 < n_components < 1, select the number of components such that the amount of variance that needs to be explained is greater than the percentage specified.
score_method – Method for computing anomaly scores: - ‘reconstruction’: Use reconstruction error - ‘mahalanobis’: Use Mahalanobis distance in PC space - ‘both’: Use both and return average
contamination – Expected proportion of outliers in the data (used for threshold)
random_state – Random state for reproducibility
- fit(y: ndarray | Series | SeriesLike, X: ndarray | DataFrame | None = None) PCADetector[source]
Fit the PCA detector on healthy operation data.
- Parameters:
y – Training data (target)
X – Optional features (if None, uses y)
- Returns:
Self for method chaining
- class anomsmith.PanelLike(*args, **kwargs)[source]
Bases:
ProtocolProtocol for panel-like data: DataFrame with entity key plus time index.
Can be a DataFrame with MultiIndex (entity, time) or a regular DataFrame with an entity column and time index.
- index: DatetimeIndex | MultiIndex | Index
- class anomsmith.RobustCovarianceDetector(contamination: float = 0.05, support_fraction: float = 0.8, random_state: int | None = None)[source]
Bases:
BaseDetectorRobust Covariance (Elliptic Envelope) anomaly detector.
Assumes that the data is Gaussian distributed and fits an elliptic envelope to the data.
- Parameters:
contamination – Expected proportion of outliers in the data
support_fraction – Proportion of points to be used as support
random_state – Random state for reproducibility
- fit(y: ndarray | Series, X: ndarray | DataFrame | None = None) RobustCovarianceDetector[source]
Fit the Robust Covariance detector.
- Parameters:
y – Training data (target)
X – Optional features (if None, uses y)
- Returns:
Self for method chaining
- class anomsmith.SeriesLike(*args, **kwargs)[source]
Bases:
ProtocolProtocol for series-like data: pandas Series or single-column DataFrame.
Must have a datetime or integer index.
- index: DatetimeIndex | Index
- class anomsmith.ThresholdRule(method: Literal['absolute', 'quantile'], value: float, quantile: float | None = None)[source]
Bases:
objectRule for thresholding anomaly scores.
- method
‘absolute’ (use value directly) or ‘quantile’ (use quantile)
- Type:
Literal[‘absolute’, ‘quantile’]
- class anomsmith.ZScoreScorer(n_std: float = 3.0, random_state: int | None = None)[source]
Bases:
BaseScorerZ-score based anomaly scorer.
Computes absolute Z-scores relative to mean and standard deviation. Higher scores indicate more anomalous points.
- Parameters:
n_std – Number of standard deviations (used for thresholding, not scoring)
random_state – Random state for reproducibility (not used, kept for compatibility)
- anomsmith.backtest_detector(y: Series | ndarray | SeriesLike, detector: BaseDetector | BaseScorer, threshold_rule: ThresholdRule, labels: Series | ndarray | SeriesLike | None = None, n_splits: int = 5, min_train_size: int = 10) DataFrame[source]
Run backtest of detector across expanding windows.
- Parameters:
y – Time series to backtest on
detector – BaseDetector or BaseScorer instance
threshold_rule – ThresholdRule to apply
labels – Optional ground truth labels
n_splits – Number of splits
min_train_size – Minimum training set size
- Returns:
fold, precision, recall, f1, avg_run_length
- Return type:
pandas DataFrame with columns
- anomsmith.detect_anomalies(y: Series | ndarray | SeriesLike, detector: BaseDetector | BaseScorer, threshold_rule: ThresholdRule) DataFrame[source]
Detect anomalies in a time series.
- Parameters:
y – Time series to detect anomalies in
detector – BaseDetector or BaseScorer instance
threshold_rule – ThresholdRule to apply
- Returns:
pandas DataFrame with ‘score’ and ‘flag’ columns, indexed by y’s index
- anomsmith.score_anomalies(y: Series | ndarray | SeriesLike, scorer: BaseScorer) Series[source]
Score anomalies in a time series.
- Parameters:
y – Time series to score
scorer – BaseScorer instance
- Returns:
pandas Series of anomaly scores with same index as y
- anomsmith.sweep_thresholds(y: Series | ndarray | SeriesLike, scorer: BaseScorer, threshold_values: list[float] | ndarray, labels: Series | ndarray | SeriesLike | None = None) DataFrame[source]
Evaluate multiple threshold values and return metrics.
- Parameters:
y – Time series to score
scorer – BaseScorer instance
threshold_values – List of threshold values to evaluate
labels – Optional ground truth labels
- Returns:
threshold, precision, recall, f1 (metrics are NaN if labels not provided)
- Return type:
pandas DataFrame with columns
Objects
Layer 1: Data and representations.
This layer uses timesmith’s SeriesLike and PanelLike types for time series data. ScoreView and LabelView are kept for anomaly-specific outputs. No domain libraries (sklearn, matplotlib, etc.) are imported here. Only numpy and pandas are allowed.
- class anomsmith.objects.LabelView(index: Index, labels: ndarray)[source]
Bases:
objectImmutable view of binary anomaly labels aligned to an index.
- index
Time index (must match input series index)
- labels
Binary flags as 1D array (1 = anomaly, 0 = normal)
- Type:
- class anomsmith.objects.PanelLike(*args, **kwargs)[source]
Bases:
ProtocolProtocol for panel-like data: DataFrame with entity key plus time index.
Can be a DataFrame with MultiIndex (entity, time) or a regular DataFrame with an entity column and time index.
- index: DatetimeIndex | MultiIndex | Index
- class anomsmith.objects.ScoreView(index: Index, scores: ndarray)[source]
Bases:
objectImmutable view of anomaly scores aligned to an index.
- index
Time index (must match input series index)
- scores
Anomaly scores as 1D array (higher = more anomalous)
- Type:
- class anomsmith.objects.SeriesLike(*args, **kwargs)[source]
Bases:
ProtocolProtocol for series-like data: pandas Series or single-column DataFrame.
Must have a datetime or integer index.
- index: DatetimeIndex | Index
- anomsmith.objects.SeriesView
alias of
SeriesLike
- class anomsmith.objects.WindowSpec(length: int, step: int = 1, alignment: Literal['left', 'right', 'center'] = 'right')[source]
Bases:
objectSpecification for sliding or expanding windows.
- alignment
‘left’ (start at beginning), ‘right’ (end at current), or ‘center’ (centered on current point)
- Type:
Literal[‘left’, ‘right’, ‘center’]
Primitives
Layer 2: Primitives.
This layer defines algorithm interfaces and thin utilities. It must not know about tasks or evaluation. Only numpy and pandas are allowed (no sklearn, matplotlib, etc.).
- class anomsmith.primitives.BaseDetector(**params: Any)[source]
Bases:
BaseEstimatorBase class for anomaly detectors.
Detectors produce both scores and binary labels.
- class anomsmith.primitives.BaseEstimator(**params: Any)[source]
Bases:
BaseObjectBase class for estimators with fit and fitted state.
- _fitted
Whether the estimator has been fitted
- abstractmethod fit(y: ndarray | Series | SeriesLike, X: ndarray | DataFrame | None = None) BaseEstimator[source]
Fit the estimator.
- Parameters:
y – Target values
X – Optional features
- Returns:
Self for method chaining
- class anomsmith.primitives.BaseObject(**params: Any)[source]
Bases:
ABCBase class for all primitives with parameter management.
Provides get_params, set_params, clone, and repr methods.
- clone() BaseObject[source]
Create a deep copy of this object.
- Returns:
Deep copy of this object
- get_params(deep: bool = True) dict[str, Any][source]
Get parameters for this object.
- Parameters:
deep – If True, return deep copy of parameters
- Returns:
Dictionary of parameter names to values
- set_params(**params: Any) BaseObject[source]
Set parameters for this object.
- Parameters:
**params – Parameters to set
- Returns:
Self for method chaining
- class anomsmith.primitives.BaseScorer(**params: Any)[source]
Bases:
BaseEstimatorBase class for anomaly scorers.
Scorers assign anomaly scores to time series points. Higher scores indicate more anomalous points.
- class anomsmith.primitives.ThresholdRule(method: Literal['absolute', 'quantile'], value: float, quantile: float | None = None)[source]
Bases:
objectRule for thresholding anomaly scores.
- method
‘absolute’ (use value directly) or ‘quantile’ (use quantile)
- Type:
Literal[‘absolute’, ‘quantile’]
- anomsmith.primitives.apply_threshold(score_view: ScoreView, rule: ThresholdRule) LabelView[source]
Apply threshold rule to scores to produce binary labels.
- Parameters:
score_view – ScoreView with anomaly scores
rule – ThresholdRule to apply
- Returns:
LabelView with binary labels (1 = anomaly, 0 = normal)
- anomsmith.primitives.robust_zscore(values: ndarray, epsilon: float = 1e-08) ndarray[source]
Compute robust z-scores using median and MAD.
Uses median as center and Median Absolute Deviation (MAD) as scale. Includes epsilon guard to prevent division by zero.
- Parameters:
values – Input values to scale
epsilon – Small value to prevent division by zero
- Returns:
Robust z-scores (same shape as input)
Workflows
Layer 4: Workflows.
Workflows provide the public entry points users call. Workflows can import matplotlib only if plots are added (not in first pass).
- anomsmith.workflows.backtest_detector(y: Series | ndarray | SeriesLike, detector: BaseDetector | BaseScorer, threshold_rule: ThresholdRule, labels: Series | ndarray | SeriesLike | None = None, n_splits: int = 5, min_train_size: int = 10) DataFrame[source]
Run backtest of detector across expanding windows.
- Parameters:
y – Time series to backtest on
detector – BaseDetector or BaseScorer instance
threshold_rule – ThresholdRule to apply
labels – Optional ground truth labels
n_splits – Number of splits
min_train_size – Minimum training set size
- Returns:
fold, precision, recall, f1, avg_run_length
- Return type:
pandas DataFrame with columns
- anomsmith.workflows.detect_anomalies(y: Series | ndarray | SeriesLike, detector: BaseDetector | BaseScorer, threshold_rule: ThresholdRule) DataFrame[source]
Detect anomalies in a time series.
- Parameters:
y – Time series to detect anomalies in
detector – BaseDetector or BaseScorer instance
threshold_rule – ThresholdRule to apply
- Returns:
pandas DataFrame with ‘score’ and ‘flag’ columns, indexed by y’s index
- anomsmith.workflows.report_detection(y: Series | ndarray | SeriesLike, detector: BaseDetector | BaseScorer, threshold_rule: ThresholdRule) dict[str, Any][source]
Generate detection report with summary stats.
- Parameters:
y – Time series that was analyzed
detector – BaseDetector or BaseScorer instance used
threshold_rule – ThresholdRule applied
- Returns:
Dictionary with summary stats and top anomaly timestamps
- anomsmith.workflows.score_anomalies(y: Series | ndarray | SeriesLike, scorer: BaseScorer) Series[source]
Score anomalies in a time series.
- Parameters:
y – Time series to score
scorer – BaseScorer instance
- Returns:
pandas Series of anomaly scores with same index as y
- anomsmith.workflows.sweep_thresholds(y: Series | ndarray | SeriesLike, scorer: BaseScorer, threshold_values: list[float] | ndarray, labels: Series | ndarray | SeriesLike | None = None) DataFrame[source]
Evaluate multiple threshold values and return metrics.
- Parameters:
y – Time series to score
scorer – BaseScorer instance
threshold_values – List of threshold values to evaluate
labels – Optional ground truth labels
- Returns:
threshold, precision, recall, f1 (metrics are NaN if labels not provided)
- Return type:
pandas DataFrame with columns