← Back to projects
03 / 05 Deep learning · MLOps Completed

Anomaly Detection API
Autoencoder + FastAPI + Docker

An LSTM autoencoder learns to reconstruct healthy multivariate sensor windows from the NASA CMAPSS turbofan dataset. The reconstruction error becomes the anomaly score, calibrated against a held-out healthy validation set, and any window above the threshold is flagged. The trained model is wrapped in a typed FastAPI service, an interactive Streamlit demo, and a two-service Docker stack.

// Overview

About this project

The project pairs a model trained on a real public benchmark with the deployment artefacts around it: a typed REST API, an interactive Streamlit demo, and a two-service Docker stack.

The core idea is reconstruction-based anomaly detection. An LSTM autoencoder is fitted only on healthy sliding windows of the CMAPSS turbofan dataset; at inference time, anomalous windows are reconstructed poorly and the per-window MSE serves as the anomaly score. The decision threshold is the 99th-percentile of the score on a healthy validation set — by construction a 1 % false-alarm rate budget on truly healthy data.

Engine-wise splits prevent leakage: distinct turbofans feed training, calibration, and evaluation. The test split labels each window anomaly = (RUL ≤ 30) from the run-to-failure ground truth, giving a clean binary task to score against.

Core formulas

Reconstruction-based score:

score(x) = ‖x − Dθ(Eθ(x))‖²

Training objective (healthy windows only):

minθ 𝔼x∈H[‖x − Dθ(Eθ(x))‖²]

Threshold calibration:

τ = quantile0.99(score(x), x ∈ valhealthy)

Decision rule:

flag(x) = 1 ⇔ score(x) > τ
// Components

How it's built

Eight pieces connected end-to-end. Each lives in its own module with a dedicated responsibility.

1 · Data pipeline

Goal: turn the raw CMAPSS .txt files into engine-disjoint train/val/test windows with labels.

How: src/data.py downloads the NASA PCOE archive once, parses 24-column run-to-failure tables, drops the 7 near-constant sensors, builds 30-cycle sliding windows, and fits per-sensor z-score statistics on the healthy training rows only. RUL is annotated automatically; build_dataset returns the three splits plus the binary labels.

2 · LSTM autoencoder

Goal: a sequence-to-sequence reconstructor with an explicit information bottleneck.

How: encoder LSTM (14 → 64), linear bottleneck (64 → 16), decoder LSTM driven by the bottleneck broadcast across 30 steps, linear head back to 14 channels. 44 510 trainable parameters. Defined in src/autoencoder.py.

3 · Training loop

Goal: fit the autoencoder on healthy windows only, monitor validation, save a portable checkpoint.

How: src/train.py — Adam (lr = 2·10⁻³, weight decay 10⁻⁵), cosine LR to zero across 80 epochs, gradient norm clipped at 1.0. The checkpoint persists weights, normalisation stats, and sensor list together so inference never has to re-derive them.

4 · Threshold + metrics

Goal: turn the score into a binary decision and quantify it.

How: src/evaluate.py — q99 of the val healthy errors gives τ ≈ 0.5515. Precision, recall, F1, accuracy, full ROC and PR sweeps and the confusion matrix are computed in NumPy (no scikit-learn dependency).

5 · FastAPI service

Goal: a typed inference endpoint with a real OpenAPI spec.

How: src/api.pylifespan loads the checkpoint and the threshold once at startup. POST /predict validates the window shape with Pydantic, applies the saved normalisation, returns {score, threshold, is_anomaly}. /info exposes metadata, /health for orchestration probes.

6 · Streamlit demo

Goal: a minimal but realistic operator UI.

How: src/app_streamlit.py — upload a CSV with the 14 informative sensor columns or pick a built-in CMAPSS test engine. The demo windows the input, scores each one, overlays the flagged regions on the input series, and lists the raw scores in an expander.

7 · Docker stack

Goal: one command and the API and demo are running.

How: multi-stage Dockerfile (builder venv with CPU torch wheel + slim Python runtime) plus a docker-compose with a Python-only urllib healthcheck on the API and depends_on: service_healthy on the demo. Image stays under 1 GB; ./data/raw is mounted read-only so the built-in engine button works.

8 · Tests + reproducibility

Goal: a service that does not regress silently.

How: 5 pytests via FastAPI TestClient cover /health, /info, the 422 on bad shape, a sensible score on a near-zero window, and an obviously broken window flagged. The full pipeline is reproduced by python main.py in ≈ 1 minute on a CPU laptop.

// Skills demonstrated

What skills it certifies

Applied deep learning
Sequence-to-sequence LSTM autoencoder with explicit linear bottleneck on multivariate time series.
Feature engineering
Per-sensor normalisation fitted on healthy rows only, sliding windows, low-variance sensor pruning.
Calibration
99th-percentile threshold from healthy val errors. ROC + PR sweeps in NumPy without scikit-learn.
Software engineering
Typed FastAPI + Pydantic, lifespan-driven model load, OpenAPI auto-generated.
Containerisation
Multi-stage Dockerfile, two-service compose, health-conditioned dependency on the API.
UX / demos
Streamlit upload + built-in engine flow with metric tiles and inline overlays.
Tests
FastAPI TestClient contract tests covering shape validation and the anomalous-input path.
Documentation
Lab-guide README, JSON metrics summary, machine-readable eval report.
PyTorch FastAPI Uvicorn Pydantic Streamlit Docker docker-compose Pandas Matplotlib pytest
// Structure

Project organization

03-anomaly-detection-api/
├── README.md             # lab-guide README, embedded figures
├── requirements.txt
├── Dockerfile            # builder venv + slim runtime
├── docker-compose.yml    # api (:8000) + demo (:8501) with healthcheck
├── main.py               # end-to-end pipeline
├── src/
│   ├── data.py           # CMAPSS download + windows + splits
│   ├── autoencoder.py    # LSTMAutoencoder + AEConfig
│   ├── train.py          # training loop + checkpoint I/O
│   ├── evaluate.py       # threshold, metrics, ROC/PR (numpy)
│   ├── plots.py          # dark-theme static figures
│   ├── api.py            # FastAPI service
│   └── app_streamlit.py  # interactive demo
├── scripts/
│   ├── export_json.py    # arrays → JSON for this page
│   └── demo_preview.py   # static replica of the demo
├── tests/
│   └── test_api.py       # 5 contract tests via TestClient
└── figures/             # tracked PNGs embedded in the README
// Roadmap

Project status

// Results

Outputs and final metrics

Engine-disjoint splits on CMAPSS FD001: train/val on engines 1-80 (healthy windows only), test on engines 81-100 (all windows, labelled anomaly = RUL ≤ 30). Hover any chart for values, zoom with the toolbar, toggle traces in the legend.

0.969
ROC-AUC on the test split · score-only ranking quality
93.7 %
recall at the calibrated threshold · 581/620 anomalous windows caught
60.8 %
precision · the bulk of false alarms is concentrated in the early-degradation phase
0.551
decision threshold · q99 of the healthy validation reconstruction error
Loading training loss…
Fig 1 · Training and validation loss. Reconstruction MSE on healthy CMAPSS windows, log-y. Training and validation tracks each other tightly across all 80 epochs — the small final gap (train 0.447, val 0.456) is consistent with a model that has captured a generic healthy manifold rather than memorising specific training trajectories.
Loading score distributions…
Fig 2 · Reconstruction error · healthy vs anomalous. Per-window MSE on the test split (log-y). Healthy windows (cyan) sit in a tight peak around 0.5; anomalous windows (amber) develop a long tail past MSE = 10. The threshold (pink dashed) cuts the tail off the healthy bulk almost exactly at the 99th-percentile of the healthy validation distribution.
Loading ROC curve…
Fig 3 · ROC and operating point. Full sweep of threshold values; the q99 calibration sits on the steep upper-left of the curve at FPR ≈ 0.11, TPR ≈ 0.94. AUC of 0.969 means the score itself ranks anomalous windows above healthy ones almost perfectly — the operating point is just where one decides to cut.
Loading engine trajectory…
Fig 4 · Per-engine reconstruction trajectory. Reconstruction MSE along one test engine's full life. A long flat segment at the healthy floor (≈ 0.45), an early-degradation deflection around RUL ≈ 50, and a near-exponential blow-up inside the labelled anomaly band (RUL ≤ 30). The deflection-before-the-band phase is the source of most false positives — and the most useful behaviour for a maintenance-scheduling use case.
// Live demo

Run the model in your browser

The trained autoencoder is deployed as an interactive Streamlit app on Hugging Face Spaces. Click Use a built-in CMAPSS engine inside the demo to score a random engine from the held-out test split, or upload a CSV with the 14 informative sensor columns. The demo windows the input, scores each window, and overlays the flagged regions on the input series.

First load can take 20–40 s while the Hugging Face Space cold-starts; subsequent interactions are instant. If the iframe is blocked by your browser, use the Open in new tab link above.