An LSTM autoencoder learns to reconstruct healthy multivariate sensor windows from the NASA CMAPSS turbofan dataset. The reconstruction error becomes the anomaly score, calibrated against a held-out healthy validation set, and any window above the threshold is flagged. The trained model is wrapped in a typed FastAPI service, an interactive Streamlit demo, and a two-service Docker stack.
The project pairs a model trained on a real public benchmark with the deployment artefacts around it: a typed REST API, an interactive Streamlit demo, and a two-service Docker stack.
The core idea is reconstruction-based anomaly detection. An LSTM autoencoder is fitted only on healthy sliding windows of the CMAPSS turbofan dataset; at inference time, anomalous windows are reconstructed poorly and the per-window MSE serves as the anomaly score. The decision threshold is the 99th-percentile of the score on a healthy validation set — by construction a 1 % false-alarm rate budget on truly healthy data.
Engine-wise splits prevent leakage: distinct turbofans feed training, calibration, and evaluation. The test split labels each window anomaly = (RUL ≤ 30) from the run-to-failure ground truth, giving a clean binary task to score against.
Reconstruction-based score:
Training objective (healthy windows only):
Threshold calibration:
Decision rule:
Eight pieces connected end-to-end. Each lives in its own module with a dedicated responsibility.
Goal: turn the raw CMAPSS .txt files into engine-disjoint train/val/test windows with labels.
How: src/data.py downloads the NASA PCOE archive once, parses 24-column run-to-failure tables, drops the 7 near-constant sensors, builds 30-cycle sliding windows, and fits per-sensor z-score statistics on the healthy training rows only. RUL is annotated automatically; build_dataset returns the three splits plus the binary labels.
Goal: a sequence-to-sequence reconstructor with an explicit information bottleneck.
How: encoder LSTM (14 → 64), linear bottleneck (64 → 16), decoder LSTM driven by the bottleneck broadcast across 30 steps, linear head back to 14 channels. 44 510 trainable parameters. Defined in src/autoencoder.py.
Goal: fit the autoencoder on healthy windows only, monitor validation, save a portable checkpoint.
How: src/train.py — Adam (lr = 2·10⁻³, weight decay 10⁻⁵), cosine LR to zero across 80 epochs, gradient norm clipped at 1.0. The checkpoint persists weights, normalisation stats, and sensor list together so inference never has to re-derive them.
Goal: turn the score into a binary decision and quantify it.
How: src/evaluate.py — q99 of the val healthy errors gives τ ≈ 0.5515. Precision, recall, F1, accuracy, full ROC and PR sweeps and the confusion matrix are computed in NumPy (no scikit-learn dependency).
Goal: a typed inference endpoint with a real OpenAPI spec.
How: src/api.py — lifespan loads the checkpoint and the threshold once at startup. POST /predict validates the window shape with Pydantic, applies the saved normalisation, returns {score, threshold, is_anomaly}. /info exposes metadata, /health for orchestration probes.
Goal: a minimal but realistic operator UI.
How: src/app_streamlit.py — upload a CSV with the 14 informative sensor columns or pick a built-in CMAPSS test engine. The demo windows the input, scores each one, overlays the flagged regions on the input series, and lists the raw scores in an expander.
Goal: one command and the API and demo are running.
How: multi-stage Dockerfile (builder venv with CPU torch wheel + slim Python runtime) plus a docker-compose with a Python-only urllib healthcheck on the API and depends_on: service_healthy on the demo. Image stays under 1 GB; ./data/raw is mounted read-only so the built-in engine button works.
Goal: a service that does not regress silently.
How: 5 pytests via FastAPI TestClient cover /health, /info, the 422 on bad shape, a sensible score on a near-zero window, and an obviously broken window flagged. The full pipeline is reproduced by python main.py in ≈ 1 minute on a CPU laptop.
lifespan-driven model load, OpenAPI auto-generated.TestClient contract tests covering shape validation and the anomalous-input path.03-anomaly-detection-api/ ├── README.md # lab-guide README, embedded figures ├── requirements.txt ├── Dockerfile # builder venv + slim runtime ├── docker-compose.yml # api (:8000) + demo (:8501) with healthcheck ├── main.py # end-to-end pipeline ├── src/ │ ├── data.py # CMAPSS download + windows + splits │ ├── autoencoder.py # LSTMAutoencoder + AEConfig │ ├── train.py # training loop + checkpoint I/O │ ├── evaluate.py # threshold, metrics, ROC/PR (numpy) │ ├── plots.py # dark-theme static figures │ ├── api.py # FastAPI service │ └── app_streamlit.py # interactive demo ├── scripts/ │ ├── export_json.py # arrays → JSON for this page │ └── demo_preview.py # static replica of the demo ├── tests/ │ └── test_api.py # 5 contract tests via TestClient └── figures/ # tracked PNGs embedded in the README
/health, /info, /predict) with Pydantic validation and lifespan model loadTestClientEngine-disjoint splits on CMAPSS FD001: train/val on engines 1-80 (healthy windows only), test on engines 81-100 (all windows, labelled anomaly = RUL ≤ 30). Hover any chart for values, zoom with the toolbar, toggle traces in the legend.
train 0.447, val 0.456) is consistent with a model that has captured a generic healthy manifold rather than memorising specific training trajectories.FPR ≈ 0.11, TPR ≈ 0.94. AUC of 0.969 means the score itself ranks anomalous windows above healthy ones almost perfectly — the operating point is just where one decides to cut.RUL ≤ 30). The deflection-before-the-band phase is the source of most false positives — and the most useful behaviour for a maintenance-scheduling use case.The trained autoencoder is deployed as an interactive Streamlit app on Hugging Face Spaces. Click Use a built-in CMAPSS engine inside the demo to score a random engine from the held-out test split, or upload a CSV with the 14 informative sensor columns. The demo windows the input, scores each window, and overlays the flagged regions on the input series.
First load can take 20–40 s while the Hugging Face Space cold-starts; subsequent interactions are instant. If the iframe is blocked by your browser, use the Open in new tab link above.