The final implementation uses a CNN-Transformer evidence pipeline. Images use Stegformer ONNX plus spatial/frequency forensics. Audio uses multisegment AudioStegNet with SPA/RS discrimination and codec-aware calibration (v3). Video uses adaptive 32–128 frame sampling with H.264/DCT and pixel paths (v4b calibrated).
The API validates size and extension, then routes the file to the image, audio, or video analysis branch.
The image branch uses Stegformer ONNX for transformer-based clean/stego probability estimation.
Spatial LSB, JPEG-frequency, and residual texture modules add support evidence and reliability context.
The audio branch uses mel, PCM-LSB, and residual tensors, then runs SPA/RS forensics, codec profiling, and calibrated CNN fusion.
Videos are sampled into up to 128 frames adaptively, scored per frame, and fused by mean, P90, maximum, support, and temporal artifact metrics.
The website renders probability bars, evidence cards, visualisations, technical findings, and the raw JSON report.