StegVision is a Final Year Project system for image, audio, and video steganalysis. It combines transformer image inference, CNN audio analysis, spatial bit-plane statistics, JPEG-frequency evidence, residual texture support, and temporal video aggregation in one forensic API.
The frontend sends every upload to the same Flask endpoint. The API chooses the right CNN, transformer, or temporal forensic path from the file extension.
A transformer image backend scores the file, then spatial LSB, JPEG-frequency, and residual texture engines add independent forensic evidence.
The audio branch uses a CNN over mel, PCM-LSB, and residual feature tensors, then runs SPA/RS forensics, codec profiling, and calibrated CNN fusion before final scoring.
OpenCV samples up to 128 frames adaptively. Each frame is scored by the image evidence ensemble, then mean, P90, maximum, support, and temporal artifacts are fused.
Developed by Aroob Mukhtar, Muhammad Madni, and Umar Daraz under the supervision of Dr. Farhan Hassan.
MEET THE TEAMThe deployment is built around the actual inference path used by the website: transformer image analysis, CNN audio analysis, classical forensic evidence, and a JSON report that explains the decision.
Flask, ONNX Runtime, PyTorch audio inference, OpenCV frame extraction, and CORS for the static website.
Stegformer transformer ONNX for image/video evidence, AudioStegNet CNN for audio, and deterministic forensic support engines.
Static HTML, CSS, and JavaScript report confidence, evidence scores, reliability, frame charts, LSB visualisation, and downloadable JSON.