MULTISTEGANALYSIS USING CNN AND TRANSFORMER

Detect hidden data across media

StegVision is a Final Year Project system for image, audio, and video steganalysis. It combines transformer image inference, CNN audio analysis, spatial bit-plane statistics, JPEG-frequency evidence, residual texture support, and temporal video aggregation in one forensic API.

ANALYSE A FILE VIEW PIPELINE

AUDIO REALWORLD

TFM

IMAGE TRANSFORMER

v4b

VIDEO CALIBRATED

LIVE API FLOW

POST /predict file=evidence.jpg

OK media_type=image

OK transformer + spatial/frequency evidence

OK P(clean), P(stego), reliability

REPORT decision engine, evidence scores, latency

POST /predict file=clip.mp4

OK sample 32-128 frames (adaptive)

OK aggregate mean, P90, max, temporal artifacts

SUPPORTED MEDIA

One API, Three Pipelines

The frontend sends every upload to the same Flask endpoint. The API chooses the right CNN, transformer, or temporal forensic path from the file extension.

Images

A transformer image backend scores the file, then spatial LSB, JPEG-frequency, and residual texture engines add independent forensic evidence.

JPGPNGWEBPBMP

Audio

The audio branch uses a CNN over mel, PCM-LSB, and residual feature tensors, then runs SPA/RS forensics, codec profiling, and calibrated CNN fusion before final scoring.

WAVMP3FLACOGG

Video

OpenCV samples up to 128 frames adaptively. Each frame is scored by the image evidence ensemble, then mean, P90, maximum, support, and temporal artifacts are fused.

MP4AVIMOVMKV

PROJECT TEAM

Cybersecurity FYP 2025

Developed by Aroob Mukhtar, Muhammad Madni, and Umar Daraz under the supervision of Dr. Farhan Hassan.

MEET THE TEAM

STACK

CNN-Transformer Evidence Stack

The deployment is built around the actual inference path used by the website: transformer image analysis, CNN audio analysis, classical forensic evidence, and a JSON report that explains the decision.

Backend

Flask, ONNX Runtime, PyTorch audio inference, OpenCV frame extraction, and CORS for the static website.

Models

Stegformer transformer ONNX for image/video evidence, AudioStegNet CNN for audio, and deterministic forensic support engines.

Frontend

Static HTML, CSS, and JavaScript report confidence, evidence scores, reliability, frame charts, LSB visualisation, and downloadable JSON.