The project report should describe the Abouted system as multisteganalysis using CNN and Transformer. Results are reported as probability, confidence, reliability, and supporting evidence scores rather than unsupported accuracy claims.
Report the method as a CNN-Transformer multisteganalysis engine: transformer image inference, CNN audio inference, spatial LSB checks, JPEG-frequency checks, residual texture support, and video temporal aggregation. Report audio results from your trained checkpoints/audio_best.pth checkpoint when it is loaded.
| Media | Backend | Input | Output |
|---|---|---|---|
| Image | Transformer + forensic engines | grayscale 256x256 plus RGB bit-plane and DCT/residual statistics | P(stego), engine scores, reliability |
| Video | 24-frame temporal ensemble | sampled frames across duration | mean, P90, max, support, temporal artifact score |
| Audio | Audio CNN + forensic gate | mel + PCM-LSB + residual + spectral/sample-pair statistics | neural score, forensic score, final P(stego) |
No universal steganalysis model detects every hiding tool. Lossless image LSB, JPEG DCT-domain hiding, audio LSB, phase artifacts, and video frame-level hiding leave different traces. The system therefore reports multiple evidence channels and a reliability label so the result is more transparent.