
How do you detect AI with AI without ever leaving the browser? A deep technical dive into Error Level Analysis (ELA), Frequency Spikes, and Bio-Signal verification using WebAssembly and WebGPU.
Most people think Deepfake Detection is a game of "Spot the Glitch." They look for an extra finger or a weird blink. But for an engineer, these are the 'Easy Fakes'. The real battle is happening in the Frequency Domain and Compression Histograms. In 2026, detecting a sophisticated AI imposter is a high-stakes math problem.
At MojoDocs, we faced a unique challenge: How do we build a world-class forensic tool that runs entirely on a user's device? Standard AI detection requires massive Python servers and NVIDIA A100 GPUs. Our goal was to run it inside the browser's sandbox using WebAssembly (WASM) and WebGPU.
This 2500-word technical deep dive explains the four pillars of our detection engine: ELA, Fourier Transforms, CNNs, and rPPG.
Pillar 1: Error Level Analysis (ELA) & Compression Forensics
JPEG and WebP are "Lossy" formats. Every time you save an image, the quality drops slightly. In computer vision, this is known as 8x8 Block Artifacts. When a scammer uses a 'DeepFaceLab' or 'Faceswap' model, they generate a fake face and "paste" it onto a real image. They then save the result.
The Mathematical Flaw: The "New" face has a different 'Compression Age' than the original background. While the eye can't see the difference, a simple algorithm can. ELA works by resaving the image at a known quality (say 90%) and calculating the Difference Map between the uploaded image and the resaved one. Modified areas will appear as "Hotspots" in the output because they react differently to the new compression cycle.
Pillar 2: Frequency Domain Analysis (The Fourier Trap)
Generative models like GANs (Generative Adversarial Networks) or Diffusion Models build images using an "Up-sampling" layer. This process leaves behind Periodic Artifacts—mathematical heartbeats that are completely invisible to humans but look like "Spikes" in the frequency domain.
By applying a Fast Fourier Transform (FFT), our engine converts the image from 'spatial pixels' to 'frequency magnitudes'. A natural photo has a smooth distribution of frequencies. A deepfake has rhythmic "Bright Spots" in the corners of the FFT plot. We use a lightweight Random Forest Classifier to scan these plots for these synthetic signatures in under 100ms.
Pillar 3: rPPG – The "Pulse" of Reality
This is the most advanced part of our detector. Remote Photoplethysmography (rPPG) is a technology that detects the human heartbeat by measuring tiny color changes in the face as blood flows through the skin. Even if a deepfake looks perfect, the "Pulse" is often missing or is "Static."
Engineering the Pulse Detector
Our engine breaks the video into 30fps frames and isolates the Green Channel (which has the highest heart-rate signal contrast). We then apply a Band-pass Filter (typically between 0.7Hz and 4Hz) to extract the pulse signal. If the 'Pulse Spectrum' shows a clear, periodic peak (around 60-100 BPM), it’s a high indicator of biological origin. If it’s flat or erratic noise? You're looking at a bot.
Pillar 4: Running Inference in the Browser (WASM & WebGPU)
Running a 400MB Neural Network in a browser is impossible for most users. We had to optimize. We use TensorFlow.js with WASM backend.
- Quantization: We shrunken the model from 32-bit floats to 8-bit integers. This reduced the size by 4x with only a 1% drop in accuracy.
- SIMD Optimization: We use Single Instruction, Multiple Data (SIMD) in WebAssembly to process 16 pixels at once, making the 'Face Mesh' calculation instant even on a mobile phone.
- WebGPU: On modern machines, we offload the matrix multiplications to the user's graphics card directly from the browser, bypassing the CPU bottleneck.
Part 5: The "Ensemble" Strategy
No single method is foolproof. Scammers use 'Deblurring' to hide ELA artifacts. They use 'Noise Injection' to hide Frequency spikes. That's why MojoDocs uses an Ensemble Approach. We weigh the results from all four pillars to give you a single "Fidelity Score."
| Technique | Catches | Accuracy (v2.0) |
|---|---|---|
| ELA Forensics | Face Swaps / Edits | 88% |
| FFT Frequency | GAN / Diffusion Fakes | 92% |
| rPPG Pulse | Pre-recorded Video Fakes | 94% |
| CNN Mesh Audit | Geometry irregularities | 86% |
Conclusion: The "Zero-Knowledge" Security Paradigm
The engineering of MojoDocs is rooted in Privacy First. In the old world, security meant "Send your data to the expert (server)." In the 2026 world, security means "The expert (code) comes to your data."
By keeping the forensics local, we eliminate the 'Honey Pot' risk—no central database of scanned faces means no one can hack us to steal your identity. We are building the future of Self-Sovereign Identity Verification. If you are a developer, we invite you to explore our WASM implementation and join the movement against digital deception.


