Forensic Detection and Mitigation of Architectural Backdoors in Deep Learning Models
Deep learning models are increasingly used in safety-critical applications, but developers often deploy open-source or marketplace-shared models without vetting beyond weight files, trusting thousands of lines of accompanying code. This has created an emerging attack vector: architectural backdoors (ABs)—malicious computation implemented with standard DL operator primitives in the model implementation code, not in the weights. Existing defenses focus on weights (retraining, pruning, weight inspection) and are ineffective against ABs, which are rooted in executable code and logic.
ABs hide in plain sight as additional computation within the model, camouflaging among normal operator or layer definitions. They can be injected in pre-processing, inference, or post-processing, and often resemble legitimate framework-defined routines (e.g., stateless operators, parameter-less layers). In real-world forensics, investigators frequently lack source code—only compiled binaries or framework bytecode—and ABs can even be injected at runtime after deployment, circumventing pre-deployment vetting.
We address this by analyzing the model at the level of its computational DAG (directed acyclic graph of tensor operations) instead of low-level instructions. This gives a higher, semantic view of the model and makes it easier to spot malicious logic. CherryPAI is a DL model forensics framework that reconstructs this computational DAG from compiled bytecode and systematically identifies paths and operators that deviate from the expected architecture, enabling detection and removal of ABs without source code.
Official publication coming soon.