Test-Case Generation for Finding Neural Network Bugs
As neural networks are increasingly included as core components of safety-critical systems, developing effective testing techniques specialized for them becomes crucial. The bulk of the research has focused on testing neural-network models (for instance, their robustness and reliability as classifiers). But neural-network models are defined by writing programs (usually written in a programming language like Python), and there is growing evidence that these neural-network programs often have bugs. Thus, being able to effectively test neural-network programs is instrumental to their dependability. This paper presents aNNoTest: an approach to generating test inputs for neural-network programs. A fundamental challenge is that the dynamically-typed languages used to program neural networks cannot express detailed constraints about valid function inputs. Without knowing these constraints, automated test-case generation is prone to producing many invalid inputs, which trigger spurious failures and are useless for identifying real bugs. To address this problem, we introduce a simple annotation language tailored for expressing valid function inputs in neural-network programs. aNNoTest inputs an annotated program, and uses property-based testing to generate random inputs that satisfy the validity constraints. In the paper, we also outline guidelines that help reduce the effort needed to write aNNoTest annotations. We evaluated aNNoTest on 19 neural-network programs from Islam et al.'s survey. aNNoTest automatically generated test inputs that revealed 94 bugs, including 63 bugs that the survey reported for these projects. These results suggest that aNNoTest can be a cost-effective approach to finding widespread bugs in neural-network programs.
READ FULL TEXT