Inferring Input Grammars from Dynamic Control Flow

12/12/2019
by   Rahul Gopinath, et al.
0

A program is characterized by its input model, and a formal input model can be of use in diverse areas including vulnerability analysis, reverse engineering, fuzzing and software testing, clone detection and refactoring. Unfortunately, input models for typical programs are often unavailable or out of date. While there exist algorithms that can mine the syntactical structure of program inputs, they either produce unwieldy and incomprehensible grammars, or require heuristics that target specific parsing patterns. In this paper, we present a general algorithm that takes a program and a small set of sample inputs and automatically infers a readable context-free grammar capturing the input language of the program. We infer the syntactic input structure only by observing access of input characters at different locations of the input parser. This works on all program stack based recursive descent input parsers, including PEG and parser combinators, and can do entirely without program specific heuristics. Our Mimid prototype produced accurate and readable grammars for a variety of evaluation subjects, including expr, URLparse, and microJSON.

READ FULL TEXT

page 2

page 3

page 7

page 8

page 12

page 13

page 16

page 17

research
10/18/2018

Sample-Free Learning of Input Grammars for Comprehensive Software Fuzzing

Generating valid test inputs for a program is much easier if one knows t...
research
08/29/2017

Active Learning of Input Grammars

Knowing the precise format of a program's input is a necessary prerequis...
research
01/25/2017

Learn&Fuzz: Machine Learning for Input Fuzzing

Fuzzing consists of repeatedly testing an application with modified, or ...
research
05/13/2020

Pika parsing: parsing in reverse solves the left recursion and error recovery problems

A recursive descent parser is built from a set of mutually-recursive fun...
research
12/18/2018

Inputs from Hell Generating Uncommon Inputs from Common Samples

Generating structured input files to test programs can be performed by t...
research
01/20/2023

Blind Spots: Automatically detecting ignored program inputs

A blind spot is any input to a program that can be arbitrarily mutated w...
research
11/02/2019

WEIZZ: Automatic Grey-box Fuzzing for Structured Binary Formats

Fuzzing technologies have evolved at a fast pace in recent years, reveal...

Please sign up or login with your details

Forgot password? Click here to reset