News Article

Automated Vulnerability Analysis: Leveraging Control Flow for Evolutionary Input Crafting
Source: Company Data ( click here to go to the source)

Featured firm in this article: Clear Hat Consulting Inc of Orlando, FL

Automated Vulnerability Analysis:
Leveraging Control Flow for Evolutionary Input Crafting
Sherri Sparks, Shawn Embleton, Ryan Cunningham, Cliff Zou
University of Central Florida
{ssparks, embleton, czou}
We present an extension of traditional "black box" fuzz
testing using a genetic algorithm based upon a
Dynamic Markov Model fitness heuristic. This
heuristic allows us to "intelligently" guide input
selection based upon feedback concerning the
"success" of past inputs that have been tried. Unlike
many software testing tools, our implementation is
strictly based upon binary code and does not require
that source code be available. Our evaluation on a
Windows server program shows that this approach is
superior to random black box fuzzing for increasing
code coverage and depth of penetration into program
control flow logic. As a result, the technique may be
beneficial to the development of future automated
vulnerability analysis tools.
As the number of households and businesses
owning personal computers continues to climb, data
and software security are becoming growing concerns.
According to the National Vulnerability Database, the
number of reported software vulnerabilities has risen
from 25 in 1995 to nearly 5000 in 2005 [13]. As a
result, there has been a great deal of commercial and
academic interest in developing automated software
security tools.
Vulnerability analysis
involves discovering a subset
of the input space with which a malicious user can
exploit logic errors in an application to drive it into an
insecure state. As software becomes larger and more
complex, exploring a commercial application’s entire
state space for exploitable vulnerabilities becomes an
intractable problem. To reduce the scope of
exploration, security researchers have developed a
number of testing techniques.
White box testing, also
known as
glass box
analysis, typically
involves detailed, manual analysis of either program
source code or a static
disassembly. It is based upon
This research was supported by NSF Grant CNS-0627318 and Intel
research funds.
the assumption that the tester has internal knowledge
of the system during the test case generation process.
In contrast, the black box or
methodology views a program as a “black box”. It
does not rely upon either source code or disassembly.
Rather, it is based upon injecting random or semi-
random external input into a program and then
monitoring its output for unexpected behavior. This
process is also sometimes referred to as
fuzz testing
fault injection
[7]. Time and cost are motivating
factors in application security. Black box
become popular in recent years because they provide a
favorable cost / benefit ratio due to their simplicity and
potential for automation.
What black box fuzzers lack, however, is input
selection based upon guided feedback concerning
progress within the program logic being tested. In
practice, security researchers frequently encounter
situations where they have analyzed and located a
potentially exploitable location in a program which is
dependent upon some user controlled input. An
example might be a packet received over a network
connection by an application that is subsequently sent
into some API function known to be vulnerable to
buffer overflows.
, however, also implies
. That is, in order to determine if the
vulnerability is an exploitable threat, one must prove
that it is reachable on the execution path given some
user supplied input. The exact format of this input is
dependent upon the control flow logic on the path
between the packet acceptance and the basic block
where the vulnerable API function is used. Figure 1
provides a graphical illustration of this idea.
Ideally, a fuzzer should have some basic
“intelligence” that allows it to preferentially drive
execution through any input dependent parse logic to a
suspected vulnerable location. We feel that a genetic
algorithm is well suited to this task. This was
motivated by several observations:

The runtime execution trace of a program is
dependent upon both its user supplied input and the
static structural characteristics of its control flow
graph. Therefore, if a given input makes it closer to a
region of the control flow graph that we wish to
Figure 1. An idealized diagram of the input crafting
problem (i.e. what input will cause the program to
exercise the control flow logic on the path from the
recv function to a potentially vulnerable strcpy()?)
explore than some other input, it may have some
desirable characteristics which make it capable of
satisfying some (if not all) of the logical constraints
on that path. Thus some “invalid” inputs may be
better than other “invalid” inputs.

If we somehow combine or “breed” the best of the
invalid inputs we’ve found in the past, we may select
for those characteristics in the future which
maximize the number of constraints we are able to
satisfy on the execution path and thus increase our
overall exploration of the program state space. This
is the “survival of the fittest” axiom.
The essential contribution of this paper is threefold.
First, we extend traditional "black box" fuzz testing
using a genetic algorithm based upon a Dynamic
Markov Model fitness heuristic. This heuristic allows
us to "intelligently" guide input selection using a
genetic algorithm based upon dynamic feedback
concerning the "success" of past inputs that have been
Second, we build upon the idea of using a partially
specified input structure. The input structure is
specified via a context-free grammar and "evolved"
using grammatical evolution [16].
Lastly, we focus upon implementing a practical,
prototype using existing tools and technologies. We
used the open source PAIMEI reverse engineering
framework to construct a source code independent
testing tool [14]. Thus, our prototype can be easily
extended and implemented by others. Our intelligent
fuzz testing tool provides focused code coverage and
targeted execution control by driving program
execution to selective regions of interest in the code
(which are suspected to contain vulnerabilities or
bugs). Unlike many existing tools, our implementation
doesn’t require source code. Thus our prototype can
be readily used to analyze commodity software.
Finally, a grammar frees us from being bound to a
particular protocol. If the user wishes to fuzz a new
protocol, all he or she has to do is replace the grammar
The paper organization is summarized as follows:
Section 2 discusses our methodology with
implementation specifics covered in Section 3.
Experimental evaluations are discussed in section 4.
Section 5 follows with an analysis of the limitations of
our technique. Related work is discussed in section 6.
Finally, in section 7 we conclude with a few ideas for
future work.
2. Methodology
In this section, we describe an intelligent black box
fuzz testing methodology capable of crafting inputs
which force an application to execute specific
dependent portions of its control flow graph (see
Figure 1). The ability to selectively drive exploration
of an application’s state space is useful for a fuzz
testing tool when the vulnerability analyst wishes to
focus testing on a specific portion of a program’s
control flow graph or “drill down” to a specific node
suspected of containing a vulnerability. Clearly a
binary response indicating whether a given input
reaches the destination state or not is insufficient. We
require a smoothed projection of the search space,
where some inputs are more nearly correct than others.
We use the control flow graph of an application to
create such a fuzzy search space.
2.1. Modeling Dynamic Control Flow as a
Markov Process
A control flow graph for an executable program is a
directed graph with nodes corresponding to blocks of
sequential instructions and edges corresponding to
non-sequential instructions that join basic blocks (i.e.
conditional branch instructions.) A control flow graph
for a binary executable can be obtained using a
disassembly engine, such as IDA Pro [9].
If we treat the transition behavior of an arbitrary
input at a particular basic block in the control flow
graph as an estimated parameter, we can develop a
probabilistic model called an absorbing Markov
process for input behavior from the control flow graph.
A Markov process is a type of stochastic process in
which the outcome of a given trial depends only on the
current state of the process [15]. A system consisting
of a series of Markov events is called a Markov chain.
If certain outcomes in the Markov chain loop directly
back to any state with absolute certainty, it is called an
absorbing Markov chain [15]. The edge probabilities
for our Markov process correspond to the probability a
random input will take a given state transition in the
control flow graph. As we are only interested in
estimating the probabilities along paths which lead to a
desired state, we treat all transitions leading off of the
subgraph consisting of these paths as transitioning to a
common absorbing state, which we call the
. Any program behavior made after reaching the
target state can also be ignored, so we treat this node in
the control flow graph as an absorbing state as well and
call it the
acceptance state
. Figure 2 illustrates this
Rather than determining the absolute probability
values for these transitions, we try to estimate them by
considering each input tested as a biased statistical
sample of this Markov model. The solution space is
therefore simply the probability of an input taking a
particular execution path in the sampled Markov
process. The estimated probability of following a
given execution path is therefore:
is the total number of inputs that have taken
edge transition
, and
is the total number of inputs
which have reached the block containing the
conditional statement for edge transition
2.2. Searching the Input Space with a Genetic
Evolutionary computation is a field of machine
learning which attempts to mimic the process of
evolution to derive solutions for a certain task [3].
Genetic algorithms are evolutionary algorithms that
function as stochastic global optimizers. A
a genetic algorithm represents one potential solution
(e.g. for a problem requiring a string input, the string
would be a genome). Genetic algorithms operate
iteratively on populations of genomes. Each iteration
is called a generation. In each generation, all of the
genomes in the population are evaluated by a
. The fitness function measures their
suitability within their environment. The genomes
which score the highest fitness are then selected for
crossover. Crossover can be likened to biological
breeding. It is a mechanism that allows useful
genomes to be combined to produce newer and
hopefully more fit genomes. Less fit genomes are
simply discarded. This mimics the process of natural
selection. Finally, to widely explore the solution
space, certain genomes in the population are mutated.
Mutation makes a random change within a genome
(e.g. bit flip, swapping of bytes, ect).
The genomes in our genetic algorithm
implementation correspond to individual inputs for the
application. We build each input string from a variable
length integer genome using a user-specified context-
free grammar and grammatical evolution [2]. The
fitness of each genome is the inverse of its execution
path’s estimated probability derived from the estimated
Markov process probabilities on the control flow

is the length of the execution path for input
on the control flow graph, and
corresponds to the
estimated probability of taking edge
in the control
flow graph. Because our fitness function depends on
the behavior of all of the other genomes in a
generation, it is necessary to compute the fitness value
for a particular generation of genomes, we first find the
execution path for all members of the population and
update the tran
sition probabilities of the Markov chain.
We then find the probability of each genome’s
execution path using the updated dynamic Markov
Because the genetic algorithm is attempting to
create strings that will follow the least probable
Figure 2. Markov probabilities associated with
state transitions on a control flow graph. The
white and black squares represent nodes on a
control flow path from a given source node (A) to a
destination node (M). The grey squares represent
reject nodes (nodes from which it is no longer
possible to reach the destination node (M) ). The
black squares represent the path taken by an
arbitrary input through the control flow logic. This
path consists of node transitions





M. We can calculate the fitness
of this input by multiplying the edge transition
Fitness = 1 / (.75
.8) = 5.525
execution paths, it will bias our sampling. While this
may at first seem to be a disadvantage, it is actually an
advantage. The resulting fitness function calculation is
actually the probability the genetic algorithm has
produced a string which followed a given execution
path. This will reward those genomes in the
population that represent input that that takes
previously unexplored execution paths as well as rare
(i.e. difficult) execution paths.
2.3. Grammatical Evolution
Grammatical evolution is a special type of genetic
algorithm that can evolve strings in an arbitrary
context-free language [2]. Rather than directly
encoding the resulting string in its genome, the genome
encodes production rules to produce a string from a
specified context-free grammar. Each genome is a
variable length integer string. The algorithm used to
produce a string from the grammar using a genome is
summarized in Figure 3. Figure 4 illustrates the
construction of a genome from a context-free grammar
via grammatical evolution.
Because the input space for the program represents
the search space for our program, its combinatorial
nature makes blindly generating variable length binary
input in a linear genome (as is usual in genetic
algorithms) inefficient. Rather than using the usual
representation, our genomes represent instructions for
building the input from a user-specified context-free
grammar, following the Grammatical Evolution
paradigm. This not only gives the user the ability to
narrow the search space based on knowledge of their
specific application, it also gives the genetic algorithm
flexibility to represent input with similar structural
characteristics as being more nearly adjacent in the
search space (e.g. Creating matching HTML tags in a
length linear genome would require four insertion
mutations for each bracket character and one mutation
for the slash character that all happen to occur in the
right position, but with the correct grammar,
grammatical evolution would require just one insertion
mutation that added the tag generation production rule
to the genome).
3. Implementation
In order to implement our approach as a practical
tool, we needed to address several requirements:
Disassembly And Control Flow Graph Extraction:
The control flow graph used in our methodology is
actually a subgraph of the overall program control flow
graph. It consists of all basic blocks on a path between
an input block and a desired acceptance (i.e.
destination) block. All edges leading to blocks off this
subgraph are assigned to a special rejection set.
Custom Debugger
We use a debugger for lightweight
basic block level binary instrumentation. This is
necessary for us to track the runtime execution path
and gather the statistics necessary for the genetic
algorithm to rate the fitness of individual inputs.
Specifically, we set breakpoints in the test application
on the entry points of all nodes in the control flow
subgraph and rejection set. We then run the test
application successively on a randomly supplied
population of inputs. In the breakpoint handler, we
track the execution path up to the point where the
execution path reaches a rejection node (i.e. the
destination is no longer reachable along all subsequent
execution paths) or terminates in success. At this
point, we calculate the fitness for the input
probabilistically using the previously discussed
Markov Chain heuristic. The fittest individuals are
Figure 3. Pseudocode for grammatical evolution.
while ( nonterminals in the string )
find first nonterminal
numRules = number of production
rules for the
i = next integer in the genome %
apply production rule i
0 1 2
| m
| d | e
1 0 0
1 1
Figure 4. Construction of a genome from a contex
free grammar. Construction begins with the initial
rule S. The application of production rule S[1]
results in the new string xBx. The nonterminal B is
then replaced by B[0] to produce the string xaAax.
Application of production rules continues in the
order of A[0], B[1], C[1] until there are no
nonterminals remaining in the string. The final
genome consists of the sequence of rule
applications {S, B[0], A[0], B[1], C[1]}. Thus, the
genome is a sequence of rule applications that forms
a “set of instructions” for how an input string should
be built.