From stap
Jump to: navigation, search
  • Motivation
The increasing popularity of concurrent programming has brought the issue of concurrent defect analysis to the forefront. Concurrent programs often exhibit wrong behaviors due to unintended interferences among the concurrent threads. Such concurrent failures or bugs - such as data races and atomicity violations - are often difficult to fix without being able to reproduce them. However, in a multi-threaded concurrent program, the number of possible interleavings is huge, and it is not practical to try them all. Only a few of the interleavings or even one specific interleaving actually produce the failure; thus, the probability of reproducing a concurrent failure is extremely low. A traditional method of reproducing concurrent failure is to repeatedly execute the program with the hope that different test executions will project in different interleavings. Unfortunately this approach is proved to be neither efficient nor reproducible in practice. Firstly, execution result of a concurrent program depends on the underlying operating system or the virtual machine for thread scheduling - it does not try to explicitly control the thread schedules; therefore, executions often end up with the same interleaving many times, which means getting an access to the buggy interleaving is always a time-consumingtask. Moreover, many concurrent applications (like servers) often run a long time and serve for specific users. So it would be extremely hard to reproduce such environmental dependent bugs in off-line testing.
The work described in this project aims to reduce the amount of time a developer spends on reproducing a concurrent failure. A key element in designing such an approach is the ability to provide a deterministic thread executing order of a non-deterministic execution instance. In this paper, we propose ConCrash, an automated concurrent failure reproducing technique for Java. ConCrash handles all threads and concurrent constructs in Java, except for windowing events, I/O inputs and network events which are topics of our future work.
The ConCrash approach adapts the concept of logical thread schedule. It monitors each critical event to capture the thread execution order during one execution of a multi-threaded program. When the concurrent program fails, ConCrash saves both information about thread scheduling and current object states in memory and automatically generates an instrumentation scheme and a set of JUnit tests. The instrumentation scheme records the thread schedule information during the failing execution as pure text, and then enforces the exact same schedule when replaying the execution, while the JUnit tests captures the failed method invocation sequences. The ConCrash approach can be used on both client and developer sides. When a concurrent failure occurs, the user could send an instrumentation scheme as well as generated JUnit tests to developers. While developers could use a ConCrash-enabled environment to replay the thread execution order, step through execution, or otherwise investigate the root cause of the failure.
The high cost of reproducing concurrent failures has motivated the development of sophisticated and automated analysis techniques. Of particular interest for our work is the ReCrash approach proposed by Artzi et al. ReCrash monitors every execution of the target (sequential) program, stores partial copies of method arguments, and converts a failing program execution into a set of deterministic unit tests, each of which reproduces the problem that causes the program to fail. The ReCrash approach is designed for sequential programs. However, the non-determinism in a multi-threaded concurrent program might disallow the unit tests generated by ReCrash to reproduce a concurrent failure.
Unlike most of the existing replay techniques, our ConCrash approach does not depend on JVM modification or existing OS-level support for replay. Instead, ConCrash instruments the compiled class files by modifying their bytecode. To reduce the runtime performance overhead, ConCrash also employs a static data race detection technique(Chord) to find potential possible race conditions, and only instruments such places. While starting to reproduce a failure, ConCrash eliminates the program's nondeterminacy caused by JVM scheduler by transforming the compiled nondeterministic multi-threaded program into a deterministic sequential program without changing the semantics.
We implement the ConCrash approach in a prototype tool for Java and experimented on a number of multi-threaded Java benchmarks. We successfully reproduced a number of real concurrent bugs (e.g., deadlocks, data races, atomicity violation) within an acceptable overhead.


Personal tools