Automated Test Suite Augmentation

The goal of this project is to automate software engineering, starting with generating test suites with complete code coverage. The plan is to implement these ideas for Apple's Swift programming language with the goal of producing a commercially viable product.

Background: Microsoft DeepCoder

In order to get up to speed with state-of-the-art in the field, we strongly recommend reading about Microsoft's DeepCoder. We have expertise in both machine learning and machine reasoning here to help, so the most important thing for you to focus on is how these methods are combined to achieve better results than the raw methods.

Automated Test Suite Augmentation

Very concretely, consider the task of augmenting a test suite to increase code coverage. Assume you have a simple function and one unit test, surely it shouldn't be too hard to augment this to reach complete coverage:

Function under test Given Test Case Generated Test Case
int min(int x, int y) {
  if (x < y) return x;
  else return y;
void myTest() {
   int result = min(10, 20);
   assert(result == 10);
void newTest() {
   int result = min(50, 5);
   assert(result == 5);

The system takes the given test suite and measures current code coverage. In this example, it should note that the else-branch of the function is not exercised. It should then apply machine reasoning techniques, such as the symbolic execution engine KLEE, to generate a new test vector that will reach that point. The inferred test vector is then used to adapt existing test cases to cover that line of code: the assertion is simply generated by running the current implementation.

Reinforcement learning

The aim of this thesis project is to study if reinforcement learning could be used to improve code coverage.

The problem setup is as follows: Input to the agent (the “state”) is plain text source of a function. Outputs of the agent (the “actions”) are function calls with different parameters. Reward is the number of lines covered by the function calls. This could also be expressed as a fraction of total lines.

The agent can use recurrent neural network to parse the function source, using either word-level or character-level embeddings. The output could be generated with decoder recurrent neural network that produces the code with function calls. To simplify the task, we could also assume that all parameters to the function are numerical and the recurrent network is used to generate just bunch of numbers, just enough to fill parameters of fixed number (say 10) function calls. Or the network outputs specific terminal symbol when it thinks it has generated enough function calls.

For initial study using the actual code is probably too complicated. It is better start with synthetically generated code as in (Zaremba and Sutskever 2014). This code could allow following simplifications:

  • No nested function calls.
  • Limited number of programming primitives, initially only if and for.
  • Limited number of comparison operators, initially only =, < and >.
  • All inputs to the function are numerical.

The student has to be familiar with neural networks for text processing and policy gradient method for reinforcement learning. PyTorch has built-in support for policy gradients, so it might be easiest to start with.

Exploring Test Automation for Swift

The commercial potential for this project is greatest if we can reach a practical implementation for iOS developers. Since Swift relies on the LLVM toolchain, we need to develop competence in the Swift compiler architecture. We are particularly interested in questions related to implementation of the test case generation techniques. Can tools that work on LLVM, such as KLEE, be applied to generate test cases for Swift applications. For B.Sc. students, simply starting out and writing a transformation phase in LLVM and describing this in Estonian will suffice as a starting point.


  • Tambet Matiisen,
  • Vesal Vojdani,
  • Triin Kask,
  • Kristian Sägi,