Safer embedded code with Goblint


Goblint is developed within the ASSUME EU project.

Gobint is a static analyzer that we develop together with the Technical University of Munich. The main use for the tool is to detect bugs in concurrent C code. Such bugs that are notoriously hard to detect by testing and are known to cause major safety hazards. Goblint was previously developed as part of the MBAT project and there is now continued funding within the ASSUME project, showing the large industrial interest in this line of research. The main goal of our work is to make the code that's embedded in your car and other safety-critical systems more safe.

Although Goblint is already the best static race detection tool around, it's still somewhat hard to use for end users. During this term, we will be pushing towards a useful analyzer, exploring differential verification, counter-example generation and applying statistical methods to rank warnings. We will be working on this actively and guide your hand every step of the way. There are many things you can do to help! Goblint is written in OCaml, but not all tasks require OCaml programming; however, if you learn to develop the analyzer, the Munich team have well-funded Ph.D. positions to offer.

There are some concrete topics that we are still interested in, but mainly we would rather you come and discuss with us and we'll find a tailor-made topic for you.

1. Visualisation and ranking of analysis results (Java)

Goblint produces warnings about data races as a textual dump of all the locations that may partake in a data race. There is a lot that could be improved about this. While Goblint is written in OCaml, there is a separate Java program that renders the results as HTML. The focus there is to display the analysis results for each program point, rather than attempting to give a human-readable view of the data race warnings. You could extend the existing tool or write something from scratch to do the warning visualization, demonstrating the paths that can lead to a data race. It would also make sense to apply statistical and machine learning methods to rank and prioritize warnings.

2. Differential race detection for Linux Device Drivers (C/OCaml)

When we run the goblint analyzer on linux device drivers, there are a bit too many false positives, not because the analysis is too imprecise, but because there are plenty of implicit assumptions on how the functions exported by the device drivers can be used. Francesco Logozzo, who now spearheads the static analysis efforts at Facebook, has suggested a solution for this: verification modulo versions. We would like to see if we can extend this idea to multithreaded programs. There is some theoretical developments needed here, but the end result should be of great practical use: we want to validate the approach by checking the bleeding edge development version of Linux device drivers against the stable version and see if any bugs have been introduced.

Depending on your interest, you can do more of the experimental work, implementation work, or theoretical work. The experimental validation requires reading Linux code and figuring out if the analysis output makes sense or not. The Linux kernel is written in low-level C, but uses many custom macros, so this mainly means that you have to learn how the Linux kernel works. Implementation work for the analyzer requires fairly good functional programming skills, as the analyzer is written in OCaml. Either way, I expect that you are also interested in the theoretical developments and are not afraid to read some of the papers and discuss the theory with us, even if you do not understand all the details immediately.

3. Witness generation (OCaml/Java)

Goblint is based on data flow analysis, which over-approximates all possible behaviours of the program. It is, therefore, hard to distinguish warnings generated by the analyzer from genuine bugs. Making it worse, as we have no guarantee that Goblint itself is free from bugs, there is also no guarantee that a program "verified" by Goblint is actually free from bugs. We would like to get out of this situation by generating witnesses that attest to the existence or absence of bugs. In the case of generating witnesses of bugs, we would like to use the format used by the model checking community: Error-Witness Automata.

This topic can be fairly advanced, but as a minimum, which only requires Java programming, you would get into the CPAChecker tool and write an overview of how counter-example generation works. The idea then is to figure out ways to generate these counter-examples from an analysis of a multi-threaded program, but we would start with very simple test cases. The implementation for this would be in OCaml, but the more substantial part is also reading papers, so that you can also be involved in solving the more interesting theoretical challenge of how to actually do this conceptually.

4. Nextify the analysis platform with LLVM (C++)

We have developed Goblint for a long time now, and while it is still a viable platform for research into static analysis, the somewhat exotic toolchain makes it hard to maintain. We are looking to port these ideas to a more mainstream platform. We would, therefore, be very happy to supervise students getting into and evaluating some existing program analysis frameworks, specifically LLVM.

One concrete task would be to implement concurrency analysis for LLVM. You should familiarize yourself with the LLVM compiler infrastructure and its clang frontend. The final goal would be to write a data race analysis tool for the LLVM platform, but that will take a lot of time. The initial task would be to implement our solver for side-effecting constraint systems within LLVM. For a B.Sc. thesis, though, it would suffice to just implement a textbook analysis and write an overview of LLVM.

More topics...