README.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74

# DBT Testing

This repository contains initial code for comprehensive testing of binary
translators.

## Requirements

We require at least LLDB version 17 for `fs_base`/`gs_base` register support.

I had to compile LLDB myself; these are the steps I had to take (you also need swig version >= 4):

```
git clone https://github.com/llvm/llvm-project <llvm-path>
cd <llvm-path>
cmake -S llvm -B build -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS="clang;lldb" -DLLDB_ENABLE_PYTHON=TRUE -DLLDB_ENABLE_SWIG=TRUE
cmake --build build/ --parallel $(nproc)

# Add the built LLDB python bindings to your PYTHONPATH:
PYTHONPATH="$PYTHONPATH:$(./build/bin/lldb -P)"
```

It will take a while to compile.

## Snapshot-comparison framework

The following files belong to a rough framework for the snapshot comparison engine:

 - `main.py`: Entry point to the tool. Handling of command line arguments, pre-processing of input
logs, etc.

 - `snapshot.py`: Internal structures used to work with snapshots. Contains the previous
`ContextBlock` class, which has been renamed to `ProgramState` to make its purpose as a snapshot of
the program state clearer.

 - `compare.py`: The central algorithms that work on snapshots.

 - `run.py`: Tools to execute native programs and capture their state via an external debugger.

 - `arancini.py`: Functionality specific to working with arancini. Parsing of arancini's logs into our
snapshot structures.

 - `arch/`: Abstractions over different processor architectures. Will be used to integrate support for
more architectures later. Currently, we only have X86.

## Symbolic execution

The following files belong to a prototype of a data-dependency generator based on symbolic
execution:

 - `symbolic.py`: Algorithms and data structures to compute and manipulate symbolic program
transformations.

 - `gen_trace.py`: An invokable tool that generates an instruction trace for an executable's native
execution. Is imported into `trace_symbols.py`, which uses the core function that records a trace.

 - `trace_symbols.py`: A simple proof of concept for symbolic data-dependency tracking. Takes an
executable as an argument and does the following:

    1. Executes the program natively (starting at `main`) and records a trace of every instruction
executed, stopping when exiting `main`.

    2. Tries to follow this trace of instructions concolically (keeps a concrete program state from
a native execution in parallel to a symbolic program state), recording after each instruction the
changes it has made to the program state before that instruction.

    3. Writes the program state at each instruction to log files; writes the concrete state of the
real execution to 'concrete.log' and the symbolic difference to 'symbolic.log'.

 - `interpreter.py`: Contains an algorithm that evaluates a symbolic expression to a concrete value,
using a reference state as input.

## Helpers

 - `lldb_target.py`: Implements angr's `ConcreteTarget` interface for [LLDB](https://lldb.llvm.org/).