SeaDsa: A Points-to Analysis for Verification of Low-level C/C++

SeaDsa is a context-, field-, and array-sensitive unification-based
points-to analysis for LLVM bitcode inspired
by DSA.
SeaDsa is an order of magnitude more scalable and precise than Dsa
and a previous implementation of SeaDsa thanks to improved handling
of context sensitivity, addition of partial flow-sensitivity, and type-awareness.

Although SeaDsa can analyze arbitrary LLVM bitcode, it has been
tailored for use in program verification of C/C++ programs. It can be
used as a stand-alone tool or together with
the SeaHorn
verification framework and its analyses.

This branch supports LLVM 14.

Requirements

SeaDsa is written in C++ and uses the Boost library. The main requirements
are:

C++ compiler supporting c++14
Boost >= 1.65
LLVM 14

To run tests, install the following packages:

sudo pip install lit OutputCheck
sudo easy_install networkx
sudo apt-get install libgraphviz-dev
sudo easy_install pygraphviz

Project Structure

The main Points-To Graph data structures, Graph, Cell, and Node, are
defined in include/Graph.hh and src/Graph.cc.
The Local analysis is in include/Local.hh and src/DsaLocal.cc.
The Bottom-Up analysis is in include/BottomUp.hh and
src/DsaBottomUp.cc.
The Top-Down analysis is in include/TopDown.hh and src/DsaTopDown.cc.
The interprocedural node cloner is in include/Cloner.hh and
src/Clonner.cc.
Type handling code is in include/FieldType.hh, include/TypeUtils.hh,
src/FieldType.cc, and src/TypeUtils.cc.
The allocator function discovery is in include/AllocWrapInfo.hh and
src/AllocWrapInfo.cc.

Compilation and Usage

Program Verification benchmarks

Instructions on running program verification benchmarks, together with recipes
for building real-world projects and our results, can be found in
tea-dsa-extras.

Integration in other C++ projects (for users)

SeaDsa contains two directories: include and src. Since SeaDsa
analyzes LLVM bitcode, LLVM header files and libraries must be
accessible when building with SeaDsa.

If your project uses cmake then you just need to add in your
project’s CMakeLists.txt:

 include_directories(seadsa/include)
 add_subdirectory(seadsa)

Standalone (for developers)

If you already installed llvm-14 on your machine:

mkdir build && cd build
cmake -DCMAKE_INSTALL_PREFIX=run -DLLVM_DIR=__here_llvm-14__/share/llvm/cmake  ..
   cmake --build . --target install

Otherwise:

mkdir build && cd build
cmake -DCMAKE_INSTALL_PREFIX=run ..
cmake --build . --target install

To run tests:

cmake --build . --target test-sea-dsa

Visualizing Memory Graphs and Complete Call Graphs

Consider a C program called tests/c/simple.c:

#include <stdlib.h>
typedef struct S {
  int** x;
  int** y;  
} S;
int g;
int main(int argc, char** argv){
  S s1, s2;
  int* p1 = (int*) malloc(sizeof(int));
  int* q1 = (int*) malloc(sizeof(int));  
  s1.x = &p1;
  s1.y = &q1;    
  *(s1.x) = &g;
  return 0;
}

Generate bitcode:

 clang -O0 -c -emit-llvm -S tests/c/simple.c -o simple.ll

The option -O0 is used to disable clang optimizations. In general,
it is a good idea to enable clang optimizations. However, for trivial
examples like simple.c, clang simplifies too much so nothing useful
would be observed. The options -c -emit-llvm -S generate bitcode in
human-readable format.

Run sea-dsa on the bitcode and print memory graphs to dot) format:

 seadsa -sea-dsa=butd-cs -sea-dsa-type-aware -sea-dsa-dot  simple.ll

The options -sea-dsa=butd-cs -sea-dsa-type-aware enable the analysis
implemented in our FMCAD’19 paper (see References). This command will
generate a FUN.mem.dot file for each function FUN in the bitcode
program. In our case, the only function is main and thus, there is
one file named main.mem.dot. The file is generated in the current
directory. If you want to store the .dot files in a different
directory DIR then add the option -sea-dsa-dot-outdir=DIR

Visualize main.mem.dot by transforming it to a pdf file:

 dot -Tpdf main.mem.dot -o main.mem.pdf
 open main.mem.pdf  // replace with you favorite pdf viewer

Example of a memory graph

In our memory model, a pointer value is represented by a cell
which is a pair of a memory object and offset. Memory objects are
represented as nodes in the memory graph. Edges are between cells.

Each node field represents a cell (i.e., an offset in the node). For
instance, the node fields <0,i32**> and <8,i32**> pointed by %6
and %15, respectively are two different cells from the same memory
object. The field <8,i32**> represents the cell at offset 8 in the
corresponding memory object and its type is i32**. Black edges
represent points-to relationships between cells. They are labeled with
a number that represents the offset in the destination node. Blue
edges connect formal parameters of the function with a cell. Purple
edges connect LLVM pointer variables with cells. Nodes can have
markers such as S (stack allocated memory), H (heap allocate
memory), M (modified memory), R (read memory), E (externally
allocated memory), etc. If a node is red then it means that the
analysis lost field sensitivity for that node. The label {void} is
used to denote that the node has been allocated but it has not been
used by the program.

sea-dsa can also resolve indirect calls. An indirect call is a
call where the callee is not known statically. sea-dsa identifies
all possible callees of an indirect call and generates a LLVM call
graph as output.

Consider this example in tests/c/complete_callgraph_5.c:

struct class_t;
typedef int (*FN_PTR)(struct class_t *, int);
typedef struct class_t {
  FN_PTR m_foo;
  FN_PTR m_bar;
} class_t;
int foo(class_t *self, int x)
{
  if (x > 10) {
    return self->m_bar(self, x + 1);
  } else
    return x;
}
int bar (class_t *self, int y) {
  if (y < 100) {
    return y + self->m_foo(self, 10);
  } else
    return y - 5;
}
int main(void) {
  class_t obj;
  obj.m_foo = &foo;
  obj.m_bar = &bar;
  int res;
  res = obj.m_foo(&obj, 42);
  return 0;
}

Type the commands:

clang -c -emit-llvm -S tests/c/complete_callgraph_5.c  -o ex.ll
sea-dsa --sea-dsa-callgraph-dot ex.ll

It generates a .dot file called callgraph.dot in the current
directory. Again, the .dot file can be converted to a .pdf file
and opened with the commands:

dot -Tpdf callgraph.dot -o callgraph.pdf
open callgraph.pdf

Example of a call graph

sea-dsa can also print some statistics about the call graph
resolution process (note that you need to call clang with -g to
print file,line, and column information):

sea-dsa --sea-dsa-callgraph-stats ex.ll
=== Sea-Dsa CallGraph Statistics === 
** Total number of indirect calls 0
** Total number of resolved indirect calls 3
%16 = call i32 %12(%struct.class_t* %13, i32 %15) at tests/c/complete_callgraph_5.c:14:12
RESOLVED
Callees:
  i32 bar(%struct.class_t*,i32)
%15 = call i32 %13(%struct.class_t* %14, i32 10) at tests/c/complete_callgraph_5.c:23:16
RESOLVED
Callees:
  i32 foo(%struct.class_t*,i32)
%11 = call i32 %10(%struct.class_t* %2, i32 42) at tests/c/complete_callgraph_5.c:36:9
RESOLVED
Callees:
  i32 foo(%struct.class_t*,i32)

Dealing with C/C++ library and external calls

The pointer semantics of external calls can be defined by writing a
wrapper that calls any of these functions defined in
seadsa/seadsa.h:

extern void seadsa_alias(const void *p, ...);
extern void seadsa_collapse(const void *p);
extern void seadsa_mk_seq(const void *p, unsigned sz);

seadsa_alias unifies all argument’s cells, seadsa_collapse tells
sea-dsa to collapse (i.e., loss of field-sensitivity) the cell
pointed by p, and seadsa_mk_seq tells sea-dsa to mark as
sequence the node pointed by p with size sz.

For instance, consider an external call foo defined as follows:

extern void* foo(const void*p1, void *p2, void *p3);

Suppose that the returned pointer should be unified to p2 but not to
p1. In addition, we would like to collapse the cell corresponding to
p3. Then, we can replace the above prototype of foo with the
following definition:

#include "seadsa/seadsa.h"
void* foo(const void*p1, void *p2, void*p3) {
    void* r = seadsa_new();
    seadsa_alias(r,p2);
    seadsa_collapse(p3);
    return r;
}

References

“A Context-Sensitive Memory Model for Verification of C/C++
Programs” by A. Gurfinkel and J. A. Navas. In SAS’17.
(Paper)
| (Slides)
“Unification-based Pointer Analysis without Oversharing” by J. Kuderski, J. A. Navas and A. Gurfinkel. In FMCAD’19.
(Paper)