Traversing the Clang AST using Python
What is the relationship between llvm/clang? I've been wondering this for quite a while.
Here is an excellent answer: Source
The stages are:
- The frontend, parsing original language, producing an AST, and spitting out LLVM Intermediate Representation (IR) code1.
- The optimizer, which optimizes the IR code. This stage does all the usual optimizations like constant propagation, dead code removal and so on.
- The backend, taking IR and producing machine code optimized for a specific architecture (x86, ARM, etc).
Clang is simply one of many front-ends that has been fitted onto LLVM. Clang's front-end hooked up to the LLVM optimizer and back-end makes up the Clang compiler.
Part 2: Traversing Clang's AST using Python
In the process of generating LLVM IR code, the Clang front-end produces an AST (Abstract Syntax Tree).
- while b ≠ 0
- if a > b
- a := a − b
- b := b − a
- if a > b
- return a
The AST for the above pseudocode would look something like this:
In order to access the clang AST, your python script must be provided with an interface that allows it to use Clang AST library functions. Unfortunately, libclang is a C interface to the Clang front-end. Fortunately, there exist bindings for Python that allow you to interface with the Clang front-end. Link with more information: libclang (will open in a new window)
Every node in the clang AST is a Cursor object.
Eli Bendersky has written up the only useful article on this subject that I can find. Link: Parsing C++ in Python with Clang (will open in new window)
I would like to add on several things that I had to figure out by myself.
1. You need to set your PYTHONPATH environment variable to the folder containing the folder clang, which contains the file cindex.py. For me, the path is ~/Desktop/llvm/tools/clang/bindings/python.
To set your PYTHONPATH, type the following in terminal:
To append to your PYTHONPATH variable, type:
To see what the value of PYTHONPATH is, run
Of course, change my path to your path.
2. You need to tell clang.cindex where the libclang dynamic library is. In OSX, the dynamic library is called libclang.dylib. In Windows, it is called libclang.dll, and in Ubuntu and other operating systems it is called libclang.so.
To do this, your first line of code, after imports, should be clang.cindex.Config.set_library_path("/Users/tomgong/Desktop/build/lib").
Replace my path with yours.
(Unfortunately my friend has had trouble setting up libclang python bindings on Ubuntu 16.04, so I don't know if this will work for Ubuntu users. I know it works on OSX).
3. All documentation, as of May 2017, is in the cindex.py file.
Below I have attached a sample python script. I hope it helps. I often find source code to be extremely useful.
Some things to know:
1. You must pass in the C or C++ file as the first argument when running the program
2. It will print out an enormous number of nodes if you have any headers. The file's AST includes all dependencies, apparently.
cindex.py (cindex.py) (will open in a new window)
""" Usage: call with <filename>
function_calls =  # List of AST node objects that are function calls
function_declarations =  # List of AST node objects that are fucntion declarations
# Traverse the AST tree
# Recurse for children of this node
for child in node.get_children():
# Add the node to function_calls
if node.type == clang.cindex.CursorKind.CALL_EXPR:
# Add the node to function_declarations
if node.type == clang.cindex.CursorKind.FUNCTION_DECL:
# Print out information about the node
print 'Found %s [line=%s, col=%s]' % (node.displayname, node.location.line, node.location.column)
# Tell clang.cindex where libclang.dylib is
index = clang.cindex.Index.create()
# Generate AST from filepath passed in the command line
tu = index.parse(sys.argv)
root = tu.cursor # Get the root of the AST
# Print the contents of function_calls and function_declarations