Traversing the Clang AST using Python

Traversing the Clang AST using Python

lockijazz
lockijazz
Apr 30, 2017, 3:51 PM |
0

What is the relationship between llvm/clang? I've been wondering this for quite a while.

Here is an excellent answer: Source

In the LLVM the compilation takes three stages (image from the AOSA book):

LLVMCompiler1.png

The stages are:

  • The frontend, parsing original language, producing an AST, and spitting out LLVM Intermediate Representation (IR) code1.
  • The optimizer, which optimizes the IR code. This stage does all the usual optimizations like constant propagation, dead code removal and so on.
  • The backend, taking IR and producing machine code optimized for a specific architecture (x86, ARM, etc).

 

Clang is simply one of many front-ends that has been fitted onto LLVM. Clang's front-end hooked up to the LLVM optimizer and back-end makes up the Clang compiler.

 

Part 2: Traversing Clang's AST using Python

In the process of generating LLVM IR code, the Clang front-end produces an AST (Abstract Syntax Tree).

while b ≠ 0
if a > b
a := a − b
else
b := b − a
return a

The AST for the above pseudocode would look something like this:

 

AST Tree diagram

 

In order to access the clang AST, your python script must be provided with an interface that allows it to use Clang AST library functions. Unfortunately, libclang is a C interface to the Clang front-end. Fortunately, there exist bindings for Python that allow you to interface with the Clang front-end. Link with more information: libclang (will open in a new window)

 

Every node in the clang AST is a Cursor object.

 

Eli Bendersky has written up the only useful article on this subject that I can find. Link: Parsing C++ in Python with Clang (will open in new window)

I would like to add on several things that I had to figure out by myself.

1. You need to set your PYTHONPATH environment variable to the folder containing the folder clang, which contains the file cindex.py. For me, the path is ~/Desktop/llvm/tools/clang/bindings/python.

To set your PYTHONPATH, type the following in terminal:

PYTHONPATH=~/Desktop/llvm/tools/clang/bindings/python

To append to your PYTHONPATH variable, type:

export PYTHONPATH=~/Desktop/llvm/tools/clang/bindings/python

To see what the value of PYTHONPATH is, run

echo $PYTHONPATH

Of course, change my path to your path.

2. You need to tell clang.cindex where the libclang dynamic library is. In OSX, the dynamic library is called libclang.dylib. In Windows, it is called libclang.dll, and in Ubuntu and other operating systems it is called libclang.so. 
To do this, your first line of code, after imports, should be clang.cindex.Config.set_library_path("/Users/tomgong/Desktop/build/lib").

Replace my path with yours.

(Unfortunately my friend has had trouble setting up libclang python bindings on Ubuntu 16.04, so I don't know if this will work for Ubuntu users. I know it works on OSX).

3. All documentation, as of May 2017, is in the cindex.py file.

 

Below I have attached a sample python script. I hope it helps. I often find source code to be extremely useful.

Some things to know:

1. You must pass in the C or C++ file as the first argument when running the program

2. It will print out an enormous number of nodes if you have any headers. The file's AST includes all dependencies, apparently.

 

cindex.py (cindex.py) (will open in a new window)

sample.py:

 

#!/usr/bin/env python
""" Usage: call with <filename>
"""

import sys
import clang.cindex

function_calls = []             # List of AST node objects that are function calls
function_declarations = []      # List of AST node objects that are fucntion declarations

# Traverse the AST tree
def traverse(node):

    # Recurse for children of this node
    for child in node.get_children():
        traverse(child)

    # Add the node to function_calls
    if node.type == clang.cindex.CursorKind.CALL_EXPR:
        function_calls.append(node)

    # Add the node to function_declarations
    if node.type == clang.cindex.CursorKind.FUNCTION_DECL:
        function_declarations.append(node)

    # Print out information about the node
    print 'Found %s [line=%s, col=%s]' % (node.displayname, node.location.line, node.location.column)

# Tell clang.cindex where libclang.dylib is
clang.cindex.Config.set_library_path("/Users/tomgong/Desktop/build/lib")
index = clang.cindex.Index.create()

# Generate AST from filepath passed in the command line
tu = index.parse(sys.argv[1])

root = tu.cursor        # Get the root of the AST
traverse(root)

# Print the contents of function_calls and function_declarations
print(function_calls)
print(function_declarations)