Get in touch

Get in touch with Binary

Binary Studio website uses cookies to enhance your browsing experience. By continuing to use our site, you agree to our Privacy Policy and use of cookies.

Learn more more arrow
Sergey Dmitrouk C++ Developer 08.04.2014

Clang by example: detecting postfix operators in for loops




Probably the most well known component (tool) of LLVM is Clang, which provides not only compiles languages of C-family, but also exposes its internal data structures for dealing with program being compiled and provides various means to deal with them. Later Clang has been updated with specialized facilities to process source files and even to modify them.

As it's now, Clang can be used to implement tools for processing code written in C/C++/Objective-C/Objective-C++ relatively easily. With it anyone can write tools for code processing with quite little amount of effort. The hard part is to start writing such tools as it's easy to get lost in lots of headers and libraries supplied with LLVM and Clang. Hence the main idea behind these series of articles: give one enough knowledge to be able to use Clang for writing tools. To achieve the goal, we'll look at examples of building Clang-tools from scratch. There will also be referenced materials that are recommended to read/watch/consult as using Clang and LLVM is already documented in many sources. Although there are plenty of resources on Clang, most of them have two issues:

· there are more or less out of date (because of quite high pace of development and breaking changes from time to time);

· some of covered examples are too artificial.

That's why we're going to build a small set of tools some of which might not be self contained, but nevertheless should be usable in real life. Process of building such tools will shed light on different parts of Clang, which will help construct a better image of what it is and in what ways it can be used.



Here are some details of my working environment to help solve possible issues with initial setup (see below).

· GNU/Linux operating system

· GCC version 4.8.1

· LLVM from 11.03.2014 built with GCC:

· Clang from 11.03.2014 built with GCC:

· cmake version 2.8.8

· ninja version 1.4.0

· everything is built with GCC


Your first tool

To get started one needs to have a working build of LLVM and Clang. This is not a subject of this writing and there is already a good tutorial on this (read notes below before following instructions in the tutorial):

Tutorial for building tools using LibTooling and LibASTMatchers

It seems to be a bit out of date, so here are some additional moments to consider:

1. Modern versions of cmake are built with ninja support, so good chances that there is no need to recompile it, check it by running:

2. I suggest enabling generation of documentation with Doxygen. For this one needs to set the following ccmakeoptions to YES before starting the build (see a small hint on using ccmakebelow):




This way you'll be sure that you're referring to the correct version of documentation while writing your tools.

You might want to set LLVM_BUILD_DOCS to OFFonce everything is built, it won't delete generated documentation, but will reduce time of compilation when build is started by ninjacommand with no arguments.

3. It's said that all tests should pass. Current version of tests display messages about expected failed tests, that seems to be OK.

4. If you decide to rebuilt Clang with itself don't forget to setup CMAKE_CXX_COMPILER as well as CMAKE_C_COMPILER (use full path to clang executable for it). Actually recompiling with Clang is optional, no need to waste time on that.

5. In step 1 there is a command:

Which should be:

ccmake hint: use arrow keys to navigate in terminal interface, “j”/“k” doesn't work there, but searching with “/” works.

Once you're done with that tutorial, you should feel like knowing a lot of new stuff about it, still not being 100% sure that you able to write your own tool. It's advised to watch Introduction to the Clang AST now. Actually the order can be reversed (watch the video first), but as Manuel Klimek (the guy in the video) describes more advanced topics, it makes sense to do basic tutorial first.

Now, having some background, we can start creating our small, but useful tool.

Detecting postfix operators in for-loops

Our goal is to analyse source files to find postfix increment and decrement operators in the last part of for-loop statements. The drawback of using postfix operator is that it can lead to performance penalty in some cases. As you probably know, postfix operator differs from prefix one in creating one extra object, which normally is not optimized. It's almost free for primitive types and can be free when it goes about iterators from library provided along with a compiler, but in general it's a good practice to use postfix increment/decrement operators only when you have a reason to do so.

There is already a plenty of tools that can detect misuse of postfix operators, but such warnings can get lost in output of mature static analyzers and be just skipped by a developer. This tool we'll do just that so skipping such warnings would be a stupid thing to do. This is just a good programming style to use postfix operators, so let's write a tool that ensures correctness of for-loops and learn more about Clang's AST at the same time.

By the way, we'll name the tool "for-postfix" (not to be confused with mail transfer agent). See "Additional resources" section below for a link to repository containing ready to use tool. History of commits is quite clear, so one can examine commits one by one while reading the article.



Note, I go in detail in this section just to repeat what you have learnt from the tutorial, skip this part if you feel like you don't need to repeat anything.

Just as in "Tutorial for building tools using LibTooling and LibASTMatchers" referenced above, we're going to use LibTooling and ASTMatchers. There is also RecursiveASTVisitor, but it's now superseded by ASTMatchers.

To get started go to llvm/tools/clang/tools/extra/directory and create a subdirectory called for-postfix. Navigate to new directory and create CMakeLists.txt containing:

Another file that needs to be created is for-postfix.cpp. For now let's put a dummy code there:

Add add_subdirectory (for-postfix) line at the end of the llvm/tools/clang/tools/extra/CMakeLists.txt, so that our tool is built with LLVM/Clang.

Now I suggest you to go into your build/directory and run ninja  for-postfix just to make sure the configuration is correct. If it is, you'll find an executable build/bin/for-postfix that does nothing when it's run.

Now let's write a skeleton of the tool (I give this as it is because you should already know basics of this kind of stuff):

This should compile and link fine. By the way, notice unusual const in front of argv parameter declaration, this is needed to satisfy prototype of CommonOptionsParser constructor.


Matching builting types

Finally we're ready to start matching things we're interested in. Our first matcher will match increment operators, both prefix and postfix:

Above piece of code should be self-explaining, but notice where bind method is called. It can be called on objects that correspond to real AST nodes, like for-statements (result of forStmt) and operators (result of calling unaryOperator), but can't be called on something that just examines properties of AST nodes (like hasIncrement or hasOperatorName).

To proceed further we need one more entity, MatchFinder's callback handler:

Also don't forget to include <iostream> before compiling.

You should be familiar with the idea behind this code: it tries to extract unary operator node named op and dumps it on the screen when succeed.

Let's create our helper and use it to add our matcher in the main function:

At this point running our tool over this example file:

Should output both i++ and ++i like this (ignore addresses):

Let's filter out the second match in our helper. Just surround dumping code with:

What we have now is not capable of handling decrement operator. To fix that, simplify the matcher to remove operator name check (note name change):

Note that there was one condition, which is now removed, while actually one can use many conditions separated by commas. We're not removing the check completely, just moving it to the MatchHelper, where op->isPostfix() condition needs to be replaced with:

That's it for builtin operators.


Matching overloaded operators

This example uses postfix increment operator, but our for-postfix is not able to find it yet:

Our matcher could look like this:

This one is too specific and requires its clone to match decrement operator (couldn't find a way to implement match chooser, something like OR operator).

It also has two interesting properties:

· operatorCallExpr got two conditions that are ANDed with each other;

· argumentCountIs is used to match postfix operator, here first argument is this and the second one is useless int to make overloading of postfix operators possible.

Let's throw away matching of operator name and do this in MatchHelper later:

Now we need to update MatchHelper a bit to handle overloaded operators, which it won't match now as CallExpr doesn't subclass UnaryOperator:

Where printOut contains two lines from above that dump operator nodes. As you can see matching specific operators is quite easy. The crucial part in this case is to use right type of node. Regular CallExpr won't let us to check for operator type, CXXOperatorCallExpr is needed instead.

Now let's register new matcher in MatchFinder:

Note that by adding multiple matchers to match finder one can match different things in one traverse through AST.


Formatting output

We're almost done. Current way of printing out nodes is not readable, it would be better to print path to source file and line number of the match. First of all we need to include more headers and add one more using directive:

Then update printOut method:

This snippet outputs match in the following format:

Example output (with truncated paths):

It's alike output of Unix grep command (with -F and -n switches) and can be accepted by many other tools out there, e.g. one can store output of the tool to a file and open it in Vim to navigate through errors and edit them like this:

This tool can already be used in real projects to find misuse of postfix operators.

Additional resources (not referenced above)

· The Design of LLVM by Chris Lattner, the primary author of the LLVM and related projects, such as the Clang compiler.
· Collection of Clang tutorials.
· GitHub repository with for-postfix tool.