Increasing Reliability through Automated Enforcement of Design Rules
Advanced static-analysis tools are good at finding serious programming errors. They can be customized to find violations of domain or project-specific rules faster and more efficiently, but also to enforce compliance with coding guidelines and quality standards.
PAUL ANDERSON, PH.D., GRAMMATECH
Page 1 of 1
In the ongoing effort to improve software reliability, advanced static-analysis tools have proven to be effective at finding serious programming errors. These tools use sophisticated symbolic execution techniques to detect bugs such as buffer overruns, null pointer dereferences, race conditions and many others. They excel at finding generic flaws—places where the fundamental rules of the language are violated, such as division by zero errors. They are also good at locating defects that arise because a common library is being used incorrectly, such as a memory or other resource leak. These tools can also be extended to find domain-specific defects. There are many benefits to customizing these tools—improving reliability by finding defects unique to the code base is the main advantage, but it can also reduce the cost of manual code inspections, and help reduce compliance costs by checking adherence to coding standards.
Almost all projects have rules about how code should be written, even if they are not explicitly stated. These rules may cover superficial aspects of the code, such as layout, identifier naming and syntactic restrictions (e.g., no expressions nested more than four levels deep). They may also address semantic properties of the code such as the order in which it is legal to perform certain operations, or how to handle the detection and logging of run-time errors. Although there are lightweight tools (such as lint) for detecting violations of the superficial rules, only the advanced static-analysis tools can effectively find violations of the complex semantics-oriented rules.
Often, a domain-specific rule is just another instantiation of a rule already checked by a static-analysis tool. Typically, tools can be easily configured to detect violations of these. For example, leaks of project-specific resources can be detected by configuring the tool to recognize the functions that allocate the resource and those that release it. However, a project-specific design rule can often be sufficiently different that a checker for it can only be written by programming a special-purpose extension to the tool. Whether it is possible to do so depends on the nature of the rule itself, and the degree of extensibility that the tool provides. Before explaining how this can be done, it is worthwhile to briefly describe how advanced static-analysis tools work.
Advanced Static-Analysis Tools
Static-analysis tools work very much like compilers. They take source code as input, which they then parse and convert to an intermediate representation (IR). The IR typically comprises the program’s abstract syntax tree (AST), the symbol table, the control-flow graph (CFG) and the call graph. Figure 1 shows a block diagram of the process. A compiler uses these to generate object code and then discards them. However, static-analysis tools retain the IR, and checkers are usually implemented by traversing or querying the IR looking for particular properties or patterns that indicate defects. A checker looking for a simple syntactic property (e.g., goto statements) would search the abstract syntax tree for constructs that match that pattern. The advanced tools get their power from sophisticated symbolic execution techniques that explore paths through the control-flow graph. These algorithms keep track of the abstract state of the program and know how to use that state to exclude consideration of infeasible paths. This level of complexity is required in order to find the serious bugs while keeping the level of false positives low.
The architecture of an advanced static-analysis tool.
No two tools provide exactly the same interface or techniques for implementing custom checkers, but two mechanisms are common. The first allows users to essentially annotate their code. The second is an API that gives access to some aspects of the IR, typically using a visitor pattern. We explain the first mechanism using examples written for CodeSonar. Other tools have analogous mechanisms.
Suppose, for example, that there is an internal function named foo that takes a single integer parameter, and that it is dangerous for that parameter to have the value -1. A check for this case could be implemented by adding some code to the body of foo as shown in Code Block 1.
Code Block 1
The #ifdef construct ensures that this new code is not seen by the regular compiler. The call to csonar_trigger is never actually executed, but is instead analyzed by the tool. If the analysis considers that the trigger condition may be satisifed, then it will issue a warning with the given message.
Of course in most cases it is not appropriate to require that programmers interleave these annotations with the code, so there is an alternative way to implement this kind of check that allows it to be written in a separate file, which avoids the need for the code to be edited. This approach would also be useful when the source code for foo is not available, such as when it is in a third-party library. To do this, the author of the checker would write a replacement function as shown in Code Block 2.
Code Block 2
When the analysis sees the definition of csonar_replace_foo, it treats all calls in the code to foo (except the one inside csonar_replace_foo) as if they were calls to csonar_replace_foo instead.
This trigger idiom is good for checking temporal properties, particularly sequences of function calls. Suppose there is a rule that says that bar should never be called while foo is executing. A check might be implemented as shown in Code Block 3.
Code Block 3
Note that a global state variable is used to record whether or not foo is active. Before entry to foo, it is set to one and then reset to zero after foo returns. This variable is then checked in the trigger in bar, and if set to one, a warning will be issued.
The example in code block 3 shows how a global property might be checked. The same mechanism can be used to write checks for properties of objects. For example, it is illegal to read from a file object after it has been closed. In such a case the state being checked must be associated with the file object. The extension API allows users to specify attributes that can be attached to objects. These can be thought of as state variables that can be associated with objects. An attribute can be used to encode whether a file object is open or closed, and the checker for a reader function can test the value of that attribute and issue a warning if the file is in fact closed.
This approach allows users to author static checks almost as if they were writing dynamic checks. The API for this kind of check is small, and the language is regular C, so there is a shallow learning curve. This simplicity is deceptive because the technique can be used to implement fairly sophisticated checks. In CodeSonar, many of the out-of-the-box checks are implemented this way.
The second way to implement a custom check is to use the API that gives access to the underlying IR through a visitor pattern. A visitor is a function that is invoked on every IR element of the appropriate type; there are different visitor types for different IR constructs. When they are present, visitors piggyback on the various traversals carried out by the analysis. Visitors can be defined for files, identifiers, subprograms and nodes in the control-flow graph (which correspond roughly to program statements), and the syntax tree. The interface to a visitor allows for pattern matching against the construct. This way a checker author can easily search for constructs without having to know exactly how they are represented.
For example, suppose there is a rule that no variable is allowed to contain uppercase characters. The checker for this would be best implemented by writing a function that takes an identifier as input, checks that it represents a variable, scans it for uppercase letters, and issues a warning if one is found. This function would be registered as a visitor for the table of identifiers.
CodeSonar has an additional kind of visitor, which is invoked during the path exploration of the control-flow graph. At each step along the path, the check can query the abstract state of the program, so the implementation can ask questions such as “what is the value of this variable at this point?” This allows the checker author to write sophisticated checks that leverage the built-in program analysis capabilities of the tool.