Commit b0549b78 authored by Lukáš Marek's avatar Lukáš Marek

Merged changes from stable branch disl_1.0.x -r 599:720

parent 5114d2b9
......@@ -6,7 +6,7 @@
<classpathentry kind="src" path="src-agent-java"/>
<classpathentry kind="src" path="src-test"/>
<classpathentry kind="con" path="org.eclipse.jdt.launching.JRE_CONTAINER"/>
<classpathentry kind="lib" path="lib/asm-debug-all-4.0.jar"/>
<classpathentry kind="lib" path="build/eclipse-dynamicbypass.jar"/>
<classpathentry kind="lib" path="lib/asm-debug-all-4.1.jar"/>
<classpathentry kind="output" path="bin"/>
DiSL is inspired by AOP, but in contrast to mainstream AOP languages, it features
an open join point model where any region of bytecodes can be selected as a join
point (i.e., code location to be instrumented). DiSL reconciles high-level
language constructs resulting in concise instrumentations, high expressiveness,
and efficiency of the inserted instrumentation code. Thanks to the
pointcut/advice model adopted by DiSL, instrumentations are similarly compact as
aspects written in AspectJ. However, in contrast to AspectJ, DiSL does not
restrict the code locations that can be instrumented, and the code generated by
DiSL avoids expensive operations (such as object allocations that are not visible
to the programmer). Furthermore, DiSL supports instrumentations with complete
bytecode coverage out-of-the-box and avoids structural modifications of classes
that would be visible through reflection and could break the instrumented code.
DiSL is a Java bytecode instrumentation framework intended for observation
of programs executing in the Java Virtual Machine. It has been mainly used
for development of dynamic program analysis instrumentations, but it can be
used equally well to develop instrumentations for, e.g. runtime performance
monitoring, or other tasks not bent on altering program execution.
DiSL is inspired by AOP, but in contrast to mainstream AOP languages, it
features an open join point model where any region of bytecodes can serve as
a join point (i.e., code location to be instrumented). DiSL also reconciles
high-level language concepts, such as the pointcut/advice programming model
found in AOP, with high expressiveness, and efficiency of bytecode
manipulation performed using low-level libraries such as ASM. As a result,
instrumentations written using DiSL almost as compact as aspects written in
AspectJ, but perform about as fast as those written using ASM.
However, in contrast to AspectJ, DiSL does not restrict the code locations
that can be instrumented, and the code generated by DiSL avoids expensive
operations (such as object allocations that are not visible to the
programmer). Furthermore, DiSL supports instrumentations with complete
bytecode coverage out-of-the-box and avoids structural modifications of
classes that would be visible through reflection and could break the
instrumented code.
DiSL currently fully supports Linux with installed Java, ant, GCC and make.
DiSL currently fully supports "Linux" and "OS X" platforms with Java, ant, GCC
and make installed and found on the executable path. DiSL has been used on the
Windows/Cygwin platform as well, but it was not extensively tested there.
While most of the DiSL is written in Java, it requires a JVM enhanced with a
native agent written in C, which must be compiled first. For that, the simple
build system needs to know where your JDK is installed to be able to find JNI
header files for your platform. On many systems, the JAVA_HOME environment
variable points to the root of the JDK installation and you should be fine.
If this is not the case, please enter the src-agent-c directory, copy the
Makefile.local.tmpl file to Makefile.local and modify it to set the JAVA_HOME
variable to point to the root of the JDK installation you want to use.
Finally, to compile DiSL, run the "ant" command in the root directory.
You can create javadoc documentation by running "ant javadoc".
To compile DiSL please run the "ant" command in the root directory. If "make" is
complaining about missing java headers, modify the Makefile.local.tmpl
......@@ -31,13 +55,15 @@ EXAMPLES
For the basic instrumentation example, please look in the example directory.
Also the src-test directory contains simple examples of DiSL features.
Please look at
If you get an java error during instrumentation or running your application,
If you get a Java error during instrumentation or running your application,
please look at USERERRORS document describing most common problems.
......@@ -176,6 +176,10 @@
<target name="javadoc" depends="package,eclipse">
<javadoc access="public" author="true" overview="doc/overview.html" classpath="build/eclipse-dynamicbypass.jar:${asm.path}" destdir="doc" nodeprecated="false" nodeprecatedlist="false" noindex="false" nonavbar="false" notree="false" packagenames="ch.usi.dag.disl.guardcontext,ch.usi.dag.disl.staticcontext,ch.usi.dag.disl.dynamiccontext,ch.usi.dag.disl.classcontext,ch.usi.dag.disl.marker,ch.usi.dag.disl.transformer,ch.usi.dag.disl.processorcontext,ch.usi.dag.disl.annotation" source="1.7" sourcefiles="src/ch/usi/dag/disl/scope/,src/ch/usi/dag/disl/scope/,src/ch/usi/dag/disl/,src/ch/usi/dag/disl/snippet/" sourcepath="src-test:src-agent-java:src" splitindex="true" use="true" version="true"/>
<!-- *** test instrumentaion package *** -->
<target name="check-test-property">
\ No newline at end of file
\clubpenalty = 10000
\widowpenalty = 10000
% In-text code style (breakable and unbreakable)
unicode = true,
colorlinks = true,
linkcolor = black,
anchorcolor = black,
citecolor = black,
urlcolor = black,
% Don't use guessable names for links; required for subfloat compatibility.
hypertexnames = false,
\title{Introduction to Instrumentation with DiSL}
DiSL is a domain-specific language for Java bytecode instrumentation.
DiSL is inspired by AOP, but in contrast to mainstream AOP languages, it features an open join point model where any region of bytecodes can be selected as a join point (i.e., code location to be instrumented).
DiSL reconciles high-level language constructs resulting in concise instrumentations, high expressiveness, and efficiency of the inserted instrumentation code.
Thanks to the pointcut/advice model adopted by DiSL, instrumentations are similarly compact as aspects written in AspectJ.
However, in contrast to AspectJ, DiSL does not restrict the code locations that can be instrumented, and the code generated by DiSL avoids expensive operations (such as object allocations that are not visible to the programmer).
Furthermore, DiSL supports instrumentations with complete bytecode coverage out-of-the-box and avoids structural modifications of classes that would be visible through reflection and could break the instrumented code.
\section{DiSL by Example}\label{sec:DiSL}
A common example of a dynamic program analysis tool is a method execution time profiler, which usually instruments the method entry and exit join~points and introduces storage for timestamps.
We describe the main features of DiSL by gradually developing the instrumentation for such a profiler.
The same instrumentation is also available on the DiSL home page\footnote{\url{}} among the examples.
For step-by-step instructions on how to run the examples, please refer to Appendix~\ref{sec:Setup}.
\subsection{Method Execution Time Profiler}
% The structure of each step should be roughly:
% 1. What profiler features we develop now.
% 2. What DiSL features are used for that.
% 3. Code explanation.
In the first version of our execution time profiler, we simply print the entry and exit times for each method execution as it happens.
For that, we need to insert instrumentation at the method entry and method exit join~points.
Each DiSL instrumentation is defined through methods declared in standard Java classes.
Each method---called {\em snippet} in DiSL terminology---is annotated so as to specify the join~points where the code of the snippet shall be inlined.\footnote{The name of the method is not constrained and can be arbitrarily chosen by the programmer.}
The profiler instrumentation code on Figure~\ref{fig:instr-prof-simple-prof} uses two such snippets, the first one prints the entry time, the second one the exit time.
public class SimpleProfiler {
static void onMethodEntry() {
System.out.println("Method entry " + System.nanoTime());
static void onMethodExit() {
System.out.println("Method exit " + System.nanoTime());
\caption{Instrumenting method entry and exit}
The code uses two annotations to direct inlining.
The \code{@Before} annotation requests the snippet to be inlined before each marked bytecode region (representing a join~point); the use of the \code{@After} annotation places the second snippet after (both normal and abnormal) exit of each marked region.
The regions themselves are specified with the \code{marker} parameter of the annotation.
In our example, \code{BodyMarker} marks the whole method (or constructor) body.
The resulting instrumentation thus prints a timestamp upon method entry and exit.
Instead of printing the entry and exit times, we may want to print the elapsed wall-clock time from the method entry to the method exit.
The elapsed time can be computed in the after snippet, but to perform the computation, the timestamp of method entry has to be passed from the before snippet to the after snippet.
In traditional AOP languages, which do not support efficient data exchange between advices, this situation would be handled using a local variable within the around advice.
In contrast, an instrumentation framework such as DiSL has no need for the usual form of the around advice, which lets the advice code decide whether to skip or proceed with the method invocation.
DiSL therefore only supports inlining snippets before and after a particular join~point, together with a way for the snippets inlined into the same method to exchange data using \emph{synthetic local variables}, as illustrated on Figure~\ref{fig:instr-prof-time}.
public class SimpleProfiler {
static long entryTime;
static void onMethodEntry() {
entryTime = System.nanoTime();
static void onMethodExit() {
System.out.println("Method duration " + (System.nanoTime() - entryTime));
\caption{Passing data between snippets using a synthetic local variable}
Synthetic local variables are static fields annotated as \code{@SyntheticLocal}.
The variables have the scope of a method invocation and can be accessed by all snippets that are inlined in the method; that is, they become local variables.
Synthetic local variables are initialized to the default value of their declared type (e.g., \code{0}, \code{false}, \code{null}).
Next, we extend the output of our profiler to include the name of each profiled method.
In DiSL, the information about the instrumented class, method, and bytecode region can be obtained through dedicated \emph{static context interfaces}.
In this case, we are interested in the \code{MethodStaticContext} interface, which provides the method name, signature, modifiers and other static data about the intercepted method and its enclosing class.
Figure~\ref{fig:instr-prof-name} refines the after snippet of Figure~\ref{fig:instr-prof-time} to access the fully qualified name of the instrumented method.
static void onMethodExit(MethodStaticContext msc) {
System.out.println(msc.thisMethodFullName() + " duration "
+ (System.nanoTime() - entryTime));
\caption{Accessing the method name through static context}
Static context interfaces provide information that is already available at the instrumentation time.
When inlining the snippets, DiSL therefore replaces the calls to these interfaces with the corresponding static context information, thus improving the efficiency of the resulting tools.
DiSL provides a set of commonly used static context interfaces, which can be declared as arguments to the snippets in any order.
The DiSL programmer may also define custom static context interfaces to perform additional static analysis at instrumentation time or to access information not directly provided by DiSL.
\subsection{Adding Stack Trace}
Sometimes knowing the name of the profiled method is not enough.
We may also want to know the context in which the method was called.
Such context is provided by the stack trace of the profiled method.
There are several ways to obtain the stack trace information in Java, such as calling the \code{getStackTrace()} method from \code{java.lang.Thread}, but frequent calls to this method may be expensive.
Our example therefore obtains the stack trace using instrumentation.
Figure~\ref{fig:instr-prof-cs} shows two additional snippets that maintain the call stack information in a shadow call stack.
Upon method entry, the method name is pushed onto the shadow call stack.
Upon method exit, the method name is popped off the shadow call stack.
static Stack<String> callStack;
@Before(marker=BodyMarker.class, order=1000)
static void pushOnMethodEntry(MethodStaticContext msc) {
if (callStack == null) { callStack = new Stack<String>(); }
@After(marker=BodyMarker.class, order=1000)
static void popOnMethodExit() {
\caption{Reifying a thread-specific call stack using dedicated snippets}
Each thread maintains a separate shadow call stack, referenced by the thread-local variable~\code{callStack}.\footnote{DiSL offers a particularly efficient implementation of thread-local variables with the \code{@ThreadLocal} annotation.}
In our example, \code{callStack} is initialized for each thread in the before snippet.
The thread-local shadow call stack can be accessed from all snippets through the \code{callStack} variable; for example, it could be included in the profiler output.
To make sure all snippets observe the shadow call stack in a consistent state, the two snippets that maintain the shadow call stack have to be inserted in a correct order relative to the other snippets.
DiSL allows the programmer to specify the order in which snippets matching the same join~point should be inlined using the \code{order} integer parameter in the snippet annotation.
The smaller this number, the closer to the join~point the snippet is inlined.
In our profiler, the time measurement snippets and the shadow call stack snippets match the same join~points (method entry, resp. method exit).
We assign a higher order value (1000) to the call stack reification snippets and keep the lower default order value (100) of the snippets for time measurement.\footnote{If snippet ordering is used, it is recommended to override the value in all snippets for improved readability.}
Consequently, the callee name is pushed onto the shadow call stack before the entry time is measured, and the exit time is measured before the callee name is popped off the stack.
\subsection{Profiling Object Instances}
Our next extension addresses situations where the dependency of the method execution time on the identity of the called object instance is of interest.
Figure~\ref{fig:instr-prof-identity} refines the after snippet of Figure~\ref{fig:instr-prof-time} by computing the identity hash code of the object instance on which the intercepted method has been called.
static void onMethodExit(MethodStaticContext msc, DynamicContext dc) {
int identityHC = System.identityHashCode(dc.getThis());
\caption{Accessing dynamic context information in a snippet}
The snippet uses the \code{DynamicContext} \emph{dynamic context interface} to get a reference to the current object instance.
Similar to the static context interfaces, the dynamic context interfaces are also exposed to the snippets as method arguments.
Unlike the static context information, which is resolved at instrumentation time, calls to the dynamic context interface are replaced with code that obtains the required dynamic information at runtime.
Besides the object reference used in the example, DiSL provides access to other dynamic context information including the local variables, the method arguments, and the values on the operand stack.
\subsection{Selecting Profiled Methods}
Often, it is useful to restrict the instrumentation to certain methods.
For example, we may want to profile only the execution of methods that contain loops, because such methods are likely to contribute more to the overall execution time.
DiSL allows programmers to restrict the instrumentation scope using the \emph{guard} construct.
A guard is a user-defined class whose one method carries the \code{@GuardMethod} annotation.
This method determines whether a snippet matching a particular join~point is inlined.
Figure~\ref{fig:instr-guard} shows the signature of a guard restricting the instrumentation only to methods containing loops.
The body of the \code{methodContainsLoop()} guard method, not shown here, would implement the detection of a loop in a method.
A loop detector based on control flow analysis is included as part of DiSL.
public class MethodsContainingLoop {
public static boolean methodContainsLoop() {
... // Loop detection based on control flow analysis
\caption{Skeleton of a guard for selecting only methods containing a loop}
The loop guard is associated with a snippet using the \code{guard} annotation parameter, as illustrated in Figure~\ref{fig:instr-prof-loopguard}.
Note that the loop guard is not used in the shadow call stack snippets.
We want to maintain complete stack trace information without omitting the methods that do not contain loops.
@Before(marker=BodyMarker.class, guard=MethodsContainingLoop.class)
static void onMethodEntry() { ... }
@After(marker=BodyMarker.class, guard=MethodsContainingLoop.class)
static void onMethodExit(...) { ... }
\caption{Applying time measurement snippets only in methods containing a loop}
\section{Advanced DiSL Features}\label{sec:Advanced}
% Subsection names originally reflected what extension to the profiler we implement.
% This was changed to reflect what DiSL feature we use,
% which should be more in line with the section title.
The features presented so far represent basic DiSL usage.
We continue with examples illustrating the more advanced features of DiSL, which allow experienced developers to extend DiSL functionalities with the aid of ASM.
Hence, to write a DiSL extension it is often required that the developer is familiar the ASM API.
Note that for developing most instrumentation tools these advanced features are not needed.
%Since some of the following features
\subsection{Join~Point Marker Library}
In all the examples presented earlier, profiles were collected with method granularity.
Such profiles may be insufficient when profiling long methods with loops and nested invocations.
In these cases, a more fine grained measurement can help identify the problematic parts of the long methods.
In the profiler example, a more fine grained measurement can be achieved using a different marker with the profiling snippets.
DiSL provides a library of markers (e.g., \code{BasicBlockMarker}, \code{BytecodeMarker}) for intercepting many common bytecode patterns; Figure~\ref{fig:instr-prof-basic blocks} illustrates the use of \code{BasicBlockMarker} for basic block profiling.
static void onBasicBlockEntry() { ... }
static void onBasicBlockExit(...) { ... }
\caption{Writing snippets to profile entry and exit from basic blocks}
\label{fig:instr-prof-basic blocks}
As presented, the change only impacts the choice of the marker class.
Although the resulting instrumentation is valid, the resulting profile is of limited use because it lacks the identification of the basic blocks being profiled.
We add this identification next.
\subsection{Custom Static Context}
There are multiple options for identifying a basic block in the profiler example.
We can use the ordinal number of the basic block as made available by the \code{BasicBlockStaticContext}; however, such identification is only useful if the information about the correspondence between the basic block numbers and the profiled code is available when interpreting the results.
The source code line number is a valuable alternative when working at the source code level, however, the identification is not necessarily unique and the need for additional information when interpreting the results also persists.
To provide an example of custom static context, we illustrate a third option, namely identifying the basic block by the ordinal number of its first instruction and its length counted in the number of instructions (numbers are valid for uninstrumented code).
Implementing the other two approaches in DiSL is of similar complexity.
Conceptually, the identification of the basic block is part of the static context of each snippet.
Thus, it would ideally be available through one of the existing static context interfaces.
Although it is our goal to equip DiSL with a rich library of static context interfaces offering all the information that may be required by an analysis tool, chances are some tools will require static context information which is not provided by DiSL.
We therefore allow defining custom static contexts, which can precompute static values at weave time.
As with other static context information, the weaver embeds these values in the snippet code as constants.
Figure~\ref{fig:instr-prof-csc} illustrates a custom static context that serves as the basic block ID calculator.
public class BasicBlockID extends AbstractStaticContext {
public String getID() {
// validate that the basic block has only one end
// get starting and ending instruction from marker
AbstractInsnNode startInsn = staticContextData.getRegionStart();
AbstractInsnNode endInsn = staticContextData.getRegionEnds().get(0);
// traverse entire method code and calculate instruction index
int bbStart = -1;
int bbLength = 0;
boolean startFound = false;
boolean endFound = false;
InsnList code = staticContextData.getMethodNode().instructions;
for(AbstractInsnNode insn = code.getFirst();
insn != null; insn = insn.getNext()) {
// increase block start index until start instruction found
if(!startFound) {
if(insn.getOpcode() != -1) ++bbStart;
startFound = (insn == startInsn);
if(startFound) {
// count instructions and exit when end instruction found
if(insn.getOpcode() != -1) ++bbLength;
if(insn == endInsn) {
endFound = true;
// validate that both start and end were found
// construct and return the basic block ID
return bbStart + "(" + bbLength + ")";
\caption{Custom static context computing a basic block ID}
A custom static context is a standard Java class that extends the \code{AbstractStaticContext} class or implements the \code{StaticContext} interface directly.
The methods of the custom static context class have no arguments and return a basic type or \code{String}.
The \code{BasicBlockID} class from Figure~\ref{fig:instr-prof-csc} contains one such method, \code{getID()}, which computes the ID of a basic block.
The computation queries the first and the last instruction of the region identified by the basic block marker.
After that, it iterates over the code of the entire method, first incrementing the block index until the basic block start is reached, then incrementing the block length until the basic block end is found.
The method returns the ID as \code{String} whose first part is the index and second part the length.
Custom static context methods can access the current static context information through a protected field called \code{staticContextData}.
The available information describes the marked region, snippet, method, and class where the custom static context is used.
The region description includes one starting instruction and one or more ending instructions depending on the marker.
The snippet structure holds all the information connected to the snippet where the static context is used.
The method and class data are represented by ASM objects \code{MethodNode} and \code{ClassNode}.
\subsection{Custom Bytecode Marker}
It is not always possible to profile a method by instrumenting its body.
For example, the method can be implemented in native code or can execute remotely.
To profile such methods, the instrumentation has to be placed around the method invocation.
In DiSL, method invocation can be easily captured by the \code{BytecodeMarker} with adequate parameters.
To illustrate the extensibility of DiSL, we instead implement a new custom marker that captures method invocations, displayed in Figure~\ref{fig:instr-prof-cm}.
public class MethodInvocationMarker extends AbstractDWRMarker {
public List<MarkedRegion> markWithDefaultWeavingReg(MethodNode method) {
List<MarkedRegion> regions = new LinkedList<MarkedRegion>();
// traverse all instructions
InsnList instructions = method.instructions;
for (AbstractInsnNode instruction : instructions.toArray()) {
// check for method invocation instructions
if (instruction instanceof MethodInsnNode) {
// add region containing one instruction (method invocation)
regions.add(new MarkedRegion(instruction, instruction));
return regions;
\caption{Custom marker implementing a method invocation join~point}
The role of a marker is to select the bytecode regions for instrumentation.
A custom bytecode marker in DiSL must implement the \code{Marker} interface.
Typically, the marker would not implement this interface directly, but instead inherit from the \code{AbstractDWRMarker} abstract class, which also takes care of correctly placing the weaving points.
In our example, the \code{MethodInvocationMarker} class traverses all instructions using ASM and creates a single-instruction region for each method invocation encountered; the abstract marker class is used to compute all the weaving information automatically.
Note that the example marker captures all method invocations.
To reduce the instrumentation scope, the developer should use either a guard or a runtime check.
\subsection{Analyzing Method Arguments}
DiSL provides two different mechanisms for analyzing method arguments.
The first approach provides the method arguments to the snippet in an object array.
The entire array is constructed dynamically at runtime, with arguments of primitive types boxed.
Conceptually simple, the approach requires object allocation and always processes all arguments.
The second approach aims at situations where the overhead of using object arrays is not acceptable.
The approach uses code fragments called \emph{argument processors}.
Each argument processor analyzes only one type of method arguments.
The code of the argument processor is inlined into the snippet where it is applied.
With argument processors, it is possible to access method arguments without object allocation.
Technically, the argument processor is an annotated Java class containing argument processing methods.
The first argument of each argument processor method is of the type being processed, that is, any basic Java type (\code{int}, \code{byte}, \code{double} \ldots), \code{String}, or an object reference.
As additional arguments, the methods can receive dynamic or static contexts, including \emph{argument context}, which is a special kind of static context available only within the argument processor.
The \code{ArgumentContext} interface exposes information about the currently processed argument and can be used to limit argument processing only to arguments at a particular position or with a particular type.
The argument processor methods can also use thread-local or synthetic local variables.
An example of an argument processor that processes int arguments is given in Figure~\ref{fig:instr-argproc}.
public class IntArgumentPrinter {
public static void printIntegerArgument (
int val, ArgumentContext ac, MethodStaticContext msc) {
"Int argument value in method %s at position %d of %d is %d\n",
msc.thisMethodFullName(), ac.getPosition(), ac.getTotalCount(), val
\caption{A simple argument processor for printing the values of integer arguments}
The argument processor is used by applying it in an argument processor context within a snippet.
The argument processor context can apply an argument processor in two modes.
All snippets can apply the processor on the arguments of the current method.
Snippets inserted just before a method invocation can also apply the processor on the invocation arguments.
Figure~\ref{fig:instr-argprocapply} illustrates a snippet that uses the \code{IntArgumentPrinter} argument processor from Figure~\ref{fig:instr-argproc} to print out the values of the integer arguments of the currently executed method.
@Before(marker = BodyMarker.class)
public static void onMethodEntry(ArgumentProcessorContext apc) {
apc.apply(IntArgumentPrinter.class, ArgumentProcessorMode.METHOD_ARGS);
\caption{Using an argument processor within a snippet}