|
|
SECTION: Register allocation |
|
|
|
|
Combined code motion and register allocation using the value state dependence graph |
| |
Neil Johnson,
Alan Mycroft
|
|
Pages: 1-16 |
|
We define the Value State Dependence Graph (VSDG). The VSDG is a form of the Value Dependence Graph (VDG) extended by the addition of state dependence edges to model sequentialised computation. These express store dependencies and loop termination dependencies ...
We define the Value State Dependence Graph (VSDG). The VSDG is a form of the Value Dependence Graph (VDG) extended by the addition of state dependence edges to model sequentialised computation. These express store dependencies and loop termination dependencies of the original program. We also exploit them to express the additional serialization inherent in producing final object code. The central idea is that this latter serialization can be done incrementally so that we have a class of algorithms which effectively interleave register allocation and code motion, thereby avoiding a well-known phase-order problem in compilers. This class operates by first normalizing the VSDG during construction, to remove all duplicated computation, and then repeatedly choosing between: (i) allocating a value to a register, (ii) spilling a value to memory, (iii) moving a loop-invariant computation within a loop to avoid register spillage, and (iv) statically duplicating a computation to avoid register spillage. We show that the classical two-phase approach (code motion then register allocation in both Chow and Chaitin forms) are examples of this class, and propose a new algorithm based on depth-first cuts of the VSDG. expand
|
|
|
Early control of register pressure for software pipelined loops |
| |
Sid-Ahmed-Ali Touati,
Christine Eisenbeis
|
|
Pages: 17-32 |
|
The register allocation in loops is generally performed after or during the software pipelining process. This is because doing a conventional register allocation at first step without assuming a schedule lacks the information of interferences between ...
The register allocation in loops is generally performed after or during the software pipelining process. This is because doing a conventional register allocation at first step without assuming a schedule lacks the information of interferences between variable lifetime intervals. Thus, the register allocator may introduce an excessive amount of false dependences that reduce dramatically the ILP (Instruction Level Parallelism). We present a new framework for controlling the register pressure before software pipelining. This is based on inserting some anti-dependences edges (register reuse edges) labeled with reuse distances, directly on the data dependence graph. In this new graph, we are able to guarantee that the number of simultaneously alive variables in any schedule does not exceed a limit. The determination of register and distance reuse is parameterized by the desired critical circuit ratio (MII) as well as by the register pressure constraints - either can be minimized while the other one is fixed. After scheduling, register allocation is done cyclically on conventional register sets or on rotating register files. We give an optimal exact model, and another approximative one that generalizes the Ning-Gao [13] buffer optimization heuristics. expand
|
|
|
Register allocation by optimal graph coloring |
| |
Christian Andersson
|
|
Pages: 33-45 |
|
We here present new insights of properties of real-life interference graphs emerging in register allocation. These new insights imply good hopes for a possibility of improving the coloring approach towards optimal solutions. The conclusions are based ...
We here present new insights of properties of real-life interference graphs emerging in register allocation. These new insights imply good hopes for a possibility of improving the coloring approach towards optimal solutions. The conclusions are based on measurements of nearly 28, 000 real instances of such interference graphs. All the instances explored are determined to possess the so-called 1-perfectness property, a fact that seems to make them easy to color optimally. The exact algorithms presented not only produce better solutions than the traditional heuristic methods, but also, indeed, seem to perform surprisingly fast, according to the measurements on our implementations. expand
|
|
|
SECTION: Language constructs and their implementation |
|
|
|
|
A compilation and optimization model for aspect-oriented programs |
| |
H. Masuhara,
G. Kiczales,
C. Dutchyn
|
|
Pages: 46-60 |
|
This paper presents a semantics-based compilation model for an aspect-oriented programming language based on its operational semantics. Using partial evaluation, the model can explain several issues in compilation processes, including how to find places ...
This paper presents a semantics-based compilation model for an aspect-oriented programming language based on its operational semantics. Using partial evaluation, the model can explain several issues in compilation processes, including how to find places in program text to insert aspect code and how to remove unnecessary run-time checks. It also illustrates optimization of calling-context sensitive pointcuts (cflow), implemented in real compilers. expand
|
|
|
A pattern matching compiler for multiple target languages |
| |
Pierre-Etienne Moreau,
Christophe Ringeissen,
Marian Vittek
|
|
Pages: 61-76 |
|
Many processes can be seen as transformations of treelike data structures. In compiler construction, for example, we continuously manipulate trees and perform tree transformations. This paper introduces a pattern matching compiler (TOM): a set of primitives ...
Many processes can be seen as transformations of treelike data structures. In compiler construction, for example, we continuously manipulate trees and perform tree transformations. This paper introduces a pattern matching compiler (TOM): a set of primitives which add pattern matching facilities to imperative languages such as C, Java, or Eiffel. We show that this tool is extremely non-intrusive, lightweight and useful to implement tree transformations. It is also flexible enough to allow the reuse of existing data structures. expand
|
|
|
A new one-pass transformation into monadic normal form |
| |
Olivier Danvy
|
|
Pages: 77-89 |
|
We present a translation from the call-by-value λ-calculus to monadic normal forms that includes short-cut boolean evaluation. The translation is higher-order, operates in one pass, duplicates no code, generates no chains of thunks, and is properly ...
We present a translation from the call-by-value λ-calculus to monadic normal forms that includes short-cut boolean evaluation. The translation is higher-order, operates in one pass, duplicates no code, generates no chains of thunks, and is properly tail recursive. It makes a crucial use of symbolic computation at translation time. expand
|
|
|
SECTION: Type analysis |
|
|
|
|
Run-time type checking for binary programs |
| |
Michael Burrows,
Stephen N. Freund,
Janet L. Wiener
|
|
Pages: 90-105 |
|
Many important software systems are written in the C programming language. Unfortunately, the C language does not provide strong safety guarantees, and many common programming mistakes introduce type errors that are not caught by the compiler. These ...
Many important software systems are written in the C programming language. Unfortunately, the C language does not provide strong safety guarantees, and many common programming mistakes introduce type errors that are not caught by the compiler. These errors only manifest themselves at run time through unexpected program behavior, and it is often hard to isolate and identify their causes. This paper presents the Hobbes run-time type checker for compiled C programs. Our tool interprets compiled binaries, tracks type information for all memory and register locations, and reports warnings when a variety of type errors occur. Because the Hobbes type checker does not rely on source code, it is effective in many situations where similar tools are not, such as when full source code is not available or when C source is linked with program fragments written in assembly or other languages. expand
|
|
|
Precision in practice: a type-preserving java compiler |
| |
Christopher League,
Zhong Shao,
Valery Trifonov
|
|
Pages: 106-120 |
|
Popular mobile code architectures (Java and .NET) include verifiers to check for memory safety and other security properties. Since their formats are relatively high level, supporting a wide range of source language features is awkward. Further compilation ...
Popular mobile code architectures (Java and .NET) include verifiers to check for memory safety and other security properties. Since their formats are relatively high level, supporting a wide range of source language features is awkward. Further compilation and optimization, necessary for efficiency, must be trusted. We describe the design and implementation of a fully type-preserving compiler for Java and ML. Its strongly-typed intermediate language provides a low-level abstract machine model and a type system general enough to prove the safety of a variety of implementation techniques. We show that precise type preservation is within reach for real-world Java systems. expand
|
|
|
The MAGICA type inference engine for MATLAB ® |
| |
Pramod G. Joisha,
Prithviraj Banerjee
|
|
Pages: 121-125 |
|
|
|
|
SECTION: CC invited talk |
|
|
|
|
Dimensions of precision in reference analysis of object-oriented programming languages |
| |
Barbara G. Ryder
|
|
Pages: 126-137 |
|
There has been approximately a ten year history of reference analyses for object-oriented programming languages. Approaches vary as to how different analyses account for program execution flow, how they capture calling context, and how they model objects, ...
There has been approximately a ten year history of reference analyses for object-oriented programming languages. Approaches vary as to how different analyses account for program execution flow, how they capture calling context, and how they model objects, reference variables and the possible calling structure of the program. A taxonomy of analysis dimensions that affect precision (and cost) will be presented and illustrated by examples of existing reference analysis techniques. expand
|
|
|
SECTION: Java |
|
|
|
|
Polyglot: an extensible compiler framework for Java |
| |
Nathaniel Nystrom,
Michael R. Clarkson,
Andrew C. Myers
|
|
Pages: 138-152 |
|
Polyglot is an extensible compiler framework that supports the easy creation of compilers for languages similar to Java, while avoiding code duplication. The Polyglot framework is useful for domain-specific languages, exploration of language design, ...
Polyglot is an extensible compiler framework that supports the easy creation of compilers for languages similar to Java, while avoiding code duplication. The Polyglot framework is useful for domain-specific languages, exploration of language design, and for simplified versions of Java for pedagogical use. We have used Polyglot to implement several major and minor modifications to Java; the cost of implementing language extensions scales well with the degree to which the language differs from Java. This paper focuses on the design choices in Polyglot that are important for making the framework usable and highly extensible. Polyglot source code is available. expand
|
|
|
Scaling Java points-to analysis using SPARK |
| |
Ondřej Lhoták,
Laurie Hendren
|
|
Pages: 153-169 |
|
Most points-to analysis research has been done on different systems by different groups, making it difficult to compare results, and to understand interactions between individual factors each group studied. Furthermore, points-to analysis for Java has ...
Most points-to analysis research has been done on different systems by different groups, making it difficult to compare results, and to understand interactions between individual factors each group studied. Furthermore, points-to analysis for Java has been studied much less thoroughly than for C, and the tradeoffs appear very different. We introduce SPARK, a flexible framework for experimenting with points-to analyses for Java. SPARK supports equality- and subset-based analyses, variations in field sensitivity, respect for declared types, variations in call graph construction, off-line simplification, and several solving algorithms. SPARK is composed of building blocks on which new analyses can be based. We demonstrate SPARK in a substantial study of factors affecting precision and efficiency of subset-based points-to analyses, including interactions between these factors. Our results show that SPARK is not only flexible and modular, but also offers superior time/space performance when compared to other points-to analysis implementations. expand
|
|
|
Effective inline-threaded interpretation of Java bytecode using preparation sequences |
| |
Etienne Gagnon,
Laurie Hendren
|
|
Pages: 170-184 |
|
Inline-threaded interpretation is a recent technique that improves performance by eliminating dispatch overhead within basic blocks for interpreters written in C [11]. The dynamic class loading, lazy class initialization, and multi-threading features ...
Inline-threaded interpretation is a recent technique that improves performance by eliminating dispatch overhead within basic blocks for interpreters written in C [11]. The dynamic class loading, lazy class initialization, and multi-threading features of Java reduce the effectiveness of a straight-forward implementation of this technique within Java interpreters. In this paper, we introduce preparation sequences, a new technique that solves the particular challenge of effectively inline-threading Java. We have implemented our technique in the SableVM Java virtual machine, and our experimental results show that using our technique, inline-threaded interpretation of Java, on a set of benchmarks, achieves a speedup ranging from 1.20 to 2.41 over switch-based interpretation, and a speedup ranging from 1.15 to 2.14 over direct-threaded interpretation. expand
|
|
|
Integrating generations with advanced reference counting garbage collectors |
| |
Hezi Azatchi,
Erez Petrank
|
|
Pages: 185-199 |
|
We study an incorporation of generations into a modern reference counting collector. We start with the two on-the-fly collectors suggested by Levanoni and Petrank: a reference counting collector and a tracing (mark and sweep) collector. We then propose ...
We study an incorporation of generations into a modern reference counting collector. We start with the two on-the-fly collectors suggested by Levanoni and Petrank: a reference counting collector and a tracing (mark and sweep) collector. We then propose three designs for combining them so that the reference counting collector collects the young generation or the old generation or both. Our designs maintain the good properties of the Levanoni-Petrank collector. In particular, it is adequate for multithreaded environment and a multiprocessor platform, and it has an efficient write barrier with no synchronization operations. To the best of our knowledge, the use of generations with reference counting has not been tried before. We have implemented these algorithms with the Jikes JVM and compared them against the concurrent reference counting collector supplied with the Jikes package. As expected, the best combination is the one that lets the tracing collector work on the young generation (where most objects die) and the reference counting work on the old generation (where many objects survive). Matching the expected survival rate with the nature of the collector yields a large improvement in throughput while maintaining the pause times around a couple of milliseconds. expand
|
|
|
SECTION: Pot pourri |
|
|
|
|
The interprocedural express-lane transformation |
| |
David Melski,
Thomas Reps
|
|
Pages: 200-216 |
|
The express-lane transformation isolates and duplicates frequently executed program paths, aiming for better data-flow facts along the duplicated paths. An express-lane p is a copy of a frequently executed program path such that p has only ...
The express-lane transformation isolates and duplicates frequently executed program paths, aiming for better data-flow facts along the duplicated paths. An express-lane p is a copy of a frequently executed program path such that p has only one entry point at its beginning; p may have branches back to the original code, but the original code never branches into p. Classical data-flow analysis is likely to find sharper data-flow facts along an express-lane, because there are no join points. This paper describes several variants of interprocedural express-lane transformations; these duplicate hot interprocedural paths, i.e., paths that may cross procedure boundaries. The paper also reports results from an experimental study of the effects of the express-lane transformation on interprocedural range analysis. expand
|
|
|
Automatic detection of uninitialized variables |
| |
Thi Viet Nga Nguyen,
François Irigoin,
Corinne Ancourt,
Fabien Coelho
|
|
Pages: 217-231 |
|
One of the most common programming errors is the use of a variable before its definition. This undefined value may produce incorrect results, memory violations, unpredictable behaviors and program failure. To detect this kind of error, two approaches ...
One of the most common programming errors is the use of a variable before its definition. This undefined value may produce incorrect results, memory violations, unpredictable behaviors and program failure. To detect this kind of error, two approaches can be used: compile-time analysis and run-time checking. However, compile-time analysis is far from perfect because of complicated data and control flows as well as arrays with non-linear, indirection subscripts, etc. On the other hand, dynamic checking, although supported by hardware and compiler techniques, is costly due to heavy code instrumentation while information available at compile-time is not taken into account. This paper presents a combination of an efficient compile-time analysis and a source code instrumentation for run-time checking. All kinds of variables are checked by PIPS, a Fortran research compiler for program analyses, transformation, parallelization and verification. Uninitialized array elements are detected by using imported array region, an efficient inter-procedural array data flow analysis. If exact array regions cannot be computed and compile-time information is not sufficient, array elements are initialized to a special value and their utilization is accompanied by a value test to assert the legality of the access. In comparison to the dynamic instrumentation, our method greatly reduces the number of variables to be initialized and to be checked. Code instrumentation is only needed for some array sections, not for the whole array. Tests are generated as early as possible. In addition, programs can be proved to be free from used-before-set errors statically at compile-time or, on the contrary, have real undefined errors. Experiments on SPEC95 CFP show encouraging results on analysis cost and run-time overheads. expand
|
|
|
Generalised regular parsers |
| |
Adrian Johnstone,
Elizabeth Scott
|
|
Pages: 232-246 |
|
Aycock and Horspool have given an algorithm which improves the efficiency of GLR parsers. A grammar is 'reduced' so that there is no recursion apart from non-hidden left recursion, an FA recogniser is then constructed and a stack is used when the recursive ...
Aycock and Horspool have given an algorithm which improves the efficiency of GLR parsers. A grammar is 'reduced' so that there is no recursion apart from non-hidden left recursion, an FA recogniser is then constructed and a stack is used when the recursive parts of the original grammar are required. Aycock and Horspool then give an algorithm which performs all possible traversals of the resulting PDA on a given input string. This mirrors the approach taken by Tomita to perform all traversals of an LR(0) FA. However, Aycock and Horspool's algorithm does not terminate in the case where the grammar contains hidden left recursion. In this paper we give a different method for constructing an FA which recognises the language generated by the grammar provided that the only recursion in the grammar arises from left or right recursion. Using this FA allows us to reduce the number of places that the stack is required. We also give a different algorithm for constructing all traversals of the final PDA which is correct in all cases, including grammars with hidden left recursion. Thus we can apply our algorithm to all context free grammars. expand
|
|
|
Rapid and robust compiler construction using template-based metacompilation |
| |
C. van Reeuwijk
|
|
Pages: 247-261 |
|
We have developed Tm, a template-based metacompiler. Given a set of data-structure definitions and a template, Tm generates files that instantiate the template for the given data structures. With this process, Tm is able to generate program code to manipulate ...
We have developed Tm, a template-based metacompiler. Given a set of data-structure definitions and a template, Tm generates files that instantiate the template for the given data structures. With this process, Tm is able to generate program code to manipulate these data structures. Since it uses templates, the generated code is not restricted to a specific programming language: any sufficiently powerful programming language can be targeted. Tm has been used for a wide variety of tasks and languages. However, it was designed to support compiler construction, and most applications have been in that area. In this paper we outline Tm, and describe our experiences with using it to construct a static compiler for Java. As we will show, it has significantly accelerated implementation of the compiler. Almost 75% of its source code is generated by Tm, allowing us to rapidly implement a much more robust and sophisticated compiler than would have been possible otherwise. expand
|
|
|
SECTION: ETAPS invited talk |
|
|
|
|
The verifying compiler: a grand challenge for computing research |
| |
Tony Hoare
|
|
Pages: 262-272 |
|
I propose a set of criteria which distinguish a grand challenge in science or engineering from the many other kinds of short-term or long-term research problems that engage the interest of scientists and engineers. As an example drawn from Computer Science, ...
I propose a set of criteria which distinguish a grand challenge in science or engineering from the many other kinds of short-term or long-term research problems that engage the interest of scientists and engineers. As an example drawn from Computer Science, I revive an old challenge: the construction and application of a verifying compiler that guarantees correctness of a program before running it. expand
|
|
|
SECTION: Optimization |
|
|
|
|
Address register assignment for reducing code size |
| |
M. Kandemir,
M. J. Irwin,
G. Chen,
J. Ramanujam
|
|
Pages: 273-289 |
|
In DSP processors, minimizing the amount of address calculations is critical for reducing code size and improving performance since studies of programs have shown that instructions that manipulate address registers constitute a significant portion of ...
In DSP processors, minimizing the amount of address calculations is critical for reducing code size and improving performance since studies of programs have shown that instructions that manipulate address registers constitute a significant portion of the overall instruction count (up to 55&percnt). This work presents a compiler-based optimization strategy to reduce the code size in embedded systems. Our strategy maximizes the use of indirect addressing modes with post-increment and post-decrement capabilities available in DSP processors. These modes can be exploited by ensuring that successive references to variables access consecutive memory locations. To achieve this spatial locality, our approach uses both access pattern modification (program code restructuring) and memory storage reordering (data layout restructuring). expand
|
|
|
Offset assignment showdown: evaluation of DSP address code optimization algorithms |
| |
Rainer Leupers
|
|
Pages: 290-302 |
|
Offset assignment is a highly effective DSP address code optimization technique that has been implemented in a number of ANSI C compilers. In this paper we concentrate on a special class of offset assignment problems called "simple offset assignment" ...
Offset assignment is a highly effective DSP address code optimization technique that has been implemented in a number of ANSI C compilers. In this paper we concentrate on a special class of offset assignment problems called "simple offset assignment" (SOA). A number of SOA algorithms have been proposed recently, but experimental results and direct comparisons are still sparse. This makes the practical selection of a suitable SOA algorithm for implementation in a compiler very difficult. This paper aims at closing this gap by providing a comprehensive benchmark suite and empirical evaluation based on real-life application programs. Our results for the first time permit a detailed assessment of all major SOA algorithms. In addition, we propose a new and superior combination of SOA heuristics. expand
|
|
|
Integrating high-level optimizations in a production compiler: design and implementation experience |
| |
Somnath Ghosh,
Abhay Kanhere,
Rakesh Krishnaiyer,
Dattatraya Kulkarni,
Wei Li,
Chu-Cheow Lim,
John Ng
|
|
Pages: 303-319 |
|
The High-Level Optimizer (HLO) is a key part of the compiler technology that enabled Itanium™ and Itanium™2 processors deliver leading floating-point performance at their introduction. In this paper, we discuss the design and implementation ...
The High-Level Optimizer (HLO) is a key part of the compiler technology that enabled Itanium™ and Itanium™2 processors deliver leading floating-point performance at their introduction. In this paper, we discuss the design and implementation experience in integrating diverse optimizations in the HLO module. In particular, we describe decisions made in the design of HLO targeting Itanium processor family. We provide empirical data to validate the design decisions. Since HLO was implemented in a production compiler, we made certain engineering trade-offs. We discuss these trade-offs and outline key learning derived from our experience. expand
|
|
|
Improving data locality by chunking |
| |
Cédric Bastoul,
Paul Feautrier
|
|
Pages: 320-334 |
|
Cache memories were invented to decouple fast processors from slow memories. However, this decoupling is only partial, and many researchers have attempted to improve cache use by program optimization. Potential benefits are significant since both energy ...
Cache memories were invented to decouple fast processors from slow memories. However, this decoupling is only partial, and many researchers have attempted to improve cache use by program optimization. Potential benefits are significant since both energy dissipation and performance highly depend on the traffic between memory levels. But modeling the traffic is difficult; this observation has led to the use of heuristic methods for steering program transformations. In this paper, we propose another approach: we simplify the cache model and we organize the target program in such a way that an asymptotic evaluation of the memory traffic is possible. This information is used by our optimization algorithm in order to find the best reordering of the program operations, at least in an asymptotic sense. Our method optimizes both temporal and spatial locality. It can be applied to any static control program with arbitrary dependences. The optimizer has been partially implemented and applied to non-trivial programs. We present experimental evidence that the amount of cache misses is drastically reduced with corresponding performance improvements. expand
|