Skip to main content
main-content

Über dieses Buch

The implementation of object-oriented languages has been an active topic of research since the 1960s when the first Simula compiler was written. The topic received renewed interest in the early 1980s with the growing popularity of object-oriented programming languages such as c++ and Smalltalk, and got another boost with the advent of Java. Polymorphic calls are at the heart of object-oriented languages, and even the first implementation of Simula-67 contained their classic implementation via virtual function tables. In fact, virtual function tables predate even Simula-for example, Ivan Sutherland's Sketchpad drawing editor employed very similar structures in 1960. Similarly, during the 1970s and 1980s the implementers of Smalltalk systems spent considerable efforts on implementing polymorphic calls for this dynamically typed language where virtual function tables could not be used. Given this long history of research into the implementation of polymorphic calls, and the relatively mature standing it achieved over time, why, one might ask, should there be a new book in this field? The answer is simple. Both software and hardware have changed considerably in recent years, to the point where many assumptions underlying the original work in this field are no longer true. In particular, virtual function tables are no longer sufficient to implement polymorphic calls even for statically typed languages; for example, Java's interface calls cannot be implemented this way. Furthermore, today's processors are deeply pipelined and can execute instructions out-of­ order, making it difficult to predict the execution time of even simple code sequences.

Inhaltsverzeichnis

Frontmatter

1. Introduction

Abstract
Object-oriented code looks different from procedural code. The main difference is the increased frequency of polymorphic calls. A polymorphic call looks like a procedural call, but where a procedural call has only one possible target subroutine, a polymorphic call can result in the execution of one of several different subroutines. The choice depends on the run time type of the first argument (the receiver). Polymorphic calls enable the construction of clean, modular code design. They allow the programmer to invoke operations on an object without knowing its exact type in advance.
Karel Driesen

2. Polymorphic Calls

Abstract
In this section we discuss the polymorphic call construct as it appears in different programming languages.
Karel Driesen

3. Software Techniques for Efficient Polymorphic Calls

Abstract
In this section, we describe the major techniques in current use for the optimization of polymorphic calls. We present the techniques and discuss runtime aspects like estimated memory cost and responsiveness. Detailed estimates of run-time call cost are delayed until Section 5, where we bring hardware into the picture.
Karel Driesen

4. Row Displacement Compression of Message Dispatch Tables

Abstract
Table-based dispatch is attractive because its cost is bound by a constant. The critical path of selector table indexing (STI), the simple, fast, but memory-greedy static technique, consists of two dependent loads and an indirect branch instruction. On older processors, this sequence takes about 5 cycles, 4 if the branch delay slot can be filled. On newer processors, the sequence takes in the worst case (branch mispredict), two load latencies plus a branch misprediction penalty (about 8 cycles on the P96 processor used as state-of-the-art representative in Section 6).
Karel Driesen

5. Analysis of Dispatch Sequences on Modern Processor Architectures

Abstract
Dispatch cost is intimately coupled with processor implementation. The same dispatch sequence may have different cost on different processor implementations, even if all of them implement the same architecture (e.g., the SPARC instruction set). In particular, processor pipelining and superscalar execution make it impossible to use the number of instructions in a code sequence as an accurate performance indicator. This section characterizes the run-time performance of dispatch mechanisms on modern pipelined processors by determining the performance impact of branch latency and superscalar instruction issue. We do this by analyzing call code sequences, optimally scheduled for the given instruction issue. In addition to providing specific numbers for three example architectures, our analysis allows bounds on dispatch performance to be computed for a wide range of possible (future) processors. With the rapid change in processor design, it is desirable to characterize performance in a way that makes the dependence on certain processor characteristics explicit, so that performance on a new processor can be estimated accurately as long as the processor’s characteristics are known.
Karel Driesen

6. Measurement of Virtual Function Call Overhead on Modern Processors

Abstract
In this section we measure the direct cost of virtual function table lookup for a number of realistic C++ programs running on superscalar processors employing co-scheduling and simple indirect branch prediction, and identify the processor characteristics that most affect this cost. In Section 5.2 we saw that, when analyzed in isolation, the cost of dispatch sequences of table-based techniques are similar to virtual function tables. Therefore VTBL serves as a representative technique for table-based dispatch in this quantitative analysis.
Karel Driesen

7. Hardware Techniques

Abstract
In the previous section we established the importance of indirect branch misprediction in the call cost of virtual functions in C++. Superscalar processors are likely to hide most of the cost of table-based polymorphic call resolution by co-scheduling surrounding instructions, but only when the indirect branch is predicted correctly. Since the performance of a branch target buffer maxed out at about 128 entries, more accurate prediction cannot be bought even if a larger transistor budget is available, unless better prediction techniques can be implemented in hardware. Therefore we studied more complex prediction mechanisms in the remainder of this work
Karel Driesen

8. Basic Indirect Branch Predictors

Abstract
We investigate a wide range of two-level predictors dedicated exclusively to indirect branches. We first study the intrinsic predictability of indirect branches by ignoring any hardware constraints on memory size or logic complexity. Then we progressively introduce hardware constraints and minimize the loss of predictor performance at each step. Thus, we initially assume unconstrained, fully associative tables and full 32-bit addresses (unless indicated otherwise).
Karel Driesen

9. Hybrid Indirect Branch Predictors

Abstract
As discussed in the previous section, predictors with short path lengths adapt more quickly when the program goes through a phase change because it doesn’t take much time for a short history to fill up. Longer path length predictors are capable of detecting longer-term correlations but take longer to adapt and suffer more from table size limitations because a larger pattern set is mapped to the same number of targets. In this section we combine basic predictors into a hybrid predictor in order to obtain the advantages of both.
Karel Driesen

10. Related Work

Abstract
In this section we discuss related work not touched in other chapters.
Karel Driesen

11. Conclusions

Abstract
We have studied various techniques to optimize a high-level language construct that is essential to object-oriented programming languages: polymorphic calls. We have gathered substantial evidence that software- and hardware-based support for polymorphic call resolution is useful, effective and affordable, and, according to current projections, will become more so in the next decade.
Karel Driesen

Bookbackmatter. Glossary

Without Abstract
Karel Driesen

13. References

Without Abstract
Karel Driesen

Backmatter

Weitere Informationen