|
An Overview of the Intel IA-64 Compiler (continued) INTERPROCEDURAL ANALYSIS AND OPTIMIZATION IA-64's Explicitly Parallel Instruction Computing (EPIC) architecture makes it possible to execute a large number of instructions in a single clock cycle. Therefore, scheduling to fill instruction words is of vital importance to the compiler. As with other processors, effective use of instruction caches and branch prediction are also important. Traditionally, compilers have operated on one procedure of the program at a time. However, such intraprocedural analysis and optimization is no longer sufficient to fully exploit IA-64's architectural features. The interprocedural optimizer in the Intel IA-64 compiler is profile-guided and multifile capable, so that it can efficiently provide analysis and optimization for very large regions of application code. The Intel IA-64 compiler provides extensive support for interprocedural analysis and optimization. One set of key features provided by the compiler is for points-to analysis, mod/ref analysis, side effect propagation, and constant propagation. The optimizer and scheduler for the IA-64 compiler may need to move instructions over large regions in order to fill scheduling slots. In order to move operations over large regions, the compiler frequently requires knowledge of memory references within the region. Points-to analysis aids this process by accurately determining which memory locations may be referenced by a memory reference. Figure 3 illustrates this with three memory references. If the store to an address in r37 is known not to store to the same object as the object pointed to by r33, then the second load may be eliminated. Furthermore, because of IA-64's data speculation feature, it may be possible to eliminate the load even if the accesses might infrequently conflict. Similarly, moving memory references across function calls requires knowledge of what is modified or referenced by the function call. This is provided by mod/ref analysis. Analysis and optimization for IA-64 also expose the need for larger program scope for the IA-64 compared to traditional optimizers. To give the optimizer and code generator larger scope, the interprocedural optimizer provides several forms of procedure integration: inlining, cloning, and partial inlining. Inlining replaces a call site by the body of the function that would be invoked, and it provides the fullest opportunity for optimization, albeit with potentially large increases in code size. Cloning and partial inlining are used to specialize functions to particular call sites, thereby providing many of the benefits of inlining while not increasing code size significantly.
![]()
Figure 3: An example of a situation requiring
Memory Disambiguation The simplest disambiguation cases are direct scalar or structure references. Figure 4 shows a pair of direct structure references. The compiler may disambiguate these two memory references either by determining that a and b are different memory objects or that field1 and field2 are non-overlapping fields.
![]()
Figure 4: Disambiguation of
![]()
Figure 5: Disambiguation of
![]()
Figure 6: Disambiguation of
Function calls can inhibit optimization. Figure 7 shows an example where a function call may inhibit dead store elimination. If the function foo() reads *p, then the first store to *p is not dead. Interprocedural mod/ref information [10] is used to determine the set of memory locations written/read as a result of a function call.
![]()
Figure 7: Disambiguation of a memory
|