CA2392122A1 - An unrolling transformation on nested loops - Google Patents

An unrolling transformation on nested loops Download PDF

Info

Publication number
CA2392122A1
CA2392122A1 CA002392122A CA2392122A CA2392122A1 CA 2392122 A1 CA2392122 A1 CA 2392122A1 CA 002392122 A CA002392122 A CA 002392122A CA 2392122 A CA2392122 A CA 2392122A CA 2392122 A1 CA2392122 A1 CA 2392122A1
Authority
CA
Canada
Prior art keywords
loop
iteration space
nest
residue
virtual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
CA002392122A
Other languages
French (fr)
Inventor
Arie Tal
Robert J. Blainey
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
IBM Canada Ltd
Original Assignee
IBM Canada Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by IBM Canada Ltd filed Critical IBM Canada Ltd
Priority to CA002392122A priority Critical patent/CA2392122A1/en
Priority to US10/294,938 priority patent/US7140009B2/en
Publication of CA2392122A1 publication Critical patent/CA2392122A1/en
Abandoned legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/443Optimisation

Abstract

The present invention is directed to a transformation technique for nested loops. A virtual iteration space may be determined based on an unroll factor (UF). The virtual iteration space, which includes the actual iteration space, is formed such that, the virtual iteration space may be evenly divided by a selected UF. Once the virtual iteration space has been calculated or determined, the virtual iteration space is "cut" into regular portions by one or more unroll factors. Portions of the actual iteration space which do not fill the cut portions of the virtual iteration space or which fall outside these cuts which have been evenly divided by the unroll factor form a residue which is calculated. The portions of the actual iteration space which remain are also evenly divided by the unroll factor(s). An outer loop for this remaining portion of the actual iteration space is then unrolled. This unrolled portion forms a perfect nested loop. Accordingly, the operations for the unrolled remaining portion of the actual iteration space when combined with the operations for the residue of the actual iteration space which was not evenly divided by the unroll factor is, in appropriate situations, semantically equivalent to the original nested loops.
Aspects of the invention are applicable to rectangular and triangular loop nests, and combinations thereof. Moreover, the invention is applicable to loops having n-dimensions.

Description

AN UNROLLING TRANSFORMATION OF NESTED LOOPS
FIELD OF THE INVENTION
This invention relates generally; to the optimization of computer instructions and, more particularly, to an unrolling transformation of nested loops.
BACKGROUND OF THE INVENTION
Generating computer code that is ei~iciently processed (i.e., "optimized") is one of the most important goals in software design and execution. Computer code which performs the desired function accurately and;reliably but too slowly (i:e.;: code which is not optimized) is often discarded or unused by the computer users.
As those of ordinary skill in the art are aware, most source code (i.e., that code which is a human readable form) is typically converted into object code, and thereafter an executable application by use of a compiler and a linker. The executable application is in a form and language that is machine readable (i.e., capable of being interpreted and executed by a computer). Other languages, such as Java available from Sun Microsystems; Inc. of California, USA, may be in source code form that is, on execution, transformed into a form understood by a computer system which then executes the transformed instructions: In any case, the source code, when transformed into a form capable of being understood and executed by a computer system, is frequently optimized: That is, a transformation is performed such that the instructions are performed more efficiently (i.e., optimized) and, hopefully; without ;any undue delay.
24 One common structure found in source code is a loop. Nested loops - a loop within another loop - are also common in the art. Loops are used to repeat one or more operations or instructions.
For example, an array may be used to store the purchase price of individual articles (e.g., where the ith element in the array A is denoted, in Fortran, as A(i) = other similar notations are used in other languages ). If it is desired to add all of the prices stored therein, a computer programmer could generate a single instruction to add each of the purchase prices together (e.g., sum = A(1) + A(2) +
... +A(n)). 'This however would take the programmer some time to code and is not easily adapted to the situation where the computer programmer does not know, at development time, the number of articles in the array. That is, when the number of elements in the array can only be determined at run time (i.e., during execution). Accordingly, the loop was developed to repeat an operation (e.g., sum = sum + A(i))) where the induction variable, l, is changed for each iteration. Other forms of loops are known and: are equally applicable. However, when the instructions of loop are transformed into machine readable code (e.g., executable code), the executed instructions may not be processed efficiently. For the example above, some computer systems may require that the processor fetch from memory, rather than from a register or cache memory, the various elements of the array "A". Fetching; data from memory requires the processor to wait while the data is retrieved.
Also, while loops maybe an efficient way to write certain repetitive source code operations, a loop does insert additional operations that would not be present if the repetitive operations were replicated: These additional operations (e.g:; branching operations) are considered to be the loop "overhead". ' To address some of the ine~ciencies in processing loops, various optimization techniques have been created and applied. For example, one optimization technique is to unroll portions of the loop (hereinafter "unrolling"), replicate the portions and then insert the replicated portions into the code (also known as "jamming"): Typically, when the unroll and jam loop transformation technique is applied to the outer loop of a nested loop pair, the outer loop's induction variable (e.g., "l") is advanced only a few times (the number of times being governed by a parameter referred to as the unroll factor - UF) rather than completely during the unrolling portion of this optimization technique. During the jamming portion of this technique, the inner loop would be replicated "UF"
times. Persons of ordinary skill in the art will appreciate that the replicated loop bodies are not identical but only similar. In the replicated loop bodies, portions of the loop bodies which use the induction of the outer loop will be advanced as required (e.g., if the loop body included reference to array element A(i), where "l" is the outer loop induction variable, a replicated loop body would include reference to the next required array element - A(i+1 )). The unroll and jam technique effectively reorders the calculations being performed in the nested loop.

The "unroll and jam" technique does offer some advantages but also has some disadvantages.
One disadvantage of the unroll and jam technique is that residues are created.
Residues form the portion of a loop that is would not be executed when the loop is unrolled by a fixed factor - the unroll factor. That is, since the controlling induction variable of the unrolled outer loop is advanced a fixed number of times in every iteration, if the upper bound does not divide evenly by the unroll factor (i.e., when there is a remainder or, the modulus of the upper bound of the outer loop induction variable "l" and the unroll factox is not zero), then code must be generated to address this remaining portion - the residue. Code generated to handle these residues may add overhead and inefficiencies that can result in performarsce degradation.
The unroll and jam technique, as a result of the creation of code to address the residue problem, introduces some significant disadvantages. Notable amongst these disadvantages is that the creation of the residue causes perfect triangular nested loops (i.e.;
nested loops where the inner loop induction variable - "j" - is bounded on the upper end by the value of the outer loop induction variable "l") to no longer be "perfect". As a result, other optimization techniques which are only applicable to perfect loop nests cannot be additionallyapplied. Therefore, using the unroll and jam technique eliminates use of many further optimization techniques.
Other optimization techniques known to those skilled in the art do not scale well. That is, the optimization techniques may provide some benefit when applied to a nested loop pair (i.e., only two dimensions): However, such techniques are not known to the inventors of the present invention to be applicable or easily applicable to nest loops of three or greater dimensions.
Accordingly, an optimization! technique which addresses at least some of these shortcomings would be desirable.

SUMMARY OF THE INVENTION
The present invention is directed to a transformation technique for nested loops.In one aspect, embodiments of the invention calculate a virtual iteration space. The actual iteration space (hereinafter simply the "iteration space" or IS); is the space formed by the set of all of values of the induction-variables in all of the iterations of the loop nest. Values which do not belong to some iteration of the loop nest do not form part of the iteration space. For example, in a simple nested loop formed by an outer loop having an induction variable "i" iterated in increments of one from a value of zero to a value "n" (i.e., i = 0, n, 1) and an inner loop having an induction variable "j"
iterated in increments of one from a value of zero to a value of "m" (i.e., j = 0, m, 1), the iteration space would be composed of those values comprising the data sets {0, 0), (0, 1); (0, 2), ... (0, m), (1, 0), (1, 1), ..., (l, m), ... (n, 0), (n, 1), ..:, (n, m):
The virtual iteration space is determined based on the unroll factor (UF). The virtual iteration space; which includes the actual iteration space, is formed such that, for a given UF, unrolling the outer loop of a rectangular nested loop pair would not result in any residues being formed.
Once the virtual iteration space has been calculated or determined, the virtual iteration space is "cut" into regular portions by one or more unroll factors. Portions of the actual iteration space which do not fill the cut portions of the virtual iteration space or which fall outside these cuts form a residue which is calculated. The portions of the actual iteration space which remain are also evenly divided by the unroll factor(s). An outer loop for this remaining portion of the actual iteration space is then unrolled. This unrolled portion forms a perfect nested loop. The operations for the unrolled remaining portion of the actual iteration space when combined with the operations for the residue of the actual iteration space is, in appropriate situations, semantically equivalent to the original nested loops. As those of ordinary skill in the art are aware, there are some instances where the unrolling of nested loops through application of the disclosed transformation and known transformation techniques (e.g., "unroll and jam") is not desirable: For example, where there is a dependency between a later operation and an earlier operation, reordering of these operations can result in an unrolled version of the nested loop being not semantically equivalent to the original nested loops.
Embodiments of the present invention applied to perfect triangular loop nests preserve this property thus enabling the loop nesfs optimized by embodiments of the present invention to be further optimized using additional optimization techniques known to those of ordinary skill in the art.
Embodiments of the invention provide code generated from the unrolling technique described and claimed herein to be compact and efficient thus providing numerous advantages that would be apparent to those of ordinary skill in the art.
In a fuxther advantage of the present invention; embodiments of the invention can be applied to nested loops having three or more dimensis.
Advantageously, the unrolling transft>nmation of nested loops technique described and claimed herein is adapted to handle a variety of nested loop structures. The unrolling transformation technique of the present invention is advantageously applicable to rectangular and triangular nested loops, and mixtures hereof. Moreover, aspects of the .present invention are applicable not only to two-dimensional nested loops, but also to n-dimensional nested loops (where n >= 2). These advantages result in embodiments of the invention transforming nested loops into compact code which in many instances is more efficiently processed.
In accordance with an aspect of the present invention there is provided a method for unrolling loops in a loop nest, said: loop nest iterating over an actual iteration space of n-dimension, said method comprising accounting for residues; said residues comprising portions of said actual iteration space falling outside of, or incompletely overlapping with, cuts of a virtual iteration space, said virtual iteration space comprising said actual iteration space and said virtual iteration space evenly divided by an unrolling f~tor, said cuts and said virtual iteration space having n-dimensions; unrolling at least one outer loop of said loop nest, said unrolled outer loop bounded by cuts of said virtual iteration space falling completely within said actual iteration space.

In accordance with another aspect of the present invention there is provided a computer readable media storing data and instructions, said data and instructions, when executed, adapting a computer system to unroll loops in a loop nest, said nested loop nest iterating over an actual iteration space of n-dimension, said computer system adapted to account for residues, said residues comprising portions of said actual iteration space falling outside of, or incompletely overlapping with, cuts of a virtual iteration space, said virtual iteration space comprising said actual iteration space and said virtual iteration space evenly divided by an unrolling factor, said cuts and said virtual iteration space having n-dimensions; unroll at least one outer loop of said nested loop nest, said unrolled outer loop bounded by cutsslices of said virtual iteration pace falling completely within said actual iteration space.
In accordance with still another aspect of the present invention there is provided a method for unrolling loops in a loop nest, said nested loop nest iterating over an actual iteration space of n-dimension, said method comprising means accounting for residues, said residues comprising portions of said actual iteration space falling outside of, or incompletely overlapping with, cuts of a virtual iteration space, said virtual iteration space: comprising said actual iteration space and said virtual iteration space evenly divided by an unrolling factor, said cuts and said virtual iteration space having n-dimensions; means unrolling at least one outer loop of said nested loop nest, said unrolled outer loop bounded by cutsslices of said virtual iteration space falling completely within said actual iteration space:
In accordance with still another aspect of the present invention there is provided a compiled file corresponding to a source code file, said source code file comprising a nested loop nest iterating over an actual iteration space of n-dimension; said compiled file comprising machine readable instructions corresponding to said ' nested loop, said machine readable instructions comprising machine readable instructions accounting, for residues, said residues comprising portions of said actual iteration space falling outside of, or incompletely overlapping with, cuts of a virtual iteration space, said virtual. iteration space comprising said actual iteration space and said virtual iteration space evenly divided by an unrolling factor, said cuts and said virtual iteration space having n-dimensions; machine readable instructions unrolling at least one outer loop of said loop nest, said unrolled outer loop bounded by cuts of said virtual iteration space falling completely within said actual iteration space.
Other aspects and features of the present invention will become apparent to those ordinarily skilled in the art upon review of the following description of specific embodiments of the invention in conjunction with the accompanying figures:
BRIEF DESCRIPTION OF THE DRAWINGS
In the figures which illustrate an example embodiment of this invention:
FIG. 1 schematically illustrates a computer system embodying aspects of the invention;
FIG. 2 schematically illustrates, in greater :detail, a portion of the computer system of FIG.
FIG. 3 illustrates, in functional block form, a portion of FIG. 2;
FIG. 4 is a flowchart of exemplary operations of the computer system of FIG.
1;
FIGS. 5A and SB illustrate two dimensional visualizations of iteration spaces for exemplary nested loops exemplary of nested loops processed during the operations illustrated in FIG. 4; and FIGS. 6 and '7 illustrate : a three dimensional visualization of iteration spaces for an exemplary three dimensional nested loop processed during the operations illustrated in FiG. 4.

DETAILED DESCRIPTION
It is to be understood that the particular orders of steps or operations described or shown herein axe not to be W derstood as limiting the scope of the general aspects of the invention provided that the result for the intended purpose is the same. As will be understood by those skilled in the art, it is often possible to perform steps or operations in a different order yet obtain the same result. This is often particularly true when implementing a method of steps or operations using computer technology.
An appendix is attached hereto forming part. of the present application which includes a paper by the inventors of the present invention entitled "An Outer Loop Unrolling Transformation on Perfect Triangular Loop Nests, Generating Compact Code and Preserving Perfect Nests". The paper forming the attached appendix describes and discloses embodiments of the present invention which are not limited simply to perfect triangular nests, as suggested by the title of the appended paper. The paper is to be submitted to a peer reviewed journal shortly after the filing of the present application.
An embodiment of the invention; computer system 100, is illustrated in FIG. 1.
Computer system 100, illustrated for exemplary purposes as a networked computing device, is in communication with other networked computing devices (not shown) via network 108. As will be appreciated by those of ordinary skill in the art, network 108 may be embodied using conventional networking technologies and may include one or more of the following: local area networks, wide area networks, intranets, public Internet and the like. As is discussed with reference to FIG. 8, computer system 100 may interact with other networked computer systems (not shown) providing application analysis of a distributed application:
Throughout the description herein, an embodiment of the invention is illustrated with aspects of the invention embodied solely on computer system 100. As will be appreciated by those of ordinary skill in the art, aspects of the invention may be distributed amongst one or more networked computing devices which interact with computer system 100 via one or more data networks such as, for example; network i 08. However, for ease of understanding, aspects of the invention have been embodied in a single'computing device - computer system 100.
Computer system i 00 includes processing system 102 which communicates with various input devices 104, output devices 106 and network 108. Input devices 104, two of which are shown, may include, for example; a keyboard; a mouse, a scanner, an imaging system (e.g., a camera, etc.) or the like. Similarly, output devices 106 (only one of which is illustrated) may include displays, information display unit printers and the like.
Additionally, combination input/output (I/O) devices may also be in communication with processing system 102. Examples of conventional I/O devices include removable and fixed recordable media (e.g., floppy disk drives, tape drives, CD-ROM drives, DVD-RW drives; etc.), touch screen displays and the like.
Exemplary processing system 102 is illustrated in greater detail in FIG. 2. As illustrated, processing system 102 includes several components - central processing unit (CPU) 202, memory 204, network interface (i/F) 208 and I/O I!F 210. Each component is in communication with the other components via a suitable communications bus 206 as required.
CPU 202 is a processing unit; such as an Intel PentiumTM, IBM PowerPCTM, Sun Microsystems UltraSpareTM processor or the like, suitable for the operations described herein. As will be appreciated bythose of ordinary skill in the art, other embodiments of processing system 102 could use alternative CPUs and may include embodiments in which one or more CPUs are employed. CPU 202 may include various support circuits to enable communication between itself and the other components of processing system 102.
Memory 204 includes both volatile and persistent memory for the storage of operational instructions for execution by CPU 202, data registers, application storage and the like: Memory 204 preferably includes a combination of random access memory (RAM), read only memory (ROM) and persistent memory such as that provided by a hard disk drive.

Network I/F 208 enables communication between computer system 100 and other network computing devices (not shown) via network 108: Network I/F 208 may be embodied in one or more conventional communication devices. Examples of a conventional communication device include an Ethernet card, a token ring card, a modem or the like. Network I/F 208 may:
also enable the retrieval or transmission of instructions for execution by CPU 202 from or to a remote storage media or device via network 108.
I/O I/F 210 enables communication between processing system 102 and the various I/O
devices 104, 106. I/O I/F 210 may include, for example, a video card for interfacing with an external display such as output device 106. Additionally, I/O I/F 210 may enable communication between processing system 102 and a removable media 212. Although removable media 2i2 is illustrated as a conventional diskette other removable memory devices such as ZipTM drives, flash cards, CD-ROMs, static memory devices and the like may also be employed.
Removable media 212 may be used to provide instructions for execution by CPU 202 or as a removable data storage device.
The computer instructions/applications stored in memory 204 and executed by (thus adapting the operation of computer system 100 as described herein) are illustrated in functional block form in FIG: 3. As will be appreciated by those of ordinary skill in the art, the delineation between aspects of the applications illustrated as functional blocks in FIG. 3 is somewhat arbitrary as he various operations attributed to a particular application as described herein may, in alternative embodiments; be subsumed by another application.
As illustrated, for exemplary purposes only, memory 202 stores operating system (OS) 302, communications suite 304, compiler 305, input source file 308, output code 310 and general data storage area 312.
OS 302 is an operating system suitable for operation with a selected CPU 202 and the operations described herein. Multitasking, multithreaded OSes such as, for example, IBM AIXTM, Microsoft Windows NTTM, Linux or the like; are expected in many embodiments to be preferred.
CA9-2002-0049: 10 Communication suite 304 provides, through; interaction with OS 302 and network (FIG. 2), suitable communication protocols to enable communication with other networked computing devices via network 108 (FIG.:1 ). Communication suite 304 may include one or more of such protocols such as TCP/IP, ethernet, token ring and the like.
Compiler 306 is adapted to receive input source code 308 and generate and output file 3I0.
Compiler 306 identifies nested loops of "n" dimensions (where "n" >= 2) and modifies the identified nested loop by calculating any residue and unrolling the outer loop. The operations performed by compiler 306 are best understood with reference to the flow chart illustrated of FIG. 4 and the examples illustrated in FIGS. SA,;SB, 6 and 7.
Input source code 308, as noted above, is conventional source code (any source code language including looping structures - eg.; for-next loops; for loops; while loops; loop untils; do loops; etc.) which includes a nested loop of "n" dimension (where "n" >= 2) where the upper and lower bounds of the loops are either loop nest invariant or are a linear function of some outer loop induction variable. An exemplary two dimensional nested loop having an outer loop with an induction variable "i" and an inner loop with an induction variable "j" is illustrated below as Nested Loop Source Code Example I:
for (i = 0; i < n; i++) for (j = 0; j < m; j++) loop body end .for end for Nested .Loop Source Code Ezample 1 In the example,nested loop above; a rectangular iteration space is formed. The rectangular iteration space being comprised of the set of all the values in the induction variables in all the iterations of the loop nests. This example nested loop has a loop depth, or dimension, of 2. The rectangular iteration space defined by the above exemplary code is illustrated in FIG. 5A by solid lines.

Output code 310 is the output code generated by compiler 306 from the processing of input source code file 308. Typically, output code 310 will be either an object code file (that can be linked with other objects to create an executable file) or an executable file.
Other forms of output file 310 (such byte codes) could equally be employed in alternative embodiments:
The operations 400 performed by compiler 306 are illustrated in a flow chart in FIG. 4.
References will be made to FIGS. 5A and 5B which are visualizations of the operations performed by compiler 306 illustrated in FIG. 4. While operations 440 are applicable to nested loops of any depth greater than, or equal to, two, the visualizations in FIGS. 5A and SB
are of depth two. FIGS.
6 and 7 illustrate a visualization of the iteration space of nested loops having a depth of three.
f IG. 5A is a visualization of the rectangular iteration space which can be created by Nested Loop Source Code Example 1 (above) at~d FIG. 5B being a visualization of a triangular iteration space which can be created by Nested Loop Source Code Example 2 (below). It must be noted that the iteration space comprises discrete points within the visualizations illustrated in FIGS. 5A and SB since the induction variables do not include all real numbers within the upper and lower bounds 1 f but only discrete values within those bounds.
for (i = 0; i < n; i++) for (j = 0; j < i; j++) loop body end for end for Nested Loop Source Code Example 2 On receipt of an: input source code file 308 (402), compiler 306 identifies any nested loops (404). The receipt and parsing of source code file 308 to identify nested loops is known to those of ordinary skill in the art:
Once a nested loop has been identified (404), a virtual iteration space 506 (FIG. 5A) which includes the actual iteration space 502 (FIG. 5A) is determined (406). The iteration space defined by the exemplary nested loops in code examples l and 2 can be visualized as rectangular (see the \ solid line portion of FIG. 5A) or triangular (see solid line portion of FIG.
5B), respectively. As noted above, the virtual iteration space used is dependent upon the unrolling factor UF. The unroll factor will in most instances be determined by the compiler :306, user input, or preferably a combination of the two: In the present embodiment, the unroll factor is determined by compiler 305. Compiler 306 uses an algorithm based on factors related to the hardware on which output file 310 is.to be-execute to determine a reasonably efficient unroll factor: The unroll factor determined by compiler 306 maybe overridden by specific user input: The virtual iteration space is illustrated in as dotted lines in the visualizations of FIGS. 5A and SB: The distance between the dashed-dotted lines illustrates the OF for the exemplary visualizations.
In the visualizations of two and three dimensional iteration spaces of FIGS. 5-7, the following convention is followed: solid lines indicate the actual iteration space 502; -dotted lines indicate the virtual iteration space 506; and dashed-dotted lines (e.g., line 510 in FIG. 5A) illustrates slices or cuts of virtual iteration space 5U6.
The virtual iteration space is selected such that the unrolling factor will result in the virtual l5 iteration space being divided into equal sized portions: To determine the virtual iteration space for a rectangular iteration space, the unroll factor (UF) is used to determine the next value greater than "n" (the upper bound of the outer loop) which is evenly divisible by UF: When the nested loops are normalized, the virtual iteration space is bounded by the rectangle having vertices at f (V0,0), (n-1, 0), (V0, m-1), (n-l, m-1)) where VO is the Virtual Origin of the virtual iteration space and is illustrated in FIG. 5A. The virtual origin is defined by the following equation:
VO = modulus (n, UF) - OF Eq. l In the rectangular nested loop situation; the virtual origin point is located at (V0, 0).
By creating the virtual iteration space 506 (FIG: SA), the virtual iteration space can be evenly divided by the unroll factor (L1F). The unroll factor is used to slice or cut the virtual iteration space into rectangular portions (hereinafter referred to as "rectangular cuts"
or "cuts"). Where the actual iteration space 502 only overlaps a portion of a cut of the virtual iteration space 506, a residue will be created. In the preferred embodiment, any residues are calculated first (408): More generally, residues are created when portions of the actual iteration space which fall outside of a rectangular cut of the virtual iteration space: It is to be noted that the virtual iteration space itself includes the entirety of the actual iteration Space. However, the iteration space formed only of the aggregation of the rectangular cuts will in most situations not include the entirety of the actual iteration space (this does not occur when the-actual iteration space is rectangular since all slices of the virtual iteration space terminate exae~ly on the right 516 and left -514 boundaries of the actual iteration space). In the rectangular iteration space illustrated in FIG. 5A
this residue portion is visualized as the small portion bounded by solid line 508 and the dashed-dotted line 510. This residue portion overlaps only partially with the portion of the virtual iteration space 506 bounded by upper dotted line 504 and the lower dashed dotted line 510. The residue portion calculated in operation 408 is bounded, for the outer loop induction variable, by the origin (0) and the modulus of the upper bound of the induction variable -and the unroll factor {i.e.; mod {n, UF); which is written in the appendix as "n%UF").
Once the residue portion has been accounted, the outer loop can be unrolled, the loop body replicated (thus creating a "UF" number of loop bodies) and jammed into the unrolled loop (410).
The outer loop will be traversed starting from the first portion (or cut) of the virtual iteration space which is completely overlapped by a similar portion of the actual iteration space. In the visualized iteration spaces of FIG. 5A. The first cut of the virtual iteration space which is completely overlapped by a portion of the actual iteration space commences at the point (mod {n,UF), 0). That is, the lower bound of the outer loop is bounded by line 510. That is, the unrolled outer loop is bounded by those slices of the virtual iteration space which fall completely within the actual iteration space. Once the bounds of the unrolled outer loop have been determined, the unrolled loop can be executed.
It is to be noted that due to the nature of unrolling. and replicating the main loop body, the operations performed during the main unrolled loop (which correspond to the replicated portions of the loop body) form a rectangular slice. An exemplary of a rectangular slice is bounded on the upper edge by line 510 and the lower edgy by litze 512 and have a "height" of UF:
For the rectangular iteration space formed by the nested loop source code of example 1 (above), operations 400 will generate and execute the following unrolled source code:

/* Residue Loop */
for (i = 0; i < mod (n, UF); i++) for (j':= O; j < m~ j~) loop body end for end for /* Main Unrolled Loop */
for (i = mod'.{n, UF); i < n; i+=UF) for (j" = 0; j < m; J++) /* OF number of Loop Bodies created */
loop body ; l*,for values of i *I
loop body; /* for values of i + 1 */
...
loop body; /* for values of i + OF - 1 */
end for end for Unrolled Loop Source Code Example 1 As will be apparent by those of ordinary skill in the art, the source code generated by operations 400 creates a nested loop (identified by the heading comment "/*
Main Unrolled Loop */") which is perfect and; thus, additional optimization 'techniques can be applied to this portion. As will be noted, the residues are also calculated by a perfect loop nest.
Accordingly, transformation techniques (including the transformation technique described herein) could be applied to the loop nest employed to calculate the residue: However; in most instances; the residue nest loops are likely to take a relatively small amount of time to calculate as compared to the main unrolled loop portion. Accordingly, transforming the residue loop nest is unlikely to achieve (in most instances, but not all) any significant efficiency advantages. It is to be further noted that the complete source code for the unrolled loop (as illustrated in one embodiment in the Unrolled Loop Source Code Example 1, above) is considerably more efficient than those presently known.
As will be appreciated, in alternative embodiments of the present invention, the residues) could be calculated after the unrolled loop. However, this will result in the lower bound formula having to be modified in'these alternative embodiments. The embodiments described herein use the above-noted lower bound out of convenience - sincecompiler 306 will generate output source code with the loops starting from zero after loop normalization. Additionally, it may be preferable to calculate the residue first as there maybe some calculations performed in the unrolled nested loop portion which are dependent upon a residue.
The virtual iteration space for triangular nested loops is slightly more complicated. FIG. 5B
illustrates both the actual iteration space 502' (shown by solid lines) and the virtual iteration space 506' (shown by dotted lines). As with the rectangular example, the virtual iteration space is evenly divided by the unroll factor.
The operations performed by compiler 306 to determine the virtual iteration space (406) is governed by the selection of a virtual origin point such that the range of the induction variable for the outer loop of the virtual iteration space is evenly divided by the unroll factor - UF. As before, the virtual origin VO (for the outer induction variable) for the two dimensional case is governed by equation (1) (above).
However, with a triangular iteration space, the residue will not be as simple as in the rectangular case (which generated a residue which itself was a simple rectangle). Rather, the residue generated by embodiments of the present invention applied to triangular nested loops will comprise similarly shaped (i.e.; triangular) residues. The residues for the iteration space of a triangular nested loop are identified in FIG. 5A as triangular residues 504.
The triangular shaped residues result from the creation of rectangular cuts (described above with reference to the rectangular iteration space) that result from the unrolling of the outer loop for the main nested loop (see the Unrolled Main Loop Source Code Example 2, below). The triangular residues will be of the same size in the virtual iteration space. However; the first (top-most) residue is potentially of a smaller size in the actual iteration space (see FIG: SB).
As will be appreciated, each residue 504 does not commence with the same lower bound (i.e., the starting value of "j" is different for each residue 504).
Accordingly, the lower bound for each residue 504 is governed by the following equation:
j,oW~,~",~a = max (0, i - mod (i + OF - mod (n; UF), Uf)) Eq. 2 For example, given "n" having a value of even (ie:, n = 7; for i < n), and OF
having a value of 2, the lower bounds for j, as calculated by the above noted equation (2), will be: 0, 1, 1, 3; 3, 5 and 5.
Similarly, unlike the residue in the rectangular iteration space (of which there is only one), there are a plurality of residues 504 created in the triangular nested loop situation which commence at different lower bounds. As such; unlike the outer nested residue loop created for the rectangular case which iterates only over limited subset of they outer induction variable space, the outer induction variable ("i") ,for the residue nested loops must iterate over the entire iteration space so that each of the plurality of residues 504 is properly accounted. The residues in the triangular iteration space are located at different coordinates (i, j) - as compared to the rectangular nested loops. These different coordinates result from the diagonal line 510 which can be said to effect two dimensions.
As with the rectangular nested loops, after the virtual iteration space has been created for an identified nested triangular loop (404; 40~), the residues will be calculated (408). For the exemplary triangular nested loop of example 2 (above), the residue calculation wily be governed by the following source code (which uses Equation (2) above):
/* Residue hoop */
for (i _ 0; i < n; i++) for (j - f Equation (2) }.; j < i; j++) loop body end for end for Residue Source Code Example 2 The main nested loop (i:e., the perfect nested loops which are an unrolled version of the triangular nested Ioops), is generated (410) in a manner similax to that for the rectangular nested loop scenario as shown below:
S /* Main Unrolled Loop */
for (i = mod' (n, UF); i < n; i+=UF) for (j ='0; J ~ i; j++) /~ OF number of Loop Bodies created */
loop body;
loop body;
end for end for Unrolled Main Loop Source Code Example 2 1 S As will be appreciated, the code generated from unrolling the triangular loop has resulted in both compact code (which care be efficiently executed) and a perfect nested loop for the main portion.
The above two exemplary two dimensional iteration spaces (a rectangular iteration space created from source code example 1, and a triangular iteration space created from source code example 2), can be combined to create n-dimensional (or n-depth) iteration spaces vc~hich, as with the two dimensional cases, will generate a residue for each unrolled loop.
The three dimensional rectangular loop nests and a mixed triangular and rectangular nested loops are explained in detail in the attached appendix. As will be understood by those of ordinary skill in the art, an n-dimensional loop nest could result in n-1 unroll factors being selected: It is to 2S be noted that the unroll factor is used for two purposes: first, the unroll factor is used for computing the virtual iteration space and calculating the residues; second, the unroll factor is also used for unrolling the actual loop body.
In a triangular iteration space; iterating through the triangular residue has an effect on two dimensions (as described above). Accordingly, the same unroll factor should be used for both dimensions when calcu-Iating the virtual iteration space and the residues created therefrom. The CA9-2002-0044. 18 common unroll factor that is used for those iteration spaces should be evenly divisible by each of the unroll factors (UFl and UF2) that could be used for the dimensions at issue. The minimum common unroll factor that could be employed is the lowest common multiplier ("LCM") for those two dimensions affected by the diagonal of a triangular iteration space. As will be appreciated, other common unroll factors could be erriployed that are evenly divisible by each OF l and UF2.
While a common unroll factor is used to calculate the virtual iteration space and the residues; when the nested loops are actually unrolled, the original unroll factors ,(e.g.;
not the lowest common multiplier) can be employed.
It is to be further noted, the cuts of the actual iteration space {being of n-dimension) which are calculated during the execution of the unrolled nested loop structure generated by embodiments of the invention will also be of n-dimensions (e.g., a three dimensional nested loop will result in an embodiment of the invention generating an unrolled nested loop also being of three dimensions and the cuts of the actual iteration space calculated by this unrolled nested loop will be a rectangular prism of three-dimensions). Further; in the mufti-dimensional triangular case, the factor determining VO could be, for example, LCM(LTF1,UF2) since they are both involved with the same diagonal.
From the foregoing, and from an ;understanding of the materials included in the appendix attached hereto, persons of ordinary skill in the art will appreciate that aspects of the present invention are easily extended to applications where nested loops of n-dimensions (i.e., of depth "n") ~0 are identified. Embodiments of the invention will result in n-dimensional nested loops being unrolled to generate compact nested loops to address any residues and compact nested loops for the non-residue portion of the nested loops. Also, and more interestingly, a perfect nested loop will also be generated for the main loop body (i.e., the non-residue portions). It is from this perfect nested loop structure that most of the benefits of the present invention are obtained. Additionally, embodiments of the present invention result in output code having a relatively small number of residue nests. Moreover, embodiments of the present invention are advantageously able to handle mufti-dimensional loop nests of rectangular, triangular or mixed rectangular and triangular loop nests.

As will be appreciated by those skilled in the art, modifications to the above-described embodiment can be made without departing from the: essence of the invention.
While one (or more) embbdiment(s) of this invention has been illustrated in the accompanying drawings and described above, it will be evident to hose skilled in the art that S changes and modifications may be made therein without departing from the essence of this invention. All such modif cations or variations are believed to be within the sphere and scope of the invention as defined by the claims appended hereto. Other modifications will be apparent to those skilled in the art and; therefore, the invention is,defined inthe claims.

Claims (46)

1. A method for unrolling loops in a loop nest, said loop nest iterating over an actual iteration space of n-dimension, said method comprising:
accounting for residues, said residues comprising portions of said actual iteration space falling outside of, or incompletely overlapping with, cuts of a virtual iteration space; said virtual iteration space comprising said actual iteration space and said virtual iteration space evenly divided by an unrolling factor, said cuts and said virtual iteration space having n-dimensions;
unrolling at least one outer loop of said loop nest, said unrolled outer loop bounded by cuts of said virtual iteration space falling completely within said actual iteration space.
2. The method of claim 1, further comprising:
calculating a virtual iteration space, said virtual iteration space comprising said actual iteration space and being evenly divided by said unrolling factor.
3. The method of claim 2 wherein said accounting for residues comprises:
generating a residue loop nest, said residue loop nest iterating over said portion of said actual iteration space falling outside of, or incompletely overlapping with, cuts of said virtual iteration space.
4. The method of claim 3 wherein said residue loop nest comprises a perfect loop nest.
5. The method of claim 1 wherein said unrolling at least one outer loop of said loop nest comprises:
iterating over said at least one outer loop of said loop nest, the induction variable of said at least one outer loop being incremented by said unrolling factor;
replicating an inner portion of said loop nest, whereby the total number of said inner portions of said loop nest equal said unrolling factor.
6. The method of claim 1 wherein said loop nest comprises a two dimensional loop nest.
7. The method of claim 6 wherein said loop nest comprises a perfect triangular loop.
8. The method of claim 7 wherein said virtual iteration space is bounded by the modulus of an upper bound of the induction variable of said outer loop and said unrolling factor.
9. The method of claim 7 wherein said accounting for residues comprises:
generating a residue loop nest iterating over said portion of said actual iteration space falling outside of, or incompletely overlapping with, cuts of said virtual iteration space, said residue loop nest comprising:
an outer residue loop; and an inner residue loop;
an induction variable "i" for said outer residue loop bounded by the bounds of the induction variable of said outer loop; and an induction taxiable "j" for said inner residue loop having a lower bound governed by the equation:
j lower bound = max (0, i - mod (i + UF - mod (n, UF), UF));

where "n" is the upper bound of said outer loop;
and said induction variable "j" for said inner residue loop having an upper bound governed by said induction variable for said. outer residue loop.
10. The method of claim 6 wherein said loop nest comprises a rectangular loop.
11. The method of claim 10 wherein said virtual iteration space is bounded by the modulus of an upper bound of the induction variable of said outer loop and said unrolling factor.
12. The method of claim 11 wherein said accounting for residues comprises:
generating a residue loop nest iterating over said portion of said actual iteration space falling outside of, or incompletely overlapping with, cuts of said virtual iteration space, said residue loop nest comprising:
an outer residue loop; and an inner residue loop;
an induction variable "i" for said outer residue loop comprising a lower bound of the induction variable of said outer loop and an upper bound governed by the modulus of an upper bound of the induction variable of said outer loop and said unrolling factor.
13. The method of claim 1 wherein said loop nest comprises at least one of a rectangular and a triangular loop nest.
14. A computer readable media storing data and instructions, said data and instructions, when executed, adapting a computer system to unroll loops in a loop nest, said nested loop nest iterating over an actual iteration space of n-dimension, said computer system adapted to:
account for residues, said residues comprising portions of said actual iteration space falling outside of, or incompletely overlapping with, cuts of a virtual iteration space; said virtual iteration space comprising said actual iteration space and said virtual iteration space evenly divided by an unrolling factor, said cuts and said virtual iteration space having n-dimensions;
unroll at least one outer loop of said nested loop nest, said unrolled outer loop bounded by cutsslices of said virtual iteration space falling completely within said actual iteration space.
15. The computer readable media of claim 14, further adapting said computer system to:
calculate a virtual iteration space, said virtual iteration space comprising said actual iteration space and being evenly divided by said unrolling factor.
16. The computer readable media of claim 15 wherein said adaptation to account for residues comprises adapting said computer system to:

generate a residue loop nest; said residue loop nest iterating over said portion of said actual iteration space falling outside of, or incompletely overlapping with, cuts of said virtual iteration space.
17. The computer readable media of claim 16 wherein said residue loop nest comprises a perfect loop nest.
18. The computer readable media of claim 14 wherein said adaptation to unroll at least one outer loop of said loop nest comprises adapting said computer system to:
iterate over said at least one outer loop of said loop nest, the induction variable of said at least one outer loop being incremented by said unrolling factor;
replicate an inner portion of said loop nest, whereby the total number of said inner portions of said loop nest equal said unrolling factor.
19. The computer readable media of claim 14 wherein said loop nest comprises a two dimensional loop nest.
20. The computer readable media of claim 19 wherein said loop nest comprises a perfect triangular loop.
21. The computer readable media of claim 20 wherein said virtual iteration space is bounded by the modulus of an upper bound of the induction variable of said outer loop and said unrolling factor.
22. The computer readable media of claim 20 wherein said adaptation to account for residues comprises adapting said computer system to:
create a residue loop nest iterating over said portion of said actual iteration space falling outside of, or incompletely overlapping with, cuts of said virtual iteration space, said residue loop nest comprising:
an outer residue loop; and an inner residue loop;
an induction variable "i" for said outer residue loop bounded by the bounds of the induction variable of said outer loop; and an induction variable "j" for said inner residue loop having a lower bound governed by the equation:
J lower bound = max (0, i - mod (i + UF - mod (n, UF), UF));

where "n" is the upper bound of said outer loop;
and said induction variable "j" for said inner residue loop having an upper bound governed by said induction variable for aid outer residue loop.
23. The computer readable media of claim 19 wherein said loop nest comprises a rectangular loop.
24. The computer readable media of claim 23 wherein said virtual iteration space is bounded by the modulus of an upper bound of the induction variable of said outer loop and said unrolling factor.
25 25. The computer readable media of claim 24 wherein said adaptation to account for residues comprises adapting said computer system to:
generate a residue loop nest iterating over said portion of said actual iteration space falling outside of, or incompletely overlapping with; cuts of said virtual iteration space, said residue loop nest comprising:
an outer residue loop; and an inner residue loop;
an induction variable "i" for said outer residue loop comprising a lower bound of the induction variable of said outer loop and an upper bound governed by the modulus of an upper bound of the induction variable of said outer loop and said unrolling factor:
26. The computer readable media of claim 14 wherein said loop nest comprises at least one of a rectangular and a triangular loop nest.
27. A method for unrolling loops in a loop nest, said nested loop nest iterating over an actual iteration space of n-dimension, said method comprising:
means accounting for residues, said residues comprising portions of said actual iteration space falling outside of, or incompletely overlapping with, cuts of a virtual iteration space; said virtual iteration space comprising said actual iteration space and said virtual iteration space evenly divided by an unrolling factor, said cuts and said virtual iteration space having n-dimensions;
means unrolling at least one outer loop of said nested loop nest, said unrolled outer loop bounded by cutsslices of said virtual iteration pace falling completely within said actual iteration space.
28. The method of claim 27, further comprising:
means for calculating a virtual iteration space, said virtual iteration space comprising said actual iteration space and being evenly divided by said unrolling factor.
29. The method of claim 28 wherein said means for accounting for residues comprises:
means for generating a residue loop nest, said residue loop nest iterating over said portion of said actual iteration space falling outside of, or incompletely overlapping with, cuts of said virtual iteration space.
30. The method of claim 29 wherein said residue loop nest comprises a perfect loop nest.
31. The method of claim 27 wherein aid means for unrolling at least one outer loop of said loop nest comprises:

means for iterating over said at least one outer loop of said loop nest, the induction variable of said at least one outer loop being incremented by said unrolling factor;
means for replicating an inner portion of said loop nest, whereby the total number of said inner portions of said loop rest equal said unrolling factor:
32. The method of claim 27 wherein said loop nest comprises a two dimensional loop nest.
33. The method of claim 32 wherein said loop nest comprises a perfect triangular loop.
34. The method of claim 33 wherein-said virtual iteration space is bounded by the modulus of an upper bound of the induction variable of said outer loop and said unrolling factor.
35. A compiled file corresponding to a source code file, said source code file comprising a nested loop nest iterating over an actual iteration space of n-dimension, said compiled file comprising machine readable instructions .corresponding to said nested loop, said machine readable instructions comprising:
machine readable instructions accounting for residues; said residues comprising portions of said actual iteration space falling outside of, or incompletely overlapping with, cuts of a virtual iteration space, said virtual iteration space comprising said actual iteration space and said virtual iteration space evenly divided by an unrolling factor, said cuts and said virtual iteration space having n-dimensions;
machine readable instructions unrolling at least one outer loop of said loop nest; said unrolled outer loop bounded by cuts of said virtual iteration space falling completely within said actual iteration space.
36. The compiled file of claim 35 wherein said machine readable instructions accounting for residues comprises:
machine readable instructions for a residue loop nest, said residue loop nest iterating over said portion of said actual iteration space falling outside of, or incompletely overlapping with; cuts of said virtual iteration space.
37. The compiled file of claim 36 wherein said residue loop nest comprises a perfect loop nest.
38. The compiled file of claim 35 wherein said machine readable instructions unrolling at least one outer loop of said loop nest comprises:
machine readable instructions iterating over said at least one outer loop of said loop nest, the induction variable of said at least one outer loop being incremented by said unrolling factor;
machine readable instructions iterating over replicated inner portions of said loop nest, whereby the total number of said inner portions of said loop nest equal said unrolling factor.
39. The compiled file of claim 38 wherein said loop nest comprises a two dimensional loop nest.
40. The compiled file of claim 39 wherein said loop nest comprises a perfect triangular loop.
41. The compiled file of claim 41 wherein said virtual iteration space is bounded by the modulus of an upper bound of the induction variable of said outer loop and said unrolling factor.
42. The compiled file of claim 41 wherein said machine readable instructions accounting for residues comprises:
machine readable instructions for; a residue loop nest iterating over said portion of said actual iteration space falling outside, of, or incompletely overlapping with, cuts of said virtual iteration space, said residue loop nest comprising:
an outer residue loop; and an inner residue loop;
an induction variable "i" for said outer residue loop bounded by the bounds of the induction variable of said outer loop; and an induction variable "j" for said inner residue loop having a lower bound governed by the equation:
j lowerbond = max (0, i - mod (i + UF - mod (n; UF), UF));
where "n" is the upper bound-of said outer loop;
and said induction variable "j" for said inner residue loop having an upper bound governed by said induction variable for said outer residue loop.
43. The compiled file of claim 40 wherein said loop nest comprises a rectangular loop.
44. The compiled file of claim 43 wherein said virtual iteration space is bounded by the modulus of an upper bound of the induction variable of said outer loop and said unrolling factor.
45. The compiled file of claim 44 wherein said machine readable instructions accounting for residues comprises:
machine readable instructions for a residue loop nest iterating over said portion of said actual iteration space falling outside of; or incompletely overlapping with, cuts of said virtual iteration space, said residue loop nest comprising:
an outer residue loop; and an inner residue loop;
an induction variable "i" for said outer residue loop comprising a lower bound of the induction variable of said outer loop and an upper bound governed by the modulus of an upper bound of the induction variable of said outer loop and said unrolling factor.
46. The compiled file claim 35 wherein said loop nest comprises at least one of a rectangular and a triangular loop nest.
CA002392122A 2002-06-28 2002-06-28 An unrolling transformation on nested loops Abandoned CA2392122A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CA002392122A CA2392122A1 (en) 2002-06-28 2002-06-28 An unrolling transformation on nested loops
US10/294,938 US7140009B2 (en) 2002-06-28 2002-11-14 Unrolling transformation of nested loops

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CA002392122A CA2392122A1 (en) 2002-06-28 2002-06-28 An unrolling transformation on nested loops

Publications (1)

Publication Number Publication Date
CA2392122A1 true CA2392122A1 (en) 2003-12-28

Family

ID=29741665

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002392122A Abandoned CA2392122A1 (en) 2002-06-28 2002-06-28 An unrolling transformation on nested loops

Country Status (2)

Country Link
US (1) US7140009B2 (en)
CA (1) CA2392122A1 (en)

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7302680B2 (en) * 2002-11-04 2007-11-27 Intel Corporation Data repacking for memory accesses
CA2453777A1 (en) * 2003-12-19 2005-06-19 Ibm Canada Limited-Ibm Canada Limitee Generalized index set splitting in software loops
US20050283772A1 (en) * 2004-06-22 2005-12-22 Kalyan Muthukumar Determination of loop unrolling factor for software loops
US7814468B1 (en) * 2005-04-20 2010-10-12 Oracle America, Inc. Method for loop reformulation
KR100707781B1 (en) 2006-01-19 2007-04-17 삼성전자주식회사 Method for efficiently processing array operation in computer system
US8443351B2 (en) * 2006-02-23 2013-05-14 Microsoft Corporation Parallel loops in a workflow
US8671401B2 (en) * 2007-04-09 2014-03-11 Microsoft Corporation Tiling across loop nests with possible recomputation
US9043771B1 (en) * 2007-06-29 2015-05-26 Cadence Design Systems, Inc. Software modification methods to provide master-slave execution for multi-processing and/or distributed parallel processing
US8056065B2 (en) * 2007-09-26 2011-11-08 International Business Machines Corporation Stable transitions in the presence of conditionals for an advanced dual-representation polyhedral loop transformation framework
US8087010B2 (en) * 2007-09-26 2011-12-27 International Business Machines Corporation Selective code generation optimization for an advanced dual-representation polyhedral loop transformation framework
US8060870B2 (en) * 2007-09-26 2011-11-15 International Business Machines Corporation System and method for advanced polyhedral loop transformations of source code in a compiler
US8087011B2 (en) * 2007-09-26 2011-12-27 International Business Machines Corporation Domain stretching for an advanced dual-representation polyhedral loop transformation framework
US20090158247A1 (en) * 2007-12-14 2009-06-18 International Business Machines Corporation Method and system for the efficient unrolling of loop nests with an imperfect nest structure
US8819647B2 (en) * 2008-01-25 2014-08-26 International Business Machines Corporation Performance improvements for nested virtual machines
US20100004542A1 (en) 2008-07-03 2010-01-07 Texas Instruments Incorporated System and method for ultrasound color doppler imaging
US20100318980A1 (en) * 2009-06-13 2010-12-16 Microsoft Corporation Static program reduction for complexity analysis
KR101756820B1 (en) * 2010-10-21 2017-07-12 삼성전자주식회사 Reconfigurable processor and method for processing nested loop
US8745607B2 (en) 2011-11-11 2014-06-03 International Business Machines Corporation Reducing branch misprediction impact in nested loop code
US9760356B2 (en) * 2014-09-23 2017-09-12 Intel Corporation Loop nest parallelization without loop linearization
US11256489B2 (en) * 2017-09-22 2022-02-22 Intel Corporation Nested loops reversal enhancements

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3218932B2 (en) * 1995-07-06 2001-10-15 株式会社日立製作所 Data prefetch code generation method
US5797013A (en) 1995-11-29 1998-08-18 Hewlett-Packard Company Intelligent loop unrolling
WO1997046962A1 (en) 1996-06-07 1997-12-11 At & T Corp. Finding an e-mail message to which another e-mail message is a response
US6035125A (en) * 1997-07-25 2000-03-07 International Business Machines Corporation Method and system for generating compact code for the loop unrolling transformation
US6341370B1 (en) * 1998-04-24 2002-01-22 Sun Microsystems, Inc. Integration of data prefetching and modulo scheduling using postpass prefetch insertion
US6064820A (en) 1998-05-27 2000-05-16 Hewlett-Packard Company Apparatus and method to incrementally update single static assignment (SSA) form
US6192515B1 (en) * 1998-07-17 2001-02-20 Intel Corporation Method for software pipelining nested loops
US6341371B1 (en) 1999-02-23 2002-01-22 International Business Machines Corporation System and method for optimizing program execution in a computer system
US6438747B1 (en) * 1999-08-20 2002-08-20 Hewlett-Packard Company Programmatic iteration scheduling for parallel processors
US6507947B1 (en) * 1999-08-20 2003-01-14 Hewlett-Packard Company Programmatic synthesis of processor element arrays
US6948160B2 (en) * 2001-05-31 2005-09-20 Sun Microsystems, Inc. System and method for loop unrolling in a dynamic compiler

Also Published As

Publication number Publication date
US7140009B2 (en) 2006-11-21
US20040003386A1 (en) 2004-01-01

Similar Documents

Publication Publication Date Title
US7140009B2 (en) Unrolling transformation of nested loops
JP5102758B2 (en) Method of forming an instruction group in a processor having a plurality of issue ports, and apparatus and computer program thereof
US8516452B2 (en) Feedback-directed call graph expansion
US7636715B2 (en) Method for fast large scale data mining using logistic regression
US20140354658A1 (en) Shader Function Linking Graph
JP2008535074A5 (en)
MacCallum Computer algebra in gravity research
Candido et al. EKO: evolution kernel operators
Guenter Efficient symbolic differentiation for graphics applications
Pitchanathan et al. FPL: Fast Presburger arithmetic through transprecision
Antonov et al. An AlgoView web-visualization system for the AlgoWiki project
Scholz et al. Tensor comprehensions in SaC
Jang et al. Automatic code overlay generation and partially redundant code fetch elimination
Cservenka et al. Data structures and dynamic memory management in reversible languages
Lim et al. R High Performance Programming
Cano et al. Fast optimal labelings for rotating maps
Milutinovic et al. DataFlow supercomputing essentials: research, development and education
CN114041116A (en) Method and device for optimizing data movement task
Hamez A symbolic model checker for Petri nets: pnmc
Ruckert The MMIX Supplement: Supplement to The Art of Computer Programming Volumes 1, 2, 3 by Donald E. Knuth
Martinez et al. Search by constraint propagation
US7478378B2 (en) Semantically consistent adaptation of software applications
Mailund et al. Stacks and Queues
Li et al. Multithreaded parallel implementation of arithmetic operations modulo a triangular set
Paavola Reduced Circular Slice Buffer II: pragmatic multiplatform implementation

Legal Events

Date Code Title Description
EEER Examination request
FZDE Discontinued