US20080028381A1 - Optimizing source code for iterative execution - Google Patents
Optimizing source code for iterative execution Download PDFInfo
- Publication number
- US20080028381A1 US20080028381A1 US11/870,121 US87012107A US2008028381A1 US 20080028381 A1 US20080028381 A1 US 20080028381A1 US 87012107 A US87012107 A US 87012107A US 2008028381 A1 US2008028381 A1 US 2008028381A1
- Authority
- US
- United States
- Prior art keywords
- source code
- cpu
- recurrence
- instructions
- recurrence element
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G06F8/443—Optimisation
- G06F8/4441—Reducing the execution time required by the program code
Definitions
- This invention relates to optimizing source code and more specifically to optimizing source code having instructions for iterative execution by a central processing unit.
- FIG. 1 Known to the inventor, which is depicted in FIG. 1 , is a computing environment for executing executable code including a computer program programmed loop having related instructions.
- the computing environment includes computer system 112 having CPU (Central Processing Unit) 116 and memory 114 operatively connected to CPU 116 .
- CPU Central Processing Unit
- Memory 114 stores source code 100 , compiler 118 , executable code 120 , and memory storage locations 122 .
- compiler 118 and source code 100 reside or are stored in long-term memory (not depicted) such as a hard disk or a floppy disk.
- CPU 116 transfers compiler 118 and source code 100 from long-term memory to memory 114 .
- compiler 118 instructs CPU 116 to compile source code 100 to generate executable code 120 .
- memory 114 is RAM (Random Access Memory).
- Source code 100 includes computer programmed instructions written in a computer programming language. Instructions forming source code 100 are used for instructing CPU 116 to achieve or perform specific tasks. Source code 100 includes start instructions 102 for starting operations of CPU 116 , set of instructions 104 (which will be executed once by CPU 116 ), computer programmed loop 105 having instructions 106 (which will be repeatedly executed “N ⁇ 1” times by CPU 116 ) for computing numerical values of various array elements, and stop instructions 110 for stopping execution of source code 100 .
- Executable code 120 includes executable instructions related to loop 105 for instructing or directing CPU 116 to compute numerical values for the elements of array A[ 1 ], A[ 2 ], A [ 3 ], . . . , A[N ⁇ 1], provided that a numerical value for array element A[ 0 ] exists prior to the commencement of computation.
- the compiled instructions related to block 102 are initially executed, followed by the execution of the compiled instructions related to block 104 and block 105 , and then followed by the execution of the compiled instructions of block 110 .
- CPU 116 will repetitively execute the compiled instructions of computer programmed loop 105 for a predetermined number of executions.
- a numerical value of an array element (such as A[i]) is computed by CPU 116 which then will store the computed numerical value to a memory storage location 122 (before CPU computes another numerical value for another array element).
- a computer programmed loop is a series of instructions which are performed repeatedly until some specific condition is satisfied, whereupon a branch instruction is obeyed to exit from the computer programmed loop.
- the branch instruction specifies the address of the next instruction to be performed by a CPU.
- Computer programmed loop 105 includes instructions for repeated execution by CPU 116 .
- Computer programmed loops are also known as strongly connected regions.
- Computer programmed loop 105 includes an induction variable (depicted as “i”) which has a related induction value that changes for each iterated or repeated step of computer programmed loop 105 .
- the induction value is changed in a predetermined manner, such as adding a numerical value of ‘1’ to a current induction value related to a current iterated step.
- computation 106 will be performed by CPU 116 in which a value for an array element A[i] in block 107 will be computed by adding the value of a previously computed array element A[i ⁇ 1] plus the numerical value of “1”.
- the computational task is depicted in block 108 .
- the changed induction value is subsequently used in a next iterative step for modifying the instructions related to the next iterated step.
- Computer programmed loop 105 provides a convenient way to avoid repeatedly expressing repetitive instructions by expressing the instructions once. It is understood that CPU 116 will repeatedly execute the instructions of computer programmed loop ‘N ⁇ 1’ times. This conveniently allows a software programmer to avoid explicitly writing the instructions ‘N ⁇ 1’ times. Disadvantageously, a significant amount of CPU processing time will be spent executing the compiled instructions of computer programmed loop 105 .
- executable code 120 instructs CPU 116 to obtain (load/read) a value of an array element A[i ⁇ 1] from a specific location in memory storage locations 122 , to add a numerical value of “1” to array element A[i ⁇ 1], and to place (store/write) the computational result (that is array element A[i]) to another specific location in memory storage location 122 .
- computer programmed loop 105 requires, with each iterative step of an induction variable, CPU 116 to load/read various recurrence elements from main memory, compute a value for a primary recurrence element, and then store/write the primary recurrence element to the main memory (such as locations 122 ).
- Recurrence elements are values which are re-computed for each iterative step of a computation process.
- An example of a computation process which re-computes values of recurrence elements is a computer programmed loop which computes various array elements (which act like recurrence elements) for each step of the loop.
- An object of the present invention is to reduce the amount of CPU processing time to be spent executing compiled instructions of a computer programmed loop.
- Another object of the present invention is to construct a computer programmed loop that reduces the need to repetitively require a CPU to load/read values of recurrence elements from slow operating memory for computing a value for a primary recurrence element.
- An embodiment of the present invention provides an optimizer for optimizing source code to generate optimized source code having instructions for instructing a central processing unit (CPU) to iteratively compute values for a primary recurrence element.
- a computer programmed loop for computing the primary recurrence element and subsequent recurrence elements is an example of a case involving iteratively computing the primary recurrence element.
- the CPU is operatively coupled to fast operating memory (FOM) and operatively coupled to slow operating memory (SOM).
- SOM stores the generated optimized source code.
- the optimized source code includes instructions for instructing said CPU to store a computed value of the primary recurrence element in a storage location of FOM.
- the instructions also includes instructions to consign the computed value of the primary recurrence element from the storage location to another storage location of the FOM.
- Another embodiment of the present invention provides an optimization mechanism for optimizing computer programmed instructions which direct a Central Processing Unit (CPU) to iteratively compute values for a primary recurrence value based on the values of various recurrence elements.
- the computer programmed instructions direct the CPU to alternatively execute load/read and store write instructions which transfer computed recurrence values between main memory and fast operating memory for each iteration.
- the optimized computer programmed instructions direct the CPU to execute a single read/load instruction for moving initial recurrence values from main memory to fast operating memory.
- the instructions direct the CPU to compute and store/write final values of recurrence elements to main memory, and direct the CPU to setup subsequently required values of recurrence elements by interchanging loaded values of recurrence elements in fast operating memory.
- the optimization mechanism can be incorporated with a compiler for compiling the optimized code to generate optimized executable code for execution by the CPU.
- Another embodiment of the present invention provides a compiler for compiling computer programmed instructions that will be iteratively executed by a CPU.
- An example of computer programmed instructions to be iteratively executed are instructions associated with a computer programmed loop.
- the computer programmed loop is also known as a ‘strongly connected region’ because the ‘region’ of instructions or code is to be re-executed in response to the CPU repeatedly executing a branching instruction.
- the compiler includes mechanisms for detecting when a branching instruction occurs such that a portion of code is being repeated. The compiler can detect whether a value associated with variable within the portion of code is required to change with each iterative step (that is each time the branching operation occurs).
- an optimizer for optimizing source code to generate optimized source code having instructions for instructing a central processing unit (CPU) to iteratively compute values for a recurrence element, the CPU operatively coupled to fast operating memory (FOM) and operatively coupled to slow operating memory (SOM) for storing the generated optimized source code, wherein the generated optimized source code comprises instructions for instructing the CPU to store a computed value of the recurrence element in a storage location of the FOM for use in a further iteration.
- CPU central processing unit
- SOM slow operating memory
- a method for optimizing source code to generate optimized source code having instructions for instructing a central processing unit (CPU) to iteratively compute values for a recurrence element, the CPU operatively coupled to fast operating memory (FOM) and operatively coupled to slow operating memory (SOM) for storing the generated optimized source code, wherein the generated optimized source code comprises instructions for instructing the CPU to store a computed value of the recurrence element in a storage location of the FOM for use in a further iteration.
- CPU central processing unit
- SOM slow operating memory
- a computer program product for use in a computer system operatively coupled to a computer readable memory
- the computer program product including a computer-readable data storage medium tangibly embodying computer readable program instructions for providing an optimizer for optimizing source code to generate optimized source code having instructions for instructing a central processing unit (CPU) to iteratively compute values for a recurrence element, the CPU operatively coupled to fast operating memory (FOM) and operatively coupled to slow operating memory (SOM) for storing the generated optimized source code, wherein the generated optimized source code comprises instructions for instructing the CPU to store a computed value of the recurrence element in a storage location of the FOM for use in a further iteration.
- CPU central processing unit
- SOM slow operating memory
- a computer program product for use in a computer system operatively coupled to a computer readable memory
- the computer program product including a computer-readable data storage medium tangibly embodying computer readable program instructions for providing a method for optimizing source code to generate optimized source code having instructions for instructing a central processing unit (CPU) to iteratively compute values for a recurrence element, the CPU operatively coupled to fast operating memory (FOM) and operatively coupled to slow operating memory (SOM) for storing the generated optimized source code, wherein the generated optimized source code comprises instructions for instructing the CPU to store a computed value of the recurrence element in a storage location of the FOM for use in a further iteration.
- CPU central processing unit
- SOM slow operating memory
- an optimizer for generating optimized source code from source code including code for instructing a central processing unit (CPU) to compute a primary recurrence element, the CPU operatively coupled to fast operating memory (FOM) and slow operating memory (SOM) for storing the generated optimized source code, including means for replacing instructions to direct the CPU to store a computed value of the primary recurrence element in a storage location of the SOM with instructions to direct the CPU to place the computed value of the primary recurrence element in a storage location of the FOM, and means for inserting instructions to direct the CPU to consign a value of the primary recurrence element loaded in the storage location of the FOM to another storage location of the FOM.
- CPU central processing unit
- SOM slow operating memory
- a method for generating optimized source code from source code including code for instructing a central processing unit (CPU) to compute a primary recurrence element, the CPU operatively coupled to fast operating memory (FOM) and slow operating memory (SOM) for storing the generated optimized source code, the method including replacing instructions to direct the CPU to store a computed value of the primary recurrence element in a storage location of the SOM with instructions to direct the CPU to place the computed value of the primary recurrence element in a storage location of the FOM, and inserting instructions to direct the CPU to consign a value of the primary recurrence element loaded in the storage location of the FOM to another storage location of the FOM.
- CPU central processing unit
- SOM slow operating memory
- a computer program product for use in a computer system operatively coupled to a computer readable memory
- the computer program product including a computer-readable data storage medium tangibly embodying computer readable program instructions for providing an optimizer for generating optimized source code from source code including code for instructing a central processing unit (CPU) to compute a primary recurrence element, the CPU operatively coupled to fast operating memory (FOM) and slow operating memory (SOM) for storing the generated optimized source code, including means for replacing instructions to direct the CPU to store a computed value of the primary recurrence element in a storage location of the SOM with instructions to direct the CPU to place the computed value of the primary recurrence element in a storage location of the FOM, and means for inserting instructions to direct the CPU to consign a value of the primary recurrence element loaded in the storage location of the FOM to another storage location of the FOM.
- a computer program product for use in a computer system operatively coupled to a computer readable memory
- the computer program product including a computer-readable data storage medium tangibly embodying computer readable program instructions for providing a method for generating optimized source code from source code including code for instructing a central processing unit (CPU) to compute a primary recurrence element, the CPU operatively coupled to fast operating memory (FOM) and slow operating memory (SOM) for storing the generated optimized source code, the method including replacing instructions to direct the CPU to store a computed value of the primary recurrence element in a storage location of the SOM with instructions to direct the CPU to place the computed value of the primary recurrence element in a storage location of the FOM, and inserting instructions to direct the CPU to consign a value of the primary recurrence element loaded in the storage location of the FOM to another storage location of the FOM.
- FIG. 1 depicts a computational environment for executing unoptimized executable code for directing a CPU to execute a computer program programmed loop;
- FIG. 2 depicts a compiler embodying the present invention for generating optimized executable code for directing a CPU to execute a computer programmed loop;
- FIG. 3 depicts operations of the compiler of FIG. 2 ;
- FIG. 4 depicts CPU of FIG. 2 executing the optimized code of FIG. 2 ;
- FIG. 5 depicts a second compiler embodying the present invention
- FIG. 6 depicts a third compiler embodying the present invention
- FIG. 7 depicts a fourth compiler embodying the present invention.
- FIG. 8 depicts a fifth compiler embodying the present invention.
- Computer-readable memory can be classified by the speed with which a CPU can access, manipulate, or operate the contents of the memory.
- Disk memory (such as floppy disks, hard drives, compact disks and the like) is the slowest type of memory that can be accessed by the CPU. Additionally, disk memory is economical and thus abundantly available.
- Main memory such as RAM (Random Access Memory) or ROM (Read Only Memory) can be accessed faster by the CPU compared to accessing disk memory.
- Cache memory can be accessed faster by the CPU compared to accessing main memory; however, there is a sub-classification of cache memory in which primary-level cache is the fastest type of cache memory that the CPU can access compared to accessing second-level cache memory or accessing third-level cache memory.
- Hardware registers are the fastest type of memory that can be accessed by the CPU; however, hardware registers are expensive to implement. It will be understood that computer-readable instructions that direct the CPU to access slow operating memory (that is disk memory or main memory) require significantly more computer processing time to execute than instructions that direct the CPU to access fast operating memory (that is cache memory or hardware registers). Therefore, it would be advantageous to provide instructions to direct the CPU to access fast operating memory (such as hardware registers or cache memory) more frequently than directing the CPU to access slow operating memory (such as disk memory or main memory).
- Compiler 200 includes an optimization module for optimizing source code.
- Computing environment 250 also includes computer system 210 having CPU 211 operatively coupled to slow operating memory 212 and fast operating memory 213 .
- compiler 200 Stored or residing in memory 212 at times during operation of computer system 210 is compiler 200 , source code 202 , block 204 (including code optimized at various stages of optimization), and optimized executable code 206 generated from optimized source code provided by the optimizer module of compiler 200 .
- Source code 202 includes a computer programmed loop.
- a user directs compiler 200 (which includes embodied aspects of the present invention) to compile source code 202 for generating optimized executable code 206 .
- the optimizer module (not depicted) of compiler 200 optimizes source code 202 to generate various stages of optimization as depicted in block 204 .
- the task of optimizing source code 202 is described below. It will be understood that the task of optimizing includes rearranging instructions, adding instructions, and/or removing instructions related to source code 202 .
- Source code 202 includes a computer programmed loop including an induction variable “i” having an induction value.
- the programmed loop includes computer-readable programmed instructions for computing data.
- the instructions depicted in source code 202 will be used by CPU 211 for iteratively computing numerical values of array elements.
- Optimized executable code 206 directs or instructs CPU 211 to achieve specific computational tasks as will be described below.
- memory 212 includes RAM or other slow operating computer-readable memory (such as disk memory) operationally coupled to CPU 211 . Also coupled to CPU 211 is fast operating memory which includes a set of hardware registers.
- Compiler 200 reads source code 202 , optimizes source code 202 (resulting in the various optimization stages depicted in block 204 —that is stages 214 , 218 , 222 , 226 ) and then generates optimized executable code 206 .
- optimized executable code 206 instructs CPU 211 to perform load/read instructions associated with each computational iteration of the computer programmed loop that involve fast operating memory.
- optimized executable code 206 instructs CPU 211 to use hardware registers (not depicted) operationally coupled to CPU 211 for loading/reading computed data associated with each iterative step of the optimized computer programmed loop (depicted in block 226 ).
- optimized executable code 206 instructs CPU 211 to use cache memory (not depicted) operationally coupled to CPU 211 for loading/reading computed data associated with each iterative step of the computer programmed loop.
- Source code 202 instructs or directs CPU 211 to iteratively (that is repeatedly) execute computational instructions of a computer programmed loop by “N ⁇ 2” iterative steps.
- an induction variable ‘i’ starts with a numerical value of ‘2’, increases by a numerical value of ‘1’ for each iterative step, and ends with a numerical value of ‘N ⁇ 1’.
- the computer programmed loop of source code 202 has a recurrence length of “3”, where recurrence length is the number of recurrence elements used in a programmed loop. Each recurrence element has a corresponding numerical value for each iterative step of the computer programmed loop.
- recurrence elements of source code 202 are A[i], A[i ⁇ 1], and A[i ⁇ 2].
- Recurrence elements are values which are re-computed for each iterative step of a computation process.
- An example of a computation process which re-computes values of recurrence elements is a computer programmed loop which computes various array elements (which act like recurrence elements) for each step of the loop.
- recurrence elements for the case when the induction value of the induction variable “i” increases with each iterative step of a computer programmed loop having recurrence elements A[i], A[i ⁇ 1], A[i ⁇ 2].
- a largest or highest recurrence element (that is, for example, A[i]) is called a primary feeder or primary recurrence element.
- the remaining recurrence elements are called in descending order, such as a secondary recurrence element A[i ⁇ 1] and a tertiary recurrence element A[i ⁇ 2], etc; or are simply called subsequent recurrence elements A[i ⁇ 1] and A[i ⁇ 2], etc.
- recurrence elements for the case when the induction value of the induction variable “i” decreases for each iterative step of a computer programmed loop having recurrence elements A[i], A[i+1], A[i+2].
- a primary feeder or primary recurrence element is array element A[i].
- the remaining recurrence elements are called in descending order, such as a secondary recurrence element A[i+1] and a tertiary recurrence element A[i+2], etc; or are simply called subsequent recurrence elements A[i+1] and A[i+2], etc.
- the primary feeder is array element A[i]
- the secondary feeder is array element A[i ⁇ 1]
- the tertiary feeder is array element A[i ⁇ 2].
- subsequent recurrence elements are array elements A[i ⁇ 1] and A[i ⁇ 2].
- Compiler 200 begins to optimize source code 202 by identifying a computer programmed loop, identifying the induction variable associated with the identified computer programmed loop, determining primary and subsequent recurrence elements associated with the identified induction variable, and converting instructions related to the identified computer programmed loop for the case when compiler 200 identifies a recurrence pattern.
- the recurrence pattern interrelates the recurrence elements.
- FIG. 3 there is depicted operations of compiler 200 of FIG. 2 .
- the operations depicted in flowchart 300 are performed by compiler 200 unless stated otherwise.
- S 302 indicates the start of operations of compiler 200 .
- compiler 200 identifies a computer programmed loop in source code 202 .
- Compiler 200 identifies the induction variable related to the identified computer programmed loop (S 306 ).
- Compiler 200 identifies a set of recurrence elements related with the identified induction variable (S 308 ).
- compiler 200 ascertains whether the identified set of recurrence elements are related by a recurrence pattern.
- the recurrence pattern includes a primary recurrence element and includes at least one subsequent recurrence element (either secondary, tertiary, etc), and the recurrence elements use the same induction variable.
- Compiler 200 determines whether the computer programmed loop includes a primary recurrence element and subsequent recurrence elements. If compiler 200 detects the primary and subsequent recurrence elements are not included in the computer programmed loop, processing continues to S 320 in which compiler 200 attempts to identify another induction variable that may exist in the identified loop of code source 202 .
- compiler 200 detects that the primary and subsequent recurrence elements are included in the computer programmed loop, processing continues to S 312 in which instructions related to the computer programmed loop are converted into instructions related to block 214 .
- compiler 200 since compiler 200 identifies a recurrence pattern “A[i], A[i ⁇ 1], A[i ⁇ 2]”, and the primary recurrence element is “A[i]” and the subsequent recurrence elements (also known as feeders) are “A[i ⁇ 1], A[i ⁇ 2]”, then compiler 200 generates the instructions related to block 214 .
- compiler 200 locates or places initial instances (values) of subsequent recurrence elements outside of the identified programmed loop.
- the primary recurrence element remains in the computer programmed loop.
- Initial values of subsequent recurrence elements “A[i ⁇ 1]” and “A[i ⁇ 2]” are placed outside or immediately before commencement of the identified computer programmed loop.
- Relocated subsequent recurrence elements are depicted in block 216 .
- Primary recurrence element ‘A[i]’ remains in the programmed loop.
- instruction identifiers for identifying the contents of locations in fast operating memory (such as hardware registers T 1 and T 2 ) are equated to values of the subsequent recurrence elements for the case when the induction value of the induction variable is equal to the start value of a first iteration or iterative step.
- compiler 200 replaces the recurrence elements with instruction identifiers for identifying hardware registers inside the identified computer programmed loop.
- Block 218 includes block 220 having instructions inside the computer programmed loop modified in which the recurrence elements have been replaced by the instruction identifiers for identifying locations of contents in fast operating memory (such as hardware registers and the like).
- compiler 200 has replaced occurrences of the recurrence elements (that are located inside or within the computer programmed loop) with instruction identifiers for identifying hardware registers T 1 and T 2 .
- Operation S 314 converts instructions related to block 214 to instructions related to block 218 .
- compiler 200 inserts another instruction identifier for identifying a location on fast operating memory inside or within the identified programmed loop to hold a value for the primary feeder or primary recurrence element.
- the primary recurrence element A[i] is assigned to another location in fast operating memory (such as a third hardware register) T 3 in which T 3 is equated to the computational operation of T 1 +T 2 (as depicted in block 224 ).
- Operation S 316 converts instructions related to block 218 to instructions related to block 222 .
- compiler 200 consigns values of instruction identifiers for identifying locations of contents in fast operating memory at end of the computer programmed loop to set up computation operations for a next iteration step of the computer programmed loop.
- the value of register T 2 is updated to equal the value of register T 1
- the value of register T 1 is updated to equal the value of register T 3 .
- the values of registers T 2 and T 1 will be included when computing the value related to register T 3 . This operation prevents several store/write operations for subsequent iteration steps of the computer programmed loop.
- Operation S 318 converts instructions related to block 222 to instructions related to block 226 .
- compiler 200 determines whether there is another induction variable in an identified computer programmed loop. If compiler 200 detects another induction variable in the identified computer programmed loop, processing continues to S 306 in which case instructions related to a newly identified induction variable are optimized. If compiler 200 detects no other induction variable in the identified computer programmed loop, processing continues to S 322 .
- compiler 200 determines whether source code 202 includes another computer programmed loop. If compiler 200 detects the presence of another computer programmed loop, processing continues to S 304 in which case compiler 200 further optimizes instructions related to a newly identified computer programmed loop. If compiler 200 does not detect the presence of any other computer programmed loop, operation continues to S 324 in which case compiler 200 stops optimizing source code 202 and begins operations for compiling instructions related to block 226 to generate optimized executable code 206 .
- Aliased memory is memory shared with other tasks. The contents of the aliased memory may change in unexpected ways if due care is not taken. To prevent aliasing memory, memory should be reserved for performing programmed loops or special attention should be paid to ensuring values in memory are not rendered corrupt by memory aliasing problems caused by other tasks that use the shared aliased memory. Unchecked aliased memory may corrupt values of a recurrence pattern. That is, operation S 308 should ensure that the memory is protected so that unpredictable changes in the values of the recurrence elements do not occur. Memory sharing or aliasing may require that the recurrence values be transferred between memory (that is slow operating memory) and fast operating memory (in which case the recurrence values are not kept constantly in fast operating memory).
- FIG. 4 there is depicted the computing environment of FIG. 2 in which CPU 211 is ready to execute optimized executable code 206 for computing values related to a computer programmed loop included in optimized executable code 206 . Subsequent load/read instruction operations in each iteration step of the computer programmed loop are performed in fast operating memory 213 . By using fast operating memory 213 for each iterative step, CPU 211 avoids executing load/read operations for transferring numerical values from slow operating memory 212 to fast operating memory 213 for each subsequent iterative step of the computer programmed loop.
- transfer operations that is store/write or load/read operations for transferring numerical values from a fast operating memory 213 to another fast operating memory 213 is performed faster than transfer operations for transferring numerical values from a storage location in slow operating memory 212 to another storage location in slow operating memory 212 .
- Slow operating memory 212 includes memory portion 402 having various memory storage locations for storing numerical values for array elements A[ 1 ], A[ 2 ], . . . , A[i]. Memory storage locations are depicted for containing values for array elements A[ 1 ] to A[ 4 ].
- fast operating memory 213 includes units of fast operating memory depicted as T 1 , T 2 , and T 3 .
- Rows 404 A, 404 B, and 404 C depict the values of hardware registers T 1 , T 2 , and T 3 for several iterative values of induction variable “i” (that is the iterative steps in which ‘i’ starts at ‘2’, then steps to ‘3’, and then steps to ‘4’).
- CPU 211 When executable code 206 is executed by CPU 211 , CPU 211 performs a load/read operation to transfer a value of A[ 0 ] and A[ 1 ] from memory 406 to hardware registers T 2 and T 1 respectively.
- the transfer of A[ 1 ] and A[ 0 ] into contents of the hardware registers is depicted in row 404 A and columns 406 A, 406 B respectively.
- a store/write operation is performed by CPU 211 in which the value stored in T 3 is transferred from hardware register T 3 to a memory storage location in memory storage 402 for storing the value of array element A[ 2 ].
- source code 502 having a recurrence element missing from the computation of array A[i] for each iterative step.
- Source code 502 is used as an example of how an aspect of the present invention can be used for handling recurrence elements which are missing from source code.
- Source code 502 depicts a missing secondary recurrence element. Even though a recurrence element is missing, the number of hardware registers required for iteratively computing the primary recurrence element is still equal to the recurrence length. For source code 502 , the recurrence length is “3” and hence three hardware registers are required.
- Source code 602 includes a computer programmed loop having a recurrence length of “2” and there is a primary and a secondary recurrence element A[i] and A[i ⁇ 1] respectively.
- compiler 606 optimizes source code 602 to generate optimized source code 608 . Once optimized source code 608 is generated, compiler 606 further optimizes optimized source code 608 to generate optimized source code 609 . It will be appreciated that an enhancement can be achieved in operations by reducing the number of copy operations when the value of register T 2 is not required after its initial use in the loop. This improvement (minimizing the number of hardware registers) can be realized during the optimization of the instructions by following flowchart 300 of FIG. 3 or through a subsequent optimization phase.
- the optimization module of compiler 606 involves using a minimum number of storage locations of said fast operating memory.
- Source code 702 for computing a function, such as a square root function.
- Memory 212 stores source code 702 having a computer programmed loop, compiler 706 , optimized source code 708 , and optimized executable source code.
- Compiler 706 optimizes source code 702 to generate optimized source code 708 , and then compiles optimized source code 708 to generate optimized executable source code 710 .
- the computer programmed loop includes a recurrence length of “2”, and a primary and a secondary recurrence element.
- the instructions related to block 712 will perform a single function call before execution of a computer programmed loop.
- Instructions related to block 714 depicts for each iterative step of the computer programmed loop, a single function call will be performed to compute the value of A[i].
- the instructions related to block 716 depicts that for each iterative step of the programmed loop, the next value of the recurrence element is to be computed. It will be appreciated that a function call has been eliminated from each iterative step. It will be appreciated that recurrence elements are not restricted to array references.
- the optimizer module of compiler 706 is used for source code that directs the CPU to compute recurrence elements from a function call.
- memory 212 storing source code 802 , compiler 806 (including an optimizer module which is not depicted), stages of optimization 807 , and optimized executable code 812 .
- Stages of optimization 807 includes optimized source code 808 and 810 formed by compiler 806 .
- compiler 806 optimizes source code 802 to generate optimized source code 808 , further optimizes optimized source code 808 to generate optimized source code 810 , and then compiles optimized source code 810 to generate optimized executable code 812 .
- Source code 802 includes instructions for a second-order computation of a recurrence element. Previous embodiments depicted computing a first-order computation of the recurrence element. Optimized source code 808 depicts instructions optimized for a first-order correction (that is the elimination of a load/read operation). Optimized source code 810 depicts optimized instructions for a second-order correction.
- compiler 806 finds any loop invariant computation applied to all recurrence elements.
- Operation S 312 is replaced with the following operation: compiler 806 places all recurrence elements and loop invariant computation on them outside of computer programmed loop. The replacement operation replaces the recurrence element and loop invariant computation, and the insertion operation holds the value of the primary feeder and any identified loop invariant computation on it.
Abstract
An embodiment of the present invention provides an optimizer for optimizing source code to generate optimized source code having instructions for instructing a central processing unit (CPU) to iteratively compute values for a primary recurrence element. A computer programmed loop for computing the primary recurrence element and subsequent recurrence elements is an example of a case involving iteratively computing the primary recurrence element. The CPU is operatively coupled to fast operating memory (FOM) and operatively coupled to slow operating memory (SOM). SOM stores the generated optimized source code. The optimized source code includes instructions for instructing said CPU to store a computed value of the primary recurrence element in a storage location of FOM. The instructions also includes instructions to consign the computed value of the primary recurrence element from the storage location to another storage location of the FOM.
Description
- This invention relates to optimizing source code and more specifically to optimizing source code having instructions for iterative execution by a central processing unit.
- Known to the inventor, which is depicted in
FIG. 1 , is a computing environment for executing executable code including a computer program programmed loop having related instructions. The computing environment includescomputer system 112 having CPU (Central Processing Unit) 116 andmemory 114 operatively connected toCPU 116. -
Memory 114stores source code 100,compiler 118,executable code 120, andmemory storage locations 122. Typically,compiler 118 andsource code 100 reside or are stored in long-term memory (not depicted) such as a hard disk or a floppy disk. As directed by a user,CPU 116transfers compiler 118 andsource code 100 from long-term memory tomemory 114. Once transferred tomemory 114,compiler 118 instructsCPU 116 to compilesource code 100 to generateexecutable code 120. Typically,memory 114 is RAM (Random Access Memory). -
Source code 100 includes computer programmed instructions written in a computer programming language. Instructions formingsource code 100 are used for instructingCPU 116 to achieve or perform specific tasks.Source code 100 includesstart instructions 102 for starting operations ofCPU 116, set of instructions 104 (which will be executed once by CPU 116), computer programmedloop 105 having instructions 106 (which will be repeatedly executed “N−1” times by CPU 116) for computing numerical values of various array elements, andstop instructions 110 for stopping execution ofsource code 100. -
Executable code 120 includes executable instructions related toloop 105 for instructing or directingCPU 116 to compute numerical values for the elements of array A[1], A[2], A [3], . . . , A[N−1], provided that a numerical value for array element A[0] exists prior to the commencement of computation. WhenCPU 116 executesexecutable code 120, the compiled instructions related toblock 102 are initially executed, followed by the execution of the compiled instructions related toblock 104 andblock 105, and then followed by the execution of the compiled instructions ofblock 110.CPU 116 will repetitively execute the compiled instructions of computer programmedloop 105 for a predetermined number of executions. For each iterative step of a computer programmed loop, a numerical value of an array element (such as A[i]) is computed byCPU 116 which then will store the computed numerical value to a memory storage location 122 (before CPU computes another numerical value for another array element). - A computer programmed loop is a series of instructions which are performed repeatedly until some specific condition is satisfied, whereupon a branch instruction is obeyed to exit from the computer programmed loop. The branch instruction specifies the address of the next instruction to be performed by a CPU. Computer programmed
loop 105 includes instructions for repeated execution byCPU 116. Computer programmed loops are also known as strongly connected regions. Computer programmedloop 105 includes an induction variable (depicted as “i”) which has a related induction value that changes for each iterated or repeated step of computer programmedloop 105. For each iterated step of computer programmedloop 105, the induction value is changed in a predetermined manner, such as adding a numerical value of ‘1’ to a current induction value related to a current iterated step. As shown inFIG. 1 , for each iterative step of the computer programmed loop,computation 106 will be performed byCPU 116 in which a value for an array element A[i] inblock 107 will be computed by adding the value of a previously computed array element A[i−1] plus the numerical value of “1”. The computational task is depicted inblock 108. Typically, the changed induction value is subsequently used in a next iterative step for modifying the instructions related to the next iterated step. Computer programmedloop 105 provides a convenient way to avoid repeatedly expressing repetitive instructions by expressing the instructions once. It is understood thatCPU 116 will repeatedly execute the instructions of computer programmed loop ‘N−1’ times. This conveniently allows a software programmer to avoid explicitly writing the instructions ‘N−1’ times. Disadvantageously, a significant amount of CPU processing time will be spent executing the compiled instructions of computer programmedloop 105. - It will be understood that for each iterative step of computer programmed
loop 105,executable code 120 instructsCPU 116 to obtain (load/read) a value of an array element A[i−1] from a specific location inmemory storage locations 122, to add a numerical value of “1” to array element A[i−1], and to place (store/write) the computational result (that is array element A[i]) to another specific location inmemory storage location 122. Disadvantageously, computer programmedloop 105 requires, with each iterative step of an induction variable,CPU 116 to load/read various recurrence elements from main memory, compute a value for a primary recurrence element, and then store/write the primary recurrence element to the main memory (such as locations 122). Recurrence elements are values which are re-computed for each iterative step of a computation process. An example of a computation process which re-computes values of recurrence elements is a computer programmed loop which computes various array elements (which act like recurrence elements) for each step of the loop. This is an inefficient system for computing or processing values (such as numerical data or alphanumeric data) associated with a computer programmed loop because time is wasted when the CPU interacts with slow operating memory when performing a multitude of load/read or store/write operations for each iterative step of the computer programmed loop. Additionally, ifstorage locations 122 are storage locations in nonvolatile memory (that is not RAM), the effects are exaggerated. - Accordingly, a system which addresses, at least in part, these and other shortcomings is desired.
- An object of the present invention is to reduce the amount of CPU processing time to be spent executing compiled instructions of a computer programmed loop.
- Another object of the present invention is to construct a computer programmed loop that reduces the need to repetitively require a CPU to load/read values of recurrence elements from slow operating memory for computing a value for a primary recurrence element.
- An embodiment of the present invention provides an optimizer for optimizing source code to generate optimized source code having instructions for instructing a central processing unit (CPU) to iteratively compute values for a primary recurrence element. A computer programmed loop for computing the primary recurrence element and subsequent recurrence elements is an example of a case involving iteratively computing the primary recurrence element. The CPU is operatively coupled to fast operating memory (FOM) and operatively coupled to slow operating memory (SOM). SOM stores the generated optimized source code. The optimized source code includes instructions for instructing said CPU to store a computed value of the primary recurrence element in a storage location of FOM. The instructions also includes instructions to consign the computed value of the primary recurrence element from the storage location to another storage location of the FOM.
- Another embodiment of the present invention provides an optimization mechanism for optimizing computer programmed instructions which direct a Central Processing Unit (CPU) to iteratively compute values for a primary recurrence value based on the values of various recurrence elements. The computer programmed instructions direct the CPU to alternatively execute load/read and store write instructions which transfer computed recurrence values between main memory and fast operating memory for each iteration. The optimized computer programmed instructions direct the CPU to execute a single read/load instruction for moving initial recurrence values from main memory to fast operating memory. For each subsequent iteration, the instructions direct the CPU to compute and store/write final values of recurrence elements to main memory, and direct the CPU to setup subsequently required values of recurrence elements by interchanging loaded values of recurrence elements in fast operating memory. The optimization mechanism can be incorporated with a compiler for compiling the optimized code to generate optimized executable code for execution by the CPU.
- Another embodiment of the present invention provides a compiler for compiling computer programmed instructions that will be iteratively executed by a CPU. An example of computer programmed instructions to be iteratively executed are instructions associated with a computer programmed loop. The computer programmed loop is also known as a ‘strongly connected region’ because the ‘region’ of instructions or code is to be re-executed in response to the CPU repeatedly executing a branching instruction. The compiler includes mechanisms for detecting when a branching instruction occurs such that a portion of code is being repeated. The compiler can detect whether a value associated with variable within the portion of code is required to change with each iterative step (that is each time the branching operation occurs).
- In a first aspect of the present invention, there is provided an optimizer for optimizing source code to generate optimized source code having instructions for instructing a central processing unit (CPU) to iteratively compute values for a recurrence element, the CPU operatively coupled to fast operating memory (FOM) and operatively coupled to slow operating memory (SOM) for storing the generated optimized source code, wherein the generated optimized source code comprises instructions for instructing the CPU to store a computed value of the recurrence element in a storage location of the FOM for use in a further iteration.
- In a further aspect of the present invention, there is provided a method for optimizing source code to generate optimized source code having instructions for instructing a central processing unit (CPU) to iteratively compute values for a recurrence element, the CPU operatively coupled to fast operating memory (FOM) and operatively coupled to slow operating memory (SOM) for storing the generated optimized source code, wherein the generated optimized source code comprises instructions for instructing the CPU to store a computed value of the recurrence element in a storage location of the FOM for use in a further iteration.
- In a further aspect of the present invention, there is provided a computer program product for use in a computer system operatively coupled to a computer readable memory, the computer program product including a computer-readable data storage medium tangibly embodying computer readable program instructions for providing an optimizer for optimizing source code to generate optimized source code having instructions for instructing a central processing unit (CPU) to iteratively compute values for a recurrence element, the CPU operatively coupled to fast operating memory (FOM) and operatively coupled to slow operating memory (SOM) for storing the generated optimized source code, wherein the generated optimized source code comprises instructions for instructing the CPU to store a computed value of the recurrence element in a storage location of the FOM for use in a further iteration.
- In a further aspect of the present invention, there is provided a computer program product for use in a computer system operatively coupled to a computer readable memory, the computer program product including a computer-readable data storage medium tangibly embodying computer readable program instructions for providing a method for optimizing source code to generate optimized source code having instructions for instructing a central processing unit (CPU) to iteratively compute values for a recurrence element, the CPU operatively coupled to fast operating memory (FOM) and operatively coupled to slow operating memory (SOM) for storing the generated optimized source code, wherein the generated optimized source code comprises instructions for instructing the CPU to store a computed value of the recurrence element in a storage location of the FOM for use in a further iteration.
- In a further aspect of the present invention, there is provided an optimizer for generating optimized source code from source code including code for instructing a central processing unit (CPU) to compute a primary recurrence element, the CPU operatively coupled to fast operating memory (FOM) and slow operating memory (SOM) for storing the generated optimized source code, including means for replacing instructions to direct the CPU to store a computed value of the primary recurrence element in a storage location of the SOM with instructions to direct the CPU to place the computed value of the primary recurrence element in a storage location of the FOM, and means for inserting instructions to direct the CPU to consign a value of the primary recurrence element loaded in the storage location of the FOM to another storage location of the FOM.
- In a further aspect of the present invention, there is provided a method for generating optimized source code from source code including code for instructing a central processing unit (CPU) to compute a primary recurrence element, the CPU operatively coupled to fast operating memory (FOM) and slow operating memory (SOM) for storing the generated optimized source code, the method including replacing instructions to direct the CPU to store a computed value of the primary recurrence element in a storage location of the SOM with instructions to direct the CPU to place the computed value of the primary recurrence element in a storage location of the FOM, and inserting instructions to direct the CPU to consign a value of the primary recurrence element loaded in the storage location of the FOM to another storage location of the FOM.
- In a further aspect of the present invention, there is provided a computer program product for use in a computer system operatively coupled to a computer readable memory, the computer program product including a computer-readable data storage medium tangibly embodying computer readable program instructions for providing an optimizer for generating optimized source code from source code including code for instructing a central processing unit (CPU) to compute a primary recurrence element, the CPU operatively coupled to fast operating memory (FOM) and slow operating memory (SOM) for storing the generated optimized source code, including means for replacing instructions to direct the CPU to store a computed value of the primary recurrence element in a storage location of the SOM with instructions to direct the CPU to place the computed value of the primary recurrence element in a storage location of the FOM, and means for inserting instructions to direct the CPU to consign a value of the primary recurrence element loaded in the storage location of the FOM to another storage location of the FOM.
- In a further aspect of the present invention there is provided a computer program product for use in a computer system operatively coupled to a computer readable memory, the computer program product including a computer-readable data storage medium tangibly embodying computer readable program instructions for providing a method for generating optimized source code from source code including code for instructing a central processing unit (CPU) to compute a primary recurrence element, the CPU operatively coupled to fast operating memory (FOM) and slow operating memory (SOM) for storing the generated optimized source code, the method including replacing instructions to direct the CPU to store a computed value of the primary recurrence element in a storage location of the SOM with instructions to direct the CPU to place the computed value of the primary recurrence element in a storage location of the FOM, and inserting instructions to direct the CPU to consign a value of the primary recurrence element loaded in the storage location of the FOM to another storage location of the FOM.
- A better understanding of these and other aspects of the embodiments of the present invention can be obtained with reference to the following drawings and description of the preferred embodiments.
- The following figures are examples of the embodiments of the present invention, in which:
-
FIG. 1 depicts a computational environment for executing unoptimized executable code for directing a CPU to execute a computer program programmed loop; -
FIG. 2 depicts a compiler embodying the present invention for generating optimized executable code for directing a CPU to execute a computer programmed loop; -
FIG. 3 depicts operations of the compiler ofFIG. 2 ; -
FIG. 4 depicts CPU ofFIG. 2 executing the optimized code ofFIG. 2 ; -
FIG. 5 depicts a second compiler embodying the present invention; -
FIG. 6 depicts a third compiler embodying the present invention; -
FIG. 7 depicts a fourth compiler embodying the present invention; and -
FIG. 8 depicts a fifth compiler embodying the present invention. - It will be understood that for purposes of illustrating the embodiments of the present invention the drawings incorporate syntax related to the C computer programming language. However, the present invention is not limited to any particular type of computer programming language.
- Computer-readable memory can be classified by the speed with which a CPU can access, manipulate, or operate the contents of the memory. Disk memory (such as floppy disks, hard drives, compact disks and the like) is the slowest type of memory that can be accessed by the CPU. Additionally, disk memory is economical and thus abundantly available. Main memory such as RAM (Random Access Memory) or ROM (Read Only Memory) can be accessed faster by the CPU compared to accessing disk memory. Cache memory can be accessed faster by the CPU compared to accessing main memory; however, there is a sub-classification of cache memory in which primary-level cache is the fastest type of cache memory that the CPU can access compared to accessing second-level cache memory or accessing third-level cache memory. Hardware registers are the fastest type of memory that can be accessed by the CPU; however, hardware registers are expensive to implement. It will be understood that computer-readable instructions that direct the CPU to access slow operating memory (that is disk memory or main memory) require significantly more computer processing time to execute than instructions that direct the CPU to access fast operating memory (that is cache memory or hardware registers). Therefore, it would be advantageous to provide instructions to direct the CPU to access fast operating memory (such as hardware registers or cache memory) more frequently than directing the CPU to access slow operating memory (such as disk memory or main memory).
- Referring to
FIG. 2 , there is depictedcomputing environment 250 includingcompiler 200 embodying aspects of the present invention.Compiler 200 includes an optimization module for optimizing source code.Computing environment 250 also includescomputer system 210 havingCPU 211 operatively coupled to slowoperating memory 212 andfast operating memory 213. - Stored or residing in
memory 212 at times during operation ofcomputer system 210 iscompiler 200,source code 202, block 204 (including code optimized at various stages of optimization), and optimizedexecutable code 206 generated from optimized source code provided by the optimizer module ofcompiler 200.Source code 202 includes a computer programmed loop. A user directs compiler 200 (which includes embodied aspects of the present invention) to compilesource code 202 for generating optimizedexecutable code 206. The optimizer module (not depicted) ofcompiler 200 optimizessource code 202 to generate various stages of optimization as depicted inblock 204. The task of optimizingsource code 202 is described below. It will be understood that the task of optimizing includes rearranging instructions, adding instructions, and/or removing instructions related tosource code 202. -
Source code 202 includes a computer programmed loop including an induction variable “i” having an induction value. The programmed loop includes computer-readable programmed instructions for computing data. For example, the instructions depicted insource code 202 will be used byCPU 211 for iteratively computing numerical values of array elements. Optimizedexecutable code 206 directs or instructsCPU 211 to achieve specific computational tasks as will be described below. - In the preferred embodiment,
memory 212 includes RAM or other slow operating computer-readable memory (such as disk memory) operationally coupled toCPU 211. Also coupled toCPU 211 is fast operating memory which includes a set of hardware registers. -
Compiler 200 readssource code 202, optimizes source code 202 (resulting in the various optimization stages depicted inblock 204—that is stages 214, 218, 222, 226) and then generates optimizedexecutable code 206. When executed byCPU 211, optimizedexecutable code 206 instructsCPU 211 to perform load/read instructions associated with each computational iteration of the computer programmed loop that involve fast operating memory. In the preferred embodiment, optimizedexecutable code 206 instructsCPU 211 to use hardware registers (not depicted) operationally coupled toCPU 211 for loading/reading computed data associated with each iterative step of the optimized computer programmed loop (depicted in block 226). In another preferred embodiment, optimizedexecutable code 206 instructsCPU 211 to use cache memory (not depicted) operationally coupled toCPU 211 for loading/reading computed data associated with each iterative step of the computer programmed loop. -
Source code 202 instructs or directsCPU 211 to iteratively (that is repeatedly) execute computational instructions of a computer programmed loop by “N−2” iterative steps. During the execution of the computer programmed loop, an induction variable ‘i’ starts with a numerical value of ‘2’, increases by a numerical value of ‘1’ for each iterative step, and ends with a numerical value of ‘N−1’. When i=(N−1), a branch condition is satisfied (in which i<N) andCPU 211 stops further iterative executions of the computer programmed loop ofsource code 202. The computer programmed loop ofsource code 202 has a recurrence length of “3”, where recurrence length is the number of recurrence elements used in a programmed loop. Each recurrence element has a corresponding numerical value for each iterative step of the computer programmed loop. For example, recurrence elements ofsource code 202 are A[i], A[i−1], and A[i−2]. Recurrence elements are values which are re-computed for each iterative step of a computation process. An example of a computation process which re-computes values of recurrence elements is a computer programmed loop which computes various array elements (which act like recurrence elements) for each step of the loop. - The following description identifies recurrence elements for the case when the induction value of the induction variable “i” increases with each iterative step of a computer programmed loop having recurrence elements A[i], A[i−1], A[i−2]. A largest or highest recurrence element (that is, for example, A[i]) is called a primary feeder or primary recurrence element. The remaining recurrence elements are called in descending order, such as a secondary recurrence element A[i−1] and a tertiary recurrence element A[i−2], etc; or are simply called subsequent recurrence elements A[i−1] and A[i−2], etc.
- The following description identifies recurrence elements for the case when the induction value of the induction variable “i” decreases for each iterative step of a computer programmed loop having recurrence elements A[i], A[i+1], A[i+2]. A primary feeder or primary recurrence element is array element A[i]. The remaining recurrence elements are called in descending order, such as a secondary recurrence element A[i+1] and a tertiary recurrence element A[i+2], etc; or are simply called subsequent recurrence elements A[i+1] and A[i+2], etc.
- Referring to the
exemplary source code 202, since the induction variable “i” increases for each iterative step, the primary feeder is array element A[i], the secondary feeder is array element A[i−1], and the tertiary feeder is array element A[i−2]. Alternatively, subsequent recurrence elements are array elements A[i−1] and A[i−2]. -
Compiler 200 begins to optimizesource code 202 by identifying a computer programmed loop, identifying the induction variable associated with the identified computer programmed loop, determining primary and subsequent recurrence elements associated with the identified induction variable, and converting instructions related to the identified computer programmed loop for the case whencompiler 200 identifies a recurrence pattern. The recurrence pattern interrelates the recurrence elements. Once the recurrence pattern is identified,source code 202 is optimized and is depicted at various stages depicted inblocks source code 202 is optimized (prior to generating optimized executable code 206) will be described below. - Referring to
FIG. 3 , there is depicted operations ofcompiler 200 ofFIG. 2 . The operations depicted inflowchart 300 are performed bycompiler 200 unless stated otherwise. - S302 indicates the start of operations of
compiler 200. In S304,compiler 200 identifies a computer programmed loop insource code 202.Compiler 200 identifies the induction variable related to the identified computer programmed loop (S306).Compiler 200 identifies a set of recurrence elements related with the identified induction variable (S308). - In S310,
compiler 200 ascertains whether the identified set of recurrence elements are related by a recurrence pattern. The recurrence pattern includes a primary recurrence element and includes at least one subsequent recurrence element (either secondary, tertiary, etc), and the recurrence elements use the same induction variable.Compiler 200 determines whether the computer programmed loop includes a primary recurrence element and subsequent recurrence elements. Ifcompiler 200 detects the primary and subsequent recurrence elements are not included in the computer programmed loop, processing continues to S320 in whichcompiler 200 attempts to identify another induction variable that may exist in the identified loop ofcode source 202. Ifcompiler 200 detects that the primary and subsequent recurrence elements are included in the computer programmed loop, processing continues to S312 in which instructions related to the computer programmed loop are converted into instructions related to block 214. Referring tosource code 202, sincecompiler 200 identifies a recurrence pattern “A[i], A[i−1], A[i−2]”, and the primary recurrence element is “A[i]” and the subsequent recurrence elements (also known as feeders) are “A[i−1], A[i−2]”, thencompiler 200 generates the instructions related to block 214. Referring to block 214,compiler 200 locates or places initial instances (values) of subsequent recurrence elements outside of the identified programmed loop. The primary recurrence element remains in the computer programmed loop. Initial values of subsequent recurrence elements “A[i−1]” and “A[i−2]” are placed outside or immediately before commencement of the identified computer programmed loop. Relocated subsequent recurrence elements are depicted inblock 216. Primary recurrence element ‘A[i]’ remains in the programmed loop. Inblock 214, instruction identifiers for identifying the contents of locations in fast operating memory (such as hardware registers T1 and T2) are equated to values of the subsequent recurrence elements for the case when the induction value of the induction variable is equal to the start value of a first iteration or iterative step. For the depicted example, the initial numerical value of the induction variable is “2” because the value of the induction variable starts with a numerical value of “2” in the computer programmed loop. Then initial values are computed for registers T1 and T2 for “i”=2, as depicted inblock 216. Initial numerical values for T1 and T2 are A[1] and A[0] respectively. Operation S312 converts instructions related tosource code 202 to instructions related to block 214. - Referring to operation S314,
compiler 200 replaces the recurrence elements with instruction identifiers for identifying hardware registers inside the identified computer programmed loop.Block 218 includesblock 220 having instructions inside the computer programmed loop modified in which the recurrence elements have been replaced by the instruction identifiers for identifying locations of contents in fast operating memory (such as hardware registers and the like). Inblock 220,compiler 200 has replaced occurrences of the recurrence elements (that are located inside or within the computer programmed loop) with instruction identifiers for identifying hardware registers T1 and T2. Operation S314 converts instructions related to block 214 to instructions related to block 218. - Referring to operation S316,
compiler 200 inserts another instruction identifier for identifying a location on fast operating memory inside or within the identified programmed loop to hold a value for the primary feeder or primary recurrence element. Referring to block 222, the primary recurrence element A[i] is assigned to another location in fast operating memory (such as a third hardware register) T3 in which T3 is equated to the computational operation of T1+T2 (as depicted in block 224). Operation S316 converts instructions related to block 218 to instructions related to block 222. - In S318,
compiler 200 consigns values of instruction identifiers for identifying locations of contents in fast operating memory at end of the computer programmed loop to set up computation operations for a next iteration step of the computer programmed loop. Referring to block 228, the value of register T2 is updated to equal the value of register T1, and then the value of register T1 is updated to equal the value of register T3. For the next iteration, the values of registers T2 and T1 will be included when computing the value related to register T3. This operation prevents several store/write operations for subsequent iteration steps of the computer programmed loop. Operation S318 converts instructions related to block 222 to instructions related to block 226. - In S320,
compiler 200 determines whether there is another induction variable in an identified computer programmed loop. Ifcompiler 200 detects another induction variable in the identified computer programmed loop, processing continues to S306 in which case instructions related to a newly identified induction variable are optimized. Ifcompiler 200 detects no other induction variable in the identified computer programmed loop, processing continues to S322. - In S322,
compiler 200 determines whethersource code 202 includes another computer programmed loop. Ifcompiler 200 detects the presence of another computer programmed loop, processing continues to S304 in whichcase compiler 200 further optimizes instructions related to a newly identified computer programmed loop. Ifcompiler 200 does not detect the presence of any other computer programmed loop, operation continues to S324 in whichcase compiler 200 stops optimizingsource code 202 and begins operations for compiling instructions related to block 226 to generate optimizedexecutable code 206. - Special care must be taken when memory is aliased. Aliased memory is memory shared with other tasks. The contents of the aliased memory may change in unexpected ways if due care is not taken. To prevent aliasing memory, memory should be reserved for performing programmed loops or special attention should be paid to ensuring values in memory are not rendered corrupt by memory aliasing problems caused by other tasks that use the shared aliased memory. Unchecked aliased memory may corrupt values of a recurrence pattern. That is, operation S308 should ensure that the memory is protected so that unpredictable changes in the values of the recurrence elements do not occur. Memory sharing or aliasing may require that the recurrence values be transferred between memory (that is slow operating memory) and fast operating memory (in which case the recurrence values are not kept constantly in fast operating memory).
- Referring to
FIG. 4 , there is depicted the computing environment ofFIG. 2 in whichCPU 211 is ready to execute optimizedexecutable code 206 for computing values related to a computer programmed loop included in optimizedexecutable code 206. Subsequent load/read instruction operations in each iteration step of the computer programmed loop are performed infast operating memory 213. By usingfast operating memory 213 for each iterative step,CPU 211 avoids executing load/read operations for transferring numerical values fromslow operating memory 212 tofast operating memory 213 for each subsequent iterative step of the computer programmed loop. It will be appreciated that transfer operations (that is store/write or load/read operations) for transferring numerical values from afast operating memory 213 to anotherfast operating memory 213 is performed faster than transfer operations for transferring numerical values from a storage location inslow operating memory 212 to another storage location inslow operating memory 212. -
Slow operating memory 212 includesmemory portion 402 having various memory storage locations for storing numerical values for array elements A[1], A[2], . . . , A[i]. Memory storage locations are depicted for containing values for array elements A[1] to A[4]. - In the preferred embodiment,
fast operating memory 213 includes units of fast operating memory depicted as T1, T2, and T3. Registers T1, T2, and T3 are depicted incolumns Rows - When
executable code 206 is executed byCPU 211,CPU 211 performs a load/read operation to transfer a value of A[0] and A[1] from memory 406 to hardware registers T2 and T1 respectively. The transfer of A[1] and A[0] into contents of the hardware registers is depicted inrow 404A andcolumns - The computer programmed loop is ready to be executed by
CPU 211 from “i”=2 to “i”=(N−1). For the case when “i”=2, a numerical value for T3 is computed, in which T3=T1+T2=A[1]+A[0]. Then, a store/write operation is performed byCPU 211 in which the value stored in T3 is transferred from hardware register T3 to a memory storage location inmemory storage 402 for storing the value of array element A[2]. Referring to the intersection ofcolumn 406B androw 404A, the value (that is A[1]) of hardware register T1 is consigned to hardware register T1 (by the instruction T2=T1=A[1]). Referring to the intersection ofcolumn 406A androw 404A, the value (that is A[2]) of hardware register T3 is consigned to hardware register T1 (by the instruction T1=T3=A[2]). - Referring to block 226, for the next iterative step in which “i”=3, a numerical value for hardware register T3 is computed, in which the value of register T3 is set to the sum of registers T1 and T2 which is the sum of A[2]+A[1] (by the instruction T3=T1+T2) which is depicted in the intersection of
row 404B andcolumn 406C. The value of hardware register T3 is stored/written to memory location for containing a value for array element A[3] inmemory 402 as directed by instruction A[3]=T3. Values of hardware registers T3 and T1 are consigned to registers T1 and T2 respectively (by instructions T2=T1 and T1=T3) which is depicted inrow 404B,columns - For the next iterative step in which “i”=4, a numerical value for hardware register T3 is computed, in which the value of register T3 is set to the sum of registers T1 and T2 which is the sum of A[3]+A[4] (by instruction T3=T1+T2) which is depicted in the intersection of
row 404C andcolumn 406C. The value of hardware register T3 is stored/written to memory location for containing a value for array element A[4] inmemory 402 as directed by instruction A[4]=T3. Values of hardware registers T3 and T1 are consigned to registers T2 and T1 respectively for use by the next iterative step of the programmed loop (by instructions T2=T1 and T1=T3) which is depicted inrow 404C andcolumns - Referring to
FIG. 5 , there is depictedsource code 502 having a recurrence element missing from the computation of array A[i] for each iterative step.Source code 502 is used as an example of how an aspect of the present invention can be used for handling recurrence elements which are missing from source code.Source code 502 depicts a missing secondary recurrence element. Even though a recurrence element is missing, the number of hardware registers required for iteratively computing the primary recurrence element is still equal to the recurrence length. Forsource code 502, the recurrence length is “3” and hence three hardware registers are required. For each iteration of “i” a value for the secondary recurrence element is still required so that for each iterative step computation can be completed for any remaining recurrence elements. The transformation ofblocks compiler 504 follow the operations depicted inflowchart 300 ofFIG. 3 . - Referring to
FIG. 6 , there is depictedmemory 212 for storingsource code 602 having a computer programmed loop,compiler 606, various stages ofoptimization 607, and optimizedexecutable code 610.Compiler 606 includes an optimization module (not depicted) for optimizingsource code 602. Stages ofoptimization 607 depicts optimizedsource code compiler 606,compiler 606 optimizes instructions related tosource code 602 to generate optimizedsource code 609, and then compiles optimizedsource code 609 to generate optimizedexecutable code 610.Source code 602 includes a computer programmed loop having a recurrence length of “2” and there is a primary and a secondary recurrence element A[i] and A[i−1] respectively. - For the case when
compiler 606 uses the operations depicted inflowchart 300 ofFIG. 3 ,compiler 606 optimizessource code 602 to generate optimizedsource code 608. Once optimizedsource code 608 is generated,compiler 606 further optimizes optimizedsource code 608 to generate optimizedsource code 609. It will be appreciated that an enhancement can be achieved in operations by reducing the number of copy operations when the value of register T2 is not required after its initial use in the loop. This improvement (minimizing the number of hardware registers) can be realized during the optimization of the instructions by followingflowchart 300 ofFIG. 3 or through a subsequent optimization phase. The optimization module ofcompiler 606 involves using a minimum number of storage locations of said fast operating memory. - Referring to
FIG. 7 , there is depictedsource code 702 for computing a function, such as a square root function.Memory 212stores source code 702 having a computer programmed loop,compiler 706, optimizedsource code 708, and optimized executable source code.Compiler 706 optimizessource code 702 to generate optimizedsource code 708, and then compiles optimizedsource code 708 to generate optimizedexecutable source code 710. The computer programmed loop includes a recurrence length of “2”, and a primary and a secondary recurrence element. - Referring to optimized
source code 708, the instructions related to block 712 will perform a single function call before execution of a computer programmed loop. Instructions related to block 714 depicts for each iterative step of the computer programmed loop, a single function call will be performed to compute the value of A[i]. The instructions related to block 716 depicts that for each iterative step of the programmed loop, the next value of the recurrence element is to be computed. It will be appreciated that a function call has been eliminated from each iterative step. It will be appreciated that recurrence elements are not restricted to array references. The optimizer module ofcompiler 706 is used for source code that directs the CPU to compute recurrence elements from a function call. - Referring to
FIG. 8 , there is depictedmemory 212 storingsource code 802, compiler 806 (including an optimizer module which is not depicted), stages ofoptimization 807, and optimizedexecutable code 812. Stages ofoptimization 807 includes optimizedsource code compiler 806. For the case when a user executescompiler 806,compiler 806 optimizessource code 802 to generate optimizedsource code 808, further optimizes optimizedsource code 808 to generate optimizedsource code 810, and then compiles optimizedsource code 810 to generate optimizedexecutable code 812. -
Source code 802 includes instructions for a second-order computation of a recurrence element. Previous embodiments depicted computing a first-order computation of the recurrence element. Optimizedsource code 808 depicts instructions optimized for a first-order correction (that is the elimination of a load/read operation). Optimizedsource code 810 depicts optimized instructions for a second-order correction. - Additional operations beyond the operations depicted in
flowchart 300 ofFIG. 3 are needed. After operation S310 (that is identifying a recurrence pattern),compiler 806 finds any loop invariant computation applied to all recurrence elements. Operation S312 is replaced with the following operation:compiler 806 places all recurrence elements and loop invariant computation on them outside of computer programmed loop. The replacement operation replaces the recurrence element and loop invariant computation, and the insertion operation holds the value of the primary feeder and any identified loop invariant computation on it. - The present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. Therefore, the presently discussed embodiments are considered to be illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.
Claims (14)
1: An optimizer stored within a memory of a computer system for optimizing source code, comprising:
means for generating the optimized source code having first instructions for instructing a central processing unit (CPU) to iteratively compute values for a recurrence element, said CPU operatively coupled to fast operating memory (FOM) and operatively coupled to slow operating memory (SOM) for storing said generated optimized source code; and
means for generating the optimized source code having second instructions for instructing said CPU to store a computed value of said recurrence element in a storage location of said FOM for use in a further iteration.
2: The optimizer of claim 1 wherein said recurrence element is a primary recurrence element, and further comprising means for generating said generated optimized source code having third instructions for instructing said CPU to consign, for use in a further iteration step, said computed value of said primary recurrence element from said storage location to another storage location of said FOM.
3: The optimizer of claim 2 further comprising means for causing said CPU to iteratively compute values for a subsequent recurrence element, and means for generating optimized source code having fourth instructions for instructing said CPU to compute a value of said primary recurrence element using a computed value of said subsequent recurrence element located in other storage locations of said FOM.
4: The optimizer of claim 2 wherein said another storage location contains at least one subsequent recurrence element.
5: The optimizer of claim 3 further comprising means for generating said optimized source code having fifth instructions for instructing said CPU to load an initial value of said subsequent recurrence element from said SOM to said FOM prior to computing an initial value of said primary recurrence element.
6: The optimizer of claim 3 wherein said subsequent recurrence element is a secondary recurrence element.
7-15. (canceled)
16: A method for optimizing source code, comprising:
instructing, by optimized source code, in a first source code instruction, a central processing unit (CPU) to iteratively compute values for a recurrence element; and
instructing, by said optimized source code, in a second source code instruction, the CPU to store a computed value of said recurrence element in a storage location of fast operating memory (FOM) for use in a further iteration by replacing said recurrence element with an instruction identifier for identifying a particular storage location within said FOM, wherein said CPU is operatively coupled to said FOM and operatively coupled to slow operating memory (SOM) for storing said optimized source code.
17: The method of claim 16 wherein said optimized source code is compiled and executed as machine code on said CPU.
18: The method of claim 17 wherein said recurrence element is a primary recurrence element, and said method further comprises consigning, by said optimized source code, in a third source code instruction, for use in a further iteration step, said computed value of said primary recurrence element from said storage location to another storage location of said FOM.
19: The method of claim 18 further comprising:
instructing, by said optimized source code, in a fourth source code instruction, said CPU to:
iteratively compute values for a subsequent recurrence element; and
compute a value of said primary recurrence element using a computed value of said subsequent recurrence element located in other storage locations of said FOM.
20-31. (canceled)
32: A computer program product for use in a computer system operatively coupled to a computer readable memory, the computer program product including a computer-readable data storage medium tangibly embodying computer readable program instructions for providing an optimizer, comprising:
first instructions for instructing a central processing unit (CPU) to iteratively compute values for a recurrence element, said CPU operatively coupled to fast operating memory (FOM) and operatively coupled to slow operating memory (SOM) for storing said generated optimized source code; and
second instructions for instructing said CPU to store a computed value of said recurrence element in a storage location of said FOM for use in a further iteration.
33-50. (canceled)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/870,121 US20080028381A1 (en) | 2001-12-18 | 2007-10-10 | Optimizing source code for iterative execution |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CA002365375A CA2365375A1 (en) | 2001-12-18 | 2001-12-18 | Optimizing source code for iterative execution |
CA2365375 | 2001-12-18 | ||
US10/314,094 US7340733B2 (en) | 2001-12-18 | 2002-12-05 | Optimizing source code for iterative execution |
US11/870,121 US20080028381A1 (en) | 2001-12-18 | 2007-10-10 | Optimizing source code for iterative execution |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/314,094 Continuation US7340733B2 (en) | 2001-12-18 | 2002-12-05 | Optimizing source code for iterative execution |
Publications (1)
Publication Number | Publication Date |
---|---|
US20080028381A1 true US20080028381A1 (en) | 2008-01-31 |
Family
ID=4170874
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/314,094 Active 2024-11-04 US7340733B2 (en) | 2001-12-18 | 2002-12-05 | Optimizing source code for iterative execution |
US11/870,121 Abandoned US20080028381A1 (en) | 2001-12-18 | 2007-10-10 | Optimizing source code for iterative execution |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/314,094 Active 2024-11-04 US7340733B2 (en) | 2001-12-18 | 2002-12-05 | Optimizing source code for iterative execution |
Country Status (2)
Country | Link |
---|---|
US (2) | US7340733B2 (en) |
CA (1) | CA2365375A1 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070074186A1 (en) * | 2005-09-29 | 2007-03-29 | Intel Corporation | Method and system for performing reassociation in software loops |
US20070079302A1 (en) * | 2005-09-30 | 2007-04-05 | Intel Corporation | Method for predicate promotion in a software loop |
US8260602B1 (en) * | 2006-11-02 | 2012-09-04 | The Math Works, Inc. | Timer analysis and identification |
US8359586B1 (en) * | 2007-08-20 | 2013-01-22 | The Mathworks, Inc. | Code generation |
CN103865093A (en) * | 2012-12-18 | 2014-06-18 | 中国第一汽车股份有限公司 | Pretreatment method for lithium ion battery separator surface |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2365375A1 (en) * | 2001-12-18 | 2003-06-18 | Ibm Canada Limited-Ibm Canada Limitee | Optimizing source code for iterative execution |
US7493609B2 (en) * | 2004-08-30 | 2009-02-17 | International Business Machines Corporation | Method and apparatus for automatic second-order predictive commoning |
KR100854032B1 (en) * | 2007-02-09 | 2008-08-26 | 삼성전자주식회사 | Memory system and data storaging method thereof |
US8813057B2 (en) * | 2007-03-31 | 2014-08-19 | Intel Corporation | Branch pruning in architectures with speculation support |
JP2009181558A (en) * | 2008-02-01 | 2009-08-13 | Panasonic Corp | Program conversion device |
US8495607B2 (en) * | 2010-03-01 | 2013-07-23 | International Business Machines Corporation | Performing aggressive code optimization with an ability to rollback changes made by the aggressive optimizations |
US10180829B2 (en) * | 2015-12-15 | 2019-01-15 | Nxp Usa, Inc. | System and method for modulo addressing vectorization with invariant code motion |
FR3056782B1 (en) * | 2016-09-26 | 2019-12-13 | Airbus Operations | GENERATION OF APPLICABLE CODES FROM A FORMAL SPECIFICATION |
US10108406B2 (en) * | 2016-10-24 | 2018-10-23 | International Business Machines Corporation | Linking optimized entry points for local-use-only function pointers |
US10108404B2 (en) * | 2016-10-24 | 2018-10-23 | International Business Machines Corporation | Compiling optimized entry points for local-use-only function pointers |
US10365904B2 (en) * | 2017-09-29 | 2019-07-30 | Microsoft Technology Licensing, Llc | Interactive code optimizer |
CN109587079B (en) * | 2018-12-14 | 2020-11-20 | 北京物芯科技有限责任公司 | OAM service processing system and method |
Citations (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5136696A (en) * | 1988-06-27 | 1992-08-04 | Prime Computer, Inc. | High-performance pipelined central processor for predicting the occurrence of executing single-cycle instructions and multicycle instructions |
US5333283A (en) * | 1991-10-29 | 1994-07-26 | International Business Machines Corporation | Case block table for predicting the outcome of blocks of conditional branches having a common operand |
US5704053A (en) * | 1995-05-18 | 1997-12-30 | Hewlett-Packard Company | Efficient explicit data prefetching analysis and code generation in a low-level optimizer for inserting prefetch instructions into loops of applications |
US5751981A (en) * | 1993-10-29 | 1998-05-12 | Advanced Micro Devices, Inc. | High performance superscalar microprocessor including a speculative instruction queue for byte-aligning CISC instructions stored in a variable byte-length format |
US5778423A (en) * | 1990-06-29 | 1998-07-07 | Digital Equipment Corporation | Prefetch instruction for improving performance in reduced instruction set processor |
US5794028A (en) * | 1996-10-17 | 1998-08-11 | Advanced Micro Devices, Inc. | Shared branch prediction structure |
US5805863A (en) * | 1995-12-27 | 1998-09-08 | Intel Corporation | Memory pattern analysis tool for use in optimizing computer program code |
US6226790B1 (en) * | 1997-02-28 | 2001-05-01 | Silicon Graphics, Inc. | Method for selecting optimal parameters for compiling source code |
US6243864B1 (en) * | 1997-07-17 | 2001-06-05 | Matsushita Electric Industrial Co., Ltd. | Compiler for optimizing memory instruction sequences by marking instructions not having multiple memory address paths |
US20010016901A1 (en) * | 2000-02-08 | 2001-08-23 | Siroyan Limited | Communicating instruction results in processors and compiling methods for processors |
US20010020294A1 (en) * | 2000-03-03 | 2001-09-06 | Hajime Ogawa | Optimization apparatus that decreases delays in pipeline processing of loop and computer-readable storage medium storing optimization program |
US6351849B1 (en) * | 1999-05-21 | 2002-02-26 | Intel Corporation | Compiler optimization through combining of memory operations |
US20020100031A1 (en) * | 2000-01-14 | 2002-07-25 | Miguel Miranda | System and method for optimizing source code |
US6539541B1 (en) * | 1999-08-20 | 2003-03-25 | Intel Corporation | Method of constructing and unrolling speculatively counted loops |
US20030079209A1 (en) * | 2001-05-30 | 2003-04-24 | International Business Machines Corporation | Code optimization |
US6564297B1 (en) * | 2000-06-15 | 2003-05-13 | Sun Microsystems, Inc. | Compiler-based cache line optimization |
US6651247B1 (en) * | 2000-05-09 | 2003-11-18 | Hewlett-Packard Development Company, L.P. | Method, apparatus, and product for optimizing compiler with rotating register assignment to modulo scheduled code in SSA form |
US20040064811A1 (en) * | 2002-09-30 | 2004-04-01 | Advanced Micro Devices, Inc. | Optimal register allocation in compilers |
US6748589B1 (en) * | 1999-10-20 | 2004-06-08 | Transmeta Corporation | Method for increasing the speed of speculative execution |
US20040205320A1 (en) * | 2001-03-20 | 2004-10-14 | International Business Machines Corporation | Method and apparatus for refining an alias set of address taken variables |
US20040255284A1 (en) * | 2001-09-18 | 2004-12-16 | Shiro Kobayashi | Compiler |
US7000227B1 (en) * | 2000-09-29 | 2006-02-14 | Intel Corporation | Iterative optimizing compiler |
US7340733B2 (en) * | 2001-12-18 | 2008-03-04 | International Business Machines Corporation | Optimizing source code for iterative execution |
-
2001
- 2001-12-18 CA CA002365375A patent/CA2365375A1/en not_active Abandoned
-
2002
- 2002-12-05 US US10/314,094 patent/US7340733B2/en active Active
-
2007
- 2007-10-10 US US11/870,121 patent/US20080028381A1/en not_active Abandoned
Patent Citations (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5136696A (en) * | 1988-06-27 | 1992-08-04 | Prime Computer, Inc. | High-performance pipelined central processor for predicting the occurrence of executing single-cycle instructions and multicycle instructions |
US5778423A (en) * | 1990-06-29 | 1998-07-07 | Digital Equipment Corporation | Prefetch instruction for improving performance in reduced instruction set processor |
US5333283A (en) * | 1991-10-29 | 1994-07-26 | International Business Machines Corporation | Case block table for predicting the outcome of blocks of conditional branches having a common operand |
US5867683A (en) * | 1993-10-29 | 1999-02-02 | Advanced Micro Devices, Inc. | Method of operating a high performance superscalar microprocessor including a common reorder buffer and common register file for both integer and floating point operations |
US5751981A (en) * | 1993-10-29 | 1998-05-12 | Advanced Micro Devices, Inc. | High performance superscalar microprocessor including a speculative instruction queue for byte-aligning CISC instructions stored in a variable byte-length format |
US5867682A (en) * | 1993-10-29 | 1999-02-02 | Advanced Micro Devices, Inc. | High performance superscalar microprocessor including a circuit for converting CISC instructions to RISC operations |
US5704053A (en) * | 1995-05-18 | 1997-12-30 | Hewlett-Packard Company | Efficient explicit data prefetching analysis and code generation in a low-level optimizer for inserting prefetch instructions into loops of applications |
US5805863A (en) * | 1995-12-27 | 1998-09-08 | Intel Corporation | Memory pattern analysis tool for use in optimizing computer program code |
US5794028A (en) * | 1996-10-17 | 1998-08-11 | Advanced Micro Devices, Inc. | Shared branch prediction structure |
US6226790B1 (en) * | 1997-02-28 | 2001-05-01 | Silicon Graphics, Inc. | Method for selecting optimal parameters for compiling source code |
US6243864B1 (en) * | 1997-07-17 | 2001-06-05 | Matsushita Electric Industrial Co., Ltd. | Compiler for optimizing memory instruction sequences by marking instructions not having multiple memory address paths |
US6351849B1 (en) * | 1999-05-21 | 2002-02-26 | Intel Corporation | Compiler optimization through combining of memory operations |
US6539541B1 (en) * | 1999-08-20 | 2003-03-25 | Intel Corporation | Method of constructing and unrolling speculatively counted loops |
US6748589B1 (en) * | 1999-10-20 | 2004-06-08 | Transmeta Corporation | Method for increasing the speed of speculative execution |
US20020100031A1 (en) * | 2000-01-14 | 2002-07-25 | Miguel Miranda | System and method for optimizing source code |
US20010016901A1 (en) * | 2000-02-08 | 2001-08-23 | Siroyan Limited | Communicating instruction results in processors and compiling methods for processors |
US20010020294A1 (en) * | 2000-03-03 | 2001-09-06 | Hajime Ogawa | Optimization apparatus that decreases delays in pipeline processing of loop and computer-readable storage medium storing optimization program |
US6651247B1 (en) * | 2000-05-09 | 2003-11-18 | Hewlett-Packard Development Company, L.P. | Method, apparatus, and product for optimizing compiler with rotating register assignment to modulo scheduled code in SSA form |
US6564297B1 (en) * | 2000-06-15 | 2003-05-13 | Sun Microsystems, Inc. | Compiler-based cache line optimization |
US7000227B1 (en) * | 2000-09-29 | 2006-02-14 | Intel Corporation | Iterative optimizing compiler |
US20040205320A1 (en) * | 2001-03-20 | 2004-10-14 | International Business Machines Corporation | Method and apparatus for refining an alias set of address taken variables |
US20030079209A1 (en) * | 2001-05-30 | 2003-04-24 | International Business Machines Corporation | Code optimization |
US20040255284A1 (en) * | 2001-09-18 | 2004-12-16 | Shiro Kobayashi | Compiler |
US7340733B2 (en) * | 2001-12-18 | 2008-03-04 | International Business Machines Corporation | Optimizing source code for iterative execution |
US20040064811A1 (en) * | 2002-09-30 | 2004-04-01 | Advanced Micro Devices, Inc. | Optimal register allocation in compilers |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070074186A1 (en) * | 2005-09-29 | 2007-03-29 | Intel Corporation | Method and system for performing reassociation in software loops |
US7774766B2 (en) * | 2005-09-29 | 2010-08-10 | Intel Corporation | Method and system for performing reassociation in software loops |
US20070079302A1 (en) * | 2005-09-30 | 2007-04-05 | Intel Corporation | Method for predicate promotion in a software loop |
US7712091B2 (en) * | 2005-09-30 | 2010-05-04 | Intel Corporation | Method for predicate promotion in a software loop |
US8260602B1 (en) * | 2006-11-02 | 2012-09-04 | The Math Works, Inc. | Timer analysis and identification |
US8868399B1 (en) | 2006-11-02 | 2014-10-21 | The Mathworks, Inc. | Timer analysis and identification |
US8359586B1 (en) * | 2007-08-20 | 2013-01-22 | The Mathworks, Inc. | Code generation |
US9009690B1 (en) | 2007-08-20 | 2015-04-14 | The Mathworks, Inc. | Code generation |
CN103865093A (en) * | 2012-12-18 | 2014-06-18 | 中国第一汽车股份有限公司 | Pretreatment method for lithium ion battery separator surface |
Also Published As
Publication number | Publication date |
---|---|
US20030115579A1 (en) | 2003-06-19 |
US7340733B2 (en) | 2008-03-04 |
CA2365375A1 (en) | 2003-06-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20080028381A1 (en) | Optimizing source code for iterative execution | |
US6516463B2 (en) | Method for removing dependent store-load pair from critical path | |
US6367071B1 (en) | Compiler optimization techniques for exploiting a zero overhead loop mechanism | |
US5901308A (en) | Software mechanism for reducing exceptions generated by speculatively scheduled instructions | |
EP1280056B1 (en) | Generation of debugging information | |
US8417921B2 (en) | Running-min and running-max instructions for processing vectors using a base value from a key element of an input vector | |
US8364938B2 (en) | Running-AND, running-OR, running-XOR, and running-multiply instructions for processing vectors using a base value from a key element of an input vector | |
US8271832B2 (en) | Non-faulting and first-faulting instructions for processing vectors | |
US8793472B2 (en) | Vector index instruction for generating a result vector with incremental values based on a start value and an increment value | |
US20170168938A1 (en) | Iterator register for structured memory | |
US8959316B2 (en) | Actual instruction and actual-fault instructions for processing vectors | |
US8447956B2 (en) | Running subtract and running divide instructions for processing vectors | |
US9182959B2 (en) | Predicate count and segment count instructions for processing vectors | |
US8984262B2 (en) | Generate predicates instruction for processing vectors | |
US8650383B2 (en) | Vector processing with predicate vector for setting element values based on key element position by executing remaining instruction | |
US20110035568A1 (en) | Select first and select last instructions for processing vectors | |
JPH0638234B2 (en) | System and method for maintaining source instruction atomicity of translated program code | |
US20100325399A1 (en) | Vector test instruction for processing vectors | |
US7480902B2 (en) | Unwind information for optimized programs | |
US7712091B2 (en) | Method for predicate promotion in a software loop | |
US7017154B2 (en) | Eliminating store/restores within hot function prolog/epilogs using volatile registers | |
US6928643B2 (en) | Bi-endian libraries | |
CA2428284A1 (en) | A method for minimizing spill in code scheduled by a list scheduler | |
US9009528B2 (en) | Scalar readXF instruction for processing vectors | |
Lossing et al. | From data to effects dependence graphs: source-to-source transformations for C |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |