US9436450B2 - Method and apparatus for optimising computer program code - Google Patents
Method and apparatus for optimising computer program code Download PDFInfo
- Publication number
- US9436450B2 US9436450B2 US14/531,024 US201414531024A US9436450B2 US 9436450 B2 US9436450 B2 US 9436450B2 US 201414531024 A US201414531024 A US 201414531024A US 9436450 B2 US9436450 B2 US 9436450B2
- Authority
- US
- United States
- Prior art keywords
- instructions
- memory
- candidate
- candidate instructions
- computer program
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G06F8/443—Optimisation
- G06F8/4441—Reducing the execution time required by the program code
Definitions
- This invention relates to a method of optimising computer program code, and an apparatus for performing such a method.
- Compiler optimisation is a transformation of code which tries to minimize or maximize some attributes of an executable computer program, most often to minimize execution time and memory space occupied by the resulting executable computer program code.
- Initialisation code is used within embedded applications to configure and setup ports, physical addresses, etc. and typically involves the initialisation of local and/or global variables, including structures and classes, with constants. Conventionally, such initialisation usually results in multiple assignments of constants to variables in contiguous memory locations. Such multiple assignments of constants is often inefficient in terms of both code size and code speed.
- the present invention provides method of optimising computer program code and computer program code optimisation apparatus as described in the accompanying claims.
- FIG. 1 illustrates a simplified block diagram of a first example of computer program code optimisation.
- FIG. 2 illustrates a simplified representation of locations within memory for initialised values relative to a stack pointer.
- FIG. 3 illustrates a simplified flowchart of an example of a method of performing computer program code optimisation.
- FIGS. 4 to 6 illustrate simplified block diagrams of further examples of computer program code optimisation.
- FIG. 7 illustrates a simplified flowchart of a further example of optimising computer program code.
- FIG. 8 illustrates a simplified block diagram of an example of a computer program code optimisation apparatus.
- FIG. 1 there is illustrated a simplified block diagram of a first example of computer program code optimisation.
- the example illustrated in FIG. 1 relates to computer program code intended for execution on CISC (complex instruction set computer) machines, with big endian byte ordering assumed.
- source code 100 for the computer program code defines a structure (struct S) comprising members (char a, char b, short c, long d, short e) that are initialised out of order.
- members char a, char b, short c, long d, short e
- several local variables short f, long g, long h
- the following initialisations are shown:
- these seven members/variables constitute 16 bytes of data, made up of three short data types, two char data types and two long data types.
- the source code 100 is translated into such a low level intermediate language 110 , for example an assembly language corresponding to the intended CISC computer architecture on which the resulting executable program code is to be run.
- the storing of a 0 value may either be achieved using a CLR type instruction or a MOV instruction.
- a CLR type instruction or a MOV instruction.
- all of the instructions use the same stack-indexed addressing mode, with the offset of the structure (struct S) and its first member being a four byte offset from the stack pointer (4, S) and the offset of the first local variable being a fourteen byte offset from the stack pointer (14, S).
- FIG. 2 illustrates a simplified representation of the locations within memory for the initialised values relative to a stack pointer.
- the above identified initialisations within the source code 100 have been translated into the following instructions within the low level intermediate language 110 respectively:
- each structure member and variable would be initialised by way of an individual memory access, with the sizes of the memory accesses performed corresponding to the sizes of the respective variables: byte for char, Word (2x byte) for short and Long (4x byte) for long (and int).
- byte for char Word (2x byte) for short
- Long (4x byte) for long (and int).
- seven memory accesses would be performed (one per instruction) in order to initialise just sixteen bytes of data.
- Such individual assignments of constants is inefficient in terms of both code size and code speed.
- FIG. 3 illustrates a simplified flowchart 300 of an example of a method of performing computer program code optimisation, and in particular for optimising the assignment of constants to variables residing in nearby (e.g. contiguous) memory locations.
- the method illustrated in FIG. 3 starts at 310 with the receipt of (or otherwise obtaining) computer program code to be optimised.
- the computer program code comprises a low level intermediate language such as the assembly language 110 of FIG. 1 .
- candidate instructions are identified within the received computer program code, the candidate instructions comprising instructions for writing constant values to memory.
- One or more sets of the identified candidate instructions are then selected at 320 for aggregation, for example as described in greater detail below.
- An aggregate constant value for the (or each) selected set of candidate instructions is then computed, at 330 , and the (or each) selected set of candidate instructions may then be replaced with a more efficient instruction or set of instructions for writing the (or each) aggregate constant value to memory at 340 , such as described in greater detail below.
- the number of memory accesses used to write the constant values to memory may be reduced by using one or more instructions that access a larger block of memory per access. In this manner the number of memory accesses required for, say, assigning constants to variables etc. may be reduced, thereby achieving more efficient computer program code in terms of size and/or execution speed.
- three constant values for the members a, b and c of the structure struct S are required to be written to four contiguous bytes within memory.
- three separate write instructions are used to individually write the three constant values to memory.
- four further constant values for the member e of the structure struct S and variables f, g and h are required to be written to 12 contiguous bytes within memory.
- four separate write instructions are used to individually write the four constant values to memory.
- This aggregate constant value takes up four bytes within memory, the equivalent of a single long data type.
- the three instructions used to individually write the three constant values for the members a, b and c of the structure struct S to memory in the initial low level intermediate language 110 may then be replaced by a single long data type write instruction in an optimised low level intermediate language 120 version of the computer program code.
- the four instructions used to individually write the four constant values for the member e of the structure struct S and variables f, g and h to memory may additionally/alternatively be selected to comprise a set of candidate instructions, and an aggregate constant value therefor computed.
- the constant values to be written to memory by these four instructions are, in the order in which they are to be stored in memory: 0x7; 0x64; 0x0; and 0x1.
- an aggregate constant value for these four instructions may be computed as:
- This aggregate constant value takes up twelve bytes within memory, the equivalent of three long data types.
- the four instructions used to individually write the four constant values for member e of the structure struct S and variables f, g and h to memory in the initial low level intermediate language 110 may then be replaced by three long data type write instructions in the optimised low level intermediate language 120 version of the computer program code.
- Candidate instructions for aggregation may be identified based on any appropriate criteria. For example, instructions using a same addressing mode may be identified as candidate instructions, (in the illustrated example the instructions all use a stack-indexed addressing mode).
- the relevant set of candidate instructions may be altered in order to avoid the invalidating condition(s). For example, any instruction comprising a volatile operand may be disregarded as a candidate instruction, and the process of selected one or more sets of candidate instructions repeated. Additionally/alternatively, if an access to memory occurs between the candidate instructions within the computer program code. or a register used by the candidate instructions is modified between the candidate instructions within the computer program code, the set of candidate instructions may be divided into subsets at the point of such a condition occurring within the computer program code. The validity of aggregating constant values for the (or each) subset of candidate instructions may then be checked.
- the largest set (or sets) of candidate instructions for which a valid aggregate constant value is achievable is/are selected for computing the (or each) aggregate constant value.
- the example illustrated in FIG. 1 relates to computer program code intended for execution on CISC machines.
- FIG. 4 there is illustrated a simplified block diagram of a further example of computer program code optimisation.
- the example illustrated in FIG. 4 relates to computer program code intended for execution on RISC (reduced instruction set computer) machines, with big endian byte ordering assumed, and comprises optimisation of the same source code 100 as the example illustrated in FIG. 1 .
- the source code 100 is translated into a low level intermediate language 410 , for example an assembly language corresponding to the intended RISC computer architecture on which the resulting executable program code is to be run.
- each source code write instruction is translated into two low level intermediate language (or assembly) instructions: a load instruction and a store instruction.
- a load instruction a load instruction
- a store instruction a store instruction
- all of the instructions use the same stack-indexed addressing mode, with the offset of the structure (struct S) and its first member being a four byte offset from the stack pointer (4, S) and the offset of the first local variable being a fourteen byte offset from the stack pointer (14, S), as illustrated in FIG. 2 .
- the constant value initialisations within the source code 100 (identified above in relation to FIG. 1 ) have been translated into fourteen load and store instructions within the low level intermediate language 410 .
- each structure member and variable would be initialised by way of an individual load/store memory access, with the sizes of the memory accesses performed corresponding to the sizes of the respective variables: byte for char, Word (2x byte) for short and Long (4x byte) for long (and int).
- byte for char Word (2x byte) for short
- Long (4x byte) for long (and int).
- seven load/store memory accesses would be performed (one per source code instruction) in order to initialise just sixteen bytes of data.
- Such individual assignments of constants is inefficient in terms of both code size and code speed.
- three pairs of load/store instructions used to individually write the three constant values for the members a, b and c of the structure struct S to memory may be selected to comprise a set of candidate instructions, and an aggregate constant value therefor computed.
- the constant values to be written to memory by these three pairs of load/store instructions are, in the order in which they are to be stored in memory: 0x2; 0x0; and 0xA.
- an aggregate constant value for these three pairs of load/store instructions may be computed as:
- This aggregate constant value takes up four bytes within memory, the equivalent of a single long data type.
- the three pairs of load/store instructions used to individually write the three constant values for the members a, b and c of the structure struct S to memory in the initial low level intermediate language 410 may then be replaced by a single long data type load/store instruction pair in an optimised low level intermediate language 420 version of the computer program code.
- the three original pairs of load/store instructions of:
- This aggregate constant value takes up twelve bytes within memory, the equivalent of three long data types.
- the four pairs of load/store instructions used to individually write the four constant values for member e of the structure struct S and variables f, g and h to memory in the initial low level intermediate language 410 may then be replaced by three long data type pairs of load/store instructions in the optimised low level intermediate language 420 version of the computer program code.
- the four original pairs of load/store instructions of:
- candidate instructions within the initial low level intermediate language 110 , 410 are directly replaced with more efficient instructions for writing the aggregate constant value(s) to memory.
- candidate instructions may additionally/alternatively be replaced with one or more library copy routine(s) for copying the aggregate constant value(s) to memory from a data section within an object file.
- FIG. 5 illustrates a simplified block diagram of a further example of computer program code optimisation in which candidate instructions are replaced with a library copy routine for copying aggregate constant value(s) to memory from a data section within an object file.
- the example illustrated in FIG. 5 relates to computer program code intended for execution on CISC machines, with big endian byte ordering assumed, and comprises optimisation of the same source code 100 and low level intermediate language code 110 as the example illustrated in FIG. 1 .
- the three instructions used to individually write the three constant values for the members a, b and c of the structure struct S to memory may be selected to comprise a set of candidate instructions, and an aggregate constant value therefor computed.
- the constant values to be written to memory by these three instructions are, in the order in which they are to be stored in memory: 0x2; 0x0; and 0xA.
- an aggregate constant value for these three instructions may be computed as:
- This aggregate constant value takes up four bytes within memory, the equivalent of a single long data type.
- the three instructions used to individually write the three constant values for the members a, b and c of the structure struct S to memory in the initial low level intermediate language 110 may then be replaced by a single long data type write instruction in an optimised low level intermediate language 120 version of the computer program code.
- an aggregate constant value for the four instructions used to individually write the four constant values for the member e of the structure struct S and variables f, g and h to memory may be computed as:
- this aggregate constant value is stored within a data section 525 of an object file (such object file may comprise the same object file as the resulting executable program code or a separate object file).
- the four instructions used to individually write the four constant values for the member e of the structure struct S and variables f, g and h to memory are the replaced within the optimised low level intermediate language code 520 by the instructions:
- These new instructions perform the actions respectively of: load the address in memory of the aggregate constant value within the data section 525 (“0xAggConstAdd”); load the address in memory to which the aggregate constant value is to be written/copied (“(12,S)”—i.e. 12 byte offset from the stack pointer); load the size of the constant in (“#0x3”—i.e. 3x 4—byte chunks); and call the library routine (“_copy_L”) for copying the aggregate constant value to memory.
- FIG. 6 illustrates a simplified block diagram of a still further example of computer program code optimisation in which candidate instructions are replaced with a library copy routine for copying aggregate constant value(s) to memory from a data section within an object file.
- the example illustrated in FIG. 6 relates to a computer program code intended for execution on RISC machines, with big endian byte ordering assumed, and comprises optimisation of the same source code 100 and low level intermediate language code 410 as the example illustrated in FIG. 4 .
- the three pairs of instructions used to individually write the three constant values for the members a, b and c of the structure struct S to memory may be selected to comprise a set of candidate instructions, and an aggregate constant value therefor computed.
- the constant values to be written to memory by these three pairs of load/store instructions are, in the order in which they are to be stored in memory: 0x2; 0x0; and 0xA.
- an aggregate constant value for these three pairs of load/store instructions may be computed as:
- This aggregate constant value takes up four bytes within memory, the equivalent of a single long data type.
- the three pairs of load/store instructions used to individually write the three constant values for the members a, b and c of the structure struct S to memory in the initial low level intermediate language 410 may then be replaced by a single long data type load/store instruction pair in an optimised low level intermediate language 520 version of the computer program code.
- the three original pairs of load/store instructions of:
- this aggregate constant value is stored within a data section 625 of an object file (such object file may comprise the same object file as the resulting executable program code or a separate object file).
- the four pairs of load/store instructions used to individually write the four constant values for the member e of the structure struct S and variables f, g and h to memory are then replace within the optimised low level intermediate language code 620 by the instructions:
- These new instructions perform the actions respectively of: load the address in memory of the aggregate constant value within the data section 525 (“0xAggConstAdd”); load the address in memory to which the aggregate constant value is to be written/copied (“(12,S)”—i.e. 12 byte offset from the stack pointer); load the size of the constant in (“#0x 3”—i.e. 3x 4-byte chunks); and call the library routine (“_copy_L”) for copying the aggregate constant value to memory.
- the method then comprises evaluating one or more efficiency metric(s) for each of the instructions replacement options of:
- the method determines whether aggregation of write instructions has been performed for all addressing modes within the computer program code, at 760 . If it is determined that aggregation of write instructions has been performed for all addressing modes within the computer program code, the method ends at 765 . Conversely, if it is determined that aggregation of write instructions has not been performed for all addressing modes within the computer program code, the method loops back to 720 where a next addressing mode is selected.
- FIG. 8 there is illustrated a simplified block diagram of an example of a computer program code optimisation apparatus 800 comprising at least one processing component 810 arranged to optimise computer program code, for example as hereinbefore described with reference to FIGS. 1 to 7 .
- the at least one processing component 810 is arranged to identify candidate instructions within the computer program code, each candidate instruction comprising an instruction for writing a constant value to memory, select at least one set of candidate instructions, the at least one set comprising a plurality of candidate instructions, compute an aggregate constant value for the at least one set of candidate instructions, and replace the at least one set of candidate instructions with at least one instruction for writing the aggregate constant value to memory.
- the (or each) processing component 810 may comprise a central processing unit, digital signal processor unit, microcontroller unit, microprocessor unit, or the like, and may be operably coupled to one or more memory elements, such as memory element 820 , in which computer program code is stored.
- the memory element 820 may have executable program code stored therein for execution by the (or each) processing core 810 for optimising computer program code, the program code operable for identifying candidate instructions within the computer program code, each candidate instruction comprising an instruction for writing a constant value to memory selecting at least one set of candidate instructions, the at least one set comprising a plurality of candidate instructions, computing an aggregate constant value for the at least one set of candidate instructions, and replacing the at least one set of candidate instructions with at least one instruction for writing the aggregate constant value to memory.
- such program code comprises compiler backend program code 830 arranged to perform such computer program code optimisation of low level intermediate program code.
- a computer program is a list of instructions such as a particular application program and/or an operating system.
- the computer program may for instance include one or more of: a subroutine, a function, a procedure, an object method, an object implementation, an executable application, an applet, a servlet, a source code, an object code, a shared library/dynamic load library and/or other sequence of instructions designed for execution on a computer system.
- the computer program may be stored internally on a tangible and non-transitory computer readable storage medium or transmitted to the computer system via a computer readable transmission medium. All or some of the computer program may be provided on computer readable media permanently, removably or remotely coupled to an information processing system.
- the tangible and non-transitory computer readable media may include, for example and without limitation, any number of the following: magnetic storage media including disk and tape storage media; optical storage media such as compact disk media (e.g., CD-ROM, CD-R, etc.) and digital video disk storage media; non-volatile memory storage media including semiconductor-based memory units such as FLASH memory, EEPROM, EPROM, ROM; ferromagnetic digital memories; MRAM; volatile storage media including registers, buffers or caches, main memory, RAM, etc.
- the computer system may for instance include at least one processing unit, associated memory and a number of input/output (I/O) devices.
- I/O input/output
- the computer system processes information according to the computer program and produces resultant output information via I/O devices.
- logic blocks are merely illustrative and that alternative embodiments may merge logic blocks or circuit elements or impose an alternate decomposition of functionality upon various logic blocks or circuit elements.
- architectures depicted herein are merely exemplary, and that in fact many other architectures can be implemented which achieve the same functionality.
- any arrangement of components to achieve the same functionality is effectively ‘associated’ such that the desired functionality is achieved.
- any two components herein combined to achieve a particular functionality can be seen as ‘associated with’ each other such that the desired functionality is achieved, irrespective of architectures or intermediary components.
- any two components so associated can also be viewed as being ‘operably connected,’ or ‘operably coupled,’ to each other to achieve the desired functionality.
- any reference signs placed between parentheses shall not be construed as limiting the claim.
- the word ‘comprising’ does not exclude the presence of other elements or steps then those listed in a claim.
- the terms ‘a’ or ‘an,’ as used herein, are defined as one or more than one.
Abstract
Description
-
- member c of struct S: 10
- member a of struct S: 2
- member e of struct S: 7
- member b of struct S: 0
- variable f: 100
- variable g: 0
- variable h: 1
-
- MOV.W #0xA, (6,S)
- MOV.B #0x2, (4,S)
- MOV.W #0x7, (12,S)
- CLR.B (5,S)
- MOV.W #0x64, (14,S)
- CLR.L (16,S)
- MOV.L #0x1, (20,S)
-
- 0x 0200 000A
-
- MOV.W #0xA, (6,S)
- MOV.B#0x2, (4,S)
- CLR.B (5,S)
may be replaced by one single write instruction of: - MOV.L #0x200000A, (4,S)
-
- 0x 0007 0064 0000 0000 0000 0001
-
- MOV.W #0x7, (12,S)
- MOV.W #0x64, (14,S)
- CLR.L (16,S)
- MOV.L#0x1, (20,S)
may be replaced by three write instructions of: - MOV.L #0x70064, (12,S)
- CLR.L (16,S)
- MOV.L#0x1, (20,S)
-
- checking whether the candidate instructions comprise volatile operands;
- checking whether accesses to memory occur between the candidate instructions within the computer program code;
- checking whether registers used by the candidate instructions are modified between the candidate instructions within the computer program code.
-
- 0x 0200 000A
-
- LDRW D0, #0xA
- STRW D0, (6,S)
- LDRB D0, #0x2
- STRB D0, (4,S)
- LDRW D1, #0x0
- STRW D1, (5,S)
may be replaced by one signal load/store instruction pair of: - LDRL D0, #0x200000A
- STRL D0, (4,S)
-
- 0x 0007 0064 0000 0000 0000 0001
-
- LDRW D1, #0x7
- STRW D1, (12,S)
- LDRW D2, #0x64
- STRW D2, (14,S)
- LDRL D2, #0x0
- STRL D2, (16,S)
- LDRL D2, #0x1
- STRL D2, (20,S)
may be replaced by three pairs of load/store instructions of: - LDRL D1, #0x70064
- STRL D1, (12,S)
- LDRL D2, #0x0
- STRL D2, (16,S)
- LDRL D3, #0x1
- STRL D3, (20,S)
-
- 0x 0200 000A
-
- MOV.W #0xA, (6,S)
- MOV.B#0x2, (4,S)
- CLR.B (5,S)
may be directly replaced by one single write instruction within an optimised low levelintermediate language code 520 of: - MOV.L #0x200000A, (4,S)
-
- 0x 0007 0064 0000 0000 0000 0001
-
- LEA X, 0xAggConstAdd
- LEA Y, (12,S)
- LD D0, #0x3
- JSR _copy_L
-
- 0x 0200 000A
-
- LDRW D0, #0xA
- STRW D0, (6,S)
- LDRB D0, #0x2
- STRB D0, (4,S)
- LDRW D1, #0x0
- STRW D1, (5,S)
may be replaced by one signal load/store instruction pair of: - LDRL D0, #0x200000A
- STRL D0, (4,S)
-
- 0x 0007 0064 0000 0000 0000 0001
-
- LEA X, 0xAggConstAdd
- LEA Y, (12,S)
- LD D0, #0x3
- JSR _copy_L
-
- checking whether the candidate instructions comprise volatile operands;
- checking whether accesses to memory occur between the candidate instructions within the computer program code;
- checking whether registers used by the candidate instructions are modified between the candidate instructions within the computer program code.
-
- (i) replacing the set(s) of candidate instructions substantially directly with more efficient write instruction(s) for writing the aggregate constant value(s) to memory (such as performed in the examples illustrated in
FIGS. 1 and 4 ); and - (ii) replacing the set(s) of candidate instructions with one or more library copy routine(s) for copying the aggregate constant value from a data section within an object file to memory (such is performed in the examples illustrated in
FIGS. 5 and 6 ).
- (i) replacing the set(s) of candidate instructions substantially directly with more efficient write instruction(s) for writing the aggregate constant value(s) to memory (such as performed in the examples illustrated in
Claims (14)
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
RO14-0664 | 2014-09-01 | ||
ROA201400664 | 2014-09-01 | ||
RO201400664 | 2014-09-01 |
Publications (2)
Publication Number | Publication Date |
---|---|
US20160062751A1 US20160062751A1 (en) | 2016-03-03 |
US9436450B2 true US9436450B2 (en) | 2016-09-06 |
Family
ID=55402573
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/531,024 Active US9436450B2 (en) | 2014-09-01 | 2014-11-03 | Method and apparatus for optimising computer program code |
Country Status (1)
Country | Link |
---|---|
US (1) | US9436450B2 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10157164B2 (en) * | 2016-09-20 | 2018-12-18 | Qualcomm Incorporated | Hierarchical synthesis of computer machine instructions |
US10534593B2 (en) | 2016-10-24 | 2020-01-14 | International Business Machines Corporation | Optimized entry points and local function call tailoring for function pointers |
US10585652B2 (en) | 2016-10-24 | 2020-03-10 | International Business Machines Corporation | Compiling optimized entry points for local-use-only function pointers |
US10606574B2 (en) | 2016-10-24 | 2020-03-31 | International Business Machines Corporation | Executing optimized local entry points and function call sites |
US10620926B2 (en) | 2016-10-24 | 2020-04-14 | International Business Machines Corporation | Linking optimized entry points for local-use-only function pointers |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9940267B2 (en) * | 2016-05-17 | 2018-04-10 | Nxp Usa, Inc. | Compiler global memory access optimization in code regions using most appropriate base pointer registers |
US10776087B2 (en) * | 2018-06-25 | 2020-09-15 | Intel Corporation | Sequence optimizations in a high-performance computing environment |
Citations (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6061772A (en) * | 1997-06-30 | 2000-05-09 | Sun Microsystems, Inc. | Split write data processing mechanism for memory controllers utilizing inactive periods during write data processing for other transactions |
US6072952A (en) * | 1998-04-22 | 2000-06-06 | Hewlett-Packard Co. | Method and apparatus for coalescing variables |
US6141791A (en) * | 1997-08-29 | 2000-10-31 | Matsushita Electric Industrial Co., Ltd. | Debug aid device, program compiler device, storage medium storing computer-readable debugger program, and storage medium storing program compiler program |
US6427234B1 (en) * | 1998-06-11 | 2002-07-30 | University Of Washington | System and method for performing selective dynamic compilation using run-time information |
US20020144244A1 (en) * | 2001-03-30 | 2002-10-03 | Rakesh Krishnaiyer | Compile-time memory coalescing for dynamic arrays |
US20030056041A1 (en) * | 2001-09-20 | 2003-03-20 | Connor Patrick L. | Method and apparatus for dynamic coalescing |
US20030163679A1 (en) * | 2000-01-31 | 2003-08-28 | Kumar Ganapathy | Method and apparatus for loop buffering digital signal processing instructions |
US20040088501A1 (en) * | 2002-11-04 | 2004-05-06 | Collard Jean-Francois C. | Data repacking for memory accesses |
US6760743B1 (en) * | 2000-01-04 | 2004-07-06 | International Business Machines Corporation | Instruction memory system for multi-processor environment and disjoint tasks |
US20050044327A1 (en) * | 2003-08-19 | 2005-02-24 | Quicksilver Technology, Inc. | Asynchronous, independent and multiple process shared memory system in an adaptive computing architecture |
US6877150B1 (en) * | 2002-12-04 | 2005-04-05 | Xilinx, Inc. | Method of transforming software language constructs to functional hardware equivalents |
US7457936B2 (en) * | 2003-11-19 | 2008-11-25 | Intel Corporation | Memory access instruction vectorization |
US20090271775A1 (en) * | 2008-04-24 | 2009-10-29 | International Business Machines Corporation | Optimizing Just-In-Time Compiling For A Java Application Executing On A Compute Node |
US20100325621A1 (en) * | 2009-06-23 | 2010-12-23 | International Business Machines Corporation | Partitioning operator flow graphs |
US8234636B2 (en) * | 2006-09-12 | 2012-07-31 | International Business Machines Corporation | Source code modification technique |
US8392669B1 (en) * | 2008-03-24 | 2013-03-05 | Nvidia Corporation | Systems and methods for coalescing memory accesses of parallel threads |
US8527975B2 (en) * | 2007-11-02 | 2013-09-03 | Hewlett-Packard Development Company, L.P. | Apparatus and method for analyzing source code using memory operation evaluation and boolean satisfiability |
US8561044B2 (en) * | 2008-10-07 | 2013-10-15 | International Business Machines Corporation | Optimized code generation targeting a high locality software cache |
US9110684B2 (en) * | 2007-07-10 | 2015-08-18 | International Business Machines Corporation | Data splitting for recursive data structures |
US9128722B2 (en) * | 2009-05-01 | 2015-09-08 | Apple Inc. | Systems, methods, and computer-readable media for fertilizing machine-executable code |
-
2014
- 2014-11-03 US US14/531,024 patent/US9436450B2/en active Active
Patent Citations (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6061772A (en) * | 1997-06-30 | 2000-05-09 | Sun Microsystems, Inc. | Split write data processing mechanism for memory controllers utilizing inactive periods during write data processing for other transactions |
US6141791A (en) * | 1997-08-29 | 2000-10-31 | Matsushita Electric Industrial Co., Ltd. | Debug aid device, program compiler device, storage medium storing computer-readable debugger program, and storage medium storing program compiler program |
US6072952A (en) * | 1998-04-22 | 2000-06-06 | Hewlett-Packard Co. | Method and apparatus for coalescing variables |
US6427234B1 (en) * | 1998-06-11 | 2002-07-30 | University Of Washington | System and method for performing selective dynamic compilation using run-time information |
US6760743B1 (en) * | 2000-01-04 | 2004-07-06 | International Business Machines Corporation | Instruction memory system for multi-processor environment and disjoint tasks |
US20030163679A1 (en) * | 2000-01-31 | 2003-08-28 | Kumar Ganapathy | Method and apparatus for loop buffering digital signal processing instructions |
US20020144244A1 (en) * | 2001-03-30 | 2002-10-03 | Rakesh Krishnaiyer | Compile-time memory coalescing for dynamic arrays |
US20030056041A1 (en) * | 2001-09-20 | 2003-03-20 | Connor Patrick L. | Method and apparatus for dynamic coalescing |
US20040088501A1 (en) * | 2002-11-04 | 2004-05-06 | Collard Jean-Francois C. | Data repacking for memory accesses |
US6877150B1 (en) * | 2002-12-04 | 2005-04-05 | Xilinx, Inc. | Method of transforming software language constructs to functional hardware equivalents |
US20050044327A1 (en) * | 2003-08-19 | 2005-02-24 | Quicksilver Technology, Inc. | Asynchronous, independent and multiple process shared memory system in an adaptive computing architecture |
US7457936B2 (en) * | 2003-11-19 | 2008-11-25 | Intel Corporation | Memory access instruction vectorization |
US8234636B2 (en) * | 2006-09-12 | 2012-07-31 | International Business Machines Corporation | Source code modification technique |
US9110684B2 (en) * | 2007-07-10 | 2015-08-18 | International Business Machines Corporation | Data splitting for recursive data structures |
US8527975B2 (en) * | 2007-11-02 | 2013-09-03 | Hewlett-Packard Development Company, L.P. | Apparatus and method for analyzing source code using memory operation evaluation and boolean satisfiability |
US8392669B1 (en) * | 2008-03-24 | 2013-03-05 | Nvidia Corporation | Systems and methods for coalescing memory accesses of parallel threads |
US20090271775A1 (en) * | 2008-04-24 | 2009-10-29 | International Business Machines Corporation | Optimizing Just-In-Time Compiling For A Java Application Executing On A Compute Node |
US8561044B2 (en) * | 2008-10-07 | 2013-10-15 | International Business Machines Corporation | Optimized code generation targeting a high locality software cache |
US9128722B2 (en) * | 2009-05-01 | 2015-09-08 | Apple Inc. | Systems, methods, and computer-readable media for fertilizing machine-executable code |
US20100325621A1 (en) * | 2009-06-23 | 2010-12-23 | International Business Machines Corporation | Partitioning operator flow graphs |
Non-Patent Citations (3)
Title |
---|
Liao, Stan et al., "Storage Assignment to Decrease Code Size," ACM Transactions of Programming Languages and Systems, vol. 18, No. 3; May 1996, pp. 235-253. |
Liao, Title: Storage Assignment to Decrease Code Size, ACM May 1996. * |
Nieplocha et al., Title: ARMCI: A Portable Aggregate Remote Memory Copy Interface, Version 1.1, Oct. 30, 2000 Located at: http://hpc.pnl.gov/globalarrays/papers/armci1-1.pdf. * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10157164B2 (en) * | 2016-09-20 | 2018-12-18 | Qualcomm Incorporated | Hierarchical synthesis of computer machine instructions |
US10534593B2 (en) | 2016-10-24 | 2020-01-14 | International Business Machines Corporation | Optimized entry points and local function call tailoring for function pointers |
US10534594B2 (en) | 2016-10-24 | 2020-01-14 | International Business Machines Corporation | Optimized entry points and local function call tailoring for function pointers |
US10585652B2 (en) | 2016-10-24 | 2020-03-10 | International Business Machines Corporation | Compiling optimized entry points for local-use-only function pointers |
US10606574B2 (en) | 2016-10-24 | 2020-03-31 | International Business Machines Corporation | Executing optimized local entry points and function call sites |
US10620926B2 (en) | 2016-10-24 | 2020-04-14 | International Business Machines Corporation | Linking optimized entry points for local-use-only function pointers |
Also Published As
Publication number | Publication date |
---|---|
US20160062751A1 (en) | 2016-03-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9436450B2 (en) | Method and apparatus for optimising computer program code | |
US11221850B2 (en) | Sort and merge instruction for a general-purpose processor | |
US10635308B2 (en) | Memory state indicator | |
CN113468079B (en) | Memory access method and device | |
US20210096876A1 (en) | Saving and restoring machine state between multiple executions of an instruction | |
US20200142669A1 (en) | Controlling storage accesses for merge operations | |
KR102598929B1 (en) | Negative zero control for execution of commands | |
JP2021525919A (en) | Scheduler queue allocation | |
US20210258311A1 (en) | Protecting supervisor mode information | |
KR102238188B1 (en) | Temporary prohibition of processing restricted storage operand requests | |
US8560805B1 (en) | Efficient allocation of address space resources to bus devices | |
US20130339667A1 (en) | Special case register update without execution | |
US10831502B2 (en) | Migration of partially completed instructions | |
CN107851015B (en) | Vector operation digit size control | |
US9753776B2 (en) | Simultaneous multithreading resource sharing | |
US9672042B2 (en) | Processing system and method of instruction set encoding space utilization | |
US10339049B2 (en) | Garbage collection facility grouping infrequently accessed data units in designated transient memory area | |
WO2023072790A1 (en) | Providing a dynamic random-access memory cache as second type memory per application process | |
CN117632778A (en) | Electronic device and method of operating the same |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: FREESCALE SEMICONDUCTOR, INC., TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:OPREA, MIHAI DANIEL;ARBONE, CIPRIAN;DITU, BOGDAN FLORIN;REEL/FRAME:034088/0989 Effective date: 20140902 |
|
AS | Assignment |
Owner name: CITIBANK, N.A., AS NOTES COLLATERAL AGENT, NEW YOR Free format text: SUPPLEMENT TO IP SECURITY AGREEMENT;ASSIGNOR:FREESCALE SEMICONDUCTOR, INC.;REEL/FRAME:035033/0001 Effective date: 20150213 Owner name: CITIBANK, N.A., AS NOTES COLLATERAL AGENT, NEW YOR Free format text: SUPPLEMENT TO IP SECURITY AGREEMENT;ASSIGNOR:FREESCALE SEMICONDUCTOR, INC.;REEL/FRAME:035033/0923 Effective date: 20150213 Owner name: CITIBANK, N.A., AS NOTES COLLATERAL AGENT, NEW YOR Free format text: SUPPLEMENT TO IP SECURITY AGREEMENT;ASSIGNOR:FREESCALE SEMICONDUCTOR, INC.;REEL/FRAME:035034/0019 Effective date: 20150213 |
|
AS | Assignment |
Owner name: FREESCALE SEMICONDUCTOR, INC., TEXAS Free format text: PATENT RELEASE;ASSIGNOR:CITIBANK, N.A., AS COLLATERAL AGENT;REEL/FRAME:037358/0001 Effective date: 20151207 |
|
AS | Assignment |
Owner name: MORGAN STANLEY SENIOR FUNDING, INC., MARYLAND Free format text: ASSIGNMENT AND ASSUMPTION OF SECURITY INTEREST IN PATENTS;ASSIGNOR:CITIBANK, N.A.;REEL/FRAME:037444/0535 Effective date: 20151207 Owner name: MORGAN STANLEY SENIOR FUNDING, INC., MARYLAND Free format text: ASSIGNMENT AND ASSUMPTION OF SECURITY INTEREST IN PATENTS;ASSIGNOR:CITIBANK, N.A.;REEL/FRAME:037444/0444 Effective date: 20151207 |
|
AS | Assignment |
Owner name: MORGAN STANLEY SENIOR FUNDING, INC., MARYLAND Free format text: SUPPLEMENT TO THE SECURITY AGREEMENT;ASSIGNOR:FREESCALE SEMICONDUCTOR, INC.;REEL/FRAME:039138/0001 Effective date: 20160525 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: NXP USA, INC., TEXAS Free format text: MERGER;ASSIGNOR:FREESCALE SEMICONDUCTOR, INC.;REEL/FRAME:041144/0363 Effective date: 20161107 |
|
AS | Assignment |
Owner name: NXP B.V., NETHERLANDS Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC.;REEL/FRAME:050744/0097 Effective date: 20190903 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |