WO2005033926A3 - Methods and apparatus for reducing memory latency in a software application - Google Patents

Methods and apparatus for reducing memory latency in a software application Download PDF

Info

Publication number
WO2005033926A3
WO2005033926A3 PCT/US2004/032212 US2004032212W WO2005033926A3 WO 2005033926 A3 WO2005033926 A3 WO 2005033926A3 US 2004032212 W US2004032212 W US 2004032212W WO 2005033926 A3 WO2005033926 A3 WO 2005033926A3
Authority
WO
WIPO (PCT)
Prior art keywords
software application
thread
helper
memory latency
main thread
Prior art date
Application number
PCT/US2004/032212
Other languages
French (fr)
Other versions
WO2005033926A2 (en
Inventor
Xinmin Tian
Shih-Wei Liao
Hong Wang
Milind Girkar
John Shen
Perry Wang
Grant Haab
Gerolf Hoflehner
Daniel Lavery
Hideki Saito
Sanjiv Shah
Dongkeun Kim
Original Assignee
Intel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corp filed Critical Intel Corp
Priority to JP2006534105A priority Critical patent/JP4783291B2/en
Priority to EP04789368A priority patent/EP1678610A2/en
Priority to CN200480035709XA priority patent/CN1890635B/en
Publication of WO2005033926A2 publication Critical patent/WO2005033926A2/en
Publication of WO2005033926A3 publication Critical patent/WO2005033926A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3851Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/443Optimisation
    • G06F8/4441Reducing the execution time required by the program code
    • G06F8/4442Reducing the number of cache misses; Data prefetching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3824Operand accessing
    • G06F9/383Operand prefetching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/52Program synchronisation; Mutual exclusion, e.g. by means of semaphores

Abstract

Methods and apparatus for reducing memory latency in a software application are disclosed. A disclosed system uses one or more helper threads to prefetch variables for a main thread to reduce performance bottlenecks due to memory latency and/or a cache miss. A performance analysis tool is used to profile the software application's resource usage and identifies areas in the software application experiencing performance bottlenecks. Compiler-runtime instructions are generated into the software application to create and manage the helper thread. The helper thread prefetches data in the identified areas of the software application experiencing performance bottlenecks. A counting mechanism is inserted into the helper thread and a counting mechanism is inserted into the main thread to coordinate the execution of the helper thread with the main thread and to help ensure the prefetched data is not removed from the cache before the main thread is able to take advantage of the prefetched data.
PCT/US2004/032212 2003-10-02 2004-09-29 Methods and apparatus for reducing memory latency in a software application WO2005033926A2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2006534105A JP4783291B2 (en) 2003-10-02 2004-09-29 Method and apparatus for reducing memory latency in software applications
EP04789368A EP1678610A2 (en) 2003-10-02 2004-09-29 Methods and apparatus for reducing memory latency in a software application
CN200480035709XA CN1890635B (en) 2003-10-02 2004-09-29 Methods and apparatus for reducing memory latency in a software application

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US10/677,414 US7328433B2 (en) 2003-10-02 2003-10-02 Methods and apparatus for reducing memory latency in a software application
US10/677,414 2003-10-02

Publications (2)

Publication Number Publication Date
WO2005033926A2 WO2005033926A2 (en) 2005-04-14
WO2005033926A3 true WO2005033926A3 (en) 2005-12-29

Family

ID=34422137

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2004/032212 WO2005033926A2 (en) 2003-10-02 2004-09-29 Methods and apparatus for reducing memory latency in a software application

Country Status (5)

Country Link
US (1) US7328433B2 (en)
EP (1) EP1678610A2 (en)
JP (2) JP4783291B2 (en)
CN (1) CN1890635B (en)
WO (1) WO2005033926A2 (en)

Families Citing this family (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040128489A1 (en) * 2002-12-31 2004-07-01 Hong Wang Transformation of single-threaded code to speculative precomputation enabled code
US20040243767A1 (en) * 2003-06-02 2004-12-02 Cierniak Michal J. Method and apparatus for prefetching based upon type identifier tags
US7707554B1 (en) * 2004-04-21 2010-04-27 Oracle America, Inc. Associating data source information with runtime events
US20060080661A1 (en) * 2004-10-07 2006-04-13 International Business Machines Corporation System and method for hiding memory latency
US7506325B2 (en) 2004-10-07 2009-03-17 International Business Machines Corporation Partitioning processor resources based on memory usage
US7752016B2 (en) * 2005-01-11 2010-07-06 Hewlett-Packard Development Company, L.P. System and method for data analysis
US7809991B2 (en) * 2005-01-11 2010-10-05 Hewlett-Packard Development Company, L.P. System and method to qualify data capture
US7849453B2 (en) * 2005-03-16 2010-12-07 Oracle America, Inc. Method and apparatus for software scouting regions of a program
US7950012B2 (en) * 2005-03-16 2011-05-24 Oracle America, Inc. Facilitating communication and synchronization between main and scout threads
US7472256B1 (en) 2005-04-12 2008-12-30 Sun Microsystems, Inc. Software value prediction using pendency records of predicted prefetch values
US20070130114A1 (en) * 2005-06-20 2007-06-07 Xiao-Feng Li Methods and apparatus to optimize processing throughput of data structures in programs
US7784040B2 (en) * 2005-11-15 2010-08-24 International Business Machines Corporation Profiling of performance behaviour of executed loops
US7856622B2 (en) * 2006-03-28 2010-12-21 Inventec Corporation Computer program runtime bottleneck diagnostic method and system
US7383402B2 (en) * 2006-06-05 2008-06-03 Sun Microsystems, Inc. Method and system for generating prefetch information for multi-block indirect memory access chains
US7383401B2 (en) * 2006-06-05 2008-06-03 Sun Microsystems, Inc. Method and system for identifying multi-block indirect memory access chains
US7596668B2 (en) * 2007-02-20 2009-09-29 International Business Machines Corporation Method, system and program product for associating threads within non-related processes based on memory paging behaviors
JP4821907B2 (en) * 2007-03-06 2011-11-24 日本電気株式会社 Memory access control system, memory access control method and program thereof
US8886887B2 (en) * 2007-03-15 2014-11-11 International Business Machines Corporation Uniform external and internal interfaces for delinquent memory operations to facilitate cache optimization
US8271963B2 (en) * 2007-11-19 2012-09-18 Microsoft Corporation Mimicking of functionality exposed through an abstraction
CN101482831B (en) * 2008-01-08 2013-05-15 国际商业机器公司 Method and equipment for concomitant scheduling of working thread and worker thread
US8359589B2 (en) * 2008-02-01 2013-01-22 International Business Machines Corporation Helper thread for pre-fetching data
CN101639799B (en) * 2008-07-31 2013-02-13 英赛特半导体有限公司 Integrated circuit characterization system and method
US8312442B2 (en) * 2008-12-10 2012-11-13 Oracle America, Inc. Method and system for interprocedural prefetching
US20100153934A1 (en) * 2008-12-12 2010-06-17 Peter Lachner Prefetch for systems with heterogeneous architectures
US8327325B2 (en) * 2009-01-14 2012-12-04 International Business Machines Corporation Programmable framework for automatic tuning of software applications
CA2680597C (en) * 2009-10-16 2011-06-07 Ibm Canada Limited - Ibm Canada Limitee Managing speculative assist threads
US8572337B1 (en) * 2009-12-14 2013-10-29 Symantec Corporation Systems and methods for performing live backups
JP5541491B2 (en) * 2010-01-07 2014-07-09 日本電気株式会社 Multiprocessor, computer system using the same, and multiprocessor processing method
CN101807144B (en) * 2010-03-17 2014-05-14 上海大学 Prospective multi-threaded parallel execution optimization method
US8423750B2 (en) 2010-05-12 2013-04-16 International Business Machines Corporation Hardware assist thread for increasing code parallelism
US8468531B2 (en) 2010-05-26 2013-06-18 International Business Machines Corporation Method and apparatus for efficient inter-thread synchronization for helper threads
US8612730B2 (en) 2010-06-08 2013-12-17 International Business Machines Corporation Hardware assist thread for dynamic performance profiling
US20120005457A1 (en) * 2010-07-01 2012-01-05 International Business Machines Corporation Using software-controlled smt priority to optimize data prefetch with assist thread
FR2962567B1 (en) * 2010-07-12 2013-04-26 Bull Sas METHOD FOR OPTIMIZING MEMORY ACCESS, WHEN RE-EXECUTING AN APPLICATION, IN A MICROPROCESSOR COMPRISING SEVERAL LOGICAL HEARTS AND COMPUTER PROGRAM USING SUCH A METHOD
US8683129B2 (en) * 2010-10-21 2014-03-25 Oracle International Corporation Using speculative cache requests to reduce cache miss delays
US20130086564A1 (en) * 2011-08-26 2013-04-04 Cognitive Electronics, Inc. Methods and systems for optimizing execution of a program in an environment having simultaneously parallel and serial processing capability
US9021152B2 (en) * 2013-09-30 2015-04-28 Google Inc. Methods and systems for determining memory usage ratings for a process configured to run on a device
KR102525295B1 (en) 2016-01-06 2023-04-25 삼성전자주식회사 Method for managing data and apparatus thereof
JP6845657B2 (en) * 2016-10-12 2021-03-24 株式会社日立製作所 Management server, management method and its program
CN106776047B (en) * 2017-01-19 2019-08-02 郑州轻工业学院 Group-wise thread forecasting method towards irregular data-intensive application
US20180260255A1 (en) * 2017-03-10 2018-09-13 Futurewei Technologies, Inc. Lock-free reference counting
US11816500B2 (en) 2019-03-15 2023-11-14 Intel Corporation Systems and methods for synchronization of multi-thread lanes
US11132268B2 (en) 2019-10-21 2021-09-28 The Boeing Company System and method for synchronizing communications between a plurality of processors

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5590293A (en) * 1988-07-20 1996-12-31 Digital Equipment Corporation Dynamic microbranching with programmable hold on condition, to programmable dynamic microbranching delay minimization
US5835947A (en) * 1996-05-31 1998-11-10 Sun Microsystems, Inc. Central processing unit and method for improving instruction cache miss latencies using an instruction buffer which conditionally stores additional addresses
US5809566A (en) * 1996-08-14 1998-09-15 International Business Machines Corporation Automatic cache prefetch timing with dynamic trigger migration
US6199154B1 (en) * 1997-11-17 2001-03-06 Advanced Micro Devices, Inc. Selecting cache to fetch in multi-level cache system based on fetch address source and pre-fetching additional data to the cache for future access
US6223276B1 (en) * 1998-03-31 2001-04-24 Intel Corporation Pipelined processing of short data streams using data prefetching
US6643766B1 (en) * 2000-05-04 2003-11-04 Hewlett-Packard Development Company, L.P. Speculative pre-fetching additional line on cache miss if no request pending in out-of-order processor

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
DORAI G ET AL: "Optimizing SMT Processors for High Single-Thread Performance", THE JOURNAL OF INSTRUCTION-LEVEL PARALLELISM, vol. 5, April 2003 (2003-04-01), XP002348824, Retrieved from the Internet <URL:http://www.jilp.org/vol5/v5paper3.pdf> [retrieved on 20051010] *
HONG WANG ET AL: "Speculative precomputation: exploring the use of multithreading for latency", INTEL TECHNOLOGY JOURNAL, vol. 6, no. 1, 14 February 2002 (2002-02-14), XP002303432 *
KIM D ET AL: "Design and evaluation of compiler algorithms for pre-execution", ASPLOS. PROCEEDINGS. INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, NEW YORK, NY, US, October 2002 (2002-10-01), pages 159 - 170, XP002311601 *
LIAO S S W ET AL: "Post-pass binary adaptation for software-based speculative precomputation", ACM SIGPLAN NOTICES, ACM, ASSOCIATION FOR COMPUTING MACHINERY, NEW YORK, NY, US, vol. 37, no. 5, May 2002 (2002-05-01), pages 117 - 128, XP002302652, ISSN: 0362-1340 *

Also Published As

Publication number Publication date
US7328433B2 (en) 2008-02-05
JP5118744B2 (en) 2013-01-16
CN1890635B (en) 2011-03-09
CN1890635A (en) 2007-01-03
JP2007507807A (en) 2007-03-29
JP4783291B2 (en) 2011-09-28
JP2011090705A (en) 2011-05-06
EP1678610A2 (en) 2006-07-12
US20050086652A1 (en) 2005-04-21
WO2005033926A2 (en) 2005-04-14

Similar Documents

Publication Publication Date Title
WO2005033926A3 (en) Methods and apparatus for reducing memory latency in a software application
EP1702269B1 (en) Dynamic performance monitoring-based approach to memory management
Schoeberl A time predictable instruction cache for a Java processor
US9652230B2 (en) Computer processor employing dedicated hardware mechanism controlling the initialization and invalidation of cache lines
WO2004027605A3 (en) Post-pass binary adaptation for software-based speculative precomputation
Ekman et al. A robust main-memory compression scheme
US6397296B1 (en) Two-level instruction cache for embedded processors
US9274965B2 (en) Prefetching data
US7401188B2 (en) Method, device, and system to avoid flushing the contents of a cache by not inserting data from large requests
Zhuang et al. Reducing cache pollution via dynamic data prefetch filtering
US7278136B2 (en) Reducing processor energy consumption using compile-time information
GB2409747A (en) Processor cache memory as ram for execution of boot code
Luk et al. Automatic compiler-inserted prefetching for pointer-based applications
WO2004068339A3 (en) Multithreaded processor with recoupled data and instruction prefetch
Chen et al. TEST: a tracer for extracting speculative threads
WO2004055667A3 (en) System and method for data prefetching
EP1460532A3 (en) Computer processor data fetch unit and related method
WO2004102376A3 (en) Apparatus and method to provide multithreaded computer processing
McCurdy et al. Characterizing the impact of prefetching on scientific application performance
Guttman et al. Performance and energy evaluation of data prefetching on intel xeon phi
WO2002027498A3 (en) System and method for identifying and managing streaming-data
Kiani et al. Skerd: Reuse distance analysis for simultaneous multiple gpu kernel executions
Lewis et al. Avoiding initialization misses to the heap
Liu et al. Enhancements for accurate and timely streaming prefetcher
Cebrian et al. Boosting Store Buffer Efficiency with Store-Prefetch Bursts

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200480035709.X

Country of ref document: CN

AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2006534105

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 2004789368

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 2004789368

Country of ref document: EP