Skip to main content Accessibility help
×
  • Cited by 13
Publisher:
Cambridge University Press
Online publication date:
June 2012
Print publication year:
2009
Online ISBN:
9780511811258

Book description

This book gives a comprehensive description of the architecture of microprocessors from simple in-order short pipeline designs to out-of-order superscalars. It discusses topics such as:The policies and mechanisms needed for out-of-order processing such as register renaming, reservation stations, and reorder buffers Optimizations for high performance such as branch predictors, instruction scheduling, and load-store speculationsDesign choices and enhancements to tolerate latency in the cache hierarchy of single and multiple processorsState-of-the-art multithreading and multiprocessing emphasizing single chip implementationsTopics are presented as conceptual ideas, with metrics to assess the performance impact, if appropriate, and examples of realization. The emphasis is on how things work at a black box and algorithmic level. The author also provides sufficient detail at the register transfer level so that readers can appreciate how design features enhance performance as well as complexity.

Reviews

'The book gives a comprehensive and profound description of the architecture of microprocessors from simple in-order short pipeline designs to out-of-order superscalers … The book is very well structured with good presentation of the material. Each chapter finishes with many exercises, examples and additional literature. All the references are very up-to-date. As a conclusion I warmly recommend this book as a textbook for a course or just as a reference book.'

Source: Zentralblatt MATH

Refine List

Actions for selected content:

Select all | Deselect all
  • View selected items
  • Export citations
  • Download PDF (zip)
  • Save to Kindle
  • Save to Dropbox
  • Save to Google Drive

Save Search

You can save your searches here and later view and run them again in "My saved searches".

Please provide a title, maximum of 40 characters.
×

Contents

Bibliography
Abel, N., Budnick, D., Kuck, D., Muraoka, Y., Northcote, R., and Wilhelmson, R., “TRANQUIL: A Language for an Array Processing Computer,” Proc. AFIPS SJCC, 1969, 57–73
Adiletta, M., Rosenbluth, M., Bernstein, D., Wolrich, G., and Wilkinson, H., “The Next Generation of Intel IXP Network Processors,” Intel Tech. Journal, 6, 3, Aug. 2002, 6–18
Adve, S. and Gharachorloo, K., “Shared Memory Consistency Models: A Tutorial,” IEEE Computer, 29, 12, Dec. 1996, 66–76
Agarwal, A., Bianchini, R., Chaiken, D., Johnson, K., Kranz, D., Kubiatowicz, J., Lim, B.-H., Mackenzie, K., and Yeung, D., “The MIT Alewife Machine: Architecture and Performance,” Proc. 22nd Int. Symp. on Computer Architecture, 1995, 2–13
Agarwal, A., Lim, B.-H., Kranz, D., and Kubiatowicz, J., “APRIL: A Processor Architecture for Multiprocessing,” Proc. 17th Int. Symp. on Computer Architecture, 1990, 104–114
Agarwal, A. and Pudar, S., “Column-Associative Caches: A Technique for Reducing the Miss Rate of Direct-Mapped Caches,” Proc. 20th Int. Symp. on Computer Architecture, 1993, 179–190
Agarwal, A., Simoni, R., Hennessy, J., and Horowitz, M., “An Evaluation of Directory Schemes for Cache Coherence,” Proc. 15th Int. Symp. on Computer Architecture, 1988, 280–289
Aggarwal, A. and Franklin, M., “Scalability Aspects of Instruction Distribution Algorithms for Clustered Processors,” IEEE Trans. on Parallel and Distributed Systems, 16, 10, Oct. 2005, 944–955
Akkary, H. and Driscoll, M., “A Dynamic Multithreading Processor,” Proc. 31st Int. Symp. on Microarchitecture, 1998, 226–236
Albonesi, D., Balasubramonian, R., Dropsho, S., Dwarkadas, S., Friedman, E., Huang, M., Kursun, V., Magklis, G., Scott, M., Semeraro, G., Bose, P., Buyuktosunoglu, A., Cook, P., and Schuster, S., “Dynamic Tuning Processor Resources with Adaptive Processing,” IEEE Computer, 36, 12, Dec. 2003, 49–58
Alverson, R., Callahan, D., Cummings, D., Koblenz, B., Porterfield, A., and Smith, B., “The Tera Computer System,” Proc. Int. Conf. on Supercomputing, 1990, 1–6
Amdahl, G., “Validity of the Single Processor Approach to Achieving Large Scale Computing Capabilities,” Proc. AFIPS SJCC, 30, Apr. 1967, 483–485
Anderson, D., Sparacio, F., and Tomasulo, R., “Machine Philosophy and Instruction Handling,” IBM Journal of Research and Development, 11, 1, Jan. 1967, 8–24
Anderson, S., Earle, J., Goldschmitt, R., and Powers, D., “The IBM System/360 Model 91: Floating-point Execution Unit,” IBM Journal of Research and Development, 11, Jan. 1967, 34–53
Anderson, T., “The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors,” IEEE Trans. on Parallel and Distributed Systems, 1, 1, Jan. 1990, 6–16
Archibald, J. and Baer, J.-L., “An Economical Solution to the Cache Coherence Problem,” Proc. 12th Int. Symp. on Computer Architecture, 1985, 355–362
Archibald, J. and Baer, J.-L., “Cache Coherence Protocols: Evaluation Using a Multiprocessor Simulation Model,” ACM Trans. on Computing Systems, 4, 4, Nov. 1986, 273–298
August, D., Connors, D., Mahlke, S., Sias, J., Crozier, K., Cheng, B., Eaton, P., Olaniran, Q., and Hwu, W.-m., “Integrated Predicated and Speculative Execution in the IMPACT EPIC Architecture,” Proc. 25th Int. Symp. on Computer Architecture, 1998, 227–237
Austin, T., Larson, D., and Ernst, D., “SimpleScalar: An Infrastructure for Computer System Modeling,” IEEE Computer, 35, 2, Feb. 2002, 59–67
Baer, J.-L. and Wang, W.-H., “On the Inclusion Properties for Multi-Level Cache Hierarchies,” Proc. 15th Int. Symp. on Computer Architecture, 1988, 73–80
Baetke, F., “The CONVEX Exemplar SPP1000 and SPP1200 – New Scalable Parallel Systems with a Virtual Shared Memory Architecture,” in Dongarra, J., Grandinetti, L., Joubert, G., and Kowalik, J., Eds., High Performance Computing: Technology, Methods and Applications, Elsevier Press, 1995, 81–102
Balasubramonian, R., Albonesi, D., Buyuktosunoglu, A., and Dwarkadas, S., “Memory Hierarchy Reconfiguration for Energy and Performance in General-purpose Processor Architectures,” Proc. 33rd Int. Symp. on Microarchitecture, 2000, 245–257
Belady, L., “A Study of Replacement Algorithms for a Virtual Storage Computer,” IBM Systems Journal, 5, 1966, 78–101
Bernstein, A., “Analysis of Programs for Parallel Processing,” IEEE Trans. on Electronic Computers, EC-15, Oct. 1966, 746–757
Bhandarkar, D., Alpha Implementations and Architecture. Complete Reference and Guide, Digital Press, Boston, 1995
Boggs, D., Baktha, A., Hawkins, J., Marr, D., Miller, J., Roussel, P., Singhal, R., Toll, B., and Venkatraman, K., “The Microarchitecture of the Pentium 4 Processor on 90nm Technology,” Intel Tech. Journal, 8, 1, Feb. 2004, 1–17
Borkenhagen, J., Eickemeyer, R., Kalla, R., and Kunkel, S., “A Multithreaded PowerPC Processor for Commercial Servers,” IBM Journal of Research and Development, 44, 6, 2000, 885–899
Brooks, D. and Martonosi, M., “Dynamic Thermal Management in High-Performance Microprocessors,” Proc.7th Int. Symp. on High-Performance Computer Architecture, 2001, 171–182
Bucholz, W., Ed., Planning a Computer System: Project Stretch, McGraw-Hill, New York, 1962
Calder, B. and Grunwald, D., “Fast & Accurate Instruction Fetch and Branch Prediction,” Proc. 21st Int. Symp. on Computer Architecture, 1994, 2–11
Calder, B. and Grunwald, D., “Next Cache Line and Set Prediction,” Proc. 22nd Int. Symp. on Computer Architecture, 1995, 287–296
Calder, B., Grunwald, D., and Emer, J., “Predictive Sequential Associative Cache,” Proc. 2nd Int. Symp. on High-Performance Computer Architecture, 1996, 244–253
Calder, B. and Reinmann, G., “A Comparative Survey of Load Speculation Architectures,” Journal of Instruction-Level Parallelism, 1, 2000, 1–39
Canal, R., Parcerisa, J.M., and Gonzales, A., “Dynamic Cluster Assignment Mechanisms,” Proc. 6th Int. Symp. on High-Performance Computer Architecture, 2000, 133–141
Cantin, J. and Hill, M., Cache Performance for SPEC CPU2000 Benchmarks, Version 3.0, May 2003, http://www.cs.wisc.edu/multifacet/misc/spec2000cache-data/
Case, R. and Padegs, A., “The Architecture of the IBM System/370,” Communications of the ACM, 21, 1, Jan. 1978, 73–96
Censier, L. and Feautrier, P., “A New Solution to Coherence Problems in Multicache Systems,” IEEE Trans. on Computers, 27, 12, Dec. 1978, 1112–1118
Chan, K., Hay, C., Keller, J., Kurpanek, G., Shumaker, F., and Zheng, J., “Design of the HP PA 7200 CPU,” Hewlett Packard Journal, 47, 1, Jan. 1996, 25–33
Chaudhry, S., Caprioli, P., Yip, S., and Tremblay, M., “High-Performance Throughput Computing,” IEEE Micro, 25, 3, May 2005, 32–45
Chen, T.-F. and Baer, J.-L., “Effective Hardware-based Data Prefetching for High-Performance Processors,” IEEE Trans. on Computers, 44, 5, May 1995, 609–623
Cheng, I-C., Coffey, J., and Mudge, T., “Analysis of Branch Prediction via Data Compression,” Proc. 7th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Oct. 1996, 128–137
Christie, D., “Developing the AMD-K5 Architecture,” IEEE Micro, 16, 2, Mar. 1996, 16–27
Chryzos, G. and Emer, J., “Memory Dependence Prediction Using Store Sets,” Proc. 25th Int. Symp. on Computer Architecture, 1998, 142–153
Citron, D., Hurani, A., and Gnadrey, A., “The Harmonic or Geometric Mean: Does it Really Matter,” Computer Architecture News, 34, 6, Sep. 2006, 19–26
Colwell, R., Papworth, D., Hinton, G., Fetterman, M., and Glew, A., “Intel's P6 Microarchitecture,” Chapter 7 in Shen, J. P. and Lipasti, M., Eds., Modern Processor Design, 2005, 329–367
Conte, T., Memezes, K., Mills, P., and Patel, B., “Optimization of Instruction Fetch Mechanisms for High Issue Rates,” Proc. 22nd Int. Symp. on Computer Architecture, 1995, 333–344
Conti, C., Gibson, D., and Pitkowsky, S., “Structural Aspects of the IBM System 360/85; General Organization,” IBM Systems Journal, 7, 1968, 2–14
Cooksey, R., Jourdan, S., and Grunwald, D., “A Stateless, Content-Directed Data Prefetching Mechanism,” Proc. 10th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Oct. 2002, 279–290
Crisp, R., “Direct Rambus Technology: The New Main Memory Standard,” IEEE Micro, 17, 6, Nov.–Dec. 1997, 18–28
Culler, D. and Singh, J.P. with Gupta, A., Parallel Computer Architecture: A Hardware/Software Approach, Morgan Kaufman Publishers, San Francisco, 1999
Cuppu, V., Jacob, B., Davis, B., and Mudge, T., “High-Performance DRAMs in Workstation Environments,” IEEE Trans. on Computers, 50, 11, Nov. 2001, 1133–1153
Curnow, H. and Wichman, B., “Synthetic Benchmark,” Computer Journal, 19, 1, Feb. 1976
Cvetanovic, Z. and Bhandarkar, D., “Performance Characterization of the Alpha 21164 Microprocessor Using TP and SPEC Workloads,” Proc. 2nd Int. Symp. on High-Performance Computer Architecture, 1996, 270–280
Cvetanovic, Z. and Kessler, R., “Performance Analysis of the Alpha 21264-based Compaq ES40 System,” Proc. 27th Int. Symp. on Computer Architecture, 2000, 192–202
Dally, W., “Virtual-Channel Flow Control,” Proc. 17th Int. Symp. on Computer Architecture, 1990, 60–68
Denning, P., “Virtual Memory,” ACM Computing Surveys, 2, Sep. 1970, 153–189
Dennis, J. and Misunas, D., “A Preliminary Data Flow Architecture for a Basic Data Flow Processor,” Proc. 2nd Int. Symp. on Computer Architecture, 1974, 126–132
Dongarra, J., Bunch, J., Moler, C., and Stewart, G., LINPACK User's Guide, SIAM, Philadelphia, 1979
Dongarra, J., Luszczek, P., and Petitet, A., “The LINPACK Benchmark: Past, Present, and Future,” Concurrency and Computation: Practice and Experience, 15, 2003, 1–18
Dubois, M., Scheurich, C., and Briggs, F., “Memory Access Buffering in Multiprocessors,” Proc. 13th Int. Symp. on Computer Architecture, 1986, 434–442
Eden, A. and Mudge, T., “The YAGS Branch Prediction Scheme,” Proc. 31st Int. Symp. on Microarchitecture, 1998, 69–77
Edmondson, J., Rubinfeld, P., Preston, R., and Rajagopalan, V., “Superscalar Instruction Execution in the 21164 Alpha Microprocessor,” IEEE Micro, 15, 2, Apr. 1995, 33–43
Eggers, S., Emer, J., Levy, H., Lo, J., Stamm, R., and Tullsen, D., “Simultaneous Multithreading: A Platform for Next-Generation Processors,” IEEE Micro, 17, 5, Sep. 1997, 12–19
Fagin, B. and Russell, K., “Partial Resolution in Branch Target Buffers,” Proc. 28th Int. Symp. on Microarchitecture, 1995, 193–198
Farkas, D. and Jouppi, N., “Complexity/Performance Trade-offs with Non-Blocking Loads,” Proc. 21st Int. Symp. on Computer Architecture, 1994, 211–222
Fields, B., Bodik, R., and Hill, M., “Slack: Maximizing Performance under Technological Constraints,” Proc. 29th Int. Symp. on Computer Architecture, 2002, 47–58
Flynn, M., “Very High Speed Computing Systems,” Proc. IEEE, 54, 12, Dec. 1966, 1901–1909
Folegnani, D. and Gonzales, A., “Energy-effective Issue Logic,” Proc. 28th Int. Symp. on Computer Architecture, 2001, 230–239
Franklin, M. and Sohi, G., “A Hardware Mechanism for Dynamic Reordering of Memory References,” IEEE Trans. on Computers, 45, 6, Jun. 1996, 552–571
Gharachorloo, K., Gupta, A., and Hennessy, J., “Two Techniques to Enhance the Performance of Memory Consistency Models,” Proc. Int. Conf. on Parallel Processing, 1991, I-355–364
Gochman, S., Ronen, R., Anati, I., Berkovits, R., Kurts, T., Naveh, A., Saeed, A., Sperber, Z., and Valentine, R., “The Intel Pentium M Processor: Microarchitecture and Performance,” Intel Tech. Journal, 07, 2, May 2003, 21–39
Golden, M. and Mudge, T., “A Comparison of Two Pipeline Organizations,” Proc. 27th Int. Symp. on Microarchitecture, 1994, 153–161
Goodman, J., Vernon, M., and Woest, P., “Efficient Synchronization Primitives for Large-Scale Cache Coherent Multiprocessors,” Proc. 3rd Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Apr. 1989, 64–73
Graunke, G. and Thakkar, S., “Synchronization Algorithms for Shared-Memory Multiprocessors,” IEEE Computer, 23, 6, Jun. 1990, 60–70
Grunwald, D., Levis, P., Farkas, K., Morrey, C., and Neufeld, M., “Policies for Dynamic Clock Scheduling,” Proc. 4th USENIX Symp. on Operating Systems Design and Implementation, 2000, 73–86
Gschwind, M., Hofstee, H., Flachs, B., Hopkins, M., Watanabe, Y., and Yamazaki, T., “Synergistic Processing in Cell's Multicore Architecture,” IEEE Micro, 26, 2, Mar. 2006, 11–24
Gunther, S., Beans, F., Carmean, D., and Hall, J., “Managing the Impact of Increasing Power Consumption,” Intel Tech. Journal, 5, 1, Feb. 2001, 1–9
Gwennap, L., “Brainiacs, Speed Demons, and Farewell,” Microprocessor Report Newsletter, 13, 7, Dec. 1999
Hallnor, E. and Reinhardt, S., “A Fully Associative Software-Managed Cache Design,” Proc. 27th Int. Symp. on Computer Architecture, 2000, 107–116
Hao, E., Chang, P.-Y., and Patt, Y., “The Effect of Speculatively Updating Branch History on Branch Prediction Accuracy, Revisited,” Proc. 27th Int. Symp. on Microarchitecture, 1994, 228–232
Harstein, A. and Puzak, T., “The Optimum Pipeline Depth for a Microprocessor,” Proc. 29th Int. Symp. on Computer Architecture, 2002, 7–13
Hennessy, J. and Patterson, D., Computer Architecture: A Quantitative Approach, Fourth Edition, Elsevier Inc., San Francisco, 2007
Henning, J., Ed., “SPEC CPU2006 Benchmark Descriptions,” Computer Architecture News, 36, 4, Sep. 2006, 1–17
Hill, M., Aspects of Cache Memory and Instruction Buffer Performance, Ph.D. Dissertation, Univ. of California, Berkeley, Nov. 1987
Hill, M., “Multiprocessors Should Support Simple Memory-Consistency Models,” IEEE Computer, 31, 8, Aug. 1998, 28–34
Hinton, G., Sager, D., Upton, M., Boggs, D., Carmean, D., Kyker, A., and Roussel, P., “The Microarchitecture of the Pentium4 Processor,” Intel Tech. Journal, 1, Feb. 2001
Ho, R., Mai, K., and Horowitz, M., “The Future of Wires,” Proc. of the IEEE, 89, 4, Apr. 2001, 490–504
Hrishikesh, M., Jouppi, N., Farkas, K., Burger, D., Keckler, S., and Shivakumar, P., “The Optimal Logic Depth per Pipeline Stage is 6 to 8 FO4 Inverter Delays,” Proc. 29th Int. Symp. on Computer Architecture, 2002, 14–24
Huck, J., Morris, D., Ross, J., Knies, A., Mulder, H., and Zahir, R., “Introducing the IA-64 Architecture,” IEEE Micro, 20, 5, Sep. 2000, 12–23
Hwu, W.-m. and Patt, Y., “HPSm, A High-Performance Restricted Data Flow Architecture Having Minimal Functionality,” Proc. 13th Int. Symp. on Computer Architecture, 1986, 297–307
,Intel Corp., A Tour of the P6 Microarchitecture, 1995, http://www.x86.org/ftp/manuals/686/p6tour.pdf
Jeremiassen, T. and Eggers, S., “Reducing False Sharing on Shared Memory Multiprocessors through Compile Time Data Transformations,” Proc. 5th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, 1995, 179–188
Jiménez, D., Keckler, S., and Lin, C., “The Impact of Delay on the Design of Branch Predictors,” Proc. 33rd Int. Symp. on Microarchitecture, 2000, 67–76
John, L., “More on Finding a Single Number to Indicate Overall Performance of a Benchmark Suite,” Computer Architecture News, 32, 1, Mar. 2004, 3–8
Joseph, D. and Grunwald, D., “Prefetching Using Markov Predictors,” Proc. 24th Int. Symp. on Computer Architecture, 1997, 252–263
Jouppi, N., “Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers,” Proc. 17th Int. Symp. on Computer Architecture, 1990, 364–373
Jourdan, S., Stark, J., Hsing, T.-H., and Patt, Y., “Recovery Requirements of Branch Prediction Storage Structures in the Presence of Mispredicted-path Execution,” International Journal of Parallel Programming, 25, Oct. 1997, 363–383
Kaeli, D. and Emma, P., “Branch History Table Prediction of Moving Target Branches Due to Subroutine Returns,” Proc. 18th Int. Symp. on Computer Architecture, 1991, 34–42
Kagi, A., Burger, D., and Goodman, J., “Efficient Synchronization: Let them Eat QOLB,” Proc. 24th Int. Symp. on Computer Architecture, 1997, 170–180
Kahle, J., Day, M., Hofstee, H., Johns, C., Maeurer, T., and Shippy, D., “Introduction to the Cell Multiprocessor,” IBM Journal of Research and Development, 49, 4/5, Jul. 2005, 589–604
Kalamatianos, J., Khalafi, A., Kaeli, D., and Meleis, W., “Analysis of Temporal-based Program Behavior for Improved Instruction Cache Performance,” IEEE Trans. on Computers, 48, 2, Feb. 1999, 168–175
Kalla, R., Sinharoy, B., and Tendler, J., “IBM Power5 Chip: A Dual-Core Multithreaded Processor,” IEEE Micro, 24, 2, Apr. 2004, 40–47
Kapasi, U., Rixner, S., Dally, W., Khailany, B., Ahn, J., Mattson, P., and Owens, J., “Programmable Stream Processors,” IEEE Computer, 36, 8, Aug. 2003, 54–62
Kaxiras, S., Hu, Z., and Martonosi, M., “Cache Decay: Exploiting Generational Behavior to Reduce Cache Leakage Power,” Proc. 28th Int. Symp. on Computer Architecture, 2001, 240–251
Keller, R., “Look-Ahead Processors,” ACM Computing Surveys, 7, 4, Dec. 1975, 177–195
Keltcher, C., McGrath, J., Ahmed, A., and Conway, P., “The AMD Opteron for Multiprocessor Servers,” IEEE Micro, 23, 2, 2003, 66–76
,Kendall Square Research, KSR1 Technology Background, Waltham, MA, 1992
Kermani, P. and Kleinrock, L., “Virtual Cut-through: A New Computer Communication Switching Technique,” Computer Networks, 3, 4, Sep. 1979, 267–286
Kerns, D. and Eggers, S., “Balanced Scheduling: Instruction Scheduling when Memory Latency is Uncertain,” Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation, SIGPLAN Notices, 28, 6, Jun. 1993, 278–289
Keshava, J. and Pentkovski, V., “Pentium III Processor Implementation Tradeoffs,” Intel Tech. Journal, 2, May 1999
Kessler, R., “The Alpha 21264 Microprocessor,” IEEE Micro, 19, 2, Mar. 1999, 24–36
Kessler, R., Jooss, R., Lebeck, A., and Hill, M., “Inexpensive Implementations of Set-Associativity,” Proc. 16th Int. Symp. on Computer Architecture, 1989, 131–139
Kilburn, T., Edwards, D., Lanigan, M., and Sumner, F., “One-level Storage System,” IRE Trans. on Electronic Computers, EC-11, 2, Apr. 1962, 223–235
Kim, C., Burger, D., and Keckler, S., “An Adaptive, Non-Uniform Cache Structure for Wire-Delay Dominated On-Chip Caches,” Proc. 10th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Oct. 2002, 211–222
Kim, N., Flautner, K., Blaauw, D., and Mudge, T., “Drowsy Instruction Caches – Leakage Power Reduction Using Dynamic Voltage Scaling and Cache Sub-bank Prediction,” Proc. 29th Int. Symp. on Computer Architecture, 2002, 219–230
KleinOsowski, A. and Lilja, D., “MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research,” Computer Architecture Letters, 1, Jun. 2002
Kogge, P., The Architecture of Pipelined Computers, McGraw-Hill, New York, 1981
Kongetira, P., Aingaran, K., and Olukotun, K., “Niagara: A 32-way Multithreaded Sparc Processor,” IEEE Micro, 24, 2, Apr. 2005, 21–29
Koufaty, D. and Marr, D., “Hyperthreading Technology in the Netburst Microarchitecture,” IEEE Micro, 23, 2, Mar. 2003, 56–65
Kroft, D., “Lockup-Free Instruction Fetch/Prefetch Cache Organization,” Proc. 8th Int. Symp. on Computer Architecture, 1981, 81–87
Lai, A., Fide, C., and Falsafi, B., “Dead-block Prediction & Dead-block Correlation Prefetchers,” Proc. 28th Int. Symp. on Computer Architecture, 2001, 144–154
Lam, M., “Software Pipelining: An Effective Scheduling Technique for VLIW Machines,” Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation, SIGPLAN Notices, 23, 7, Jul. 1988, 318–328
Lamport, L., “How to Make a Multiprocessor Computer that Correctly Executes Programs,” IEEE Trans. on Computers, 28, 9, Sep. 1979, 690–691
Larus, J. and Kozyrakis, C., “Transactional Memory,” Communications of the ACM, 51, 7, Jul. 2008, 80–88
Lee, D., Crowley, P., Baer, J.-L., Anderson, T., and Bershad, B., “Execution Characteristics of Desktop Applications on Windows NT,” Proc. 25th Int. Symp. on Computer Architecture, 1998, 27–38
Lee, J., “Study of ‘Look-Aside’ Memory,” IEEE Trans. on Computers, C-18, 11, Nov. 1969, 1062–1065
Lee, J. and Smith, A., “Branch Prediction Strategies and Branch Target Buffer Design,” IEEE Computer, 17, 1, Jan. 1984, 6–22
Lin, W.-F., Reinhardt, S., and Burger, D., “Designing a Modern Memory Hierarchy with Hardware Prefetching,” IEEE Trans. on Computers, 50, 11, Nov. 2001, 1202–1218
Lipasti, M. and Shen, J.P., “Exceeding the Dataflow Limit with Value Prediction,” Proc. 29th Int. Symp. on Microarchitecture, 1996, 226–237
Liptay, J., “Design of the IBM Enterprise System/9000 High-end Processor,” IBM Journal of Research and Development, 36, 4, Jul. 1992, 713–731
Lo, J., Barroso, L., Eggers, S., Gharachorloo, K., Levy, H., and Parekh, S., “An Analysis of Database Workload Performance on Simultaneous Multithreaded Processors,” Proc. 25th Int. Symp. on Computer Architecture, 1998, 39–50
Loh, G., “Advanced Instruction Flow Techniques,” Chapter 9 in Shen, J. P. and Lipasti, M., Eds., Modern Processor Design, 2005, 453–518
Lovett, T. and Clapp, R., “STiNG: A CC-NUMA Computer System for the Commercial Marketplace,” Proc. 23rd Int. Symp. on Computer Architecture, 1996, 308–317
Lovett, T. and Thakkar, S., “The Symmetry Multiprocessor System,” Proc. Int. Conf. on Parallel Processing, Aug. 1988, pp. 303–310
Mathis, H., Mericas, A., McCalpin, J., Eickemeyer, R., and Kunkel, S., “Characterization of Simultaneous Multithreading (SMT) Efficiency in Power5,” IBM Journal of Research and Development, 49, 4, Jul. 2005, 555–564
Mattson, R., Gecsei, J., Slutz, D., and Traiger, I., “Evaluation Techniques for Storage Hierarchies,” IBM Systems Journal, 9, 1970, 78–117
McFarling, S., “Combining Branch Predictors,” WRL Technical Note, TN-36, Jun. 1993
McMahon, H., “The Livermore Fortran Kernels Test of the Numerical Performance Range,” in Martin, J. L., Ed., Performance Evaluation of Supercomputers, Elsevier Science B.V., North-Holland, Amsterdam, 1988, 143–186.
McNairy, C. and Soltis, D., “Itanium 2 Processor Microarchitecture,” IEEE Micro, 23, 2, Mar. 2003, 44–55
Mendelson, A., Mandelblat, J., Gochman, S., Shemer, A., Chabukswar, R., Niemeyer, E., and Kumar, A., “CMP Implementation in Systems Based on the Intel Core Duo Processor,” Intel Tech. Journal, 10, 2, May 2006, 99–107
Moore, G., “Cramming More Components onto Integrated Circuits,” Electronics, 38, 8, Apr. 1965
Moshovos, A., Breach, S., Vijaykumar, T., and Sohi, G., “Dynamic Speculation and Synchronization of Data Dependences,” Proc. 24th Int. Symp. on Computer Architecture, 1997, 181–193
Mowry, T., Lam, M., and Gupta, A., “Tolerating Latency Through Software-Controlled Prefetching in Shared-Memory Multiprocessors,” Proc. 5th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Oct. 1992, 62–73
Mutlu, O., Stark, J., Wilkerson, C., and Patt, Y., “Run-ahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors,” Proc. 9th Int. Symp. on High-Performance Computer Architecture, 2003, 129–140
Naveh, A., Rotem, E., Mendelson, A., Gochman, S., Chabuskwar, R., Krishnan, K., and Kumar, A., “Power and Thermal Management in the Intel Core Dual Processor,” Intel Tech. Journal, 10, 2, May 2006, 109–122
Ozer, E., Banerjia, S., and Conte, T., “Unified Assign and Schedule: A New Approach to Scheduling for Clustered Register File Microarchitectures,” Proc. 31st Int. Symp. on Microarchitecture, 1998, 308–315
Palacharla, S., Jouppi, N., and Smith, J., “Complexity-Effective Superscalar Processors,” Proc. 24th Int. Symp. on Computer Architecture, 1997, 206–218
Palacharla, S. and Kessler, R., “Evaluating Stream Buffers as a Secondary Cache Replacement,” Proc. 21st Int. Symp. on Computer Architecture, 1994, 24–33
Pan, S., So, K., and Rahmey, J., “Improving the Accuracy of Dynamic Branch Prediction using Branch Correlation,” Proc. 5th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Oct. 1992, 76–84
Papamarcos, M. and Patel, J., “A Low-overhead Coherence Solution for Multiprocessors with Private Cache Memories,” Proc. 12th Int. Symp. on Computer Architecture, 1985, 348–354
Papworth, D., “Tuning the Pentium Pro Microarchitecture,” IEEE Micro, 16, 2, Mar. 1996, 8–15
Patel, S., Friendly, D., and Patt, Y., “Evaluation of Design Options for the Trace Cache Fetch Mechanism,” IEEE Trans. on Computers, 48, 2, Feb. 1999, 193–204
Patterson, D. and Hennessy, J., Computer Organization & Design: The Hardware/Software Interface, Third Edition, Morgan Kaufman Publishers, San Francisco, 2004
Patterson, D. and Séquin, C., “RISC I: A Reduced Instruction Set VLSI Computer,” Proc. 8th Int. Symp. on Computer Architecture, 1981, 443–457.
Peir, J.-K., Hsu, W., and Smith, A., “Functional Implementations Techniques for CPU Cache Memories,” IEEE Trans. on Computers, 48, 2, Feb. 1999, 100–110
Peleg, A. and Weiser, U., “Dynamic Flow Instruction Cache Memory Organized Around Trace Segments Independent of Virtual Address Line,” U.S. Patent Number 5,381,533, 1994
Peleg, A. and Weiser, U., “MMX Technology Extension to the Intel Architecture,” IEEE Micro, 16, 4, Aug. 1996, 42–50
Perleberg, C. and Smith, A., “Branch Target Buffer Design and Optimization,” IEEE Trans. on Computers, 42, 4, Apr. 1993, 396–412
Pettis, K. and Hansen, R., “Profile Guided Code Positioning,” Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation, SIGPLAN Notices, 25, Jun. 1990, 16–27
Ponomarev, D., Kucuk, G., and Ghose, K., “Reducing Power Requirements of Instruction Scheduling through Dynamic Allocation of Multiple Datapath resources,” Proc. 34th Int. Symp. on Microarchitecture, 2001, 90–101
Postiff, M., Tyson, G., and Mudge, T., “Performance Limits of Trace Caches,” Journal of Instruction-Level Parallelism, 1, Sep. 1999, 1–17
Przybylski, S., Cache Design: A Performance Directed Approach, Morgan Kaufman Publishers, San Francisco, 1990
Pugh, E., Johnson, L., and Palmer, J., IBM's 360 and Early 370 Systems, The MIT Press, Cambridge, MA, 1991
Ranganathan, P., Adve, S., and Jouppi, N., “Performance of Image and Video Processing with General-Purpose Processors and Media ISA Extensions,” Proc. 26th Int. Symp. on Computer Architecture, 1999, 124–135
Riseman, E. and Foster, C., “The Inhibition of Potential Parallelism by Conditional Jumps,” IEEE Trans. on Computers, C-21, 12, Dec. 1972, 1405–1411
Romer, T., Lee, D., Volker, G., Wolman, A., Wong, W., Baer, J.-L., Bershad, B., and Levy, H., “The Structure and Performance of Interpreters,” Proc. 7th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Oct. 1996, pp. 150–159
Rotenberg, E., Bennett, S., and Smith, J., “Trace Cache: A Low Latency Approach to High Bandwidth Instruction Fetching,” Proc. 29th Int. Symp. on Microarchitecture, 1996, 24–34
Rudolf, L. and Segall, Z., “Dynamic Decentralized Cache Schemes for MIMD Parallel Processors,” Proc. 11th Int. Symp. on Computer Architecture, 1984, 340–347
Salverda, P. and Zilles, C., “A Criticality Analysis of Clustering in Superscalar Processors,” Proc. 38th Int. Symp. on Microarchitecture, 2005, 55–66
Schlansker, M. and Rau, B., “EPIC: Explicitly Parallel Instruction Computing,” IEEE Computer, 33, 2, Feb. 2000, 37–45
Scott, S., “Synchronization and Communication in the Cray 3TE Multiprocessor,” Proc. 7th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Oct. 1996, 26–36
Seznec, A., “A Case for Two-way Skewed-Associative Caches,” Proc. 20th Int. Symp. on Computer Architecture, 1993, 169–178
Sharangpani, H. and Arora, K., “Itanium Processor Microarchitecture,” IEEE Micro, 20, 5, Sep. 2000, 24–43
Shen, J. P. and Lipasti, M., Modern Processor Design Fundamentals of Superscalar Processors, McGraw-Hill, 2005
Sherwood, T., Perelman, E., Hamerly, G., Sair, S., and Calder, B., “Discovering and Exploiting Program Phases,” IEEE Micro, 23, 6, Nov.–Dec. 2003, 84–93
Sima, D., “The Design Space of Register Renaming Techniques,” IEEE Micro, 20, 5, Sep. 2000, 70–83
Skadron, K., Martonosi, M., and Clark, D., “Speculative Updates of Local and Global Branch History: A Quantitative Analysis,” Journal of Instruction-Level Parallelism, 2, 2000, 1–23
Skadron, K., Stan, M., Huang, W., Velusamy, S., Sankararayanan, K., and Tarjan, D., “Temperature-Aware Microarchitecture,” Proc. 30th Int. Symp. on Computer Architecture, 2003, 2–13
Slingerland, N. and Smith, A., “Multimedia Extensions for General-Purpose Microprocessors: A Survey,” Microprocessors and Microsystems, 29, 5, Jan. 2005, 225–246
Smith, A., “Cache Memories,” ACM Computing Surveys, 14, 3, Sep. 1982, 473–530
Smith, B., “A Pipelined, Shared Resource MIMD Computer,” Proc. Int. Conf. on Parallel Processing, 1978, 6–8
Smith, J., “A Study of Branch Prediction Strategies,” Proc. 8th Int. Symp. on Computer Architecture, 1981, 135–148
Smith, J., “Characterizing Computer Performance with a Single Number,” Communications of the ACM, 31, 10, Oct. 1988, 1201–1206
Smith, J. and Pleszkun, A., “Implementation of Precise Interrupts in Pipelined Processors,” IEEE Trans. on Computers, C-37, 5, May 1988, 562–573 (an earlier version was published in Proc. 12th Int. Symp. on Computer Architecture, 1985)
Smith, J. and Sohi, G., “The Microarchitecture of Superscalar Processors,” Proc. IEEE, 83, 12, Dec. 1995, 1609–1624
Sohi, G., “Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers,” IEEE Trans. on Computers, C-39, 3, Mar. 1990, 349–359 (an earlier version with co-author S. Vajapeyam was published in Proc. 14th Int. Symp. on Computer Architecture, 1987)
Sohi, G., Breach, S., and Vijaykumar, T., “Multiscalar Processors,” Proc. 22nd Int. Symp. on Computer Architecture, 1995, 414–425
Sohi, G. and Roth, A., “Speculative Multithreaded Processors,” IEEE Computer, 34, 4, Apr. 2001, 66–73
Srinivasan, S., Ju, D.-C., Lebeck, A., and Wilkerson, C., “Locality vs. Criticality,” Proc. 28th Int. Symp. on Computer Architecture, 2001, 132–143
Stark, J., Brown, M., and Patt, Y., “On Pipelining Dynamic Instruction Scheduling Logic,” Proc. 34th Int. Symp. on Microarchitecture, 2000, 57–66
Stunkel, C., Herring, J., Abali, B., and Sivaram, R., “A New Switch Chip for IBM RS/6000 SP Systems,” Proc. Int. Conf. on Supercomputing, 1999, 16–33
Sweazey, P. and Smith, A., “A Class of Compatible Cache Consistency Protocols and their Support by the IEEE Future Bus,” Proc. 13th Int. Symp. on Computer Architecture, 1986, 414–423
Tendler, J., Dodson, J., Fields, Jr. J., Le, H., and Sinharoy, B., “POWER 4 System Microarchitecture,” IBM Journal of Research and Development, 46, 1, Jan. 2002, 5–25
Thornton, J., “Parallel Operation in the Control Data 6600,” Proc. AFIPS. FJCC, pt. 2, vol. 26, 1964, 33–40 (reprinted as Chapter 39 of Bell, C. and Newell, A., Eds., Computer Structures: Readings and Examples, McGraw-Hill, New York, 1971, and Chapter 43 of Siewiorek, D., Bell, C., and Newell, A., Eds., Computer Structures: Principles and Examples, McGraw-Hill, New York, 1982)
Thornton, J., Design of a Computer. The Control Data 6600, Scott, Foresman and Co., Glenview, IL, 1970
Tjaden, G. and Flynn, M., “Detection and Parallel Execution of Independent Instructions,” IEEE Trans. on Computers, C-19, 10, Oct. 1970, 889–895
Tomasulo, R., “An Efficient Algorithm for Exploiting Multiple Arithmetic Units,” IBM Journal of Research and Development, 11, 1, Jan. 1967, 25–33
Tremblay, M., Chan, J., Chaudhry, S., Coniglaro, A., and Tse, S., “The MAJC Architecture: A Synthesis of Parallelism and Scalability,” IEEE Micro, 20, 6, Nov. 2000, 12–25
Tremblay, M. and O'Connor, J., “UltraSparc I: A Four-issue Processor Supporting Multimedia,” IEEE Micro, 16, 2, Apr. 1996, 42–50
Tucker, L. and Robertson, G., “Architecture and Applications of the Connection Machine,” IEEE Computer, 21, 8, Aug. 1988, 26–38
Tullsen, D., Eggers, S., and Levy, H., “Simultaneous Multithreading: Maximizing On-chip Parallelism,” Proc. 22nd Int. Symp. on Computer Architecture, 1995, 392–403
Tune, E., Liang, D., Tullsen, D., and Calder, B., “Dynamic Prediction of Critical Path Instructions,” Proc. 7th Int. Symp. on High-Performance Computer Architecture, 2001, 185–195
Uhlig, R. and Mudge, T., “Trace-driven Memory Simulation: A Survey,” ACM Computing Surveys, 29, 2, Jun. 1997, 128–170
Vanderwiel, S. and Lilja, D., “Data Prefetch Mechanisms,” ACM Computing Surveys, 32, 2, Jun. 2000, 174–199
VanVleet, P., Anderson, E., Brown, L., Baer, J.-L., and Karlin, A., “Pursuing the Performance Potential of Dynamic Cache Lines,” Proc. ICCD, Oct. 1999, 528–537
Venkatachalam, V. and Franz, M., “Power Reduction Techniques for Microprocessor Systems,” ACM Computing Surveys, 37, 3, Sep. 2005, 195–237
Weicker, R., “Dhrystone: A Synthetic Systems Programming Benchmark,” Communications of the ACM, 27, Oct. 1984, 1013–1030
Weiser, M., Welch, B., Demers, A., and Shenker, S., “Scheduling for Reduced CPU Energy,” Proc. 1st USENIX Symp. on Operating Systems Design and Implementation, 1994, 13–23
Weschler, O., “Inside Intel Core Microarchitecture,” Intel White Paper, 2006, http://download.intel.com/technology/architecture/new_architecture_06.pdf
Wilkes, M., “Slave Memories and Dynamic Storage Allocation,” IEEE Trans. on Electronic Computers, EC-14, Apr. 1965, 270–271
Wong, W. and Baer, J.-L., “Modified LRU Policies for Improving Second-Level Cache Behavior,” Proc. 6th Int. Symp. on High-Performance Computer Architecture, 2000, 49–60
Yeager, K., “The MIPS R10000 Superscalar Microprocessor,” IEEE Micro, 16, 2, Apr. 1996, 28–41
Yeh, T.-Y. and Patt, Y., “Alternative Implementations of Two-Level Adaptive Branch Prediction,” Proc. 19th Int. Symp. on Computer Architecture, 1992, 124–134
Yeh, T.-Y. and Patt, Y., “A Comprehensive Instruction Fetch Mechanism for a Processor Supporting Speculative Execution,” Proc. 25th Ann. Symp. on Microarchitecture, 1992, 129–139
Yoaz, A., Erez, M., Ronen, R., and Jourdan, S., “Speculation Techniques for Improving Load Related Instruction Scheduling,” Proc. 26th Int. Symp. on Computer Architecture, 1999, 42–53
Zhang, Z., Zhu, Z., and Zhang, X., “A Permutation-based Page Interleaving Scheme to Reduce Row-buffer Conflicts and Exploit Data Locality,” Proc. 33rd Int. Symp. on Microarchitecture, 2000, 32–41

Metrics

Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Book summary page views

Total views: 0 *
Loading metrics...

* Views captured on Cambridge Core between #date#. This data will be updated every 24 hours.

Usage data cannot currently be displayed.