Skip to main content

2019 | OriginalPaper | Buchkapitel

5. The Processing-in-Memory Paradigm: Mechanisms to Enable Adoption

verfasst von : Saugata Ghose, Kevin Hsieh, Amirali Boroumand, Rachata Ausavarungnirun, Onur Mutlu

Erschienen in: Beyond-CMOS Technologies for Next Generation Computer Design

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Performance improvements from DRAM technology scaling have been lagging behind the improvements from logic technology scaling for many years. As application demand for main memory continues to grow, DRAM-based main memory is increasingly becoming a larger system bottleneck in terms of both performance and energy consumption. A major reason for poor memory performance and energy efficiency is memory’s inability to perform computation. Instead, data stored within DRAM memory must be moved into the CPU before any computation can take place. This data movement is costly, as it requires a high latency and consumes significant energy to transfer the data across the pin-limited memory channel. Moreover, the data moved to the CPU is often not reused, and thus does not benefit from being cached within the CPU, which makes it difficult to amortize the overhead of data movement.
Modern 3D-stacked DRAM architectures provide an opportunity to avoid unnecessary data movement between memory and the CPU. These multi-layer architectures include a logic layer, where compute logic can be integrated underneath multiple layers of DRAM cell arrays (i.e., the memory layers) within the same chip. Architects can take advantage of the logic layer to perform processing-in-memory (PIM), or near-data processing, where some of the computation is moved from the CPU to the logic layer underneath the memory layer. In a PIM architecture, the logic layer within DRAM has access to the high internal bandwidth available within 3D-stacked DRAM (which is much greater than the bandwidth available in the narrow memory channel between DRAM and the CPU). Thus, PIM architectures can effectively free up valuable bandwidth on the bandwidth-limited memory channel while at the same time reducing system energy consumption.
A number of important issues arise when we add compute logic to DRAM. In particular, logic within DRAM does not have low-latency access to common CPU structures that are essential for modern application execution, such as the virtual memory mechanisms, e.g., the translation lookaside buffer (TLB) or the page table walker, and the cache coherence mechanisms, e.g., the coherence directory. To ease the widespread adoption of PIM, we ideally would like to maintain traditional virtual memory abstractions and the shared memory programming model. This requires efficient mechanisms that can provide logic in DRAM with access to virtual memory and cache coherence without having to communicate frequently with the CPU, as off-chip communication between the CPU and DRAM consumes much of the limited bandwidth that PIM aims to avoid using. To this end, we propose and evaluate two general-purpose solutions that can be used by PIM architectures to minimize unnecessary off-chip communication. The first, IMPICA, is an efficient in-memory accelerator for pointer chasing, which can handle address translation entirely within DRAM. The second, LazyPIM, provides coherence support without the need to continually communicate with the CPU. We show that both of these mechanisms provide a significant benefit for a number of important memory-intensive applications, thereby both improving performance and reducing energy consumption.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
We use the Intel® VTune™ profiling tool on a machine with a Xeon® W3550 processor (3 GHz, 8-core, 8 MB LLC) [73] and 18 GB memory. We profile each application for 10 min after it reaches steady state.
 
2
We sweep the size of the IMPICA cache from 32 to 128 kB, and find that it has negligible effect on our results.
 
3
See Sect. 5.4.5 for our experimental evaluation methodology.
 
4
A thorough treatment of memory consistency [106] is outside the scope of this work. Our goal is to deal with the coherence problem in PIM, not handle consistency issues.
 
5
The programmer should be conservative in identifying PIM data regions, and should not miss any possible data that may be touched by a PIM core. If any data not marked as PIM data is accessed by the PIM core, the program can produce incorrect results.
 
Literatur
1.
Zurück zum Zitat S. Aga, S. Jeloka, A. Subramaniyan, S. Narayanasamy, D. Blaauw, R. Das, Compute caches, in HPCA (2017) S. Aga, S. Jeloka, A. Subramaniyan, S. Narayanasamy, D. Blaauw, R. Das, Compute caches, in HPCA (2017)
2.
Zurück zum Zitat J. Ahn, S. Hong, S. Yoo, O. Mutlu, K. Choi, A scalable processing-in-memory accelerator for parallel graph processing, in ISCA (2015) J. Ahn, S. Hong, S. Yoo, O. Mutlu, K. Choi, A scalable processing-in-memory accelerator for parallel graph processing, in ISCA (2015)
3.
Zurück zum Zitat J. Ahn, S. Yoo, O. Mutlu, K. Choi, PIM-enabled instructions: a low-overhead, locality-aware processing-in-memory architecture, in ISCA (2015) J. Ahn, S. Yoo, O. Mutlu, K. Choi, PIM-enabled instructions: a low-overhead, locality-aware processing-in-memory architecture, in ISCA (2015)
4.
Zurück zum Zitat B. Akin, F. Franchetti, J.C. Hoe, Data reorganization in memory using 3D-stacked DRAM, in ISCA (2015) B. Akin, F. Franchetti, J.C. Hoe, Data reorganization in memory using 3D-stacked DRAM, in ISCA (2015)
5.
Zurück zum Zitat C. Alkan et al., Personalized copy number and segmental duplication maps using next-generation sequencing. Nat. Genet. 41, 1061 (2009) C. Alkan et al., Personalized copy number and segmental duplication maps using next-generation sequencing. Nat. Genet. 41, 1061 (2009)
6.
Zurück zum Zitat M. Alser, H. Hassan, H. Xin, O. Ergin, O. Mutlu, C. Alkan, GateKeeper: a new hardware architecture for accelerating pre-alignment in DNA short read mapping. Bioinformatics 33, 3355–3363 (2017) M. Alser, H. Hassan, H. Xin, O. Ergin, O. Mutlu, C. Alkan, GateKeeper: a new hardware architecture for accelerating pre-alignment in DNA short read mapping. Bioinformatics 33, 3355–3363 (2017)
9.
Zurück zum Zitat H. Asghari-Moghaddam, Y.H. Son, J.H. Ahn, N.S. Kim, Chameleon: versatile and practical near-DRAM acceleration architecture for large memory systems, in MICRO (2016) H. Asghari-Moghaddam, Y.H. Son, J.H. Ahn, N.S. Kim, Chameleon: versatile and practical near-DRAM acceleration architecture for large memory systems, in MICRO (2016)
10.
Zurück zum Zitat R. Ausavarungnirun, J. Landgraf, V. Miller, S. Ghose, J. Gandhi, C.J. Rossbach, O. Mutlu, Mosaic: a GPU memory manager with application-transparent support for multiple page sizes, in MICRO (2017) R. Ausavarungnirun, J. Landgraf, V. Miller, S. Ghose, J. Gandhi, C.J. Rossbach, O. Mutlu, Mosaic: a GPU memory manager with application-transparent support for multiple page sizes, in MICRO (2017)
11.
Zurück zum Zitat R. Ausavarungnirun, V. Miller, J. Landgraf, S. Ghose, J. Gandhi, A. Jog, C.J. Rossbach, O. Mutlu, MASK: redesigning the GPU memory hierarchy to support multi-application concurrency, in ASPLOS (2018) R. Ausavarungnirun, V. Miller, J. Landgraf, S. Ghose, J. Gandhi, A. Jog, C.J. Rossbach, O. Mutlu, MASK: redesigning the GPU memory hierarchy to support multi-application concurrency, in ASPLOS (2018)
12.
Zurück zum Zitat O.O. Babarinsa, S. Idreos, JAFAR: near-data processing for databases, in SIGMOD (2015) O.O. Babarinsa, S. Idreos, JAFAR: near-data processing for databases, in SIGMOD (2015)
13.
Zurück zum Zitat A. Basu, J. Gandhi, J. Chang, M.D. Hill, M.M. Swift, Efficient virtual memory for big memory servers, in ISCA (2013) A. Basu, J. Gandhi, J. Chang, M.D. Hill, M.M. Swift, Efficient virtual memory for big memory servers, in ISCA (2013)
14.
Zurück zum Zitat A. Bensoussan, C.T. Clingen, R.C. Daley, The Multics virtual memory: concepts and design, in CACM (1972) A. Bensoussan, C.T. Clingen, R.C. Daley, The Multics virtual memory: concepts and design, in CACM (1972)
15.
Zurück zum Zitat A. Bhattacharjee, Large-reach memory management unit caches, in MICRO (2013) A. Bhattacharjee, Large-reach memory management unit caches, in MICRO (2013)
16.
Zurück zum Zitat A. Bhattacharjee, M. Martonosi, Inter-core cooperative TLB for chip multiprocessors, in ASPLOS (2010) A. Bhattacharjee, M. Martonosi, Inter-core cooperative TLB for chip multiprocessors, in ASPLOS (2010)
17.
Zurück zum Zitat A. Bhattacharjee, D. Lustig, M. Martonosi, Shared last-level TLBs for chip multiprocessors, in HPCA (2011) A. Bhattacharjee, D. Lustig, M. Martonosi, Shared last-level TLBs for chip multiprocessors, in HPCA (2011)
18.
Zurück zum Zitat N. Binkert, B. Beckman, A. Saidi, G. Black, A. Basu, The gem5 Simulator, in CAN (2011) N. Binkert, B. Beckman, A. Saidi, G. Black, A. Basu, The gem5 Simulator, in CAN (2011)
19.
Zurück zum Zitat B.H. Bloom, Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13, 422–426 (1970) B.H. Bloom, Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13, 422–426 (1970)
20.
Zurück zum Zitat A. Boroumand, S. Ghose, M. Patel, H. Hassan, B. Lucia, K. Hsieh, K.T. Malladi, H. Zheng, O. Mutlu, LazyPIM: an efficient cache coherence mechanism for processing-in-memory, in CAL (2016) A. Boroumand, S. Ghose, M. Patel, H. Hassan, B. Lucia, K. Hsieh, K.T. Malladi, H. Zheng, O. Mutlu, LazyPIM: an efficient cache coherence mechanism for processing-in-memory, in CAL (2016)
21.
Zurück zum Zitat A. Boroumand, S. Ghose, M. Patel, H. Hassan, B. Lucia, N. Hajinazar, K. Hsieh, K.T. Malladi, H. Zheng, O. Mutlu, LazyPIM: efficient support for cache coherence in processing-in-memory architectures (2017). arXiv:1706.03162 [cs:AR] A. Boroumand, S. Ghose, M. Patel, H. Hassan, B. Lucia, N. Hajinazar, K. Hsieh, K.T. Malladi, H. Zheng, O. Mutlu, LazyPIM: efficient support for cache coherence in processing-in-memory architectures (2017). arXiv:1706.03162 [cs:AR]
22.
Zurück zum Zitat A. Boroumand, S. Ghose, Y. Kim, R. Ausavarungnirun, E. Shiu, R. Thakur, D. Kim, A. Kuusela, A. Knies, P. Ranganathan, O. Mutlu, Google workloads for consumer devices: mitigating data movement bottlenecks, in ASPLOS (2018) A. Boroumand, S. Ghose, Y. Kim, R. Ausavarungnirun, E. Shiu, R. Thakur, D. Kim, A. Kuusela, A. Knies, P. Ranganathan, O. Mutlu, Google workloads for consumer devices: mitigating data movement bottlenecks, in ASPLOS (2018)
23.
Zurück zum Zitat L.M. Censier, P. Feutrier, A new solution to coherence problems in multicache systems, in IEEE TC (1978) L.M. Censier, P. Feutrier, A new solution to coherence problems in multicache systems, in IEEE TC (1978)
24.
Zurück zum Zitat L. Ceze, J. Tuck, P. Montesinos, J. Torrellas, BulkSC: bulk enforcement of sequential consistency, in ISCA (2007) L. Ceze, J. Tuck, P. Montesinos, J. Torrellas, BulkSC: bulk enforcement of sequential consistency, in ISCA (2007)
25.
Zurück zum Zitat K.K. Chang, D. Lee, Z. Chishti, A.R. Alameldeen, C. Wilkerson, Y. Kim, O. Mutlu, Improving DRAM performance by parallelizing refreshes with accesses, in HPCA (2014) K.K. Chang, D. Lee, Z. Chishti, A.R. Alameldeen, C. Wilkerson, Y. Kim, O. Mutlu, Improving DRAM performance by parallelizing refreshes with accesses, in HPCA (2014)
26.
Zurück zum Zitat K.K. Chang, A. Kashyap, H. Hassan, S. Ghose, K. Hsieh, D. Lee, T. Li, G. Pekhimenko, S. Khan, O. Mutlu, Understanding latency variation in modern DRAM chips: experimental characterization, analysis, and optimization, in SIGMETRICS (2016) K.K. Chang, A. Kashyap, H. Hassan, S. Ghose, K. Hsieh, D. Lee, T. Li, G. Pekhimenko, S. Khan, O. Mutlu, Understanding latency variation in modern DRAM chips: experimental characterization, analysis, and optimization, in SIGMETRICS (2016)
27.
Zurück zum Zitat K.K. Chang, P.J. Nair, D. Lee, S. Ghose, M.K. Qureshi, O. Mutlu, Low-cost inter-linked subarrays (LISA): enabling fast inter-subarray data movement in DRAM, in HPCA (2016) K.K. Chang, P.J. Nair, D. Lee, S. Ghose, M.K. Qureshi, O. Mutlu, Low-cost inter-linked subarrays (LISA): enabling fast inter-subarray data movement in DRAM, in HPCA (2016)
28.
Zurück zum Zitat K.K. Chang, Understanding and improving the latency of DRAM-based memory systems. Ph.D. dissertation, Carnegie Mellon University, 2017 K.K. Chang, Understanding and improving the latency of DRAM-based memory systems. Ph.D. dissertation, Carnegie Mellon University, 2017
29.
Zurück zum Zitat K.K. Chang, A.G. Yağlıkçı, S. Ghose, A. Agrawal, N. Chatterjee, A. Kashyap, D. Lee, M. O’Connor, H. Hassan, O. Mutlu, Understanding reduced-voltage operation in modern DRAM devices: experimental characterization, analysis, and mechanisms, in SIGMETRICS (2017) K.K. Chang, A.G. Yağlıkçı, S. Ghose, A. Agrawal, N. Chatterjee, A. Kashyap, D. Lee, M. O’Connor, H. Hassan, O. Mutlu, Understanding reduced-voltage operation in modern DRAM devices: experimental characterization, analysis, and mechanisms, in SIGMETRICS (2017)
30.
Zurück zum Zitat P. Chi, S. Li, C. Xu, T. Zhang, J. Zhao, Y. Liu, Y. Wang, Y. Xie, PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory, in ISCA (2016) P. Chi, S. Li, C. Xu, T. Zhang, J. Zhao, Y. Liu, Y. Wang, Y. Xie, PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory, in ISCA (2016)
31.
Zurück zum Zitat L. Chua, Memristor—the missing circuit element, in IEEE TCT (1971) L. Chua, Memristor—the missing circuit element, in IEEE TCT (1971)
32.
Zurück zum Zitat E.S. Chung, J.D. Davis, J. Lee, LINQits: big data on little clients, in ISCA (2013) E.S. Chung, J.D. Davis, J. Lee, LINQits: big data on little clients, in ISCA (2013)
33.
Zurück zum Zitat J.D. Collins, H. Wang, D.M. Tullsen, C.J. Hughes, Y. Lee, D.M. Lavery, J.P. Shen, Speculative precomputation: long-range prefetching of delinquent loads, in ISCA (2001) J.D. Collins, H. Wang, D.M. Tullsen, C.J. Hughes, Y. Lee, D.M. Lavery, J.P. Shen, Speculative precomputation: long-range prefetching of delinquent loads, in ISCA (2001)
34.
Zurück zum Zitat J.D. Collins, S. Sair, B. Calder, D.M. Tullsen, Pointer cache assisted prefetching, in MICRO (2002) J.D. Collins, S. Sair, B. Calder, D.M. Tullsen, Pointer cache assisted prefetching, in MICRO (2002)
35.
Zurück zum Zitat R. Cooksey, S. Jourdan, D. Grunwald, A stateless, content-directed data prefetching mechanism, in ASPLOS (2002) R. Cooksey, S. Jourdan, D. Grunwald, A stateless, content-directed data prefetching mechanism, in ASPLOS (2002)
36.
Zurück zum Zitat N.C. Crago, S.J. Patel, OUTRIDER: efficient memory latency tolerance with decoupled strands, in ISCA (2011) N.C. Crago, S.J. Patel, OUTRIDER: efficient memory latency tolerance with decoupled strands, in ISCA (2011)
37.
Zurück zum Zitat J. Dean, L.A. Barroso, The tail at scale, in CACM (2013) J. Dean, L.A. Barroso, The tail at scale, in CACM (2013)
38.
Zurück zum Zitat J. Devietti, B. Lucia, L. Ceze, M. Oskin, DMP: deterministic shared memory multiprocessing, in ASPLOS (2009) J. Devietti, B. Lucia, L. Ceze, M. Oskin, DMP: deterministic shared memory multiprocessing, in ASPLOS (2009)
39.
Zurück zum Zitat J. Draper, J. Chame, M. Hall, C. Steele, T. Barrett, J. LaCoss, J. Granacki, J. Shin, C. Chen, C.W. Kang, I. Kim, G. Daglikoca, The architecture of the DIVA processing-in-memory chip, in SC (2002) J. Draper, J. Chame, M. Hall, C. Steele, T. Barrett, J. LaCoss, J. Granacki, J. Shin, C. Chen, C.W. Kang, I. Kim, G. Daglikoca, The architecture of the DIVA processing-in-memory chip, in SC (2002)
40.
Zurück zum Zitat E. Ebrahimi, O. Mutlu, Y. Patt, Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems, in HPCA (2009) E. Ebrahimi, O. Mutlu, Y. Patt, Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems, in HPCA (2009)
41.
Zurück zum Zitat E. Ebrahimi, O. Mutlu, C.J. Lee, Y.N. Patt, Coordinated control of multiple prefetchers in multi-core systems, in MICRO (2009) E. Ebrahimi, O. Mutlu, C.J. Lee, Y.N. Patt, Coordinated control of multiple prefetchers in multi-core systems, in MICRO (2009)
42.
Zurück zum Zitat E. Ebrahimi, C.J. Lee, O. Mutlu, Y.N. Patt, Prefetch-aware shared resource management for multi-core systems, in ISCA (2011) E. Ebrahimi, C.J. Lee, O. Mutlu, Y.N. Patt, Prefetch-aware shared resource management for multi-core systems, in ISCA (2011)
43.
Zurück zum Zitat Y. Eckert, N. Jayasena, G.H. Loh, Thermal feasibility of die-stacked processing in memory, in WoNDP (2014) Y. Eckert, N. Jayasena, G.H. Loh, Thermal feasibility of die-stacked processing in memory, in WoNDP (2014)
44.
Zurück zum Zitat D.G. Elliott, W.M. Snelgrove, M. Stumm, Computational RAM: a memory-SIMD hybrid and its application to DSP, in CICC (1992) D.G. Elliott, W.M. Snelgrove, M. Stumm, Computational RAM: a memory-SIMD hybrid and its application to DSP, in CICC (1992)
45.
Zurück zum Zitat D. Elliott, M. Stumm, W.M. Snelgrove, C. Cojocaru, R. McKenzie, Computational RAM: implementing processors in memory, in IEEE Design & Test (1999) D. Elliott, M. Stumm, W.M. Snelgrove, C. Cojocaru, R. McKenzie, Computational RAM: implementing processors in memory, in IEEE Design & Test (1999)
46.
Zurück zum Zitat R. Elmasri, Fundamentals of Database Systems (Pearson, Boston, 2007) R. Elmasri, Fundamentals of Database Systems (Pearson, Boston, 2007)
47.
Zurück zum Zitat A. Farmahini-Farahani, J.H. Ahn, K. Morrow, N.S. Kim, NDA: near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules, in HPCA (2015) A. Farmahini-Farahani, J.H. Ahn, K. Morrow, N.S. Kim, NDA: near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules, in HPCA (2015)
48.
Zurück zum Zitat M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A.D. Popescu, A. Ailamaki, B. Falsafi, Clearing the clouds: a study of emerging scale-out workloads on modern hardware, in ASPLOS (2012) M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A.D. Popescu, A. Ailamaki, B. Falsafi, Clearing the clouds: a study of emerging scale-out workloads on modern hardware, in ASPLOS (2012)
49.
Zurück zum Zitat M. Filippo, Technology preview: ARM next generation processing, in ARM TechCon (2012) M. Filippo, Technology preview: ARM next generation processing, in ARM TechCon (2012)
50.
Zurück zum Zitat B. Fitzpatrick, Distributed caching with memcached. Linux J. 2004, 5 (2004) B. Fitzpatrick, Distributed caching with memcached. Linux J. 2004, 5 (2004)
51.
Zurück zum Zitat M. Gao, C. Kozyrakis, HRL: efficient and flexible reconfigurable logic for near-data processing, in HPCA (2016) M. Gao, C. Kozyrakis, HRL: efficient and flexible reconfigurable logic for near-data processing, in HPCA (2016)
52.
Zurück zum Zitat M. Gao, G. Ayers, C. Kozyrakis, Practical near-data processing for in-memory analytics frameworks, in PACT (2015) M. Gao, G. Ayers, C. Kozyrakis, Practical near-data processing for in-memory analytics frameworks, in PACT (2015)
53.
Zurück zum Zitat S. Ghemawat, H. Gobioff, S.-T. Leung, The Google file system, in SOSP (2003) S. Ghemawat, H. Gobioff, S.-T. Leung, The Google file system, in SOSP (2003)
54.
Zurück zum Zitat D. Giampaolo, Practical File System Design with the BE File System (Morgan Kaufmann Publishers Inc., San Francisco, 1998) D. Giampaolo, Practical File System Design with the BE File System (Morgan Kaufmann Publishers Inc., San Francisco, 1998)
55.
Zurück zum Zitat A. Glew, MLP yes! ILP no!, in ASPLOS WACI (1998) A. Glew, MLP yes! ILP no!, in ASPLOS WACI (1998)
56.
Zurück zum Zitat M. Gokhale, B. Holmes, K. Iobst, Processing in memory: the Terasys massively parallel PIM array. IEEE Comput. 28, 23–31 (1995) M. Gokhale, B. Holmes, K. Iobst, Processing in memory: the Terasys massively parallel PIM array. IEEE Comput. 28, 23–31 (1995)
57.
Zurück zum Zitat J.R. Goodman, Using cache memory to reduce processor-memory traffic, in ISCA (1983) J.R. Goodman, Using cache memory to reduce processor-memory traffic, in ISCA (1983)
58.
Zurück zum Zitat B. Gu, A.S. Yoon, D.-H. Bae, I. Jo, J. Lee, J. Yoon, J.-U. Kang, M. Kwon, C. Yoon, S. Cho, J. Jeong, D. Chang, Biscuit: a framework for near-data processing of big data workloads, in ISCA (2016) B. Gu, A.S. Yoon, D.-H. Bae, I. Jo, J. Lee, J. Yoon, J.-U. Kang, M. Kwon, C. Yoon, S. Cho, J. Jeong, D. Chang, Biscuit: a framework for near-data processing of big data workloads, in ISCA (2016)
59.
Zurück zum Zitat Q. Guo, N. Alachiotis, B. Akin, F. Sadi, G. Xu, T.M. Low, L. Pileggi, J.C. Hoe, F. Franchetti, 3D-stacked memory-side acceleration: accelerator and system design, in WoNDP (2014) Q. Guo, N. Alachiotis, B. Akin, F. Sadi, G. Xu, T.M. Low, L. Pileggi, J.C. Hoe, F. Franchetti, 3D-stacked memory-side acceleration: accelerator and system design, in WoNDP (2014)
60.
Zurück zum Zitat A. Gutierrez, J. Pusdesris, R.G. Dreslinski, T. Mudge, C. Sudanthi, C.D. Emmons, M. Hayenga, N. Paver, Sources of error in full-system simulation, in ISPASS (2014) A. Gutierrez, J. Pusdesris, R.G. Dreslinski, T. Mudge, C. Sudanthi, C.D. Emmons, M. Hayenga, N. Paver, Sources of error in full-system simulation, in ISPASS (2014)
61.
Zurück zum Zitat L. Hammond, V. Wong, M. Chen, B.D. Carlstrom, J.D. Davis, B. Hertzberg, M.K. Prabhu, H. Wijaya, C. Kozyrakis, K. Olukotun, Transactional memory coherence and consistency, in ISCA (2004) L. Hammond, V. Wong, M. Chen, B.D. Carlstrom, J.D. Davis, B. Hertzberg, M.K. Prabhu, H. Wijaya, C. Kozyrakis, K. Olukotun, Transactional memory coherence and consistency, in ISCA (2004)
62.
Zurück zum Zitat M. Hashemi, O. Mutlu, Y.N. Patt, Continuous runahead: transparent hardware acceleration for memory intensive workloads, in MICRO (2016) M. Hashemi, O. Mutlu, Y.N. Patt, Continuous runahead: transparent hardware acceleration for memory intensive workloads, in MICRO (2016)
63.
Zurück zum Zitat M. Hashemi, Khubaib, E. Ebrahimi, O. Mutlu, Y.N. Patt, Accelerating dependent cache misses with an enhanced memory controller, in ISCA (2016) M. Hashemi, Khubaib, E. Ebrahimi, O. Mutlu, Y.N. Patt, Accelerating dependent cache misses with an enhanced memory controller, in ISCA (2016)
64.
Zurück zum Zitat S.M. Hassan, S. Yalamanchili, S. Mukhopadhyay, Near data processing: impact and optimization of 3D memory system architecture on the uncore, in MEMSYS (2015) S.M. Hassan, S. Yalamanchili, S. Mukhopadhyay, Near data processing: impact and optimization of 3D memory system architecture on the uncore, in MEMSYS (2015)
65.
Zurück zum Zitat H. Hassan, G. Pekhimenko, N. Vijaykumar, V. Seshadri, D. Lee, O. Ergin, O. Mutlu, ChargeCache: reducing DRAM latency by exploiting row access locality, in HPCA (2016) H. Hassan, G. Pekhimenko, N. Vijaykumar, V. Seshadri, D. Lee, O. Ergin, O. Mutlu, ChargeCache: reducing DRAM latency by exploiting row access locality, in HPCA (2016)
66.
Zurück zum Zitat H. Hassan, N. Vijaykumar, S. Khan, S. Ghose, K. Chang, G. Pekhimenko, D. Lee, O. Ergin, O. Mutlu, SoftMC: a flexible and practical open-source infrastructure for enabling experimental DRAM studies, in HPCA (2017) H. Hassan, N. Vijaykumar, S. Khan, S. Ghose, K. Chang, G. Pekhimenko, D. Lee, O. Ergin, O. Mutlu, SoftMC: a flexible and practical open-source infrastructure for enabling experimental DRAM studies, in HPCA (2017)
67.
Zurück zum Zitat K. Hsieh, S. Khan, N. Vijaykumar, K.K. Chang, A. Boroumand, S. Ghose, O. Mutlu, Accelerating pointer chasing in 3D-stacked memory: challenges, mechanisms, evaluation, in ICCD (2016) K. Hsieh, S. Khan, N. Vijaykumar, K.K. Chang, A. Boroumand, S. Ghose, O. Mutlu, Accelerating pointer chasing in 3D-stacked memory: challenges, mechanisms, evaluation, in ICCD (2016)
68.
Zurück zum Zitat K. Hsieh, E. Ebrahimi, G. Kim, N. Chatterjee, M. O’Conner, N. Vijaykumar, O. Mutlu, S. Keckler, Transparent offloading and mapping (TOM): enabling programmer-transparent near-data processing in GPU systems, in ISCA (2016) K. Hsieh, E. Ebrahimi, G. Kim, N. Chatterjee, M. O’Conner, N. Vijaykumar, O. Mutlu, S. Keckler, Transparent offloading and mapping (TOM): enabling programmer-transparent near-data processing in GPU systems, in ISCA (2016)
69.
Zurück zum Zitat Z. Hu, M. Martonosi, S. Kaxiras, TCP: tag correlating prefetchers, in HPCA (2003) Z. Hu, M. Martonosi, S. Kaxiras, TCP: tag correlating prefetchers, in HPCA (2003)
70.
Zurück zum Zitat C.J. Hughes, S.V. Adve, Memory-side prefetching for linked data structures for processor-in-memory systems, in JPDC (2005) C.J. Hughes, S.V. Adve, Memory-side prefetching for linked data structures for processor-in-memory systems, in JPDC (2005)
71.
Zurück zum Zitat Hybrid Memory Cube Consortium, HMC Specification 1.1 (2013) Hybrid Memory Cube Consortium, HMC Specification 1.1 (2013)
72.
Zurück zum Zitat Hybrid Memory Cube Consortium, HMC Specification 2.0 (2014) Hybrid Memory Cube Consortium, HMC Specification 2.0 (2014)
73.
Zurück zum Zitat Intel, Intel Xeon Processor W3550 (2009) Intel, Intel Xeon Processor W3550 (2009)
74.
Zurück zum Zitat J. Jeddeloh, B. Keeth, Hybrid memory cube: new DRAM architecture increases density and performance, in VLSIT (2012) J. Jeddeloh, B. Keeth, Hybrid memory cube: new DRAM architecture increases density and performance, in VLSIT (2012)
75.
Zurück zum Zitat JEDEC, High bandwidth memory (HBM) DRAM, Standard No. JESD235 (2013) JEDEC, High bandwidth memory (HBM) DRAM, Standard No. JESD235 (2013)
76.
Zurück zum Zitat J. Joao, O. Mutlu, Y.N. Patt, Flexible reference-counting-based hardware acceleration for garbage collection, in ISCA (2009) J. Joao, O. Mutlu, Y.N. Patt, Flexible reference-counting-based hardware acceleration for garbage collection, in ISCA (2009)
77.
Zurück zum Zitat R. Jones, R. Lins, Garbage Collection: Algorithms for Automatic Dynamic Memory Management (Wiley, New York, 1996) R. Jones, R. Lins, Garbage Collection: Algorithms for Automatic Dynamic Memory Management (Wiley, New York, 1996)
78.
Zurück zum Zitat D. Joseph, D. Grunwald, Prefetching using Markov predictors, in ISCA (1997) D. Joseph, D. Grunwald, Prefetching using Markov predictors, in ISCA (1997)
79.
Zurück zum Zitat S. Kanev, J.P. Darago, K. Hazelwood, P. Ranganathan, T. Moseley, G.-Y. Wei, D. Brooks, Profiling a warehouse-scale computer, in ISCA (2015) S. Kanev, J.P. Darago, K. Hazelwood, P. Ranganathan, T. Moseley, G.-Y. Wei, D. Brooks, Profiling a warehouse-scale computer, in ISCA (2015)
80.
Zurück zum Zitat Y. Kang, W. Huang, S.-M. Yoo, D. Keen, Z. Ge, V. Lam, P. Pattnaik, J. Torrellas, FlexRAM: toward an advanced intelligent memory system, in ICCD (1999) Y. Kang, W. Huang, S.-M. Yoo, D. Keen, Z. Ge, V. Lam, P. Pattnaik, J. Torrellas, FlexRAM: toward an advanced intelligent memory system, in ICCD (1999)
81.
Zurück zum Zitat M. Kang, M.-S. Keel, N.R. Shanbhag, S. Eilert, K. Curewitz, An energy-efficient VLSI architecture for pattern recognition via deep embedding of computation in SRAM, in ICASSP (2014) M. Kang, M.-S. Keel, N.R. Shanbhag, S. Eilert, K. Curewitz, An energy-efficient VLSI architecture for pattern recognition via deep embedding of computation in SRAM, in ICASSP (2014)
82.
Zurück zum Zitat U. Kang, H.-S. Yu, C. Park, H. Zheng, J. Halbert, K. Bains, S. Jang, J. Choi, Co-architecting controllers and DRAM to enhance DRAM process scaling, in The Memory Forum (2014) U. Kang, H.-S. Yu, C. Park, H. Zheng, J. Halbert, K. Bains, S. Jang, J. Choi, Co-architecting controllers and DRAM to enhance DRAM process scaling, in The Memory Forum (2014)
83.
Zurück zum Zitat M. Karlsson, F. Dahlgren, P. Stenström, A prefetching technique for irregular accesses to linked data structures, in HPCA (2000) M. Karlsson, F. Dahlgren, P. Stenström, A prefetching technique for irregular accesses to linked data structures, in HPCA (2000)
84.
Zurück zum Zitat S. Khan, D. Lee, Y. Kim, A.R. Alameldeen, C. Wilkerson, O. Mutlu, The efficacy of error mitigation techniques for DRAM retention failures: a comparative experimental study, in SIGMETRICS (2014) S. Khan, D. Lee, Y. Kim, A.R. Alameldeen, C. Wilkerson, O. Mutlu, The efficacy of error mitigation techniques for DRAM retention failures: a comparative experimental study, in SIGMETRICS (2014)
85.
Zurück zum Zitat S. Khan, D. Lee, O. Mutlu, PARBOR: an efficient system-level technique to detect data dependent failures in DRAM, in DSN (2016) S. Khan, D. Lee, O. Mutlu, PARBOR: an efficient system-level technique to detect data dependent failures in DRAM, in DSN (2016)
86.
Zurück zum Zitat S. Khan, C. Wilkerson, D. Lee, A.R. Alameldeen, O. Mutlu, A case for memory content-based detection and mitigation of data-dependent failures in DRAM, in CAL (2016) S. Khan, C. Wilkerson, D. Lee, A.R. Alameldeen, O. Mutlu, A case for memory content-based detection and mitigation of data-dependent failures in DRAM, in CAL (2016)
87.
Zurück zum Zitat S. Khan, C. Wilkerson, Z. Wang, A. Alameldeen, D. Lee, O. Mutlu, Detecting and mitigating data-dependent DRAM failures by exploiting current memory content, in MICRO (2017) S. Khan, C. Wilkerson, Z. Wang, A. Alameldeen, D. Lee, O. Mutlu, Detecting and mitigating data-dependent DRAM failures by exploiting current memory content, in MICRO (2017)
88.
Zurück zum Zitat T. Kilburn, D.B.G. Edwards, M.J. Lanigan, F.H. Sumner, One-level storage system. IRE Trans. Electron Comput. 2, 223–235 (1962) T. Kilburn, D.B.G. Edwards, M.J. Lanigan, F.H. Sumner, One-level storage system. IRE Trans. Electron Comput. 2, 223–235 (1962)
89.
Zurück zum Zitat Y. Kim, Architectural techniques to enhance DRAM scaling. Ph.D. dissertation, Carnegie Mellon University, 2015 Y. Kim, Architectural techniques to enhance DRAM scaling. Ph.D. dissertation, Carnegie Mellon University, 2015
90.
Zurück zum Zitat Y. Kim, V. Seshadri, D. Lee, J. Liu, O. Mutlu, A case for exploiting subarray-level parallelism (SALP) in DRAM, in ISCA (2012) Y. Kim, V. Seshadri, D. Lee, J. Liu, O. Mutlu, A case for exploiting subarray-level parallelism (SALP) in DRAM, in ISCA (2012)
91.
Zurück zum Zitat Y. Kim, R. Daly, J. Kim, C. Fallin, J.H. Lee, D. Lee, C. Wilkerson, K. Lai, O. Mutlu, Flipping bits in memory without accessing them: an experimental study of DRAM disturbance errors, in ISCA (2014) Y. Kim, R. Daly, J. Kim, C. Fallin, J.H. Lee, D. Lee, C. Wilkerson, K. Lai, O. Mutlu, Flipping bits in memory without accessing them: an experimental study of DRAM disturbance errors, in ISCA (2014)
92.
Zurück zum Zitat Y. Kim, W. Yang, O. Mutlu, Ramulator: a fast and extensible DRAM simulator, in CAL (2015) Y. Kim, W. Yang, O. Mutlu, Ramulator: a fast and extensible DRAM simulator, in CAL (2015)
93.
Zurück zum Zitat D. Kim, J. Kung, S. Chai, S. Yalamanchili, S. Mukhopadhyay, Neurocube: a programmable digital neuromorphic architecture with high-density 3D memory, in ISCA (2016) D. Kim, J. Kung, S. Chai, S. Yalamanchili, S. Mukhopadhyay, Neurocube: a programmable digital neuromorphic architecture with high-density 3D memory, in ISCA (2016)
94.
Zurück zum Zitat Y. Kim, R. Daly, J. Kim, C. Fallin, J.H. Lee, D. Lee, C. Wilkerson, K. Lai, O. Mutlu, RowHammer: reliability analysis and security implications (2016). arXiv:1603.00747 [cs:AR] Y. Kim, R. Daly, J. Kim, C. Fallin, J.H. Lee, D. Lee, C. Wilkerson, K. Lai, O. Mutlu, RowHammer: reliability analysis and security implications (2016). arXiv:1603.00747 [cs:AR]
95.
Zurück zum Zitat G. Kim, N. Chatterjee, M. O’Connor, K. Hsieh, Toward standardized near-data processing with unrestricted data placement for GPUs, in SC (2017) G. Kim, N. Chatterjee, M. O’Connor, K. Hsieh, Toward standardized near-data processing with unrestricted data placement for GPUs, in SC (2017)
96.
Zurück zum Zitat J.S. Kim, D. Senol, H. Xin, D. Lee, S. Ghose, M. Alser, H. Hassan, O. Ergin, C. Alkan, O. Mutlu, GRIM-Filter: fast seed filtering in read mapping using emerging memory technologies. arXiv:1708.04329 [q-bio.GN] (2017) J.S. Kim, D. Senol, H. Xin, D. Lee, S. Ghose, M. Alser, H. Hassan, O. Ergin, C. Alkan, O. Mutlu, GRIM-Filter: fast seed filtering in read mapping using emerging memory technologies. arXiv:1708.04329 [q-bio.GN] (2017)
97.
Zurück zum Zitat J. Kim, M. Patel, H. Hassan, O. Mutlu, The DRAM latency PUF: quickly evaluating physical unclonable functions by exploiting the latency–reliability tradeoff in modern DRAM devices, in HPCA (2018) J. Kim, M. Patel, H. Hassan, O. Mutlu, The DRAM latency PUF: quickly evaluating physical unclonable functions by exploiting the latency–reliability tradeoff in modern DRAM devices, in HPCA (2018)
98.
Zurück zum Zitat J.S. Kim, D. Senol, H. Xin, D. Lee, S. Ghose, M. Alser, H. Hassan, O. Ergin, C. Alkan, O. Mutlu, GRIM-Filter: fast seed location filtering in DNA read mapping using processing-in-memory technologies, in BMC Genomics (2018) J.S. Kim, D. Senol, H. Xin, D. Lee, S. Ghose, M. Alser, H. Hassan, O. Ergin, C. Alkan, O. Mutlu, GRIM-Filter: fast seed location filtering in DNA read mapping using processing-in-memory technologies, in BMC Genomics (2018)
99.
Zurück zum Zitat Y.O. Koçberber, B. Grot, J. Picorel, B. Falsafi, K.T. Lim, P. Ranganathan, Meet the walkers: accelerating index traversals for in-memory databases, in MICRO (2013) Y.O. Koçberber, B. Grot, J. Picorel, B. Falsafi, K.T. Lim, P. Ranganathan, Meet the walkers: accelerating index traversals for in-memory databases, in MICRO (2013)
100.
Zurück zum Zitat P.M. Kogge, EXECUBE-a new architecture for scaleable MPPs, in ICPP (1994) P.M. Kogge, EXECUBE-a new architecture for scaleable MPPs, in ICPP (1994)
101.
Zurück zum Zitat E. Kültürsay, M. Kandemir, A. Sivasubramaniam, O. Mutlu, Evaluating STT-RAM as an energy-efficient main memory alternative, in ISPASS (2013) E. Kültürsay, M. Kandemir, A. Sivasubramaniam, O. Mutlu, Evaluating STT-RAM as an energy-efficient main memory alternative, in ISPASS (2013)
102.
Zurück zum Zitat L. Kurian, P.T. Hulina, L.D. Coraor, Memory latency effects in decoupled architectures with a single data memory module, in ISCA (1992) L. Kurian, P.T. Hulina, L.D. Coraor, Memory latency effects in decoupled architectures with a single data memory module, in ISCA (1992)
103.
Zurück zum Zitat S. Kvatinsky, A. Kolodny, U.C. Weiser, E.G. Friedman, Memristor-based IMPLY logic design procedure, in ICCD (2011) S. Kvatinsky, A. Kolodny, U.C. Weiser, E.G. Friedman, Memristor-based IMPLY logic design procedure, in ICCD (2011)
104.
Zurück zum Zitat S. Kvatinsky, D. Belousov, S. Liman, G. Satat, N. Wald, E.G. Friedman, A. Kolodny, U.C. Weiser, MAGIC—memristor-aided logic, in IEEE TCAS II: Express Briefs (2014) S. Kvatinsky, D. Belousov, S. Liman, G. Satat, N. Wald, E.G. Friedman, A. Kolodny, U.C. Weiser, MAGIC—memristor-aided logic, in IEEE TCAS II: Express Briefs (2014)
105.
Zurück zum Zitat S. Kvatinsky, G. Satat, N. Wald, E.G. Friedman, A. Kolodny, U.C. Weiser, Memristor-based material implication (IMPLY) logic: design principles and methodologies, in TVLSI (2014) S. Kvatinsky, G. Satat, N. Wald, E.G. Friedman, A. Kolodny, U.C. Weiser, Memristor-based material implication (IMPLY) logic: design principles and methodologies, in TVLSI (2014)
106.
Zurück zum Zitat L. Lamport, How to make a multiprocessor computer that correctly executes multiprocess programs, in IEEE TC (1979) L. Lamport, How to make a multiprocessor computer that correctly executes multiprocess programs, in IEEE TC (1979)
107.
Zurück zum Zitat D. Lee, Reducing DRAM latency at low cost by exploiting heterogeneity. Ph.D. dissertation, Carnegie Mellon University, 2016 D. Lee, Reducing DRAM latency at low cost by exploiting heterogeneity. Ph.D. dissertation, Carnegie Mellon University, 2016
108.
Zurück zum Zitat J. Lee, Y. Solihin, J. Torrettas, Automatically mapping code on an intelligent memory architecture, in HPCA (2001) J. Lee, Y. Solihin, J. Torrettas, Automatically mapping code on an intelligent memory architecture, in HPCA (2001)
109.
Zurück zum Zitat C.J. Lee, O. Mutlu, V. Narasiman, Y.N. Patt, Prefetch-aware DRAM controllers, in MICRO (2008) C.J. Lee, O. Mutlu, V. Narasiman, Y.N. Patt, Prefetch-aware DRAM controllers, in MICRO (2008)
110.
Zurück zum Zitat B.C. Lee, E. Ipek, O. Mutlu, D. Burger, Architecting phase change memory as a scalable DRAM alternative, in ISCA (2009) B.C. Lee, E. Ipek, O. Mutlu, D. Burger, Architecting phase change memory as a scalable DRAM alternative, in ISCA (2009)
111.
Zurück zum Zitat B.C. Lee, E. Ipek, O. Mutlu, D. Burger, Phase change memory architecture and the quest for scalability, in CACM (2010) B.C. Lee, E. Ipek, O. Mutlu, D. Burger, Phase change memory architecture and the quest for scalability, in CACM (2010)
112.
Zurück zum Zitat B.C. Lee, P. Zhou, J. Yang, Y. Zhang, B. Zhao, E. Ipek, O. Mutlu, D. Burger, Phase-change technology and the future of main memory, in IEEE Micro (2010) B.C. Lee, P. Zhou, J. Yang, Y. Zhang, B. Zhao, E. Ipek, O. Mutlu, D. Burger, Phase-change technology and the future of main memory, in IEEE Micro (2010)
113.
Zurück zum Zitat C.J. Lee, O. Mutlu, V. Narasiman, Y.N. Patt, Prefetch-aware memory controllers, in IEEE TC (2011) C.J. Lee, O. Mutlu, V. Narasiman, Y.N. Patt, Prefetch-aware memory controllers, in IEEE TC (2011)
114.
Zurück zum Zitat D. Lee, Y. Kim, V. Seshadri, J. Liu, L. Subramanian, O. Mutlu, Tiered-latency DRAM: a low latency and low cost DRAM architecture, in HPCA (2013) D. Lee, Y. Kim, V. Seshadri, J. Liu, L. Subramanian, O. Mutlu, Tiered-latency DRAM: a low latency and low cost DRAM architecture, in HPCA (2013)
115.
Zurück zum Zitat D. Lee, F. Hormozdiari, H. Xin, F. Hach, O. Mutlu, C. Alkan, Fast and accurate mapping of complete genomics reads, in Methods (2014) D. Lee, F. Hormozdiari, H. Xin, F. Hach, O. Mutlu, C. Alkan, Fast and accurate mapping of complete genomics reads, in Methods (2014)
116.
Zurück zum Zitat D. Lee, Y. Kim, G. Pekhimenko, S. Khan, V. Seshadri, K. Chang, O. Mutlu, Adaptive-latency DRAM: optimizing DRAM timing for the common-case, in HPCA (2015) D. Lee, Y. Kim, G. Pekhimenko, S. Khan, V. Seshadri, K. Chang, O. Mutlu, Adaptive-latency DRAM: optimizing DRAM timing for the common-case, in HPCA (2015)
117.
Zurück zum Zitat D. Lee, L. Subramanian, R. Ausavarungnirun, J. Choi, O. Mutlu, Decoupled direct memory access: isolating CPU and IO traffic by leveraging a dual-data-port DRAM, in PACT (2015) D. Lee, L. Subramanian, R. Ausavarungnirun, J. Choi, O. Mutlu, Decoupled direct memory access: isolating CPU and IO traffic by leveraging a dual-data-port DRAM, in PACT (2015)
118.
Zurück zum Zitat J.H. Lee, J. Sim, H. Kim, BSSync: processing near memory for machine learning workloads with bounded staleness consistency models, in PACT (2015) J.H. Lee, J. Sim, H. Kim, BSSync: processing near memory for machine learning workloads with bounded staleness consistency models, in PACT (2015)
119.
Zurück zum Zitat D. Lee, S. Ghose, G. Pekhimenko, S. Khan, O. Mutlu, Simultaneous multi-layer access: improving 3D-stacked memory bandwidth at low cost, in TACO (2016) D. Lee, S. Ghose, G. Pekhimenko, S. Khan, O. Mutlu, Simultaneous multi-layer access: improving 3D-stacked memory bandwidth at low cost, in TACO (2016)
120.
Zurück zum Zitat D. Lee, S. Khan, L. Subramanian, S. Ghose, R. Ausavarungnirun, G. Pekhimenko, V. Seshadri, O. Mutlu, Design-induced latency variation in modern DRAM chips: characterization, analysis, and latency reduction mechanisms, in SIGMETRICS (2017) D. Lee, S. Khan, L. Subramanian, S. Ghose, R. Ausavarungnirun, G. Pekhimenko, V. Seshadri, O. Mutlu, Design-induced latency variation in modern DRAM chips: characterization, analysis, and latency reduction mechanisms, in SIGMETRICS (2017)
121.
Zurück zum Zitat Y. Levy, J. Bruck, Y. Cassuto, E.G. Friedman, A. Kolodny, E. Yaakobi, S. Kvatinsky, Logic operations in memory using a memristive Akers array. Microelectron. J. 45, 1429–1437 (2014) Y. Levy, J. Bruck, Y. Cassuto, E.G. Friedman, A. Kolodny, E. Yaakobi, S. Kvatinsky, Logic operations in memory using a memristive Akers array. Microelectron. J. 45, 1429–1437 (2014)
122.
Zurück zum Zitat S. Li, J.H. Ahn, R.D. Strong, J.B. Brockman, D.M. Tullsen, N.P. Jouppi, The McPAT framework for multicore and manycore architectures: simultaneously modeling power, area, and timing, in TACO (2013) S. Li, J.H. Ahn, R.D. Strong, J.B. Brockman, D.M. Tullsen, N.P. Jouppi, The McPAT framework for multicore and manycore architectures: simultaneously modeling power, area, and timing, in TACO (2013)
123.
Zurück zum Zitat S. Li, C. Xu, Q. Zou, J. Zhao, Y. Lu, Y. Xie, Pinatubo: a processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories, in DAC (2016) S. Li, C. Xu, Q. Zou, J. Zhao, Y. Lu, Y. Xie, Pinatubo: a processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories, in DAC (2016)
124.
Zurück zum Zitat S. Li, D. Niu, K.T. Malladi, H. Zheng, B. Brennan, Y. Xie, DRISA: a DRAM-based reconfigurable in-situ accelerator, in MICRO (2017) S. Li, D. Niu, K.T. Malladi, H. Zheng, B. Brennan, Y. Xie, DRISA: a DRAM-based reconfigurable in-situ accelerator, in MICRO (2017)
125.
Zurück zum Zitat K. Lim, J. Chang, T. Mudge, P. Ranganathan, S.K. Reinhardt, T.F. Wenisch, Disaggregated memory for expansion and sharing in blade servers, in ISCA (2009) K. Lim, J. Chang, T. Mudge, P. Ranganathan, S.K. Reinhardt, T.F. Wenisch, Disaggregated memory for expansion and sharing in blade servers, in ISCA (2009)
126.
Zurück zum Zitat K.T. Lim, D. Meisner, A.G. Saidi, P. Ranganathan, T.F. Wenisch, Thin servers with smart pipes: designing SoC accelerators for memcached, in ISCA (2013) K.T. Lim, D. Meisner, A.G. Saidi, P. Ranganathan, T.F. Wenisch, Thin servers with smart pipes: designing SoC accelerators for memcached, in ISCA (2013)
127.
Zurück zum Zitat Linaro, 64-Bit Linux Kernel for ARM (2014) Linaro, 64-Bit Linux Kernel for ARM (2014)
128.
Zurück zum Zitat M.H. Lipasti, W.J. Schmidt, S.R. Kunkel, R.R. Roediger, SPAID: software prefetching in pointer- and call-intensive environments, in MICRO (1995) M.H. Lipasti, W.J. Schmidt, S.R. Kunkel, R.R. Roediger, SPAID: software prefetching in pointer- and call-intensive environments, in MICRO (1995)
129.
Zurück zum Zitat J. Liu, B. Jaiyen, R. Veras, O. Mutlu, RAIDR: retention-aware intelligent DRAM refresh, in ISCA (2012) J. Liu, B. Jaiyen, R. Veras, O. Mutlu, RAIDR: retention-aware intelligent DRAM refresh, in ISCA (2012)
130.
Zurück zum Zitat J. Liu, B. Jaiyen, Y. Kim, C. Wilkerson, O. Mutlu, An experimental study of data retention behavior in modern DRAM devices: implications for retention time profiling mechanisms, in ISCA (2013) J. Liu, B. Jaiyen, Y. Kim, C. Wilkerson, O. Mutlu, An experimental study of data retention behavior in modern DRAM devices: implications for retention time profiling mechanisms, in ISCA (2013)
131.
Zurück zum Zitat Z. Liu, I. Calciu, M. Harlihy, O. Mutlu, Concurrent data structures for near-memory computing, in SPAA (2017) Z. Liu, I. Calciu, M. Harlihy, O. Mutlu, Concurrent data structures for near-memory computing, in SPAA (2017)
132.
Zurück zum Zitat G.H. Loh, 3D-stacked memory architectures for multi-core processors, in ISCA (2008) G.H. Loh, 3D-stacked memory architectures for multi-core processors, in ISCA (2008)
133.
Zurück zum Zitat G.H. Loh, N. Jayasena, M. Oskin, M. Nutter, D. Roberts, M. Meswani, D.P. Zhang, M. Ignatowski, A processing in memory taxonomy and a case for studying fixed-function PIM, in WoNDP (2013) G.H. Loh, N. Jayasena, M. Oskin, M. Nutter, D. Roberts, M. Meswani, D.P. Zhang, M. Ignatowski, A processing in memory taxonomy and a case for studying fixed-function PIM, in WoNDP (2013)
134.
Zurück zum Zitat P. Lotfi-Kamran, B. Grot, M. Ferdman, S. Volos, Y.O. Koçberber, J. Picorel, A. Adileh, D. Jevdjic, S. Idgunji, E. Özer, B. Falsafi, Scale-out processors, in ISCA (2012) P. Lotfi-Kamran, B. Grot, M. Ferdman, S. Volos, Y.O. Koçberber, J. Picorel, A. Adileh, D. Jevdjic, S. Idgunji, E. Özer, B. Falsafi, Scale-out processors, in ISCA (2012)
135.
Zurück zum Zitat C. Luk, Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors, in ISCA (2001) C. Luk, Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors, in ISCA (2001)
136.
Zurück zum Zitat C. Luk, T.C. Mowry, Compiler-based prefetching for recursive data structures, in ASPLOS (1996) C. Luk, T.C. Mowry, Compiler-based prefetching for recursive data structures, in ASPLOS (1996)
137.
Zurück zum Zitat Y. Luo, S. Govindan, B. Sharma, M. Santaniello, J. Meza, A. Kansal, J. Liu, B. Khessib, K. Vaid, O. Mutlu, Characterizing application memory error vulnerability to optimize datacenter cost via heterogeneous-reliability memory, in DSN (2014) Y. Luo, S. Govindan, B. Sharma, M. Santaniello, J. Meza, A. Kansal, J. Liu, B. Khessib, K. Vaid, O. Mutlu, Characterizing application memory error vulnerability to optimize datacenter cost via heterogeneous-reliability memory, in DSN (2014)
138.
Zurück zum Zitat Y. Luo, S. Ghose, T. Li, S. Govindan, B. Sharma, B. Kelly, A. Boroumand, O. Mutlu, Using ECC DRAM to adaptively increase memory capacity (2017). arXiv:1706.08870 [cs:AR] Y. Luo, S. Ghose, T. Li, S. Govindan, B. Sharma, B. Kelly, A. Boroumand, O. Mutlu, Using ECC DRAM to adaptively increase memory capacity (2017). arXiv:1706.08870 [cs:AR]
139.
Zurück zum Zitat D. Lustig, A. Bhattacharjee, M. Martonosi, TLB improvements for chip multiprocessors: inter-core cooperative prefetchers and shared last-level TLBs, in ACM TACO (2013) D. Lustig, A. Bhattacharjee, M. Martonosi, TLB improvements for chip multiprocessors: inter-core cooperative prefetchers and shared last-level TLBs, in ACM TACO (2013)
140.
Zurück zum Zitat K. Mai, T. Paaske, N. Jayasena, R. Ho, W.J. Dally, M. Horowitz, Smart memories: a modular reconfigurable architecture, in ISCA (2000) K. Mai, T. Paaske, N. Jayasena, R. Ho, W.J. Dally, M. Horowitz, Smart memories: a modular reconfigurable architecture, in ISCA (2000)
141.
Zurück zum Zitat J.A. Mandelman, R.H. Dennard, G.B. Bronner, J.K. DeBrosse, R. Divakaruni, Y. Li, C.J. Radens, Challenges and future directions for the scaling of dynamic random-access memory (DRAM), in IBM JRD (2002) J.A. Mandelman, R.H. Dennard, G.B. Bronner, J.K. DeBrosse, R. Divakaruni, Y. Li, C.J. Radens, Challenges and future directions for the scaling of dynamic random-access memory (DRAM), in IBM JRD (2002)
142.
Zurück zum Zitat Y. Mao, E. Kohler, R.T. Morris, Cache craftiness for fast multicore key-value storage, in EuroSys (2012) Y. Mao, E. Kohler, R.T. Morris, Cache craftiness for fast multicore key-value storage, in EuroSys (2012)
143.
Zurück zum Zitat S.A. McKee, Reflections on the memory wall, in CF (2004) S.A. McKee, Reflections on the memory wall, in CF (2004)
145.
Zurück zum Zitat M.R. Meswani, S. Blagodurov, D. Roberts, J. Slice, M. Ignatowski, G.H. Loh, Heterogeneous memory architectures: a HW/SW approach for mixing die-stacked and off-package memories, in HPCA (2015), pp. 126–136 M.R. Meswani, S. Blagodurov, D. Roberts, J. Slice, M. Ignatowski, G.H. Loh, Heterogeneous memory architectures: a HW/SW approach for mixing die-stacked and off-package memories, in HPCA (2015), pp. 126–136
146.
Zurück zum Zitat J. Meza, Y. Luo, S. Khan, J. Zhao, Y. Xie, O. Mutlu, A case for efficient hardware-software cooperative management of storage and memory, in WEED (2013) J. Meza, Y. Luo, S. Khan, J. Zhao, Y. Xie, O. Mutlu, A case for efficient hardware-software cooperative management of storage and memory, in WEED (2013)
147.
Zurück zum Zitat J. Meza, Q. Wu, S. Kumar, O. Mutlu, Revisiting memory errors in large-scale production data centers: analysis and modeling of new trends from the field, in DSN (2015) J. Meza, Q. Wu, S. Kumar, O. Mutlu, Revisiting memory errors in large-scale production data centers: analysis and modeling of new trends from the field, in DSN (2015)
148.
Zurück zum Zitat N. Mirzadeh, O. Kocberber, B. Falsafi, B. Grot, Sort vs. hash join revisited for near-memory execution, in ASBD (2007) N. Mirzadeh, O. Kocberber, B. Falsafi, B. Grot, Sort vs. hash join revisited for near-memory execution, in ASBD (2007)
149.
Zurück zum Zitat A. Morad, L. Yavits, R. Ginosar, GP-SIMD processing-in-memory, in ACM TACO (2015) A. Morad, L. Yavits, R. Ginosar, GP-SIMD processing-in-memory, in ACM TACO (2015)
150.
Zurück zum Zitat J. Mukundan, H. Hunter, K.H. Kim, J. Stuecheli, J.F. Martínez, Understanding and mitigating refresh overheads in high-density DDR4 DRAM systems, in ISCA (2013) J. Mukundan, H. Hunter, K.H. Kim, J. Stuecheli, J.F. Martínez, Understanding and mitigating refresh overheads in high-density DDR4 DRAM systems, in ISCA (2013)
151.
Zurück zum Zitat O. Mutlu, Memory scaling: a systems architecture perspective, in IMW (2013) O. Mutlu, Memory scaling: a systems architecture perspective, in IMW (2013)
152.
Zurück zum Zitat O. Mutlu, The RowHammer problem and other issues we may face as memory becomes denser, in DATE (2017) O. Mutlu, The RowHammer problem and other issues we may face as memory becomes denser, in DATE (2017)
153.
Zurück zum Zitat O. Mutlu, J. Stark, C. Wilkerson, Y.N. Patt, Runahead execution: an alternative to very large instruction windows for out-of-order processors, in HPCA (2003) O. Mutlu, J. Stark, C. Wilkerson, Y.N. Patt, Runahead execution: an alternative to very large instruction windows for out-of-order processors, in HPCA (2003)
154.
Zurück zum Zitat O. Mutlu, J. Stark, C. Wilkerson, Y.N. Patt, Runahead execution: an effective alternative to large instruction windows, in IEEE Micro (2003) O. Mutlu, J. Stark, C. Wilkerson, Y.N. Patt, Runahead execution: an effective alternative to large instruction windows, in IEEE Micro (2003)
155.
Zurück zum Zitat O. Mutlu, H. Kim, Y.N. Patt, Address-value delta (AVD) prediction: increasing the effectiveness of runahead execution by exploiting regular memory allocation patterns, in MICRO (2005) O. Mutlu, H. Kim, Y.N. Patt, Address-value delta (AVD) prediction: increasing the effectiveness of runahead execution by exploiting regular memory allocation patterns, in MICRO (2005)
156.
Zurück zum Zitat O. Mutlu, H. Kim, Y.N. Patt, Techniques for efficient processing in runahead execution engines, in ISCA (2005) O. Mutlu, H. Kim, Y.N. Patt, Techniques for efficient processing in runahead execution engines, in ISCA (2005)
157.
Zurück zum Zitat O. Mutlu, H. Kim, Y.N. Patt, Address-value delta (AVD) prediction: a hardware technique for efficiently parallelizing dependent cache misses, in TC (2006) O. Mutlu, H. Kim, Y.N. Patt, Address-value delta (AVD) prediction: a hardware technique for efficiently parallelizing dependent cache misses, in TC (2006)
158.
Zurück zum Zitat O. Mutlu, H. Kim, Y.N. Patt, Efficient runahead execution: power-efficient memory latency tolerance, in IEEE Micro (2006) O. Mutlu, H. Kim, Y.N. Patt, Efficient runahead execution: power-efficient memory latency tolerance, in IEEE Micro (2006)
159.
Zurück zum Zitat O. Mutlu, T. Moscibroda, Parallelism-aware batch scheduling: enhancing both performance and fairness of shared DRAM systems, in ISCA (2008) O. Mutlu, T. Moscibroda, Parallelism-aware batch scheduling: enhancing both performance and fairness of shared DRAM systems, in ISCA (2008)
160.
Zurück zum Zitat O. Mutlu, L. Subramanian, Research problems and opportunities in memory systems, in SUPERFRI (2014) O. Mutlu, L. Subramanian, Research problems and opportunities in memory systems, in SUPERFRI (2014)
161.
Zurück zum Zitat A. Muzahid, D. Suárez, S. Qi, J. Torrellas, SigRace: signature-based data race detection, in ISCA (2009) A. Muzahid, D. Suárez, S. Qi, J. Torrellas, SigRace: signature-based data race detection, in ISCA (2009)
162.
Zurück zum Zitat H. Naeimi, C. Augustine, A. Raychowdhury, S.-L. Lu, J. Tschanz, STT-RAM scaling and retention failure. Intel Technol. J. 17, 54–75 (2013) H. Naeimi, C. Augustine, A. Raychowdhury, S.-L. Lu, J. Tschanz, STT-RAM scaling and retention failure. Intel Technol. J. 17, 54–75 (2013)
163.
Zurück zum Zitat L. Nai, R. Hadidi, J. Sim, H. Kim, P. Kumar, H. Kim, GraphPIM: enabling instruction-level PIM offloading in graph computing frameworks, in HPCA (2017) L. Nai, R. Hadidi, J. Sim, H. Kim, P. Kumar, H. Kim, GraphPIM: enabling instruction-level PIM offloading in graph computing frameworks, in HPCA (2017)
164.
Zurück zum Zitat B. Naylor, J. Amanatides, W. Thibault, Merging BSP trees yields polyhedral set operations, in SIGGRAPH (1990) B. Naylor, J. Amanatides, W. Thibault, Merging BSP trees yields polyhedral set operations, in SIGGRAPH (1990)
166.
Zurück zum Zitat M. Oskin, F.T. Chong, T. Sherwood, Active pages: a computation model for intelligent memory, in ISCA (1998) M. Oskin, F.T. Chong, T. Sherwood, Active pages: a computation model for intelligent memory, in ISCA (1998)
167.
Zurück zum Zitat M.S. Papamarcos, J.H. Patel, A low-overhead coherence solution for multiprocessors with private. Cache memories, in ISCA (1984) M.S. Papamarcos, J.H. Patel, A low-overhead coherence solution for multiprocessors with private. Cache memories, in ISCA (1984)
168.
Zurück zum Zitat M. Patel, J. Kim, O. Mutlu, The reach profiler (REAPER): enabling the mitigation of DRAM retention failures via profiling at aggressive conditions, in ISCA (2017) M. Patel, J. Kim, O. Mutlu, The reach profiler (REAPER): enabling the mitigation of DRAM retention failures via profiling at aggressive conditions, in ISCA (2017)
169.
Zurück zum Zitat Y.N. Patt, W.-M. Hwu, M. Shebanow, HPS, a new microarchitecture: rationale and introduction, in MICRO (1985) Y.N. Patt, W.-M. Hwu, M. Shebanow, HPS, a new microarchitecture: rationale and introduction, in MICRO (1985)
170.
Zurück zum Zitat Y.N. Patt, S.W. Melvin, W.-M. Hwu, M.C. Shebanow, Critical issues regarding HPS, a high performance microarchitecture, in MICRO, (1985) Y.N. Patt, S.W. Melvin, W.-M. Hwu, M.C. Shebanow, Critical issues regarding HPS, a high performance microarchitecture, in MICRO, (1985)
171.
Zurück zum Zitat D. Patterson, T. Anderson, N. Cardwell, R. Fromm, K. Keeton, C. Kozyrakis, R. Thomas, K. Yelick, A case for intelligent RAM, in IEEE Micro (1997) D. Patterson, T. Anderson, N. Cardwell, R. Fromm, K. Keeton, C. Kozyrakis, R. Thomas, K. Yelick, A case for intelligent RAM, in IEEE Micro (1997)
172.
Zurück zum Zitat A. Pattnaik, X. Tang, A. Jog, O. Kayiran, A.K. Mishra, M.T. Kandemir, O. Mutlu, C.R. Das, Scheduling techniques for GPU architectures with processing-in-memory capabilities, in PACT (2016) A. Pattnaik, X. Tang, A. Jog, O. Kayiran, A.K. Mishra, M.T. Kandemir, O. Mutlu, C.R. Das, Scheduling techniques for GPU architectures with processing-in-memory capabilities, in PACT (2016)
173.
Zurück zum Zitat B. Pichai, L. Hsu, A. Bhattacharjee, Architectural support for address translation on GPUs: designing memory management units for CPU/GPUs with unified address spaces, in ASPLOS (2014) B. Pichai, L. Hsu, A. Bhattacharjee, Architectural support for address translation on GPUs: designing memory management units for CPU/GPUs with unified address spaces, in ASPLOS (2014)
174.
Zurück zum Zitat G. Pokam, C. Pereira, K. Danne, R. Kassa, A.-R. Adl-Tabatabai, Architecting a chunk-based memory race recorder in modern CMPs, in MICRO (2009) G. Pokam, C. Pereira, K. Danne, R. Kassa, A.-R. Adl-Tabatabai, Architecting a chunk-based memory race recorder in modern CMPs, in MICRO (2009)
175.
Zurück zum Zitat J. Power, M.D. Hill, D.A. Wood, Supporting x86-64 address translation for 100s of GPU lanes, in HPCA (2014) J. Power, M.D. Hill, D.A. Wood, Supporting x86-64 address translation for 100s of GPU lanes, in HPCA (2014)
176.
Zurück zum Zitat S.H. Pugsley, J. Jestes, H. Zhang, R. Balasubramonian, V. Srinivasan, A. Buyuktosunoglu, A. Davis, F. Li, NDC: analyzing the impact of 3D-stacked memory+logic devices on mapreduce workloads, in ISPASS (2014) S.H. Pugsley, J. Jestes, H. Zhang, R. Balasubramonian, V. Srinivasan, A. Buyuktosunoglu, A. Davis, F. Li, NDC: analyzing the impact of 3D-stacked memory+logic devices on mapreduce workloads, in ISPASS (2014)
177.
Zurück zum Zitat M.K. Qureshi, M.A. Suleman, Y.N. Patt, Line distillation: increasing cache capacity by filtering unused words in cache lines, in HPCA (2007) M.K. Qureshi, M.A. Suleman, Y.N. Patt, Line distillation: increasing cache capacity by filtering unused words in cache lines, in HPCA (2007)
178.
Zurück zum Zitat M.K. Qureshi, A. Jaleel, Y.N. Patt, S.C. Steely Jr., J. Emer, Adaptive insertion policies for high-performance caching, in ISCA (2007) M.K. Qureshi, A. Jaleel, Y.N. Patt, S.C. Steely Jr., J. Emer, Adaptive insertion policies for high-performance caching, in ISCA (2007)
179.
Zurück zum Zitat M.K. Qureshi, V. Srinivasan, J.A. Rivers, Scalable high performance main memory system using phase-change memory technology, in ISCA (2009) M.K. Qureshi, V. Srinivasan, J.A. Rivers, Scalable high performance main memory system using phase-change memory technology, in ISCA (2009)
180.
Zurück zum Zitat M.K. Qureshi, D.H. Kim, S. Khan, P.J. Nair, O. Mutlu, AVATAR: a variable-retention-time (VRT) aware refresh for DRAM systems, in DSN (2015) M.K. Qureshi, D.H. Kim, S. Khan, P.J. Nair, O. Mutlu, AVATAR: a variable-retention-time (VRT) aware refresh for DRAM systems, in DSN (2015)
181.
Zurück zum Zitat J. Ren, J. Zhao, S. Khan, J. Choi, Y. Wu, O. Mutlu, ThyNVM: enabling software-transparent crash consistency in persistent memory systems, in MICRO (2015) J. Ren, J. Zhao, S. Khan, J. Choi, Y. Wu, O. Mutlu, ThyNVM: enabling software-transparent crash consistency in persistent memory systems, in MICRO (2015)
182.
Zurück zum Zitat S. Rixner, W.J. Dally, U.J. Kapasi, P. Mattson, J.D. Owens, Memory access scheduling, in ISCA (2000) S. Rixner, W.J. Dally, U.J. Kapasi, P. Mattson, J.D. Owens, Memory access scheduling, in ISCA (2000)
183.
Zurück zum Zitat O. Rodeh, C. Mason, J. Bacik, BTRFS: the Linux B-tree filesystem, in TOS (2013) O. Rodeh, C. Mason, J. Bacik, BTRFS: the Linux B-tree filesystem, in TOS (2013)
184.
Zurück zum Zitat A. Rogers, M. C. Carlisle, J.H. Reppy, L.J. Hendren, Supporting dynamic data structures on distributed-memory machines, in TOPLAS (1995) A. Rogers, M. C. Carlisle, J.H. Reppy, L.J. Hendren, Supporting dynamic data structures on distributed-memory machines, in TOPLAS (1995)
185.
Zurück zum Zitat P. Rosenfeld, E. Cooper-Balis, B. Jacob, DRAMSim2: a cycle accurate memory system simulator, in CAL (2011) P. Rosenfeld, E. Cooper-Balis, B. Jacob, DRAMSim2: a cycle accurate memory system simulator, in CAL (2011)
186.
Zurück zum Zitat A. Roth, G.S. Sohi, Effective jump-pointer prefetching for linked data structures, in ISCA (1999) A. Roth, G.S. Sohi, Effective jump-pointer prefetching for linked data structures, in ISCA (1999)
187.
Zurück zum Zitat A. Roth, A. Moshovos, G.S. Sohi, Dependence based prefetching for linked data structures, in ASPLOS (1998) A. Roth, A. Moshovos, G.S. Sohi, Dependence based prefetching for linked data structures, in ASPLOS (1998)
192.
Zurück zum Zitat D. Sanchez, L. Yen, M.D. Hill, K. Sankaralingam, Implementing signatures for transactional memory, in MICRO (2007) D. Sanchez, L. Yen, M.D. Hill, K. Sankaralingam, Implementing signatures for transactional memory, in MICRO (2007)
194.
Zurück zum Zitat B. Schroeder, E. Pinheiro, W.-D. Weber, DRAM errors in the wild: a large-scale field study, in SIGMETRICS (2009) B. Schroeder, E. Pinheiro, W.-D. Weber, DRAM errors in the wild: a large-scale field study, in SIGMETRICS (2009)
195.
Zurück zum Zitat V. Seshadri, D. Lee, T. Mullins, H. Hassan, A. Boroumand, J. Kim, M.A. Kozuch, O. Mutlu, P.B. Gibbons, T.C. Mowry, Buddy-RAM: improving the performance and efficiency of bulk bitwise operations using DRAM (2016). arXiv:1611.09988 [cs:AR] V. Seshadri, D. Lee, T. Mullins, H. Hassan, A. Boroumand, J. Kim, M.A. Kozuch, O. Mutlu, P.B. Gibbons, T.C. Mowry, Buddy-RAM: improving the performance and efficiency of bulk bitwise operations using DRAM (2016). arXiv:1611.09988 [cs:AR]
196.
Zurück zum Zitat V. Seshadri, D. Lee, T. Mullins, H. Hassan, A. Boroumand, J. Kim, M.A. Kozuch, O. Mutlu, P.B. Gibbons, T.C. Mowry, Ambit: in-memory accelerator for bulk bitwise operations using commodity DRAM technology, in MICRO (2017) V. Seshadri, D. Lee, T. Mullins, H. Hassan, A. Boroumand, J. Kim, M.A. Kozuch, O. Mutlu, P.B. Gibbons, T.C. Mowry, Ambit: in-memory accelerator for bulk bitwise operations using commodity DRAM technology, in MICRO (2017)
197.
Zurück zum Zitat V. Seshadri, Simple DRAM and virtual memory abstractions to enable highly efficient memory systems. Ph.D. dissertation, Carnegie Mellon University, 2016 V. Seshadri, Simple DRAM and virtual memory abstractions to enable highly efficient memory systems. Ph.D. dissertation, Carnegie Mellon University, 2016
198.
Zurück zum Zitat V. Seshadri, O. Mutlu, The processing using memory paradigm: In-DRAM bulk copy, initialization, bitwise AND and OR (2016). arXiv:1610.09603 [cs:AR] V. Seshadri, O. Mutlu, The processing using memory paradigm: In-DRAM bulk copy, initialization, bitwise AND and OR (2016). arXiv:1610.09603 [cs:AR]
199.
Zurück zum Zitat V. Seshadri, O. Mutlu, Simple operations in memory to reduce data movement. Adv. Comput. 106, 107–166 (2017) V. Seshadri, O. Mutlu, Simple operations in memory to reduce data movement. Adv. Comput. 106, 107–166 (2017)
200.
Zurück zum Zitat V. Seshadri, Y. Kim, C. Fallin, D. Lee, R. Ausavarungnirun, G. Pekhimenko, Y. Luo, O. Mutlu, M.A. Kozuch, P.B. Gibbons, T.C. Mowry, RowClone: fast and energy-efficient in-DRAM bulk data copy and initialization, in MICRO (2013) V. Seshadri, Y. Kim, C. Fallin, D. Lee, R. Ausavarungnirun, G. Pekhimenko, Y. Luo, O. Mutlu, M.A. Kozuch, P.B. Gibbons, T.C. Mowry, RowClone: fast and energy-efficient in-DRAM bulk data copy and initialization, in MICRO (2013)
201.
Zurück zum Zitat V. Seshadri, A. Bhowmick, O. Mutlu, P.B. Gibbons, M.A. Kozuch, T.C. Mowry, The dirty-block index, in ISCA (2014) V. Seshadri, A. Bhowmick, O. Mutlu, P.B. Gibbons, M.A. Kozuch, T.C. Mowry, The dirty-block index, in ISCA (2014)
202.
Zurück zum Zitat V. Seshadri, K. Hsieh, A. Boroumand, D. Lee, M.A. Kozuch, O. Mutlu, P.B. Gibbons, T.C. Mowry, Fast bulk bitwise AND and OR in DRAM, CAL (2015) V. Seshadri, K. Hsieh, A. Boroumand, D. Lee, M.A. Kozuch, O. Mutlu, P.B. Gibbons, T.C. Mowry, Fast bulk bitwise AND and OR in DRAM, CAL (2015)
203.
Zurück zum Zitat V. Seshadri, T. Mullins, A. Boroumand, O. Mutli, P.B. Gibbons, M.A. Kozuch, T.C. Mowry, Gather-scatter DRAM: in-DRAM address translation to improve the spatial locality of non-unit strided accesses, in MICRO (2015) V. Seshadri, T. Mullins, A. Boroumand, O. Mutli, P.B. Gibbons, M.A. Kozuch, T.C. Mowry, Gather-scatter DRAM: in-DRAM address translation to improve the spatial locality of non-unit strided accesses, in MICRO (2015)
204.
Zurück zum Zitat V. Seshadri, S. Yedkar, H. Xin, O. Mutlu, P.B. Gibbons, M.A. Kozuch, T.C. Mowry, Mitigating prefetcher-caused pollution using informed caching policies for prefetched blocks. ACM TACO 11(4), 51:1–51:22 (2015) V. Seshadri, S. Yedkar, H. Xin, O. Mutlu, P.B. Gibbons, M.A. Kozuch, T.C. Mowry, Mitigating prefetcher-caused pollution using informed caching policies for prefetched blocks. ACM TACO 11(4), 51:1–51:22 (2015)
205.
Zurück zum Zitat A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J.P. Strachan, M. Hu, R.S. Williams, V. Srikumar, ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars, in ISCA (2016) A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J.P. Strachan, M. Hu, R.S. Williams, V. Srikumar, ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars, in ISCA (2016)
206.
Zurück zum Zitat J.S. Shapiro, J. Adams, Design evolution of the EROS single-level store, in USENIX ATC (2002) J.S. Shapiro, J. Adams, Design evolution of the EROS single-level store, in USENIX ATC (2002)
207.
Zurück zum Zitat J.S. Shapiro, J.M. Smith, D.J. Farber, EROS: a fast capability system, in SOSP (1999) J.S. Shapiro, J.M. Smith, D.J. Farber, EROS: a fast capability system, in SOSP (1999)
208.
Zurück zum Zitat D.E. Shaw, S.J. Stolfo, H. Ibrahim, B. Hillyer, G. Wiederhold, J. Andrews, The NON-VON database machine: a brief overview. IEEE Database Eng. Bull. 4, 41–52 (1981) D.E. Shaw, S.J. Stolfo, H. Ibrahim, B. Hillyer, G. Wiederhold, J. Andrews, The NON-VON database machine: a brief overview. IEEE Database Eng. Bull. 4, 41–52 (1981)
209.
Zurück zum Zitat J. Shun, G.E. Blelloch, Ligra: a lightweight graph processing framework for shared memory, in PPoPP (2013) J. Shun, G.E. Blelloch, Ligra: a lightweight graph processing framework for shared memory, in PPoPP (2013)
210.
Zurück zum Zitat J.E. Smith, Decoupled access/execute computer architectures, in ISCA (1982) J.E. Smith, Decoupled access/execute computer architectures, in ISCA (1982)
211.
Zurück zum Zitat J.E. Smith, Dynamic instruction scheduling and the astronautics ZS-1, in Computer (1986) J.E. Smith, Dynamic instruction scheduling and the astronautics ZS-1, in Computer (1986)
212.
Zurück zum Zitat J.E. Smith, S. Weiss, N.Y. Pang, A simulation study of decoupled architecture computers, in IEEE TC (1986) J.E. Smith, S. Weiss, N.Y. Pang, A simulation study of decoupled architecture computers, in IEEE TC (1986)
213.
Zurück zum Zitat Y. Solihin, J. Torrellas, J. Lee, Using a user-level memory thread for correlation prefetching, in ISCA (2002) Y. Solihin, J. Torrellas, J. Lee, Using a user-level memory thread for correlation prefetching, in ISCA (2002)
214.
Zurück zum Zitat V. Sridharan, N. DeBardeleben, S. Blanchard, K.B. Ferreira, J. Stearley, J. Shalf, S. Gurumurthi, Memory errors in modern systems: the good, the bad, and the ugly, in ASPLOS (2015) V. Sridharan, N. DeBardeleben, S. Blanchard, K.B. Ferreira, J. Stearley, J. Shalf, S. Gurumurthi, Memory errors in modern systems: the good, the bad, and the ugly, in ASPLOS (2015)
215.
Zurück zum Zitat S. Srikantaiah, M. Kandemir, Synergistic TLBs for high performance address translation in chip multiprocessors, in MICRO (2010) S. Srikantaiah, M. Kandemir, Synergistic TLBs for high performance address translation in chip multiprocessors, in MICRO (2010)
216.
Zurück zum Zitat S. Srinath, O. Mutlu, H. Kim, Y.N. Patt, Feedback directed prefetching: improving the performance and bandwidth-efficiency of hardware prefetchers, in HPCA (2007) S. Srinath, O. Mutlu, H. Kim, Y.N. Patt, Feedback directed prefetching: improving the performance and bandwidth-efficiency of hardware prefetchers, in HPCA (2007)
218.
Zurück zum Zitat H.S. Stone, A logic-in-memory computer, in TC (1970) H.S. Stone, A logic-in-memory computer, in TC (1970)
219.
Zurück zum Zitat M. Stonebraker, A. Weisberg, The VoltDB main memory DBMS. IEEE Data Eng. Bull. 36, 21–27 (2013) M. Stonebraker, A. Weisberg, The VoltDB main memory DBMS. IEEE Data Eng. Bull. 36, 21–27 (2013)
220.
Zurück zum Zitat D.B. Strukov, G.S. Snider, D.R. Stewart, R.S. Williams, The missing memristor found. Nature 453, 80 (2008) D.B. Strukov, G.S. Snider, D.R. Stewart, R.S. Williams, The missing memristor found. Nature 453, 80 (2008)
221.
Zurück zum Zitat Z. Sura, A. Jacob, T. Chen, B. Rosenburg, O. Sallenave, C. Bertolli, S. Antao, J. Brunheroto, Y. Park, K. O’Brien, R. Nair, Data access optimization in a processing-in-memory system, in CF (2015) Z. Sura, A. Jacob, T. Chen, B. Rosenburg, O. Sallenave, C. Bertolli, S. Antao, J. Brunheroto, Y. Park, K. O’Brien, R. Nair, Data access optimization in a processing-in-memory system, in CF (2015)
222.
Zurück zum Zitat R.M. Tomasulo, An efficient algorithm for exploiting multiple arithmetic units, in IBM JRD (1967) R.M. Tomasulo, An efficient algorithm for exploiting multiple arithmetic units, in IBM JRD (1967)
224.
Zurück zum Zitat M. Waldvogel, G. Varghese, J. Turner, B. Plattner, Scalable high speed IP routing lookups, in SIGCOMM (1997) M. Waldvogel, G. Varghese, J. Turner, B. Plattner, Scalable high speed IP routing lookups, in SIGCOMM (1997)
225.
Zurück zum Zitat L. Wang, J. Zhan, C. Luo, Y. Zhu, Q. Yang, Y. He, W. Gao, Z. Jia, Y. Shi, S. Zhang, C. Zheng, G. Lu, K. Zhan, X. Li, B. Qiu, BigDataBench: a big data benchmark suite from internet services, in HPCA (2014) L. Wang, J. Zhan, C. Luo, Y. Zhu, Q. Yang, Y. He, W. Gao, Z. Jia, Y. Shi, S. Zhang, C. Zheng, G. Lu, K. Zhan, X. Li, B. Qiu, BigDataBench: a big data benchmark suite from internet services, in HPCA (2014)
226.
Zurück zum Zitat M.V. Wilkes, The memory gap and the future of high performance memories, in CAN (2001) M.V. Wilkes, The memory gap and the future of high performance memories, in CAN (2001)
227.
Zurück zum Zitat P.R. Wilson, Uniprocessor garbage collection techniques, in IWMM (1992) P.R. Wilson, Uniprocessor garbage collection techniques, in IWMM (1992)
228.
Zurück zum Zitat H.-S.P. Wong, S. Raoux, S. Kim, J. Liang, J.P. Reifenberg, B. Rajendran, M. Asheghi, K.E. Goodson, Phase change memory. Proc. IEEE 98, 2201–2227 (2010) H.-S.P. Wong, S. Raoux, S. Kim, J. Liang, J.P. Reifenberg, B. Rajendran, M. Asheghi, K.E. Goodson, Phase change memory. Proc. IEEE 98, 2201–2227 (2010)
229.
Zurück zum Zitat H.-S.P. Wong, H.-Y. Lee, S. Yu, Y.-S. Chen, Y. Wu, P.-S. Chen, B. Lee, F.T. Chen, M.-J. Tsai, Metal-oxide RRAM. Proc. IEEE 100, 1951–1970 (2012) H.-S.P. Wong, H.-Y. Lee, S. Yu, Y.-S. Chen, Y. Wu, P.-S. Chen, B. Lee, F.T. Chen, M.-J. Tsai, Metal-oxide RRAM. Proc. IEEE 100, 1951–1970 (2012)
230.
Zurück zum Zitat L. Wu, R.J. Barker, M.A. Kim, K.A. Ross, Navigating big data with high-throughput, energy-efficient data partitioning, in ISCA (2013) L. Wu, R.J. Barker, M.A. Kim, K.A. Ross, Navigating big data with high-throughput, energy-efficient data partitioning, in ISCA (2013)
231.
Zurück zum Zitat L. Wu, A. Lottarini, T.K. Paine, M.A. Kim, K.A. Ross, Q100: the architecture and design of a database processing unit, in ASPLOS (2014) L. Wu, A. Lottarini, T.K. Paine, M.A. Kim, K.A. Ross, Q100: the architecture and design of a database processing unit, in ASPLOS (2014)
232.
Zurück zum Zitat Y. Wu, Efficient discovery of regular stride patterns in irregular programs, in PLDI (2002) Y. Wu, Efficient discovery of regular stride patterns in irregular programs, in PLDI (2002)
233.
Zurück zum Zitat W.A. Wulf, S.A. McKee, Hitting the memory wall: implications of the obvious, CAN (1995) W.A. Wulf, S.A. McKee, Hitting the memory wall: implications of the obvious, CAN (1995)
234.
Zurück zum Zitat S.L. Xi, O. Babarinsa, M. Athanassoulis, S. Idreos, Beyond the wall: near-data processing for databases, in DaMoN (2015) S.L. Xi, O. Babarinsa, M. Athanassoulis, S. Idreos, Beyond the wall: near-data processing for databases, in DaMoN (2015)
235.
Zurück zum Zitat H. Xin, D. Lee, F. Hormozdiari, S. Yedkar, O. Mutlu, C. Alkan, Accelerating read mapping with FastHASH, in BMC Genomics (2013) H. Xin, D. Lee, F. Hormozdiari, S. Yedkar, O. Mutlu, C. Alkan, Accelerating read mapping with FastHASH, in BMC Genomics (2013)
236.
Zurück zum Zitat H. Xin, J. Greth, J. Emmons, G. Pekhimenko, C. Kingsford, C. Alkan, O. Mutlu, Shifted hamming distance: a fast and accurate SIMD-friendly filter to accelerate alignment verification in read mapping. Bioinformatics 31, 1553–1560 (2015) H. Xin, J. Greth, J. Emmons, G. Pekhimenko, C. Kingsford, C. Alkan, O. Mutlu, Shifted hamming distance: a fast and accurate SIMD-friendly filter to accelerate alignment verification in read mapping. Bioinformatics 31, 1553–1560 (2015)
237.
Zurück zum Zitat J. Xue, Z. Yang, Z. Qu, S. Hou, Y. Dai, Seraph: an efficient, low-cost system for concurrent graph processing, in HPDC (2014) J. Xue, Z. Yang, Z. Qu, S. Hou, Y. Dai, Seraph: an efficient, low-cost system for concurrent graph processing, in HPDC (2014)
238.
Zurück zum Zitat C. Yang, A.R. Lebeck, Push vs. pull: data movement for linked data structures, in ICS (2000) C. Yang, A.R. Lebeck, Push vs. pull: data movement for linked data structures, in ICS (2000)
239.
Zurück zum Zitat H. Yoon, R.A.J. Meza, R. Harding, O. Mutlu, Row buffer locality aware caching policies for hybrid memories, in ICCD (2012) H. Yoon, R.A.J. Meza, R. Harding, O. Mutlu, Row buffer locality aware caching policies for hybrid memories, in ICCD (2012)
240.
Zurück zum Zitat H. Yoon, J. Meza, N. Muralimanohar, N.P. Jouppi, O. Mutlu, Efficient data mapping and buffering techniques for multilevel cell phase-change memories, in ACM TACO (2014) H. Yoon, J. Meza, N. Muralimanohar, N.P. Jouppi, O. Mutlu, Efficient data mapping and buffering techniques for multilevel cell phase-change memories, in ACM TACO (2014)
241.
Zurück zum Zitat X. Yu, G. Bezerra, A. Pavlo, S. Devadas, M. Stonebraker, Staring into the abyss: an evaluation of concurrency control with one thousand cores, in VLDB (2014) X. Yu, G. Bezerra, A. Pavlo, S. Devadas, M. Stonebraker, Staring into the abyss: an evaluation of concurrency control with one thousand cores, in VLDB (2014)
242.
Zurück zum Zitat X. Yu, C.J. Hughes, N. Satish, S. Devadas, IMP: indirect memory prefetcher, in MICRO (2015) X. Yu, C.J. Hughes, N. Satish, S. Devadas, IMP: indirect memory prefetcher, in MICRO (2015)
243.
Zurück zum Zitat D.P. Zhang, N. Jayasena, A. Lyashevsky, J.L. Greathouse, L. Xu, M. Ignatowski, TOP-PIM: throughput-oriented programmable processing in memory, in HPDC (2014) D.P. Zhang, N. Jayasena, A. Lyashevsky, J.L. Greathouse, L. Xu, M. Ignatowski, TOP-PIM: throughput-oriented programmable processing in memory, in HPDC (2014)
244.
Zurück zum Zitat J. Zhao, O. Mutlu, Y. Xie, FIRM: fair and high-performance memory control for persistent memory systems, in MICRO (2014) J. Zhao, O. Mutlu, Y. Xie, FIRM: fair and high-performance memory control for persistent memory systems, in MICRO (2014)
245.
Zurück zum Zitat P. Zhou, B. Zhao, J. Yang, Y. Zhang, A durable and energy efficient main memory using phase change memory technology, in ISCA (2009) P. Zhou, B. Zhao, J. Yang, Y. Zhang, A durable and energy efficient main memory using phase change memory technology, in ISCA (2009)
246.
Zurück zum Zitat Q. Zhu, T. Graf, H.E. Sumbul, L. Pileggi, F. Franchetti, Accelerating sparse matrix-matrix multiplication with 3D-stacked logic-in-memory hardware, in HPEC (2013) Q. Zhu, T. Graf, H.E. Sumbul, L. Pileggi, F. Franchetti, Accelerating sparse matrix-matrix multiplication with 3D-stacked logic-in-memory hardware, in HPEC (2013)
247.
Zurück zum Zitat C.B. Zilles, Benchmark health considered harmful, in CAN (2001) C.B. Zilles, Benchmark health considered harmful, in CAN (2001)
248.
Zurück zum Zitat C.B. Zilles, G.S. Sohi, Execution-based prediction using speculative slices, in ISCA (2001) C.B. Zilles, G.S. Sohi, Execution-based prediction using speculative slices, in ISCA (2001)
249.
Zurück zum Zitat W.K. Zuravleff, T. Robinson, Controller for a synchronous DRAM that maximizes throughput by allowing memory requests and commands to be issued out of order. US Patent No. 5,630,096 (1997) W.K. Zuravleff, T. Robinson, Controller for a synchronous DRAM that maximizes throughput by allowing memory requests and commands to be issued out of order. US Patent No. 5,630,096 (1997)
Metadaten
Titel
The Processing-in-Memory Paradigm: Mechanisms to Enable Adoption
verfasst von
Saugata Ghose
Kevin Hsieh
Amirali Boroumand
Rachata Ausavarungnirun
Onur Mutlu
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-319-90385-9_5

Neuer Inhalt