Top

Published in:

2019 | OriginalPaper | Chapter

5. The Processing-in-Memory Paradigm: Mechanisms to Enable Adoption

Authors : Saugata Ghose, Kevin Hsieh, Amirali Boroumand, Rachata Ausavarungnirun, Onur Mutlu

Published in: Beyond-CMOS Technologies for Next Generation Computer Design

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Performance improvements from DRAM technology scaling have been lagging behind the improvements from logic technology scaling for many years. As application demand for main memory continues to grow, DRAM-based main memory is increasingly becoming a larger system bottleneck in terms of both performance and energy consumption. A major reason for poor memory performance and energy efficiency is memory’s inability to perform computation. Instead, data stored within DRAM memory must be moved into the CPU before any computation can take place. This data movement is costly, as it requires a high latency and consumes significant energy to transfer the data across the pin-limited memory channel. Moreover, the data moved to the CPU is often not reused, and thus does not benefit from being cached within the CPU, which makes it difficult to amortize the overhead of data movement.

Modern 3D-stacked DRAM architectures provide an opportunity to avoid unnecessary data movement between memory and the CPU. These multi-layer architectures include a logic layer, where compute logic can be integrated underneath multiple layers of DRAM cell arrays (i.e., the memory layers) within the same chip. Architects can take advantage of the logic layer to perform processing-in-memory (PIM), or near-data processing, where some of the computation is moved from the CPU to the logic layer underneath the memory layer. In a PIM architecture, the logic layer within DRAM has access to the high internal bandwidth available within 3D-stacked DRAM (which is much greater than the bandwidth available in the narrow memory channel between DRAM and the CPU). Thus, PIM architectures can effectively free up valuable bandwidth on the bandwidth-limited memory channel while at the same time reducing system energy consumption.

A number of important issues arise when we add compute logic to DRAM. In particular, logic within DRAM does not have low-latency access to common CPU structures that are essential for modern application execution, such as the virtual memory mechanisms, e.g., the translation lookaside buffer (TLB) or the page table walker, and the cache coherence mechanisms, e.g., the coherence directory. To ease the widespread adoption of PIM, we ideally would like to maintain traditional virtual memory abstractions and the shared memory programming model. This requires efficient mechanisms that can provide logic in DRAM with access to virtual memory and cache coherence without having to communicate frequently with the CPU, as off-chip communication between the CPU and DRAM consumes much of the limited bandwidth that PIM aims to avoid using. To this end, we propose and evaluate two general-purpose solutions that can be used by PIM architectures to minimize unnecessary off-chip communication. The first, IMPICA, is an efficient in-memory accelerator for pointer chasing, which can handle address translation entirely within DRAM. The second, LazyPIM, provides coherence support without the need to continually communicate with the CPU. We show that both of these mechanisms provide a significant benefit for a number of important memory-intensive applications, thereby both improving performance and reducing energy consumption.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Emerging NVM Circuit Techniques and Implementations for Energy-Efficient Systems

next chapter Emerging Steep-Slope Devices and Circuits: Opportunities and Challenges

We use the Intel® VTune™ profiling tool on a machine with a Xeon® W3550 processor (3 GHz, 8-core, 8 MB LLC) [73] and 18 GB memory. We profile each application for 10 min after it reaches steady state.

We sweep the size of the IMPICA cache from 32 to 128 kB, and find that it has negligible effect on our results.

See Sect. 5.4.5 for our experimental evaluation methodology.

A thorough treatment of memory consistency [106] is outside the scope of this work. Our goal is to deal with the coherence problem in PIM, not handle consistency issues.

The programmer should be conservative in identifying PIM data regions, and should not miss any possible data that may be touched by a PIM core. If any data not marked as PIM data is accessed by the PIM core, the program can produce incorrect results.

S. Aga, S. Jeloka, A. Subramaniyan, S. Narayanasamy, D. Blaauw, R. Das, Compute caches, in HPCA (2017)

J. Ahn, S. Hong, S. Yoo, O. Mutlu, K. Choi, A scalable processing-in-memory accelerator for parallel graph processing, in ISCA (2015)

J. Ahn, S. Yoo, O. Mutlu, K. Choi, PIM-enabled instructions: a low-overhead, locality-aware processing-in-memory architecture, in ISCA (2015)

B. Akin, F. Franchetti, J.C. Hoe, Data reorganization in memory using 3D-stacked DRAM, in ISCA (2015)

C. Alkan et al., Personalized copy number and segmental duplication maps using next-generation sequencing. Nat. Genet. 41, 1061 (2009)

M. Alser, H. Hassan, H. Xin, O. Ergin, O. Mutlu, C. Alkan, GateKeeper: a new hardware architecture for accelerating pre-alignment in DNA short read mapping. Bioinformatics 33, 3355–3363 (2017)

ARM Holdings, ARM Cortex-A57. http://www.arm.com/products/processors/cortex-a/cortex-a57-processor.php

ARM Holdings, ARM Cortex-R4. http://www.arm.com/products/processors/cortex-r/cortex-r4.php

H. Asghari-Moghaddam, Y.H. Son, J.H. Ahn, N.S. Kim, Chameleon: versatile and practical near-DRAM acceleration architecture for large memory systems, in MICRO (2016)

10.

R. Ausavarungnirun, J. Landgraf, V. Miller, S. Ghose, J. Gandhi, C.J. Rossbach, O. Mutlu, Mosaic: a GPU memory manager with application-transparent support for multiple page sizes, in MICRO (2017)

11.

R. Ausavarungnirun, V. Miller, J. Landgraf, S. Ghose, J. Gandhi, A. Jog, C.J. Rossbach, O. Mutlu, MASK: redesigning the GPU memory hierarchy to support multi-application concurrency, in ASPLOS (2018)

12.

O.O. Babarinsa, S. Idreos, JAFAR: near-data processing for databases, in SIGMOD (2015)

13.

A. Basu, J. Gandhi, J. Chang, M.D. Hill, M.M. Swift, Efficient virtual memory for big memory servers, in ISCA (2013)

14.

A. Bensoussan, C.T. Clingen, R.C. Daley, The Multics virtual memory: concepts and design, in CACM (1972)

15.

A. Bhattacharjee, Large-reach memory management unit caches, in MICRO (2013)

16.

A. Bhattacharjee, M. Martonosi, Inter-core cooperative TLB for chip multiprocessors, in ASPLOS (2010)

17.

A. Bhattacharjee, D. Lustig, M. Martonosi, Shared last-level TLBs for chip multiprocessors, in HPCA (2011)

18.

N. Binkert, B. Beckman, A. Saidi, G. Black, A. Basu, The gem5 Simulator, in CAN (2011)

19.

B.H. Bloom, Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13, 422–426 (1970)

20.

A. Boroumand, S. Ghose, M. Patel, H. Hassan, B. Lucia, K. Hsieh, K.T. Malladi, H. Zheng, O. Mutlu, LazyPIM: an efficient cache coherence mechanism for processing-in-memory, in CAL (2016)

21.

A. Boroumand, S. Ghose, M. Patel, H. Hassan, B. Lucia, N. Hajinazar, K. Hsieh, K.T. Malladi, H. Zheng, O. Mutlu, LazyPIM: efficient support for cache coherence in processing-in-memory architectures (2017). arXiv:1706.03162 [cs:AR]

22.

A. Boroumand, S. Ghose, Y. Kim, R. Ausavarungnirun, E. Shiu, R. Thakur, D. Kim, A. Kuusela, A. Knies, P. Ranganathan, O. Mutlu, Google workloads for consumer devices: mitigating data movement bottlenecks, in ASPLOS (2018)

23.

L.M. Censier, P. Feutrier, A new solution to coherence problems in multicache systems, in IEEE TC (1978)

24.

L. Ceze, J. Tuck, P. Montesinos, J. Torrellas, BulkSC: bulk enforcement of sequential consistency, in ISCA (2007)

25.

K.K. Chang, D. Lee, Z. Chishti, A.R. Alameldeen, C. Wilkerson, Y. Kim, O. Mutlu, Improving DRAM performance by parallelizing refreshes with accesses, in HPCA (2014)

26.

K.K. Chang, A. Kashyap, H. Hassan, S. Ghose, K. Hsieh, D. Lee, T. Li, G. Pekhimenko, S. Khan, O. Mutlu, Understanding latency variation in modern DRAM chips: experimental characterization, analysis, and optimization, in SIGMETRICS (2016)

27.

K.K. Chang, P.J. Nair, D. Lee, S. Ghose, M.K. Qureshi, O. Mutlu, Low-cost inter-linked subarrays (LISA): enabling fast inter-subarray data movement in DRAM, in HPCA (2016)

28.

K.K. Chang, Understanding and improving the latency of DRAM-based memory systems. Ph.D. dissertation, Carnegie Mellon University, 2017

29.

K.K. Chang, A.G. Yağlıkçı, S. Ghose, A. Agrawal, N. Chatterjee, A. Kashyap, D. Lee, M. O’Connor, H. Hassan, O. Mutlu, Understanding reduced-voltage operation in modern DRAM devices: experimental characterization, analysis, and mechanisms, in SIGMETRICS (2017)

30.

P. Chi, S. Li, C. Xu, T. Zhang, J. Zhao, Y. Liu, Y. Wang, Y. Xie, PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory, in ISCA (2016)

31.

L. Chua, Memristor—the missing circuit element, in IEEE TCT (1971)

32.

E.S. Chung, J.D. Davis, J. Lee, LINQits: big data on little clients, in ISCA (2013)

33.

J.D. Collins, H. Wang, D.M. Tullsen, C.J. Hughes, Y. Lee, D.M. Lavery, J.P. Shen, Speculative precomputation: long-range prefetching of delinquent loads, in ISCA (2001)

34.

J.D. Collins, S. Sair, B. Calder, D.M. Tullsen, Pointer cache assisted prefetching, in MICRO (2002)

35.

R. Cooksey, S. Jourdan, D. Grunwald, A stateless, content-directed data prefetching mechanism, in ASPLOS (2002)

36.

N.C. Crago, S.J. Patel, OUTRIDER: efficient memory latency tolerance with decoupled strands, in ISCA (2011)

37.

J. Dean, L.A. Barroso, The tail at scale, in CACM (2013)

38.

J. Devietti, B. Lucia, L. Ceze, M. Oskin, DMP: deterministic shared memory multiprocessing, in ASPLOS (2009)

39.

J. Draper, J. Chame, M. Hall, C. Steele, T. Barrett, J. LaCoss, J. Granacki, J. Shin, C. Chen, C.W. Kang, I. Kim, G. Daglikoca, The architecture of the DIVA processing-in-memory chip, in SC (2002)

40.

E. Ebrahimi, O. Mutlu, Y. Patt, Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems, in HPCA (2009)

41.

E. Ebrahimi, O. Mutlu, C.J. Lee, Y.N. Patt, Coordinated control of multiple prefetchers in multi-core systems, in MICRO (2009)

42.

E. Ebrahimi, C.J. Lee, O. Mutlu, Y.N. Patt, Prefetch-aware shared resource management for multi-core systems, in ISCA (2011)

43.

Y. Eckert, N. Jayasena, G.H. Loh, Thermal feasibility of die-stacked processing in memory, in WoNDP (2014)

44.

D.G. Elliott, W.M. Snelgrove, M. Stumm, Computational RAM: a memory-SIMD hybrid and its application to DSP, in CICC (1992)

45.

D. Elliott, M. Stumm, W.M. Snelgrove, C. Cojocaru, R. McKenzie, Computational RAM: implementing processors in memory, in IEEE Design & Test (1999)

46.

R. Elmasri, Fundamentals of Database Systems (Pearson, Boston, 2007)

47.

A. Farmahini-Farahani, J.H. Ahn, K. Morrow, N.S. Kim, NDA: near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules, in HPCA (2015)

48.

M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A.D. Popescu, A. Ailamaki, B. Falsafi, Clearing the clouds: a study of emerging scale-out workloads on modern hardware, in ASPLOS (2012)

49.

M. Filippo, Technology preview: ARM next generation processing, in ARM TechCon (2012)

50.

B. Fitzpatrick, Distributed caching with memcached. Linux J. 2004, 5 (2004)

51.

M. Gao, C. Kozyrakis, HRL: efficient and flexible reconfigurable logic for near-data processing, in HPCA (2016)

52.

M. Gao, G. Ayers, C. Kozyrakis, Practical near-data processing for in-memory analytics frameworks, in PACT (2015)

53.

S. Ghemawat, H. Gobioff, S.-T. Leung, The Google file system, in SOSP (2003)

54.

D. Giampaolo, Practical File System Design with the BE File System (Morgan Kaufmann Publishers Inc., San Francisco, 1998)

55.

A. Glew, MLP yes! ILP no!, in ASPLOS WACI (1998)

56.

M. Gokhale, B. Holmes, K. Iobst, Processing in memory: the Terasys massively parallel PIM array. IEEE Comput. 28, 23–31 (1995)

57.

J.R. Goodman, Using cache memory to reduce processor-memory traffic, in ISCA (1983)

58.

B. Gu, A.S. Yoon, D.-H. Bae, I. Jo, J. Lee, J. Yoon, J.-U. Kang, M. Kwon, C. Yoon, S. Cho, J. Jeong, D. Chang, Biscuit: a framework for near-data processing of big data workloads, in ISCA (2016)

59.

Q. Guo, N. Alachiotis, B. Akin, F. Sadi, G. Xu, T.M. Low, L. Pileggi, J.C. Hoe, F. Franchetti, 3D-stacked memory-side acceleration: accelerator and system design, in WoNDP (2014)

60.

A. Gutierrez, J. Pusdesris, R.G. Dreslinski, T. Mudge, C. Sudanthi, C.D. Emmons, M. Hayenga, N. Paver, Sources of error in full-system simulation, in ISPASS (2014)

61.

L. Hammond, V. Wong, M. Chen, B.D. Carlstrom, J.D. Davis, B. Hertzberg, M.K. Prabhu, H. Wijaya, C. Kozyrakis, K. Olukotun, Transactional memory coherence and consistency, in ISCA (2004)

62.

M. Hashemi, O. Mutlu, Y.N. Patt, Continuous runahead: transparent hardware acceleration for memory intensive workloads, in MICRO (2016)

63.

M. Hashemi, Khubaib, E. Ebrahimi, O. Mutlu, Y.N. Patt, Accelerating dependent cache misses with an enhanced memory controller, in ISCA (2016)

64.

S.M. Hassan, S. Yalamanchili, S. Mukhopadhyay, Near data processing: impact and optimization of 3D memory system architecture on the uncore, in MEMSYS (2015)

65.

H. Hassan, G. Pekhimenko, N. Vijaykumar, V. Seshadri, D. Lee, O. Ergin, O. Mutlu, ChargeCache: reducing DRAM latency by exploiting row access locality, in HPCA (2016)

66.

H. Hassan, N. Vijaykumar, S. Khan, S. Ghose, K. Chang, G. Pekhimenko, D. Lee, O. Ergin, O. Mutlu, SoftMC: a flexible and practical open-source infrastructure for enabling experimental DRAM studies, in HPCA (2017)

67.

K. Hsieh, S. Khan, N. Vijaykumar, K.K. Chang, A. Boroumand, S. Ghose, O. Mutlu, Accelerating pointer chasing in 3D-stacked memory: challenges, mechanisms, evaluation, in ICCD (2016)

68.

K. Hsieh, E. Ebrahimi, G. Kim, N. Chatterjee, M. O’Conner, N. Vijaykumar, O. Mutlu, S. Keckler, Transparent offloading and mapping (TOM): enabling programmer-transparent near-data processing in GPU systems, in ISCA (2016)

69.

Z. Hu, M. Martonosi, S. Kaxiras, TCP: tag correlating prefetchers, in HPCA (2003)

70.

C.J. Hughes, S.V. Adve, Memory-side prefetching for linked data structures for processor-in-memory systems, in JPDC (2005)

71.

Hybrid Memory Cube Consortium, HMC Specification 1.1 (2013)

72.

Hybrid Memory Cube Consortium, HMC Specification 2.0 (2014)

73.

Intel, Intel Xeon Processor W3550 (2009)

74.

J. Jeddeloh, B. Keeth, Hybrid memory cube: new DRAM architecture increases density and performance, in VLSIT (2012)

75.

JEDEC, High bandwidth memory (HBM) DRAM, Standard No. JESD235 (2013)

76.

J. Joao, O. Mutlu, Y.N. Patt, Flexible reference-counting-based hardware acceleration for garbage collection, in ISCA (2009)

77.

R. Jones, R. Lins, Garbage Collection: Algorithms for Automatic Dynamic Memory Management (Wiley, New York, 1996)

78.

D. Joseph, D. Grunwald, Prefetching using Markov predictors, in ISCA (1997)

79.

S. Kanev, J.P. Darago, K. Hazelwood, P. Ranganathan, T. Moseley, G.-Y. Wei, D. Brooks, Profiling a warehouse-scale computer, in ISCA (2015)

80.

Y. Kang, W. Huang, S.-M. Yoo, D. Keen, Z. Ge, V. Lam, P. Pattnaik, J. Torrellas, FlexRAM: toward an advanced intelligent memory system, in ICCD (1999)

81.

M. Kang, M.-S. Keel, N.R. Shanbhag, S. Eilert, K. Curewitz, An energy-efficient VLSI architecture for pattern recognition via deep embedding of computation in SRAM, in ICASSP (2014)

82.

U. Kang, H.-S. Yu, C. Park, H. Zheng, J. Halbert, K. Bains, S. Jang, J. Choi, Co-architecting controllers and DRAM to enhance DRAM process scaling, in The Memory Forum (2014)

83.

M. Karlsson, F. Dahlgren, P. Stenström, A prefetching technique for irregular accesses to linked data structures, in HPCA (2000)

84.

S. Khan, D. Lee, Y. Kim, A.R. Alameldeen, C. Wilkerson, O. Mutlu, The efficacy of error mitigation techniques for DRAM retention failures: a comparative experimental study, in SIGMETRICS (2014)

85.

S. Khan, D. Lee, O. Mutlu, PARBOR: an efficient system-level technique to detect data dependent failures in DRAM, in DSN (2016)

86.

S. Khan, C. Wilkerson, D. Lee, A.R. Alameldeen, O. Mutlu, A case for memory content-based detection and mitigation of data-dependent failures in DRAM, in CAL (2016)

87.

S. Khan, C. Wilkerson, Z. Wang, A. Alameldeen, D. Lee, O. Mutlu, Detecting and mitigating data-dependent DRAM failures by exploiting current memory content, in MICRO (2017)

88.

T. Kilburn, D.B.G. Edwards, M.J. Lanigan, F.H. Sumner, One-level storage system. IRE Trans. Electron Comput. 2, 223–235 (1962)

89.

Y. Kim, Architectural techniques to enhance DRAM scaling. Ph.D. dissertation, Carnegie Mellon University, 2015

90.

Y. Kim, V. Seshadri, D. Lee, J. Liu, O. Mutlu, A case for exploiting subarray-level parallelism (SALP) in DRAM, in ISCA (2012)

91.

Y. Kim, R. Daly, J. Kim, C. Fallin, J.H. Lee, D. Lee, C. Wilkerson, K. Lai, O. Mutlu, Flipping bits in memory without accessing them: an experimental study of DRAM disturbance errors, in ISCA (2014)

92.

Y. Kim, W. Yang, O. Mutlu, Ramulator: a fast and extensible DRAM simulator, in CAL (2015)

93.

D. Kim, J. Kung, S. Chai, S. Yalamanchili, S. Mukhopadhyay, Neurocube: a programmable digital neuromorphic architecture with high-density 3D memory, in ISCA (2016)

94.

Y. Kim, R. Daly, J. Kim, C. Fallin, J.H. Lee, D. Lee, C. Wilkerson, K. Lai, O. Mutlu, RowHammer: reliability analysis and security implications (2016). arXiv:1603.00747 [cs:AR]

95.

G. Kim, N. Chatterjee, M. O’Connor, K. Hsieh, Toward standardized near-data processing with unrestricted data placement for GPUs, in SC (2017)

96.

J.S. Kim, D. Senol, H. Xin, D. Lee, S. Ghose, M. Alser, H. Hassan, O. Ergin, C. Alkan, O. Mutlu, GRIM-Filter: fast seed filtering in read mapping using emerging memory technologies. arXiv:1708.04329 [q-bio.GN] (2017)

97.

J. Kim, M. Patel, H. Hassan, O. Mutlu, The DRAM latency PUF: quickly evaluating physical unclonable functions by exploiting the latency–reliability tradeoff in modern DRAM devices, in HPCA (2018)

98.

J.S. Kim, D. Senol, H. Xin, D. Lee, S. Ghose, M. Alser, H. Hassan, O. Ergin, C. Alkan, O. Mutlu, GRIM-Filter: fast seed location filtering in DNA read mapping using processing-in-memory technologies, in BMC Genomics (2018)

99.

Y.O. Koçberber, B. Grot, J. Picorel, B. Falsafi, K.T. Lim, P. Ranganathan, Meet the walkers: accelerating index traversals for in-memory databases, in MICRO (2013)

100.

P.M. Kogge, EXECUBE-a new architecture for scaleable MPPs, in ICPP (1994)

101.

E. Kültürsay, M. Kandemir, A. Sivasubramaniam, O. Mutlu, Evaluating STT-RAM as an energy-efficient main memory alternative, in ISPASS (2013)

102.

L. Kurian, P.T. Hulina, L.D. Coraor, Memory latency effects in decoupled architectures with a single data memory module, in ISCA (1992)

103.

S. Kvatinsky, A. Kolodny, U.C. Weiser, E.G. Friedman, Memristor-based IMPLY logic design procedure, in ICCD (2011)

104.

S. Kvatinsky, D. Belousov, S. Liman, G. Satat, N. Wald, E.G. Friedman, A. Kolodny, U.C. Weiser, MAGIC—memristor-aided logic, in IEEE TCAS II: Express Briefs (2014)

105.

S. Kvatinsky, G. Satat, N. Wald, E.G. Friedman, A. Kolodny, U.C. Weiser, Memristor-based material implication (IMPLY) logic: design principles and methodologies, in TVLSI (2014)

106.

L. Lamport, How to make a multiprocessor computer that correctly executes multiprocess programs, in IEEE TC (1979)

107.

D. Lee, Reducing DRAM latency at low cost by exploiting heterogeneity. Ph.D. dissertation, Carnegie Mellon University, 2016

108.

J. Lee, Y. Solihin, J. Torrettas, Automatically mapping code on an intelligent memory architecture, in HPCA (2001)

109.

C.J. Lee, O. Mutlu, V. Narasiman, Y.N. Patt, Prefetch-aware DRAM controllers, in MICRO (2008)

110.

B.C. Lee, E. Ipek, O. Mutlu, D. Burger, Architecting phase change memory as a scalable DRAM alternative, in ISCA (2009)

111.

B.C. Lee, E. Ipek, O. Mutlu, D. Burger, Phase change memory architecture and the quest for scalability, in CACM (2010)

112.

B.C. Lee, P. Zhou, J. Yang, Y. Zhang, B. Zhao, E. Ipek, O. Mutlu, D. Burger, Phase-change technology and the future of main memory, in IEEE Micro (2010)

113.

C.J. Lee, O. Mutlu, V. Narasiman, Y.N. Patt, Prefetch-aware memory controllers, in IEEE TC (2011)

114.

D. Lee, Y. Kim, V. Seshadri, J. Liu, L. Subramanian, O. Mutlu, Tiered-latency DRAM: a low latency and low cost DRAM architecture, in HPCA (2013)

115.

D. Lee, F. Hormozdiari, H. Xin, F. Hach, O. Mutlu, C. Alkan, Fast and accurate mapping of complete genomics reads, in Methods (2014)

116.

D. Lee, Y. Kim, G. Pekhimenko, S. Khan, V. Seshadri, K. Chang, O. Mutlu, Adaptive-latency DRAM: optimizing DRAM timing for the common-case, in HPCA (2015)

117.

D. Lee, L. Subramanian, R. Ausavarungnirun, J. Choi, O. Mutlu, Decoupled direct memory access: isolating CPU and IO traffic by leveraging a dual-data-port DRAM, in PACT (2015)

118.

J.H. Lee, J. Sim, H. Kim, BSSync: processing near memory for machine learning workloads with bounded staleness consistency models, in PACT (2015)

119.

D. Lee, S. Ghose, G. Pekhimenko, S. Khan, O. Mutlu, Simultaneous multi-layer access: improving 3D-stacked memory bandwidth at low cost, in TACO (2016)

120.

D. Lee, S. Khan, L. Subramanian, S. Ghose, R. Ausavarungnirun, G. Pekhimenko, V. Seshadri, O. Mutlu, Design-induced latency variation in modern DRAM chips: characterization, analysis, and latency reduction mechanisms, in SIGMETRICS (2017)

121.

Y. Levy, J. Bruck, Y. Cassuto, E.G. Friedman, A. Kolodny, E. Yaakobi, S. Kvatinsky, Logic operations in memory using a memristive Akers array. Microelectron. J. 45, 1429–1437 (2014)

122.

S. Li, J.H. Ahn, R.D. Strong, J.B. Brockman, D.M. Tullsen, N.P. Jouppi, The McPAT framework for multicore and manycore architectures: simultaneously modeling power, area, and timing, in TACO (2013)

123.

S. Li, C. Xu, Q. Zou, J. Zhao, Y. Lu, Y. Xie, Pinatubo: a processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories, in DAC (2016)

124.

S. Li, D. Niu, K.T. Malladi, H. Zheng, B. Brennan, Y. Xie, DRISA: a DRAM-based reconfigurable in-situ accelerator, in MICRO (2017)

125.

K. Lim, J. Chang, T. Mudge, P. Ranganathan, S.K. Reinhardt, T.F. Wenisch, Disaggregated memory for expansion and sharing in blade servers, in ISCA (2009)

126.

K.T. Lim, D. Meisner, A.G. Saidi, P. Ranganathan, T.F. Wenisch, Thin servers with smart pipes: designing SoC accelerators for memcached, in ISCA (2013)

127.

Linaro, 64-Bit Linux Kernel for ARM (2014)

128.

M.H. Lipasti, W.J. Schmidt, S.R. Kunkel, R.R. Roediger, SPAID: software prefetching in pointer- and call-intensive environments, in MICRO (1995)

129.

J. Liu, B. Jaiyen, R. Veras, O. Mutlu, RAIDR: retention-aware intelligent DRAM refresh, in ISCA (2012)

130.

J. Liu, B. Jaiyen, Y. Kim, C. Wilkerson, O. Mutlu, An experimental study of data retention behavior in modern DRAM devices: implications for retention time profiling mechanisms, in ISCA (2013)

131.

Z. Liu, I. Calciu, M. Harlihy, O. Mutlu, Concurrent data structures for near-memory computing, in SPAA (2017)

132.

G.H. Loh, 3D-stacked memory architectures for multi-core processors, in ISCA (2008)

133.

G.H. Loh, N. Jayasena, M. Oskin, M. Nutter, D. Roberts, M. Meswani, D.P. Zhang, M. Ignatowski, A processing in memory taxonomy and a case for studying fixed-function PIM, in WoNDP (2013)

134.

P. Lotfi-Kamran, B. Grot, M. Ferdman, S. Volos, Y.O. Koçberber, J. Picorel, A. Adileh, D. Jevdjic, S. Idgunji, E. Özer, B. Falsafi, Scale-out processors, in ISCA (2012)

135.

C. Luk, Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors, in ISCA (2001)

136.

C. Luk, T.C. Mowry, Compiler-based prefetching for recursive data structures, in ASPLOS (1996)

137.

Y. Luo, S. Govindan, B. Sharma, M. Santaniello, J. Meza, A. Kansal, J. Liu, B. Khessib, K. Vaid, O. Mutlu, Characterizing application memory error vulnerability to optimize datacenter cost via heterogeneous-reliability memory, in DSN (2014)

138.

Y. Luo, S. Ghose, T. Li, S. Govindan, B. Sharma, B. Kelly, A. Boroumand, O. Mutlu, Using ECC DRAM to adaptively increase memory capacity (2017). arXiv:1706.08870 [cs:AR]

139.

D. Lustig, A. Bhattacharjee, M. Martonosi, TLB improvements for chip multiprocessors: inter-core cooperative prefetchers and shared last-level TLBs, in ACM TACO (2013)

140.

K. Mai, T. Paaske, N. Jayasena, R. Ho, W.J. Dally, M. Horowitz, Smart memories: a modular reconfigurable architecture, in ISCA (2000)

141.

J.A. Mandelman, R.H. Dennard, G.B. Bronner, J.K. DeBrosse, R. Divakaruni, Y. Li, C.J. Radens, Challenges and future directions for the scaling of dynamic random-access memory (DRAM), in IBM JRD (2002)

142.

Y. Mao, E. Kohler, R.T. Morris, Cache craftiness for fast multicore key-value storage, in EuroSys (2012)

143.

S.A. McKee, Reflections on the memory wall, in CF (2004)

144.

MemSQL, Inc., MemSQL. http://www.memsql.com

145.

M.R. Meswani, S. Blagodurov, D. Roberts, J. Slice, M. Ignatowski, G.H. Loh, Heterogeneous memory architectures: a HW/SW approach for mixing die-stacked and off-package memories, in HPCA (2015), pp. 126–136

146.

J. Meza, Y. Luo, S. Khan, J. Zhao, Y. Xie, O. Mutlu, A case for efficient hardware-software cooperative management of storage and memory, in WEED (2013)

147.

J. Meza, Q. Wu, S. Kumar, O. Mutlu, Revisiting memory errors in large-scale production data centers: analysis and modeling of new trends from the field, in DSN (2015)

148.

N. Mirzadeh, O. Kocberber, B. Falsafi, B. Grot, Sort vs. hash join revisited for near-memory execution, in ASBD (2007)

149.

A. Morad, L. Yavits, R. Ginosar, GP-SIMD processing-in-memory, in ACM TACO (2015)

150.

J. Mukundan, H. Hunter, K.H. Kim, J. Stuecheli, J.F. Martínez, Understanding and mitigating refresh overheads in high-density DDR4 DRAM systems, in ISCA (2013)

151.

O. Mutlu, Memory scaling: a systems architecture perspective, in IMW (2013)

152.

O. Mutlu, The RowHammer problem and other issues we may face as memory becomes denser, in DATE (2017)

153.

O. Mutlu, J. Stark, C. Wilkerson, Y.N. Patt, Runahead execution: an alternative to very large instruction windows for out-of-order processors, in HPCA (2003)

154.

O. Mutlu, J. Stark, C. Wilkerson, Y.N. Patt, Runahead execution: an effective alternative to large instruction windows, in IEEE Micro (2003)

155.

O. Mutlu, H. Kim, Y.N. Patt, Address-value delta (AVD) prediction: increasing the effectiveness of runahead execution by exploiting regular memory allocation patterns, in MICRO (2005)

156.

O. Mutlu, H. Kim, Y.N. Patt, Techniques for efficient processing in runahead execution engines, in ISCA (2005)

157.

O. Mutlu, H. Kim, Y.N. Patt, Address-value delta (AVD) prediction: a hardware technique for efficiently parallelizing dependent cache misses, in TC (2006)

158.

O. Mutlu, H. Kim, Y.N. Patt, Efficient runahead execution: power-efficient memory latency tolerance, in IEEE Micro (2006)

159.

O. Mutlu, T. Moscibroda, Parallelism-aware batch scheduling: enhancing both performance and fairness of shared DRAM systems, in ISCA (2008)

160.

O. Mutlu, L. Subramanian, Research problems and opportunities in memory systems, in SUPERFRI (2014)

161.

A. Muzahid, D. Suárez, S. Qi, J. Torrellas, SigRace: signature-based data race detection, in ISCA (2009)

162.

H. Naeimi, C. Augustine, A. Raychowdhury, S.-L. Lu, J. Tschanz, STT-RAM scaling and retention failure. Intel Technol. J. 17, 54–75 (2013)

163.

L. Nai, R. Hadidi, J. Sim, H. Kim, P. Kumar, H. Kim, GraphPIM: enabling instruction-level PIM offloading in graph computing frameworks, in HPCA (2017)

164.

B. Naylor, J. Amanatides, W. Thibault, Merging BSP trees yields polyhedral set operations, in SIGGRAPH (1990)

165.

OProfile, http://oprofile.sourceforge.net/

166.

M. Oskin, F.T. Chong, T. Sherwood, Active pages: a computation model for intelligent memory, in ISCA (1998)

167.

M.S. Papamarcos, J.H. Patel, A low-overhead coherence solution for multiprocessors with private. Cache memories, in ISCA (1984)

168.

M. Patel, J. Kim, O. Mutlu, The reach profiler (REAPER): enabling the mitigation of DRAM retention failures via profiling at aggressive conditions, in ISCA (2017)

169.

Y.N. Patt, W.-M. Hwu, M. Shebanow, HPS, a new microarchitecture: rationale and introduction, in MICRO (1985)

170.

Y.N. Patt, S.W. Melvin, W.-M. Hwu, M.C. Shebanow, Critical issues regarding HPS, a high performance microarchitecture, in MICRO, (1985)

171.

D. Patterson, T. Anderson, N. Cardwell, R. Fromm, K. Keeton, C. Kozyrakis, R. Thomas, K. Yelick, A case for intelligent RAM, in IEEE Micro (1997)

172.

A. Pattnaik, X. Tang, A. Jog, O. Kayiran, A.K. Mishra, M.T. Kandemir, O. Mutlu, C.R. Das, Scheduling techniques for GPU architectures with processing-in-memory capabilities, in PACT (2016)

173.

B. Pichai, L. Hsu, A. Bhattacharjee, Architectural support for address translation on GPUs: designing memory management units for CPU/GPUs with unified address spaces, in ASPLOS (2014)

174.

G. Pokam, C. Pereira, K. Danne, R. Kassa, A.-R. Adl-Tabatabai, Architecting a chunk-based memory race recorder in modern CMPs, in MICRO (2009)

175.

J. Power, M.D. Hill, D.A. Wood, Supporting x86-64 address translation for 100s of GPU lanes, in HPCA (2014)

176.

S.H. Pugsley, J. Jestes, H. Zhang, R. Balasubramonian, V. Srinivasan, A. Buyuktosunoglu, A. Davis, F. Li, NDC: analyzing the impact of 3D-stacked memory+logic devices on mapreduce workloads, in ISPASS (2014)

177.

M.K. Qureshi, M.A. Suleman, Y.N. Patt, Line distillation: increasing cache capacity by filtering unused words in cache lines, in HPCA (2007)

178.

M.K. Qureshi, A. Jaleel, Y.N. Patt, S.C. Steely Jr., J. Emer, Adaptive insertion policies for high-performance caching, in ISCA (2007)

179.

M.K. Qureshi, V. Srinivasan, J.A. Rivers, Scalable high performance main memory system using phase-change memory technology, in ISCA (2009)

180.

M.K. Qureshi, D.H. Kim, S. Khan, P.J. Nair, O. Mutlu, AVATAR: a variable-retention-time (VRT) aware refresh for DRAM systems, in DSN (2015)

181.

J. Ren, J. Zhao, S. Khan, J. Choi, Y. Wu, O. Mutlu, ThyNVM: enabling software-transparent crash consistency in persistent memory systems, in MICRO (2015)

182.

S. Rixner, W.J. Dally, U.J. Kapasi, P. Mattson, J.D. Owens, Memory access scheduling, in ISCA (2000)

183.

O. Rodeh, C. Mason, J. Bacik, BTRFS: the Linux B-tree filesystem, in TOS (2013)

184.

A. Rogers, M. C. Carlisle, J.H. Reppy, L.J. Hendren, Supporting dynamic data structures on distributed-memory machines, in TOPLAS (1995)

185.

P. Rosenfeld, E. Cooper-Balis, B. Jacob, DRAMSim2: a cycle accurate memory system simulator, in CAL (2011)

186.

A. Roth, G.S. Sohi, Effective jump-pointer prefetching for linked data structures, in ISCA (1999)

187.

A. Roth, A. Moshovos, G.S. Sohi, Dependence based prefetching for linked data structures, in ASPLOS (1998)

188.

SAFARI Research Group, IMPICA (in-memory pointer chasing accelerator) – GitHub repository. https://github.com/CMU-SAFARI/IMPICA/

189.

SAFARI Research Group, Ramulator: A DRAM simulator – GitHub repository. https://github.com/CMU-SAFARI/ramulator/

190.

SAFARI Research Group, SAFARI software tools – GitHub repository. https://github.com/CMU-SAFARI/

191.

SAFARI Research Group, SoftMC v1.0 – GitHub repository. https://github.com/CMU-SAFARI/SoftMC/

192.

D. Sanchez, L. Yen, M.D. Hill, K. Sankaralingam, Implementing signatures for transactional memory, in MICRO (2007)

193.

SAP SE, SAP HANA. http://www.hana.sap.com/

194.

B. Schroeder, E. Pinheiro, W.-D. Weber, DRAM errors in the wild: a large-scale field study, in SIGMETRICS (2009)

195.

V. Seshadri, D. Lee, T. Mullins, H. Hassan, A. Boroumand, J. Kim, M.A. Kozuch, O. Mutlu, P.B. Gibbons, T.C. Mowry, Buddy-RAM: improving the performance and efficiency of bulk bitwise operations using DRAM (2016). arXiv:1611.09988 [cs:AR]

196.

V. Seshadri, D. Lee, T. Mullins, H. Hassan, A. Boroumand, J. Kim, M.A. Kozuch, O. Mutlu, P.B. Gibbons, T.C. Mowry, Ambit: in-memory accelerator for bulk bitwise operations using commodity DRAM technology, in MICRO (2017)

197.

V. Seshadri, Simple DRAM and virtual memory abstractions to enable highly efficient memory systems. Ph.D. dissertation, Carnegie Mellon University, 2016

198.

V. Seshadri, O. Mutlu, The processing using memory paradigm: In-DRAM bulk copy, initialization, bitwise AND and OR (2016). arXiv:1610.09603 [cs:AR]

199.

V. Seshadri, O. Mutlu, Simple operations in memory to reduce data movement. Adv. Comput. 106, 107–166 (2017)

200.

V. Seshadri, Y. Kim, C. Fallin, D. Lee, R. Ausavarungnirun, G. Pekhimenko, Y. Luo, O. Mutlu, M.A. Kozuch, P.B. Gibbons, T.C. Mowry, RowClone: fast and energy-efficient in-DRAM bulk data copy and initialization, in MICRO (2013)

201.

V. Seshadri, A. Bhowmick, O. Mutlu, P.B. Gibbons, M.A. Kozuch, T.C. Mowry, The dirty-block index, in ISCA (2014)

202.

V. Seshadri, K. Hsieh, A. Boroumand, D. Lee, M.A. Kozuch, O. Mutlu, P.B. Gibbons, T.C. Mowry, Fast bulk bitwise AND and OR in DRAM, CAL (2015)

203.

V. Seshadri, T. Mullins, A. Boroumand, O. Mutli, P.B. Gibbons, M.A. Kozuch, T.C. Mowry, Gather-scatter DRAM: in-DRAM address translation to improve the spatial locality of non-unit strided accesses, in MICRO (2015)

204.

V. Seshadri, S. Yedkar, H. Xin, O. Mutlu, P.B. Gibbons, M.A. Kozuch, T.C. Mowry, Mitigating prefetcher-caused pollution using informed caching policies for prefetched blocks. ACM TACO 11(4), 51:1–51:22 (2015)

205.

A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J.P. Strachan, M. Hu, R.S. Williams, V. Srikumar, ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars, in ISCA (2016)

206.

J.S. Shapiro, J. Adams, Design evolution of the EROS single-level store, in USENIX ATC (2002)

207.

J.S. Shapiro, J.M. Smith, D.J. Farber, EROS: a fast capability system, in SOSP (1999)

208.

D.E. Shaw, S.J. Stolfo, H. Ibrahim, B. Hillyer, G. Wiederhold, J. Andrews, The NON-VON database machine: a brief overview. IEEE Database Eng. Bull. 4, 41–52 (1981)

209.

J. Shun, G.E. Blelloch, Ligra: a lightweight graph processing framework for shared memory, in PPoPP (2013)

210.

J.E. Smith, Decoupled access/execute computer architectures, in ISCA (1982)

211.

J.E. Smith, Dynamic instruction scheduling and the astronautics ZS-1, in Computer (1986)

212.

J.E. Smith, S. Weiss, N.Y. Pang, A simulation study of decoupled architecture computers, in IEEE TC (1986)

213.

Y. Solihin, J. Torrellas, J. Lee, Using a user-level memory thread for correlation prefetching, in ISCA (2002)

214.

V. Sridharan, N. DeBardeleben, S. Blanchard, K.B. Ferreira, J. Stearley, J. Shalf, S. Gurumurthi, Memory errors in modern systems: the good, the bad, and the ugly, in ASPLOS (2015)

215.

S. Srikantaiah, M. Kandemir, Synergistic TLBs for high performance address translation in chip multiprocessors, in MICRO (2010)

216.

S. Srinath, O. Mutlu, H. Kim, Y.N. Patt, Feedback directed prefetching: improving the performance and bandwidth-efficiency of hardware prefetchers, in HPCA (2007)

217.

Stanford Network Analysis Project, http://snap.stanford.edu/

218.

H.S. Stone, A logic-in-memory computer, in TC (1970)

219.

M. Stonebraker, A. Weisberg, The VoltDB main memory DBMS. IEEE Data Eng. Bull. 36, 21–27 (2013)

220.

D.B. Strukov, G.S. Snider, D.R. Stewart, R.S. Williams, The missing memristor found. Nature 453, 80 (2008)

221.

Z. Sura, A. Jacob, T. Chen, B. Rosenburg, O. Sallenave, C. Bertolli, S. Antao, J. Brunheroto, Y. Park, K. O’Brien, R. Nair, Data access optimization in a processing-in-memory system, in CF (2015)

222.

R.M. Tomasulo, An efficient algorithm for exploiting multiple arithmetic units, in IBM JRD (1967)

223.

Transaction Processing Performance Council, TPC benchmarks. http://www.tpc.org

224.

M. Waldvogel, G. Varghese, J. Turner, B. Plattner, Scalable high speed IP routing lookups, in SIGCOMM (1997)

225.

L. Wang, J. Zhan, C. Luo, Y. Zhu, Q. Yang, Y. He, W. Gao, Z. Jia, Y. Shi, S. Zhang, C. Zheng, G. Lu, K. Zhan, X. Li, B. Qiu, BigDataBench: a big data benchmark suite from internet services, in HPCA (2014)

226.

M.V. Wilkes, The memory gap and the future of high performance memories, in CAN (2001)

227.

P.R. Wilson, Uniprocessor garbage collection techniques, in IWMM (1992)

228.

H.-S.P. Wong, S. Raoux, S. Kim, J. Liang, J.P. Reifenberg, B. Rajendran, M. Asheghi, K.E. Goodson, Phase change memory. Proc. IEEE 98, 2201–2227 (2010)

229.

H.-S.P. Wong, H.-Y. Lee, S. Yu, Y.-S. Chen, Y. Wu, P.-S. Chen, B. Lee, F.T. Chen, M.-J. Tsai, Metal-oxide RRAM. Proc. IEEE 100, 1951–1970 (2012)

230.

L. Wu, R.J. Barker, M.A. Kim, K.A. Ross, Navigating big data with high-throughput, energy-efficient data partitioning, in ISCA (2013)

231.

L. Wu, A. Lottarini, T.K. Paine, M.A. Kim, K.A. Ross, Q100: the architecture and design of a database processing unit, in ASPLOS (2014)

232.

Y. Wu, Efficient discovery of regular stride patterns in irregular programs, in PLDI (2002)

233.

W.A. Wulf, S.A. McKee, Hitting the memory wall: implications of the obvious, CAN (1995)

234.

S.L. Xi, O. Babarinsa, M. Athanassoulis, S. Idreos, Beyond the wall: near-data processing for databases, in DaMoN (2015)

235.

H. Xin, D. Lee, F. Hormozdiari, S. Yedkar, O. Mutlu, C. Alkan, Accelerating read mapping with FastHASH, in BMC Genomics (2013)

236.

H. Xin, J. Greth, J. Emmons, G. Pekhimenko, C. Kingsford, C. Alkan, O. Mutlu, Shifted hamming distance: a fast and accurate SIMD-friendly filter to accelerate alignment verification in read mapping. Bioinformatics 31, 1553–1560 (2015)

237.

J. Xue, Z. Yang, Z. Qu, S. Hou, Y. Dai, Seraph: an efficient, low-cost system for concurrent graph processing, in HPDC (2014)

238.

C. Yang, A.R. Lebeck, Push vs. pull: data movement for linked data structures, in ICS (2000)

239.

H. Yoon, R.A.J. Meza, R. Harding, O. Mutlu, Row buffer locality aware caching policies for hybrid memories, in ICCD (2012)

240.

H. Yoon, J. Meza, N. Muralimanohar, N.P. Jouppi, O. Mutlu, Efficient data mapping and buffering techniques for multilevel cell phase-change memories, in ACM TACO (2014)

241.

X. Yu, G. Bezerra, A. Pavlo, S. Devadas, M. Stonebraker, Staring into the abyss: an evaluation of concurrency control with one thousand cores, in VLDB (2014)

242.

X. Yu, C.J. Hughes, N. Satish, S. Devadas, IMP: indirect memory prefetcher, in MICRO (2015)

243.

D.P. Zhang, N. Jayasena, A. Lyashevsky, J.L. Greathouse, L. Xu, M. Ignatowski, TOP-PIM: throughput-oriented programmable processing in memory, in HPDC (2014)

244.

J. Zhao, O. Mutlu, Y. Xie, FIRM: fair and high-performance memory control for persistent memory systems, in MICRO (2014)

245.

P. Zhou, B. Zhao, J. Yang, Y. Zhang, A durable and energy efficient main memory using phase change memory technology, in ISCA (2009)

246.

Q. Zhu, T. Graf, H.E. Sumbul, L. Pileggi, F. Franchetti, Accelerating sparse matrix-matrix multiplication with 3D-stacked logic-in-memory hardware, in HPEC (2013)

247.

C.B. Zilles, Benchmark health considered harmful, in CAN (2001)

248.

C.B. Zilles, G.S. Sohi, Execution-based prediction using speculative slices, in ISCA (2001)

249.

W.K. Zuravleff, T. Robinson, Controller for a synchronous DRAM that maximizes throughput by allowing memory requests and commands to be issued out of order. US Patent No. 5,630,096 (1997)

Title: The Processing-in-Memory Paradigm: Mechanisms to Enable Adoption
Authors: Saugata Ghose
Kevin Hsieh
Amirali Boroumand
Rachata Ausavarungnirun
Onur Mutlu
Publisher: Springer International Publishing
Book: Beyond-CMOS Technologies for Next Generation Computer Design
Print ISBN: 978-3-319-90384-2

Electronic ISBN: 978-3-319-90385-9

Copyright Year: 2019
DOI: https://doi.org/10.1007/978-3-319-90385-9_5

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"