Skip to main content
Top
Published in: International Journal of Parallel Programming 1/2015

01-02-2015

BADCO: Behavioral Application-Dependent Superscalar Core Models

Authors: Ricardo A. Velásquez, Pierre Michaud, André Seznec

Published in: International Journal of Parallel Programming | Issue 1/2015

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Microarchitecture research and development rely heavily on simulators. The ideal simulator should be simple and easy to develop, it should be precise, accurate and very fast. But the ideal simulator does not exist, and microarchitects use different sorts of simulators at different stages of the development of a processor, depending on which is most important, accuracy or simulation speed. Approximate microarchitecture models, which trade accuracy for simulation speed, are very useful for research and design space exploration, provided the loss of accuracy remains acceptable. Behavioral superscalar core modeling is a possible way to trade accuracy for simulation speed in situations where the focus of the study is not the core itself. In this approach, a superscalar core is viewed as a black box emitting requests to the uncore at certain times. A behavioral core model can be connected to a detailed uncore model. Behavioral core models are built from detailed simulations. Once the time to build the model is amortized, important simulation speedups can be obtained. We describe and study a new method for defining behavioral models for modern superscalar cores. The proposed Behavioral Application-Dependent Superscalar Core model, BADCO, predicts the execution time of a thread running on a superscalar core with an error less than 10 % in most cases. We show that BADCO is qualitatively accurate, being able to predict how performance changes when we change the uncore. The simulation speedups we obtained are typically between one and two orders of magnitude.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
Lee et al. used SimpleScalar sim-outorder [1] for their experiments.
 
2
If an L1 miss Y is data-dependent on a delayed L1 hit which is waiting for a cache line requested by a previous L1 miss X, then Y is considered data-dependent on X [2].
 
3
The only exception is L1 store miss, because store requests are processed at commit on x86 architectures.
 
4
The fetch stall models the pipeline flush on real architectures.
 
5
Feedback mechanisms in the prefetcher throttle prefetch requests when needed. The occupancy of the MSHR and the utilization of the bus, for example, can be monitored by the prefetch controller to decide about whether or not to issue prefetch requests.
 
6
We attach a DL1 miss request to the first \(\mu \)op (load or store) accessing that cache line. We attach a DL1 prefetch to the \(\mu \)op triggering the prefetch. We attach a write-back request to the same \(\mu \)op to which the request causing the write-back is attached.
 
7
The Zesto model implements next-line prefetching for the instructions, but does not pipeline the instruction misses. Node fetching mimics this behavior.
 
8
For instance, the Zesto simulator allocates an MSHR entry for each delayed hit (i.e., hits on a pending miss). Our BADCO machine does the same in our experiments. This is why we simulate an unlimited MSHR for generating trace TL, so as to capture all potential delayed hits in the trace.
 
9
Each request to the uncore is attached to a single \(\mu \)op.
 
10
D(X) is null or 0 when there is not request \(\mu \)op whose CT is less than the IT of X. This just happen at the beginning of the trace.
 
11
The uncore simulator was extracted from Zesto.
 
Literature
2.
go back to reference Chen, X.E., Aamodt, T.M.: Hybrid analytical modeling of pending cache hits, data prefetching, and MSHRs. In: Proceedings of the 41st International Symposium on Microarchitecture (2008) Chen, X.E., Aamodt, T.M.: Hybrid analytical modeling of pending cache hits, data prefetching, and MSHRs. In: Proceedings of the 41st International Symposium on Microarchitecture (2008)
3.
go back to reference Cho, S., Demetriades, S., Evans, S., Jin, L., Lee, H., Lee, K., Moeng, M.: TPTS : a novel framework for very fast manycore processor architecture simulation. In: Proceedings of the 37th International Conference on Parallel Processing (2008) Cho, S., Demetriades, S., Evans, S., Jin, L., Lee, H., Lee, K., Moeng, M.: TPTS : a novel framework for very fast manycore processor architecture simulation. In: Proceedings of the 37th International Conference on Parallel Processing (2008)
4.
go back to reference Durbhakula, M., Pai, V.S., Adve, S.: Improving the accuracy vs. speed tradeoff for simulating shared-memory multiprocessors with ILP processors. In: Proceedings of the 5th International Symposium on High-Performance Computer Architecture (1999) Durbhakula, M., Pai, V.S., Adve, S.: Improving the accuracy vs. speed tradeoff for simulating shared-memory multiprocessors with ILP processors. In: Proceedings of the 5th International Symposium on High-Performance Computer Architecture (1999)
5.
go back to reference Eyerman, S., Eeckhout, L., Karkhanis, T., Smith, J.E.: A mechanistic performance model for superscalar out-of-order processors. ACM Trans. Comput. Syst. 27(2) (2009) Eyerman, S., Eeckhout, L., Karkhanis, T., Smith, J.E.: A mechanistic performance model for superscalar out-of-order processors. ACM Trans. Comput. Syst. 27(2) (2009)
6.
go back to reference Eyerman, S., Smith, J.E., Eeckhout, L.: Characterizing the branch misprediction penalty. In: Proceedings of the International Symposium on Performance Analysis of Systems and Software (2011) Eyerman, S., Smith, J.E., Eeckhout, L.: Characterizing the branch misprediction penalty. In: Proceedings of the International Symposium on Performance Analysis of Systems and Software (2011)
7.
go back to reference Fields, B.A., Bodik, R., Hill, M.D., Newburn, C.J.: Using interaction costs for microarchitectural bottleneck analysis. In: Proceedings of the 36th International Symposium on Microarchitecture (2003) Fields, B.A., Bodik, R., Hill, M.D., Newburn, C.J.: Using interaction costs for microarchitectural bottleneck analysis. In: Proceedings of the 36th International Symposium on Microarchitecture (2003)
8.
go back to reference Fields, B., Rubin, S., Bodik, R.: Focusing processor policies via critical-path prediction. In: Proceedings of the 28th International Symposium on Computer Architecture (2001) Fields, B., Rubin, S., Bodik, R.: Focusing processor policies via critical-path prediction. In: Proceedings of the 28th International Symposium on Computer Architecture (2001)
9.
go back to reference Genbrugge, D., Eyerman, S., Eeckhout, L.: Interval simulation : raising the level of abstraction in architectural simulation. In: Proceedings of the 16th International Symposium on High-Performance Computer Architecture (2010) Genbrugge, D., Eyerman, S., Eeckhout, L.: Interval simulation : raising the level of abstraction in architectural simulation. In: Proceedings of the 16th International Symposium on High-Performance Computer Architecture (2010)
10.
go back to reference Goldschmidt, S.R., Hennessy, J.L.: The accuracy of trace-driven simulations of multiprocessors. In: Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (1993) Goldschmidt, S.R., Hennessy, J.L.: The accuracy of trace-driven simulations of multiprocessors. In: Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (1993)
11.
go back to reference \({\ddot{\rm I}}\)pek, E., McKee, S., Caruana, R., de Supinski, B., Schulz, M.: Efficiently Exploring Architectural Design Spaces Via Predictive Modeling, vol. 40. ACM (2006) \({\ddot{\rm I}}\)pek, E., McKee, S., Caruana, R., de Supinski, B., Schulz, M.: Efficiently Exploring Architectural Design Spaces Via Predictive Modeling, vol. 40. ACM (2006)
12.
go back to reference Joseph, P., Vaswani, K., Thazhuthaveetil, M.: Construction and use of linear regression models for processor performance analysis. In: High-Performance Computer Architecture, 2006. The Twelfth International Symposium on, pp. 99–108. IEEE (2006) Joseph, P., Vaswani, K., Thazhuthaveetil, M.: Construction and use of linear regression models for processor performance analysis. In: High-Performance Computer Architecture, 2006. The Twelfth International Symposium on, pp. 99–108. IEEE (2006)
13.
go back to reference Kanaujia, S., Papazian, I.E., Chamberlain, J., Baxter, J.: FastMP : a multi-core simulation methodology. In: Workshop on Modeling, Benchmarking and Simulation (2006) Kanaujia, S., Papazian, I.E., Chamberlain, J., Baxter, J.: FastMP : a multi-core simulation methodology. In: Workshop on Modeling, Benchmarking and Simulation (2006)
14.
go back to reference Karkhanis, T.S., Smith, J.E.: A first-order superscalar processor model. In: Proceedings of the 31st International Symposium on Computer Architecture (2004) Karkhanis, T.S., Smith, J.E.: A first-order superscalar processor model. In: Proceedings of the 31st International Symposium on Computer Architecture (2004)
15.
go back to reference Lee, K., Cho, S.: In-N-Out : reproducing out-of-order superscalar processor behavior from reduced in-order traces. In: Proceedings of the IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS) (2011) Lee, K., Cho, S.: In-N-Out : reproducing out-of-order superscalar processor behavior from reduced in-order traces. In: Proceedings of the IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS) (2011)
16.
go back to reference Lee, K., Evans, S., Cho, S.: Accurately approximating superscalar processor performance from traces. In: Proceedings of the International Symposium on Performance Analysis of Systems and Software (2009) Lee, K., Evans, S., Cho, S.: Accurately approximating superscalar processor performance from traces. In: Proceedings of the International Symposium on Performance Analysis of Systems and Software (2009)
17.
go back to reference Li, Y., Lee, B., Brooks, D., Hu, Z., Skadron, K.: CMP design space exploration subject to physical constraints. In: Proceedings of the 12th International Symposium on High Performance Computer Architecture (2006) Li, Y., Lee, B., Brooks, D., Hu, Z., Skadron, K.: CMP design space exploration subject to physical constraints. In: Proceedings of the 12th International Symposium on High Performance Computer Architecture (2006)
18.
go back to reference Loh, G., Subramaniam, S., Xie, Y.: Zesto : a cycle-level simulator for highly detailed microarchitecture exploration. In: Proceedings of the International Symposium on Performance Analysis of Systems and Software (2009) Loh, G., Subramaniam, S., Xie, Y.: Zesto : a cycle-level simulator for highly detailed microarchitecture exploration. In: Proceedings of the International Symposium on Performance Analysis of Systems and Software (2009)
19.
go back to reference Loh, G.: A time-stamping algorithm for efficient performance estimation of superscalar processors. In: Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (2001) Loh, G.: A time-stamping algorithm for efficient performance estimation of superscalar processors. In: Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (2001)
20.
go back to reference Moses, J., Illikkal, R., Iyer, R., Huggahalli, R., Newell, D.: ASPEN : towards effective simulation of threads & engines in evolving platforms. In: Proceedings of the 12th IEEE / ACM International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (2004) Moses, J., Illikkal, R., Iyer, R., Huggahalli, R., Newell, D.: ASPEN : towards effective simulation of threads & engines in evolving platforms. In: Proceedings of the 12th IEEE / ACM International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (2004)
21.
go back to reference Mutlu, O., Kim, H., Armstrong, D., Patt, Y.: Understanding the effects of wrong-path memory references on processor performance. In: Proceedings of the 3rd Workshop on Memory Performance Issues: In Conjunction with the 31st International Symposium on Computer Architecture, pp. 56–64. ACM (2004) Mutlu, O., Kim, H., Armstrong, D., Patt, Y.: Understanding the effects of wrong-path memory references on processor performance. In: Proceedings of the 3rd Workshop on Memory Performance Issues: In Conjunction with the 31st International Symposium on Computer Architecture, pp. 56–64. ACM (2004)
22.
go back to reference Noonburg, D.B., Shen, J.P.: Theoretical modeling of superscalar processor performance. In: Proceedings of the 27th International Symposium on Microarchitecture (1994) Noonburg, D.B., Shen, J.P.: Theoretical modeling of superscalar processor performance. In: Proceedings of the 27th International Symposium on Microarchitecture (1994)
23.
go back to reference Rico, A., Duran, A., Cabarcas, F., Etsion, Y., Ramirez, A., Valero, M.: Trace-driven simulation of multithreaded applications. In: Proceedings of the International Symposium on Performance Analysis of Systems and Software (2011) Rico, A., Duran, A., Cabarcas, F., Etsion, Y., Ramirez, A., Valero, M.: Trace-driven simulation of multithreaded applications. In: Proceedings of the International Symposium on Performance Analysis of Systems and Software (2011)
24.
go back to reference Ryckbosch, F., Polfliet, S., Eeckhout, L.: Fast, accurate, and validated full-system software simulation on x86 hardware. IEEE Micro 30(6), 46–56 (2010)CrossRef Ryckbosch, F., Polfliet, S., Eeckhout, L.: Fast, accurate, and validated full-system software simulation on x86 hardware. IEEE Micro 30(6), 46–56 (2010)CrossRef
25.
go back to reference Sendag, R., Yilmazer, A., Yi, J., Uht, A.: Quantifying and reducing the effects of wrong-path memory references in cache-coherent multiprocessor systems. In: Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International, pp. 10-pp. IEEE (2006) Sendag, R., Yilmazer, A., Yi, J., Uht, A.: Quantifying and reducing the effects of wrong-path memory references in cache-coherent multiprocessor systems. In: Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International, pp. 10-pp. IEEE (2006)
26.
go back to reference Sorin, D.J., Pai, V.S., Adve, S.V., Vernon, M.K., Wood, D.A.: Analytic evaluation of shared-memory systems with ILP processors. In: Proceedings of the 25th International Symposium on Computer Architecture (1998) Sorin, D.J., Pai, V.S., Adve, S.V., Vernon, M.K., Wood, D.A.: Analytic evaluation of shared-memory systems with ILP processors. In: Proceedings of the 25th International Symposium on Computer Architecture (1998)
27.
go back to reference Zhao, L., Iyer, R., Moses, J., Illikkal, R., Makineni, S., Newell, D.: Exploring large-scale CMP architectures using ManySim. IEEE Micro 27(4), 21–33 (2007)CrossRef Zhao, L., Iyer, R., Moses, J., Illikkal, R., Makineni, S., Newell, D.: Exploring large-scale CMP architectures using ManySim. IEEE Micro 27(4), 21–33 (2007)CrossRef
Metadata
Title
BADCO: Behavioral Application-Dependent Superscalar Core Models
Authors
Ricardo A. Velásquez
Pierre Michaud
André Seznec
Publication date
01-02-2015
Publisher
Springer US
Published in
International Journal of Parallel Programming / Issue 1/2015
Print ISSN: 0885-7458
Electronic ISSN: 1573-7640
DOI
https://doi.org/10.1007/s10766-013-0278-1

Other articles of this Issue 1/2015

International Journal of Parallel Programming 1/2015 Go to the issue

Premium Partner