Skip to main content

2011 | OriginalPaper | Buchkapitel

3. ECL/HPCC: A Unified Approach to Big Data

verfasst von : Anthony M. Middleton, David Alan Bayliss, Gavin Halliday

Erschienen in: Handbook of Data Intensive Computing

Verlag: Springer New York

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

As a result of the continuing information explosion, many organizations are experiencing what is now called the “Big Data” problem. This results in the inability of organizations to effectively use massive amounts of their data in datasets which have grown too big to process in a timely manner. Data-intensive computing represents a new computing paradigm [26] which can address the big data problem using high-performance architectures supporting scalable parallel processing to allow government, commercial organizations, and research environments to process massive amounts of data and implement new applications previously thought to be impractical or infeasible.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Abbas, A. (2004). Grid computing: A practical guide to technology and applications. Hingham, MA: Charles River Media, Inc. Abbas, A. (2004). Grid computing: A practical guide to technology and applications. Hingham, MA: Charles River Media, Inc.
2.
Zurück zum Zitat Agichtein, E. (2004). Scaling information extraction to large document collections: Microsoft Research. Agichtein, E. (2004). Scaling information extraction to large document collections: Microsoft Research.
3.
Zurück zum Zitat Agichtein, E., & Ganti, V. (2004). Mining reference tables for automatic text segmentation. Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA, 20–29. Agichtein, E., & Ganti, V. (2004). Mining reference tables for automatic text segmentation. Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA, 20–29.
4.
Zurück zum Zitat Bayliss, D. A. (2010a). Aggregated data analysis: The paradigm shift (Whitepaper): LexisNexis. Bayliss, D. A. (2010a). Aggregated data analysis: The paradigm shift (Whitepaper): LexisNexis.
5.
Zurück zum Zitat Bayliss, D. A. (2010b). Enterrprise control language overview (Whitepaper): LexisNexis. Bayliss, D. A. (2010b). Enterrprise control language overview (Whitepaper): LexisNexis.
6.
Zurück zum Zitat Bayliss, D. A. (2010c). Thinking declaratively (Whitepaper). Bayliss, D. A. (2010c). Thinking declaratively (Whitepaper).
7.
Zurück zum Zitat Berman, F. (2008). Got data? A guide to data preservation in the information age. Communications of the ACM, 51(12), 50–56.CrossRef Berman, F. (2008). Got data? A guide to data preservation in the information age. Communications of the ACM, 51(12), 50–56.CrossRef
9.
Zurück zum Zitat Buyya, R. (1999). High performance cluster computing. Upper Saddle River, NJ: Prentice Hall. Buyya, R. (1999). High performance cluster computing. Upper Saddle River, NJ: Prentice Hall.
10.
Zurück zum Zitat Buyya, R., Yeo, C. S., Venugopal, S., Broberg, J., & Brandic, I. (2009). Cloud computing and emerging it platforms: Vision, hype, and reality for delivering computing as the 5th utility. Future Generation Computer Systems, 25(6), 599–616.CrossRef Buyya, R., Yeo, C. S., Venugopal, S., Broberg, J., & Brandic, I. (2009). Cloud computing and emerging it platforms: Vision, hype, and reality for delivering computing as the 5th utility. Future Generation Computer Systems, 25(6), 599–616.CrossRef
11.
Zurück zum Zitat Cerf, V. G. (2007). An information avalanche. IEEE Computer, 40(1), 104–105.CrossRef Cerf, V. G. (2007). An information avalanche. IEEE Computer, 40(1), 104–105.CrossRef
12.
Zurück zum Zitat Chaiken, R., Jenkins, B., Larson, P.-A., Ramsey, B., Shakib, D., Weaver, S., et al. (2008). Scope: Easy and efficient parallel processing of massive data sets. Proceedings of the VLDB Endowment, 1, 1265–1276. Chaiken, R., Jenkins, B., Larson, P.-A., Ramsey, B., Shakib, D., Weaver, S., et al. (2008). Scope: Easy and efficient parallel processing of massive data sets. Proceedings of the VLDB Endowment, 1, 1265–1276.
13.
Zurück zum Zitat Dean, J., & Ghemawat, S. (2004). Mapreduce: Simplified data processing on large clusters. Proceedings of the Sixth Symposium on Operating System Design and Implementation (OSDI). Dean, J., & Ghemawat, S. (2004). Mapreduce: Simplified data processing on large clusters. Proceedings of the Sixth Symposium on Operating System Design and Implementation (OSDI).
14.
Zurück zum Zitat Dean, J., & Ghemawat, S. (2010). Mapreduce: A flexible data processing tool. Communications of the ACM, 53(1), 72–77.CrossRef Dean, J., & Ghemawat, S. (2010). Mapreduce: A flexible data processing tool. Communications of the ACM, 53(1), 72–77.CrossRef
15.
Zurück zum Zitat Dowd, K., & Severance, C. (1998). High performance computing. Sebastopol, CA: O’Reilly and Associates, Inc. Dowd, K., & Severance, C. (1998). High performance computing. Sebastopol, CA: O’Reilly and Associates, Inc.
16.
Zurück zum Zitat Gantz, J. F., Reinsel, D., Chute, C., Schlichting, W., McArthur, J., Minton, S., et al. (2007). The expanding digital universe (White Paper): IDC. Gantz, J. F., Reinsel, D., Chute, C., Schlichting, W., McArthur, J., Minton, S., et al. (2007). The expanding digital universe (White Paper): IDC.
17.
Zurück zum Zitat Gates, A. F., Natkovich, O., Chopra, S., Kamath, P., Narayanamurthy, S. M., Olston, C., et al. (2009, Aug 24–28). Building a high-level dataflow system on top of map-reduce: The pig experience. Proceedings of the 35th International Conference on Very Large Databases (VLDB 2009), Lyon, France. Gates, A. F., Natkovich, O., Chopra, S., Kamath, P., Narayanamurthy, S. M., Olston, C., et al. (2009, Aug 24–28). Building a high-level dataflow system on top of map-reduce: The pig experience. Proceedings of the 35th International Conference on Very Large Databases (VLDB 2009), Lyon, France.
18.
Zurück zum Zitat Gokhale, M., Cohen, J., Yoo, A., & Miller, W. M. (2008). Hardware technologies for high-performance data-intensive computing. IEEE Computer, 41(4), 60–68.CrossRef Gokhale, M., Cohen, J., Yoo, A., & Miller, W. M. (2008). Hardware technologies for high-performance data-intensive computing. IEEE Computer, 41(4), 60–68.CrossRef
19.
Zurück zum Zitat Gorton, I., Greenfield, P., Szalay, A., & Williams, R. (2008). Data-intensive computing in the 21st century. IEEE Computer, 41(4), 30–32.CrossRef Gorton, I., Greenfield, P., Szalay, A., & Williams, R. (2008). Data-intensive computing in the 21st century. IEEE Computer, 41(4), 30–32.CrossRef
20.
Zurück zum Zitat Gray, J. (2008). Distributed computing economics. ACM Queue, 6(3), 63–68.CrossRef Gray, J. (2008). Distributed computing economics. ACM Queue, 6(3), 63–68.CrossRef
21.
Zurück zum Zitat Grossman, R., & Gu, Y. (2008). Data mining using high performance data clouds: Experimental studies using sector and sphere. Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, Nevada, USA. Grossman, R., & Gu, Y. (2008). Data mining using high performance data clouds: Experimental studies using sector and sphere. Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, Nevada, USA.
22.
Zurück zum Zitat Grossman, R. L., Gu, Y., Sabala, M., & Zhang, W. (2009). Compute and storage clouds using wide area high performance networks. Future Generation Computer Systems, 25(2), 179–183.CrossRef Grossman, R. L., Gu, Y., Sabala, M., & Zhang, W. (2009). Compute and storage clouds using wide area high performance networks. Future Generation Computer Systems, 25(2), 179–183.CrossRef
23.
Zurück zum Zitat Gu, Y., & Grossman, R. L. (2009). Lessons learned from a year’s worth of benchmarks of large data clouds. Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers, Portland, Oregon. Gu, Y., & Grossman, R. L. (2009). Lessons learned from a year’s worth of benchmarks of large data clouds. Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers, Portland, Oregon.
24.
Zurück zum Zitat Hellerstein, J. M. (2010). The declarative imperative. SIGMOD Record, 39(1), 5–19.CrossRef Hellerstein, J. M. (2010). The declarative imperative. SIGMOD Record, 39(1), 5–19.CrossRef
25.
Zurück zum Zitat Johnston, W. E. (1998). High-speed, wide area, data intensive computing: A ten year retrospective, Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing: IEEE Computer Society. Johnston, W. E. (1998). High-speed, wide area, data intensive computing: A ten year retrospective, Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing: IEEE Computer Society.
26.
Zurück zum Zitat Kouzes, R. T., Anderson, G. A., Elbert, S. T., Gorton, I., & Gracio, D. K. (2009). The changing paradigm of data-intensive computing. Computer, 42(1), 26–34.CrossRef Kouzes, R. T., Anderson, G. A., Elbert, S. T., Gorton, I., & Gracio, D. K. (2009). The changing paradigm of data-intensive computing. Computer, 42(1), 26–34.CrossRef
27.
Zurück zum Zitat Liu, H., & Orban, D. (2008). Gridbatch: Cloud computing for large-scale data-intensive batch applications. Proceedings of the Eighth IEEE International Symposium on Cluster Computing and the Grid, 295–305. Liu, H., & Orban, D. (2008). Gridbatch: Cloud computing for large-scale data-intensive batch applications. Proceedings of the Eighth IEEE International Symposium on Cluster Computing and the Grid, 295–305.
28.
Zurück zum Zitat Llor, X., Acs, B., Auvil, L. S., Capitanu, B., Welge, M. E., & Goldberg, D. E. (2008). Meandre: Semantic-driven data-intensive flows in the clouds. Proceedings of the Fourth IEEE International Conference on eScience, 238–245. Llor, X., Acs, B., Auvil, L. S., Capitanu, B., Welge, M. E., & Goldberg, D. E. (2008). Meandre: Semantic-driven data-intensive flows in the clouds. Proceedings of the Fourth IEEE International Conference on eScience, 238–245.
29.
Zurück zum Zitat Lyman, P., & Varian, H. R. (2003). How much information? 2003 (Research Report): School of Information Management and Systems, University of California at Berkeley. Lyman, P., & Varian, H. R. (2003). How much information? 2003 (Research Report): School of Information Management and Systems, University of California at Berkeley.
30.
Zurück zum Zitat Middleton, A. M. (2009). Data-intensive computing solutions (Whitepaper): LexisNexis. Middleton, A. M. (2009). Data-intensive computing solutions (Whitepaper): LexisNexis.
32.
Zurück zum Zitat Nyland, L. S., Prins, J. F., Goldberg, A., & Mills, P. H. (2000). A design methodology for data-parallel applications. IEEE Transactions on Software Engineering, 26(4), 293–314.CrossRef Nyland, L. S., Prins, J. F., Goldberg, A., & Mills, P. H. (2000). A design methodology for data-parallel applications. IEEE Transactions on Software Engineering, 26(4), 293–314.CrossRef
34.
Zurück zum Zitat Olston, C., Reed, B., Srivastava, U., Kumar, R., & Tomkins, A. (2008, June 9–12). Pig latin: A not-so_foreign language for data processing. Proceedings of the 28th ACM SIGMOD/PODS International Conference on Management of Data/Principles of Database Systems, Vancouver, BC, Canada, 1099–1110. Olston, C., Reed, B., Srivastava, U., Kumar, R., & Tomkins, A. (2008, June 9–12). Pig latin: A not-so_foreign language for data processing. Proceedings of the 28th ACM SIGMOD/PODS International Conference on Management of Data/Principles of Database Systems, Vancouver, BC, Canada, 1099–1110.
35.
Zurück zum Zitat Pavlo, A., Paulson, E., Rasin, A., Abadi, D. J., Dewitt, D. J., Madden, S., et al. (2009, June 29–July 2). A comparison of approaches to large-scale data analysis. Proceedings of the 35th SIGMOD international conference on Management of data, Providence, RI, 165–168. Pavlo, A., Paulson, E., Rasin, A., Abadi, D. J., Dewitt, D. J., Madden, S., et al. (2009, June 29–July 2). A comparison of approaches to large-scale data analysis. Proceedings of the 35th SIGMOD international conference on Management of data, Providence, RI, 165–168.
36.
Zurück zum Zitat Pike, R., Dorward, S., Griesemer, R., & Quinlan, S. (2004). Interpreting the data: Parallel analysis with sawzall. Scientific Programming Journal, 13(4), 227–298. Pike, R., Dorward, S., Griesemer, R., & Quinlan, S. (2004). Interpreting the data: Parallel analysis with sawzall. Scientific Programming Journal, 13(4), 227–298.
38.
Zurück zum Zitat Ravichandran, D., Pantel, P., & Hovy, E. (2004). The terascale challenge. Proceedings of the KDD Workshop on Mining for and from the Semantic Web. Ravichandran, D., Pantel, P., & Hovy, E. (2004). The terascale challenge. Proceedings of the KDD Workshop on Mining for and from the Semantic Web.
39.
Zurück zum Zitat Rencuzogullari, U., & Dwarkadas, S. (2001). Dynamic adaptation to available resources for parallel computing in an autonomous network of workstations. Proceedings of the Eighth ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming, Snowbird, UT, 72–81. Rencuzogullari, U., & Dwarkadas, S. (2001). Dynamic adaptation to available resources for parallel computing in an autonomous network of workstations. Proceedings of the Eighth ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming, Snowbird, UT, 72–81.
40.
Zurück zum Zitat Skillicorn, D. B., & Talia, D. (1998). Models and languages for parallel computation. ACM Computing Surveys, 30(2), 123–169.CrossRef Skillicorn, D. B., & Talia, D. (1998). Models and languages for parallel computation. ACM Computing Surveys, 30(2), 123–169.CrossRef
41.
Zurück zum Zitat White, T. (2009). Hadoop: The definitive guide (First ed.). Sebastopol, CA: O’Reilly Media Inc. White, T. (2009). Hadoop: The definitive guide (First ed.). Sebastopol, CA: O’Reilly Media Inc.
42.
Zurück zum Zitat Yu, Y., Gunda, P. K., & Isard, M. (2009). Distributed aggregation for data-parallel computing: Interfaces and implementations. Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, Big Sky, Montana, USA, 247–260. Yu, Y., Gunda, P. K., & Isard, M. (2009). Distributed aggregation for data-parallel computing: Interfaces and implementations. Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, Big Sky, Montana, USA, 247–260.
Metadaten
Titel
ECL/HPCC: A Unified Approach to Big Data
verfasst von
Anthony M. Middleton
David Alan Bayliss
Gavin Halliday
Copyright-Jahr
2011
Verlag
Springer New York
DOI
https://doi.org/10.1007/978-1-4614-1415-5_3