Skip to main content
Erschienen in: International Journal of Machine Learning and Cybernetics 1/2018

23.05.2015 | Original Article

Universal knowledge discovery from big data using combined dual-cycle

verfasst von: Bin Shen

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 1/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Many people hold a vision that big data will provide big insights and have a big impact in the future. However, how to turn big data into deep insights with tremendous value still remains obscure. Here I highlight universal knowledge discovery from big data. The new concept focuses on discovering universal knowledge, which exists in the statistical analyses of big data and provides valuable insights into big data. Universal knowledge comes in different forms, e.g., universal patterns, rules, correlations, models and mechanisms. To accelerate big data assisted scientific discovery, a unified research paradigm should be built based on techniques and paradigms from related research domains, especially big data mining and complex systems science. Therefore, I propose a dual-cycle methodology with three types of cycle-driven UKD process, i.e., big-data-cycle-driven, mechanism-cycle-driven and combined-dual-cycle-driven mining. A case study is also given to illustrate the effectiveness of the proposed processes. This paper lays a foundation for the future development of universal knowledge discovery, and offers a pathway to the discovery of “treasure-trove” hidden in big data.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Literatur
2.
Zurück zum Zitat Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. In: ACM SIGMOD Record, vol 22, pp 207–216. ACM Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. In: ACM SIGMOD Record, vol 22, pp 207–216. ACM
3.
Zurück zum Zitat Akil H, Martone ME, Van Essen DC (2011) Challenges and opportunities in mining neuroscience data. Science (New York, NY) 331(6018):708CrossRef Akil H, Martone ME, Van Essen DC (2011) Challenges and opportunities in mining neuroscience data. Science (New York, NY) 331(6018):708CrossRef
4.
Zurück zum Zitat Barabási AL (2010) Bursts: the hidden patterns behind everything we do, from your e-mail to bloody crusades. Penguin Group, New York Barabási AL (2010) Bursts: the hidden patterns behind everything we do, from your e-mail to bloody crusades. Penguin Group, New York
6.
Zurück zum Zitat Bell G, Hey T, Szalay A (2009) Beyond the data deluge. Science 323(5919):1297–1298CrossRef Bell G, Hey T, Szalay A (2009) Beyond the data deluge. Science 323(5919):1297–1298CrossRef
7.
Zurück zum Zitat Bengio Y (2009) Learning deep architectures for ai. Foundations Trends Mach Learn 2(1):1–127CrossRefMATH Bengio Y (2009) Learning deep architectures for ai. Foundations Trends Mach Learn 2(1):1–127CrossRefMATH
8.
Zurück zum Zitat Brockmann D, Hufnagel L, Geisel T (2006) The scaling laws of human travel. Nature 439(7075):462–465CrossRef Brockmann D, Hufnagel L, Geisel T (2006) The scaling laws of human travel. Nature 439(7075):462–465CrossRef
9.
Zurück zum Zitat i Cancho RF, Solé RV (2001) The small world of human language. Proc R Soc Lond Ser B Biol Sci 268(1482):2261–2265 i Cancho RF, Solé RV (2001) The small world of human language. Proc R Soc Lond Ser B Biol Sci 268(1482):2261–2265
10.
Zurück zum Zitat Cao L (2012) Actionable knowledge discovery and delivery. Wiley Interdiscip Rev Data Mining Knowl Discov 2(2):149–163CrossRef Cao L (2012) Actionable knowledge discovery and delivery. Wiley Interdiscip Rev Data Mining Knowl Discov 2(2):149–163CrossRef
11.
Zurück zum Zitat Cao L, Yu S (2012) Behavior computing. Springer, Berlin Cao L, Yu S (2012) Behavior computing. Springer, Berlin
12.
Zurück zum Zitat Chen W, Chen Y, Weinberger KQ (2014) Fast flux discriminant for large-scale sparse nonlinear classification. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 621–630. ACM Chen W, Chen Y, Weinberger KQ (2014) Fast flux discriminant for large-scale sparse nonlinear classification. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 621–630. ACM
13.
Zurück zum Zitat Chesney T (2014) Networked individuals predict a community wide outcome from their local information. Decis Support Syst 57:11–21CrossRef Chesney T (2014) Networked individuals predict a community wide outcome from their local information. Decis Support Syst 57:11–21CrossRef
14.
Zurück zum Zitat Clauset A, Moore C, Newman ME (2008) Hierarchical structure and the prediction of missing links in networks. Nature 453(7191):98–101CrossRef Clauset A, Moore C, Newman ME (2008) Hierarchical structure and the prediction of missing links in networks. Nature 453(7191):98–101CrossRef
15.
Zurück zum Zitat Cohen J, Dolan B, Dunlap M, Hellerstein JM, Welton C (2009) Mad skills: new analysis practices for big data. Proc VLDB Endow 2(2):1481–1492CrossRef Cohen J, Dolan B, Dunlap M, Hellerstein JM, Welton C (2009) Mad skills: new analysis practices for big data. Proc VLDB Endow 2(2):1481–1492CrossRef
16.
Zurück zum Zitat Fan W, Bifet A (2013) Mining big data: current status, and forecast to the future. ACM SIGKDD Explor Newsl 14(2):1–5CrossRef Fan W, Bifet A (2013) Mining big data: current status, and forecast to the future. ACM SIGKDD Explor Newsl 14(2):1–5CrossRef
17.
Zurück zum Zitat Fayyad U, Piatetsky-Shapiro G, Smyth P (1996) From data mining to knowledge discovery in databases. AI Mag 17(3):37 Fayyad U, Piatetsky-Shapiro G, Smyth P (1996) From data mining to knowledge discovery in databases. AI Mag 17(3):37
18.
Zurück zum Zitat Feenstra RC, Lipsey RE, Deng H, Ma AC, Mo H (2005) World trade flows: 1962–2000. Tech. rep, National Bureau of Economic Research Feenstra RC, Lipsey RE, Deng H, Ma AC, Mo H (2005) World trade flows: 1962–2000. Tech. rep, National Bureau of Economic Research
19.
Zurück zum Zitat Gao L, Song C, Gao Z, Barabási AL, Bagrow JP, Wang D (2014) Quantifying information flow during emergencies. Scientific reports 4 Gao L, Song C, Gao Z, Barabási AL, Bagrow JP, Wang D (2014) Quantifying information flow during emergencies. Scientific reports 4
20.
Zurück zum Zitat Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, Brilliant L (2008) Detecting influenza epidemics using search engine query data. Nature 457(7232):1012–1014CrossRef Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, Brilliant L (2008) Detecting influenza epidemics using search engine query data. Nature 457(7232):1012–1014CrossRef
21.
Zurück zum Zitat Viswanathan GM (2010) Fish in lévy-flight foraging. Nature 465:1018–1019CrossRef Viswanathan GM (2010) Fish in lévy-flight foraging. Nature 465:1018–1019CrossRef
22.
Zurück zum Zitat Gonzalez MC, Hidalgo CA, Barabasi AL (2008) Understanding individual human mobility patterns. Nature 453(7196):779–782CrossRef Gonzalez MC, Hidalgo CA, Barabasi AL (2008) Understanding individual human mobility patterns. Nature 453(7196):779–782CrossRef
23.
Zurück zum Zitat Groot AD, Spiekerman JA (1969) Methodology: foundations of inference and research in the behavioral sciences. Mouton, The Hague Groot AD, Spiekerman JA (1969) Methodology: foundations of inference and research in the behavioral sciences. Mouton, The Hague
24.
25.
Zurück zum Zitat Hey AJ, Tansley S, Tolle KM et al (2009) The fourth paradigm: data-intensive scientific discovery. Microsoft Research, Washington Hey AJ, Tansley S, Tolle KM et al (2009) The fourth paradigm: data-intensive scientific discovery. Microsoft Research, Washington
27.
Zurück zum Zitat Jiawei H, Kamber M (2001) Data mining: concepts and techniques. Morgan Kaufmann, San Francisco Jiawei H, Kamber M (2001) Data mining: concepts and techniques. Morgan Kaufmann, San Francisco
28.
Zurück zum Zitat Langley P (1978) Bacon. 1: a general discovery system. In: Proceedings 2nd Biennial conference of the Canadian society for computational studies of intelligence, pp 173–180 Langley P (1978) Bacon. 1: a general discovery system. In: Proceedings 2nd Biennial conference of the Canadian society for computational studies of intelligence, pp 173–180
29.
Zurück zum Zitat Langley P (1979) Rediscovering physics with bacon. 3. In: IJCAI, pp 505–507 Langley P (1979) Rediscovering physics with bacon. 3. In: IJCAI, pp 505–507
30.
Zurück zum Zitat Lazer DM, Kennedy R, King G, Vespignani A (2014) The parable of google flu: traps in big data analysis. Science 343(6176):1203–1205CrossRef Lazer DM, Kennedy R, King G, Vespignani A (2014) The parable of google flu: traps in big data analysis. Science 343(6176):1203–1205CrossRef
31.
Zurück zum Zitat Lin J, Ryaboy D (2013) Scaling big data mining infrastructure: the twitter experience. ACM SIGKDD Explor Newsl 14(2):6–19CrossRef Lin J, Ryaboy D (2013) Scaling big data mining infrastructure: the twitter experience. ACM SIGKDD Explor Newsl 14(2):6–19CrossRef
32.
Zurück zum Zitat Liu CL, Tsai TH, Lee CH (2014) Online chinese restaurant process. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 591–600. ACM Liu CL, Tsai TH, Lee CH (2014) Online chinese restaurant process. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 591–600. ACM
33.
Zurück zum Zitat Lynch C (2008) Big data: how do your data grow? Nature 455(7209):28–29CrossRef Lynch C (2008) Big data: how do your data grow? Nature 455(7209):28–29CrossRef
35.
Zurück zum Zitat Marx V (2013) Biology: the big challenges of big data. Nature 498(7453):255–260CrossRef Marx V (2013) Biology: the big challenges of big data. Nature 498(7453):255–260CrossRef
36.
Zurück zum Zitat Mayer-Schönberger V, Cukier K (2013) Big data: a revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt, Boston Mayer-Schönberger V, Cukier K (2013) Big data: a revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt, Boston
37.
Zurück zum Zitat Milgram S (1967) The small world problem. Psychol Today 2(1):60–67 Milgram S (1967) The small world problem. Psychol Today 2(1):60–67
38.
Zurück zum Zitat Nordhausen B, Langley P (1993) An integrated framework for empirical discovery. Mach Learn 12(1–3):17–47 Nordhausen B, Langley P (1993) An integrated framework for empirical discovery. Mach Learn 12(1–3):17–47
39.
Zurück zum Zitat Peng C, Jin X, Wong KC, Shi M, Liò P (2012) Collective human mobility pattern from taxi trips in urban area. PloS one 7(4):e34–487 Peng C, Jin X, Wong KC, Shi M, Liò P (2012) Collective human mobility pattern from taxi trips in urban area. PloS one 7(4):e34–487
40.
Zurück zum Zitat Pulvermüller F (2002) The neuroscience of language: on brain circuits of words and serial order. Cambridge University Press, Cambridge Pulvermüller F (2002) The neuroscience of language: on brain circuits of words and serial order. Cambridge University Press, Cambridge
41.
Zurück zum Zitat Qian X (1981) A rediscussion on the system of system science. Syst Eng Theory Prac 1:1–3 Qian X (1981) A rediscussion on the system of system science. Syst Eng Theory Prac 1:1–3
42.
Zurück zum Zitat Qian X, Yu J, Dai R (1990) A new field of science: open complex giant system and its methodology. Nature Mag China 13:3–10 Qian X, Yu J, Dai R (1990) A new field of science: open complex giant system and its methodology. Nature Mag China 13:3–10
43.
Zurück zum Zitat Rajaraman A, Ullman JD (2011) Mining of massive datasets. Cambridge University Press, Cambridge Rajaraman A, Ullman JD (2011) Mining of massive datasets. Cambridge University Press, Cambridge
44.
Zurück zum Zitat Shawn J (2014) Why “big data” is a big deal: information science promises to change the world. Harv Mag. Harvardmagazine.com Shawn J (2014) Why “big data” is a big deal: information science promises to change the world. Harv Mag. Harvardmagazine.com
45.
Zurück zum Zitat Shen B (2014) A comparative study and an integration of research paradigms of complex networks and data mining. Complex Syst Complex Sci 11(1):48–52 Shen B (2014) A comparative study and an integration of research paradigms of complex networks and data mining. Complex Syst Complex Sci 11(1):48–52
47.
Zurück zum Zitat Shen B, Yao M, Wu Z, Gao Y (2010) Mining dynamic association rules with comments. Knowl Info Syst 23(1):73–98CrossRef Shen B, Yao M, Wu Z, Gao Y (2010) Mining dynamic association rules with comments. Knowl Info Syst 23(1):73–98CrossRef
49.
Zurück zum Zitat Tang C, Zhang Y, Tang L, Li C, Chen Y (2008) A survey on mining kinetic intervention rule from sub-complex systems. J Comput Appl 28(11):2732–2736MATH Tang C, Zhang Y, Tang L, Li C, Chen Y (2008) A survey on mining kinetic intervention rule from sub-complex systems. J Comput Appl 28(11):2732–2736MATH
50.
Zurück zum Zitat Wang B, Zhou T, Zhou C (2012) Statistical physics research for human behaviors, complex networks and information mining. J Univ Shanghai Sci Technol 34(2):103–117 Wang B, Zhou T, Zhou C (2012) Statistical physics research for human behaviors, complex networks and information mining. J Univ Shanghai Sci Technol 34(2):103–117
51.
Zurück zum Zitat Watts DJ, Strogatz SH (1998) Collective dynamics of ’small-world’ networks. Nature 393(6684):440–442 Watts DJ, Strogatz SH (1998) Collective dynamics of ’small-world’ networks. Nature 393(6684):440–442
52.
Zurück zum Zitat Wu WT (1994) Mechanical theorem proving in geometries: basic principles. Springer, Berlin Wu WT (1994) Mechanical theorem proving in geometries: basic principles. Springer, Berlin
53.
Zurück zum Zitat Xu G, Gu J, Che H (2000) System science. Shanghai scientific and technological education press, Shanghai Xu G, Gu J, Che H (2000) System science. Shanghai scientific and technological education press, Shanghai
54.
Zurück zum Zitat Yan XY, Han XP, Wang BH, Zhou T (2013) Diversity of individual mobility patterns and emergence of aggregated scaling laws. Scientific reports 3 Yan XY, Han XP, Wang BH, Zhou T (2013) Diversity of individual mobility patterns and emergence of aggregated scaling laws. Scientific reports 3
55.
Zurück zum Zitat Yang Q, Wu X (2006) 10 challenging problems in data mining research. Int J Info Tech Decis Mak 5(04):597–604CrossRef Yang Q, Wu X (2006) 10 challenging problems in data mining research. Int J Info Tech Decis Mak 5(04):597–604CrossRef
56.
Zurück zum Zitat Zhang F, Wilkie D, Zheng Y, Xie X (2013) Sensing the pulse of urban refueling behavior. In: Proceedings of the 2013 ACM international joint conference on Pervasive and ubiquitous computing, pp 13–22. ACM Zhang F, Wilkie D, Zheng Y, Xie X (2013) Sensing the pulse of urban refueling behavior. In: Proceedings of the 2013 ACM international joint conference on Pervasive and ubiquitous computing, pp 13–22. ACM
58.
Zurück zum Zitat Zhu YX, Huang J, Zhang ZK, Zhang QM, Zhou T, Ahn YY (2013) Geography and similarity of regional cuisines in china. PloS One 8(11):e79–161 Zhu YX, Huang J, Zhang ZK, Zhang QM, Zhou T, Ahn YY (2013) Geography and similarity of regional cuisines in china. PloS One 8(11):e79–161
59.
Zurück zum Zitat Zikopoulos P, Eaton C et al. (2011) Understanding big data: analytics for enterprise class hadoop and streaming data. McGraw-Hill Osborne Media, New York Zikopoulos P, Eaton C et al. (2011) Understanding big data: analytics for enterprise class hadoop and streaming data. McGraw-Hill Osborne Media, New York
60.
Zurück zum Zitat Zytkow JM, Simon HA (1986) A theory of historical discovery: the construction of componential models. Mach Learn 1(1):107–137 Zytkow JM, Simon HA (1986) A theory of historical discovery: the construction of componential models. Mach Learn 1(1):107–137
61.
Zurück zum Zitat Zytkow JM, Zhu J, Hussam A (1990) Automated discovery in a chemistry laboratory. In: AAAI, pp 889–894 Zytkow JM, Zhu J, Hussam A (1990) Automated discovery in a chemistry laboratory. In: AAAI, pp 889–894
Metadaten
Titel
Universal knowledge discovery from big data using combined dual-cycle
verfasst von
Bin Shen
Publikationsdatum
23.05.2015
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal of Machine Learning and Cybernetics / Ausgabe 1/2018
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-015-0376-z

Weitere Artikel der Ausgabe 1/2018

International Journal of Machine Learning and Cybernetics 1/2018 Zur Ausgabe

Neuer Inhalt