Skip to main content
Erschienen in: Wireless Personal Communications 4/2018

22.02.2018

Research on Attribute Dimension Partition Based on SVM Classifying and MapReduce

verfasst von: Wenbin Zhao, Tongrang Fan, Yongchuan Nie, Feng Wu, Hou Wen

Erschienen in: Wireless Personal Communications | Ausgabe 4/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The data analysis is closely related to data attribute dimension. The traditional extraction and partition of data attribute dimension is so manual and inefficiency as to not meet the needs of analysing big data. This paper proposed an attribute dimension partition scheme based on SVM classifying and MapReduce for analysing big data. This scheme improve traditional SVM classifying method by combining Euclidean distance theory for overcoming its disadvantages, and adopts punish coefficient to reduce the unbalance of data distribution. With the improved SVM classifying method, the implementation of attribute dimension partition take MapReduce model of Hadoop as process engine, use TF–IDF vector to save the extracted attribute dimension, and use k-means clustering algorithm to clustering partition. The experiment result shows that the execution efficiency of the proposed method is enhanced, and while the rationality of partition is guaranteed, the increasing of data attributes does not significantly increase the execution time.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Wang, H., Qin, X., Wang, S., et al. (2015). Scalable OLAP queries processing towards large cluster. Chinese Journal of Computer, 38(1), 45–58.MathSciNet Wang, H., Qin, X., Wang, S., et al. (2015). Scalable OLAP queries processing towards large cluster. Chinese Journal of Computer, 38(1), 45–58.MathSciNet
2.
Zurück zum Zitat van der Aalst, W. M. (2013). Process cubes: Slicing, dicing, rolling up and drilling down event data for process mining. Lecture Notes in Business Information Processing, 159, 1–22.CrossRef van der Aalst, W. M. (2013). Process cubes: Slicing, dicing, rolling up and drilling down event data for process mining. Lecture Notes in Business Information Processing, 159, 1–22.CrossRef
3.
Zurück zum Zitat Huser, V. (2012). Process mining: Discovery, conformance and enhancement of business processes. Journal of Biomedical Informatics, 45(5), 1018–1019.MathSciNetCrossRef Huser, V. (2012). Process mining: Discovery, conformance and enhancement of business processes. Journal of Biomedical Informatics, 45(5), 1018–1019.MathSciNetCrossRef
4.
Zurück zum Zitat Archana, S., & Elangovan, K. (2014). Survey of classification techniques in data mining. International Journal of Computer Science and Mobile Applications, 2(2), 65–71. Archana, S., & Elangovan, K. (2014). Survey of classification techniques in data mining. International Journal of Computer Science and Mobile Applications, 2(2), 65–71.
5.
Zurück zum Zitat Wu, H. C., Luk, R. W. P., Wong, K. F., et al. (2008). Interpreting TF–IDF term weights as making relevance decisions. ACM Transactions on Information Systems, 26(3), 55–59.CrossRef Wu, H. C., Luk, R. W. P., Wong, K. F., et al. (2008). Interpreting TF–IDF term weights as making relevance decisions. ACM Transactions on Information Systems, 26(3), 55–59.CrossRef
6.
Zurück zum Zitat Patil, T. R., & Sherekar, S. (2013). Performance analysis of naive Bayes and J48 classification algorithm for data classification. International Journal of Computer Science and Applications, 6(2), 256–261. Patil, T. R., & Sherekar, S. (2013). Performance analysis of naive Bayes and J48 classification algorithm for data classification. International Journal of Computer Science and Applications, 6(2), 256–261.
7.
Zurück zum Zitat Abeen, F., Khusro, S., Majid, A., et al. (2014). Semantics discovery in social tagging systems: a review. Multimedia Tools and Applications, 75(1), 1–33. Abeen, F., Khusro, S., Majid, A., et al. (2014). Semantics discovery in social tagging systems: a review. Multimedia Tools and Applications, 75(1), 1–33.
8.
Zurück zum Zitat Askan, A., & Saym, S. (2014). SVM classification for imbalanced data sets using a multiobjective optimization framework. Annals of Operations Research, 216(1), 191–203.MathSciNetCrossRefMATH Askan, A., & Saym, S. (2014). SVM classification for imbalanced data sets using a multiobjective optimization framework. Annals of Operations Research, 216(1), 191–203.MathSciNetCrossRefMATH
9.
Zurück zum Zitat Bijalwan, V., Kumar, V., Kumar, P., et al. (2014). KNN based machine learning approach for text and document mining. International Journal of Database Theory and Application, 7(1), 61–70.CrossRef Bijalwan, V., Kumar, V., Kumar, P., et al. (2014). KNN based machine learning approach for text and document mining. International Journal of Database Theory and Application, 7(1), 61–70.CrossRef
10.
Zurück zum Zitat Yoo, J. Y., & Yang, D. (2015). Classification scheme of unstructured text document using TF–IDF and naive Bayes classifier. Advanced Science and Technology Letters, 111(50), 263–266.CrossRef Yoo, J. Y., & Yang, D. (2015). Classification scheme of unstructured text document using TF–IDF and naive Bayes classifier. Advanced Science and Technology Letters, 111(50), 263–266.CrossRef
11.
Zurück zum Zitat Annasaheb, A. B., & Verma, V. K. (2016). Data mining classification techniques: A recent survey. International Journal of Emerging Technologies in Engineering Research, 4(8), 51–54. Annasaheb, A. B., & Verma, V. K. (2016). Data mining classification techniques: A recent survey. International Journal of Emerging Technologies in Engineering Research, 4(8), 51–54.
12.
Zurück zum Zitat Akinola, S. O., & Oyabugbe, O. J. (2015). Accuracies and training times of data mining classification algorithms: An empirical comparative study. Journal of Software Engineering and Applications, 8(9), 470–477.CrossRef Akinola, S. O., & Oyabugbe, O. J. (2015). Accuracies and training times of data mining classification algorithms: An empirical comparative study. Journal of Software Engineering and Applications, 8(9), 470–477.CrossRef
13.
Zurück zum Zitat Sujatha, R., & Ezhilmaran, D. (2016). Performance analysis of data mining classification techniques for chronic kidney disease. International Journal of Pharmacy and Technology, 8(2), 12032–13037. Sujatha, R., & Ezhilmaran, D. (2016). Performance analysis of data mining classification techniques for chronic kidney disease. International Journal of Pharmacy and Technology, 8(2), 12032–13037.
14.
Zurück zum Zitat Subaira, A. S., & Anitha, P. (2013). Efficient classification mechanism for network intrusion detection system based on data mining techniques: A survey. International Journal of Computer Science and Mobile Computing, 2(10), 274–280. Subaira, A. S., & Anitha, P. (2013). Efficient classification mechanism for network intrusion detection system based on data mining techniques: A survey. International Journal of Computer Science and Mobile Computing, 2(10), 274–280.
15.
Zurück zum Zitat Perveen, S., Shahbaz, M., Guergachi, A., & Keshavjee, K. (2016). Performance analysis of data mining classification techniques to predict diabetes. Procedia Computer Science, 82, 115–121.CrossRef Perveen, S., Shahbaz, M., Guergachi, A., & Keshavjee, K. (2016). Performance analysis of data mining classification techniques to predict diabetes. Procedia Computer Science, 82, 115–121.CrossRef
16.
Zurück zum Zitat Mateus, R. C., Siqueira, T. L. L., Times, V. C., et al. (2016). Spatial data warehouses and spatial OLAP come towards the cloud: Design and performance. Distributed and Parallel Databases, 34(3), 425–461.CrossRef Mateus, R. C., Siqueira, T. L. L., Times, V. C., et al. (2016). Spatial data warehouses and spatial OLAP come towards the cloud: Design and performance. Distributed and Parallel Databases, 34(3), 425–461.CrossRef
17.
Zurück zum Zitat Zhao, W., & Zhao, Z. (2012). Research on engineering software data formats conversion network. Journal of Software, 7(11), 2606–2613.CrossRef Zhao, W., & Zhao, Z. (2012). Research on engineering software data formats conversion network. Journal of Software, 7(11), 2606–2613.CrossRef
18.
Zurück zum Zitat Beheshti, S. M. R., & Benatallah, B. (2016). Scalable graph-based OLAP analytics over process execution data. Distributed and Parallel Databases, 34(3), 379–423.CrossRef Beheshti, S. M. R., & Benatallah, B. (2016). Scalable graph-based OLAP analytics over process execution data. Distributed and Parallel Databases, 34(3), 379–423.CrossRef
19.
Zurück zum Zitat Pokorny, J. (2013). NoSQL databases: A step to database scalability in web environment. International Journal of Web Information Systems, 9(1), 278–283.CrossRef Pokorny, J. (2013). NoSQL databases: A step to database scalability in web environment. International Journal of Web Information Systems, 9(1), 278–283.CrossRef
20.
Zurück zum Zitat Nikhil, N., & Kulkarni, R. B. (2015). Appraisal management system using data mining classification technique. International Journal of Computer Applications, 136(12), 45–58. Nikhil, N., & Kulkarni, R. B. (2015). Appraisal management system using data mining classification technique. International Journal of Computer Applications, 136(12), 45–58.
21.
Zurück zum Zitat Zhao, W., Fan, T., & Wang, H. (2017). Research on data security mechanism among cloud services based on software define network. International Journal of Security and its Application, 11(1), 35–44.CrossRef Zhao, W., Fan, T., & Wang, H. (2017). Research on data security mechanism among cloud services based on software define network. International Journal of Security and its Application, 11(1), 35–44.CrossRef
22.
Zurück zum Zitat Suma, V. R., Renjith, S., Ashok, S., & Judy, M. V. (2016). Analytical study of selected classification algorithms for clinical dataset. Indian Journal of Science and Technology, 9(11), 1–9.CrossRef Suma, V. R., Renjith, S., Ashok, S., & Judy, M. V. (2016). Analytical study of selected classification algorithms for clinical dataset. Indian Journal of Science and Technology, 9(11), 1–9.CrossRef
Metadaten
Titel
Research on Attribute Dimension Partition Based on SVM Classifying and MapReduce
verfasst von
Wenbin Zhao
Tongrang Fan
Yongchuan Nie
Feng Wu
Hou Wen
Publikationsdatum
22.02.2018
Verlag
Springer US
Erschienen in
Wireless Personal Communications / Ausgabe 4/2018
Print ISSN: 0929-6212
Elektronische ISSN: 1572-834X
DOI
https://doi.org/10.1007/s11277-018-5301-9

Weitere Artikel der Ausgabe 4/2018

Wireless Personal Communications 4/2018 Zur Ausgabe

Neuer Inhalt