Skip to main content

2019 | OriginalPaper | Buchkapitel

Streaming Massive Electric Power Data Analysis Based on Spark Streaming

verfasst von : Xudong Zhang, Zhongwen Qian, Siqi Shen, Jia Shi, Shujun Wang

Erschienen in: Database Systems for Advanced Applications

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Electric power user classification is one of the most important methods to realize the optimal allocation of power resources. Through the analysis of users’needs, behavior and habits, Countries and enterprises can offer different incentives for different users. In this way, people are more willing to use green and clean Electric power resources. In the analysis of user clustering, there is a need for real-time processing of massive and high-speed data. In this paper we propose a novel distributed user data stream clustering method based on Spark streaming, improved clusStream algorithm and improved K-means algorithm named “DStreamEPK”. In the final experimental evaluation, we first tested the clustering effectiveness of DStreamEPK on UCI datasets, the results show that the proposed DStreamEPK is better than the traditional K-means clustering algorithm. At the same time, it is found that DStreamEPK can cluster user’s electricity data quickly and efficiently through testing on user’s real data sets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Ackermann, M.R., Märtens, M., Raupach, C., Swierkot, K., Lammersen, C., Sohler, C.: Streamkm++: a clustering algorithm for data streams. ACM J. Exp. Algorithmics 17(1), 2–4 (2012)MathSciNetMATH Ackermann, M.R., Märtens, M., Raupach, C., Swierkot, K., Lammersen, C., Sohler, C.: Streamkm++: a clustering algorithm for data streams. ACM J. Exp. Algorithmics 17(1), 2–4 (2012)MathSciNetMATH
2.
Zurück zum Zitat Bogojeska, J., Alexa, A., Altmann, A., Lengauer, T., Rahnenführer, J.: Rtreemix: an R package for estimating evolutionary pathways and genetic progression scores. Bioinformatics 24(20), 2391–2392 (2008)CrossRef Bogojeska, J., Alexa, A., Altmann, A., Lengauer, T., Rahnenführer, J.: Rtreemix: an R package for estimating evolutionary pathways and genetic progression scores. Bioinformatics 24(20), 2391–2392 (2008)CrossRef
3.
Zurück zum Zitat Chen, W., Zhou, K., Yang, S., Cheng, W.: Data quality of electricity consumption data in a smart grid environment. Renew. Sustain. Energy Rev. 75, 98–105 (2016)CrossRef Chen, W., Zhou, K., Yang, S., Cheng, W.: Data quality of electricity consumption data in a smart grid environment. Renew. Sustain. Energy Rev. 75, 98–105 (2016)CrossRef
4.
Zurück zum Zitat Freytag, J.C., Lockemann, P.C., Abiteboul, S., Carey, M.J., Selinger, P.G., Heuer, A. (eds.): VLDB 2003, Proceedings of 29th International Conference on Very Large Data Bases, 9–12 September 2003, Berlin, Germany. Morgan Kaufmann (2003) Freytag, J.C., Lockemann, P.C., Abiteboul, S., Carey, M.J., Selinger, P.G., Heuer, A. (eds.): VLDB 2003, Proceedings of 29th International Conference on Very Large Data Bases, 9–12 September 2003, Berlin, Germany. Morgan Kaufmann (2003)
5.
Zurück zum Zitat Goldbergs, G., Maier, S.W., Levick, S.R., Edwards, A.: Limitations of high resolution satellite stereo imagery for estimating canopy height in Australian tropical savannas. Int. J. Appl. Earth Obs. Geoinf. 75, 83–95 (2019)CrossRef Goldbergs, G., Maier, S.W., Levick, S.R., Edwards, A.: Limitations of high resolution satellite stereo imagery for estimating canopy height in Australian tropical savannas. Int. J. Appl. Earth Obs. Geoinf. 75, 83–95 (2019)CrossRef
6.
Zurück zum Zitat Hartigan, J.A., Wong, M.A.: Algorithm as 136: a k-means clustering algorithm. J. R. Stat. Soc. 28(1), 100–108 (1979)MATH Hartigan, J.A., Wong, M.A.: Algorithm as 136: a k-means clustering algorithm. J. R. Stat. Soc. 28(1), 100–108 (1979)MATH
7.
Zurück zum Zitat Kranen, P., Assent, I., Baldauf, C., Seidl, T.: The clustree: indexing micro-clusters for anytime stream mining. Knowl. Inf. Syst. 29(2), 249–272 (2011)CrossRef Kranen, P., Assent, I., Baldauf, C., Seidl, T.: The clustree: indexing micro-clusters for anytime stream mining. Knowl. Inf. Syst. 29(2), 249–272 (2011)CrossRef
9.
Zurück zum Zitat Wang, H.Z., Liu, K., Zhou, J., Wang, Y.F.: Pretreatment of short-term load forecasting based on k-means clustering algorithm. Computer Simulation (2016) Wang, H.Z., Liu, K., Zhou, J., Wang, Y.F.: Pretreatment of short-term load forecasting based on k-means clustering algorithm. Computer Simulation (2016)
10.
Zurück zum Zitat Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Usenix Conference on Hot Topics in Cloud Computing (2010) Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Usenix Conference on Hot Topics in Cloud Computing (2010)
11.
Zurück zum Zitat Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. SIGMOD Rec. 25(2), 103–114 (1996)CrossRef Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. SIGMOD Rec. 25(2), 103–114 (1996)CrossRef
12.
Zurück zum Zitat Zhao, W., Gong, Y.: Load curve clustering based on kernel k-means. Electr. Power Autom. Equip. (2016) Zhao, W., Gong, Y.: Load curve clustering based on kernel k-means. Electr. Power Autom. Equip. (2016)
Metadaten
Titel
Streaming Massive Electric Power Data Analysis Based on Spark Streaming
verfasst von
Xudong Zhang
Zhongwen Qian
Siqi Shen
Jia Shi
Shujun Wang
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-18590-9_14