Skip to main content
Top
Published in: The Journal of Supercomputing 1/2021

30-03-2020

DiffPageRank: an efficient differential PageRank approach in MapReduce

Authors: Maryam Nooraei Abadeh, Mansooreh Mirzaie

Published in: The Journal of Supercomputing | Issue 1/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Unstructured big data processing requires efficient computational styles to rapidly analyze continuously changing data. Incremental processing is a promising technique to update search results without re-computing the whole process. One of the most important mining algorithms is PageRank which necessitates incremental processing and iterative computations to keep the mining results up-to-date. In this paper, a light and efficient PageRank algorithm is proposed using differential dataflow in the MapReduce programming style named DiffPageRank. The innovation of the proposed approach is investigated from two points of view: (1) the updating state is dependent on a partial order of changes in DiffPageRank computing, while the changes are applied based on complete order sequences in the standard incremental computing; and (2) in DiffPageRank, a set of updates rebuilds each version, while in the incremental systems, every single update continually bring up-to-date the current version state. DiffPageRank is compared with two other implementations of PageRank including standard and incremental PageRank in MapReduce in terms of processing time, CPU utilization and update speed of data mining results. The results show that if the changes on the input data are large, the efficiency of the differential method will considerably increase in terms of time than the standard and incremental PageRank approaches up to 61% and 86%, respectively. DiffPageRank is also compared with two state-of-the-art incremental graph processing benchmarks on Orkut and Twitter datasets in terms of execution time and the number of updates. The experimental comparisons on benchmark datasets show that our method reduces the number of updates and total execution time by using the differential MapReduce approach.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1–7):107–117CrossRef Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1–7):107–117CrossRef
2.
go back to reference White T (2012) Hadoop: the definitive guide. O’Reilly Media, Inc., Sebastopol White T (2012) Hadoop: the definitive guide. O’Reilly Media, Inc., Sebastopol
3.
go back to reference Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST). IEEE, pp 1–10 Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST). IEEE, pp 1–10
4.
go back to reference Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRef Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRef
5.
go back to reference Maleki N, Rahmani AM, Conti M (2019) MapReduce: an infrastructure review and research insights. J Supercomput 75(10):6934–7002CrossRef Maleki N, Rahmani AM, Conti M (2019) MapReduce: an infrastructure review and research insights. J Supercomput 75(10):6934–7002CrossRef
6.
go back to reference Gupta D, Rani R (2018) A study of big data evolution and research challenges. J Inf Sci 45:322–340CrossRef Gupta D, Rani R (2018) A study of big data evolution and research challenges. J Inf Sci 45:322–340CrossRef
7.
go back to reference Talan PP, Sharma KU, Nawade PP, Talan KP (2019) An overview of Hadoop MapReduce, spark, and scalable graph processing architecture. In: Kalita J, Balas VE, Borah S, Pradhan R (eds) Recent developments in machine learning and data analytics. Springer, Berlin, pp 35–42CrossRef Talan PP, Sharma KU, Nawade PP, Talan KP (2019) An overview of Hadoop MapReduce, spark, and scalable graph processing architecture. In: Kalita J, Balas VE, Borah S, Pradhan R (eds) Recent developments in machine learning and data analytics. Springer, Berlin, pp 35–42CrossRef
8.
go back to reference Kwak H, Lee C, Park H, Moon S (2010) What is Twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web. ACM, pp 591–600 Kwak H, Lee C, Park H, Moon S (2010) What is Twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web. ACM, pp 591–600
9.
go back to reference Zhang Y, Chen S, Wang Q, Yu G (2016) i2MapReduce: incremental MapReduce for mining evolving big data. In: 2016 IEEE 32nd International Conference on Data Engineering (ICDE). IEEE, pp 1482–1483 Zhang Y, Chen S, Wang Q, Yu G (2016) i2MapReduce: incremental MapReduce for mining evolving big data. In: 2016 IEEE 32nd International Conference on Data Engineering (ICDE). IEEE, pp 1482–1483
10.
go back to reference McSherry F, Murray DG, Isaacs R, Isard M (2013) Differential dataflow. In: CIDR McSherry F, Murray DG, Isaacs R, Isard M (2013) Differential dataflow. In: CIDR
11.
go back to reference Bhawiyuga A, Kirana AP (2016) Implementation of page rank algorithm in Hadoop MapReduce framework. In: 2016 International Seminar on Intelligent Technology and Its Applications (ISITIA). IEEE, pp 231–236 Bhawiyuga A, Kirana AP (2016) Implementation of page rank algorithm in Hadoop MapReduce framework. In: 2016 International Seminar on Intelligent Technology and Its Applications (ISITIA). IEEE, pp 231–236
12.
go back to reference Murray DG, McSherry F, Isaacs R, Isard M, Barham P, Abadi M (2013) Naiad: a timely dataflow system. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. ACM, pp 439–455 Murray DG, McSherry F, Isaacs R, Isard M, Barham P, Abadi M (2013) Naiad: a timely dataflow system. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. ACM, pp 439–455
13.
go back to reference Murray DG, McSherry F, Isard M, Isaacs R, Barham P, Abadi M (2016) Incremental, iterative data processing with timely dataflow. Commun ACM 59(10):75–83CrossRef Murray DG, McSherry F, Isard M, Isaacs R, Barham P, Abadi M (2016) Incremental, iterative data processing with timely dataflow. Commun ACM 59(10):75–83CrossRef
14.
go back to reference Pasquinelli M (2009) Google’s PageRank algorithm: a diagram of cognitive capitalism and the rentier of the common intellect. In: Becker K, Stalder F (eds) Deep search: the politics of search beyond Google. Studien Verlag, Innsbruck, pp 152–163 Pasquinelli M (2009) Google’s PageRank algorithm: a diagram of cognitive capitalism and the rentier of the common intellect. In: Becker K, Stalder F (eds) Deep search: the politics of search beyond Google. Studien Verlag, Innsbruck, pp 152–163
15.
go back to reference Cauwenberghs G, Poggio T (2001) Incremental and decremental support vector machine learning. In: Advances in Neural Information Processing Systems, pp 409–415 Cauwenberghs G, Poggio T (2001) Incremental and decremental support vector machine learning. In: Advances in Neural Information Processing Systems, pp 409–415
16.
go back to reference Peng D, Dabek F (2010) Large-scale incremental processing using distributed transactions and notifications. In: OSDI, vol 10, pp 1–15 Peng D, Dabek F (2010) Large-scale incremental processing using distributed transactions and notifications. In: OSDI, vol 10, pp 1–15
17.
go back to reference Popa L, Budiu M, Yu Y, Isard M (2009) DryadInc: reusing work in large-scale computations. HotCloud 9:2–6 Popa L, Budiu M, Yu Y, Isard M (2009) DryadInc: reusing work in large-scale computations. HotCloud 9:2–6
18.
go back to reference Logothetis D, Olston C, Reed B, Webb KC, Yocum K (2010) Stateful bulk processing for incremental analytics. In: Proceedings of the 1st ACM Symposium on Cloud Computing, pp 51–62 Logothetis D, Olston C, Reed B, Webb KC, Yocum K (2010) Stateful bulk processing for incremental analytics. In: Proceedings of the 1st ACM Symposium on Cloud Computing, pp 51–62
19.
go back to reference Lee D, Kim J-S, Maeng S (2014) Large-scale incremental processing with MapReduce. Future Gener Comput Syst 36:66–79CrossRef Lee D, Kim J-S, Maeng S (2014) Large-scale incremental processing with MapReduce. Future Gener Comput Syst 36:66–79CrossRef
20.
go back to reference Zhang Y, Chen S (2013) i2MapReduce: incremental iterative MapReduce. In: Proceedings of the 2nd International Workshop on Cloud Intelligence, pp 1–4 Zhang Y, Chen S (2013) i2MapReduce: incremental iterative MapReduce. In: Proceedings of the 2nd International Workshop on Cloud Intelligence, pp 1–4
21.
go back to reference Jörg T, Parvizi R, Yong H, Dessloch S (2011) Incremental recomputations in mapreduce. In: Proceedings of the Third International Workshop on Cloud Data Management, pp 7–14 Jörg T, Parvizi R, Yong H, Dessloch S (2011) Incremental recomputations in mapreduce. In: Proceedings of the Third International Workshop on Cloud Data Management, pp 7–14
22.
go back to reference Saadon AGB, Mokhtar HM (2019) Survey on iterative and incremental approaches in distributed computing environment. Int J Data Sci 4(1):18–30CrossRef Saadon AGB, Mokhtar HM (2019) Survey on iterative and incremental approaches in distributed computing environment. Int J Data Sci 4(1):18–30CrossRef
23.
go back to reference Bhatotia P, Wieder A, Rodrigues R, Acar UA, Pasquin, R (2011) Incoop: MapReduce for incremental computations. In: Proceedings of the 2nd ACM Symposium on Cloud Computing. ACM, p 7 Bhatotia P, Wieder A, Rodrigues R, Acar UA, Pasquin, R (2011) Incoop: MapReduce for incremental computations. In: Proceedings of the 2nd ACM Symposium on Cloud Computing. ACM, p 7
25.
go back to reference McSherry FD, Isaacs R, Isard MA, Murray DG (2015) Differential dataflow, ed: Google Patents McSherry FD, Isaacs R, Isard MA, Murray DG (2015) Differential dataflow, ed: Google Patents
26.
go back to reference Cheng R et al (2012) Kineograph: taking the pulse of a fast-changing and connected world. In: Proceedings of the 7th ACM European Conference on Computer Systems, pp 85–98 Cheng R et al (2012) Kineograph: taking the pulse of a fast-changing and connected world. In: Proceedings of the 7th ACM European Conference on Computer Systems, pp 85–98
27.
go back to reference Yin J, Gao L (2016) Asynchronous distributed incremental computation on evolving graphs. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, pp 722–738 Yin J, Gao L (2016) Asynchronous distributed incremental computation on evolving graphs. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, pp 722–738
28.
go back to reference Lv X, Xiao W, Zhang Y, Liao X, Jin H, Hua Q (2019) An effective framework for asynchronous incremental graph processing. Front Comput Sci 13(3):539–551CrossRef Lv X, Xiao W, Zhang Y, Liao X, Jin H, Hua Q (2019) An effective framework for asynchronous incremental graph processing. Front Comput Sci 13(3):539–551CrossRef
29.
go back to reference Park S, Lee W, Choe B, Lee S-G (2019) A survey on personalized PageRank computation algorithms. IEEE Access 7:163049–163062CrossRef Park S, Lee W, Choe B, Lee S-G (2019) A survey on personalized PageRank computation algorithms. IEEE Access 7:163049–163062CrossRef
30.
go back to reference Bahmani B, Chowdhury A, Goel A (2010) Fast incremental and personalized pagerank. Proc VLDB Endow 4(3):173–184CrossRef Bahmani B, Chowdhury A, Goel A (2010) Fast incremental and personalized pagerank. Proc VLDB Endow 4(3):173–184CrossRef
31.
go back to reference Abdullah IB (2010) Incremental pagerank for twitter data using hadoop. Technical paper Abdullah IB (2010) Incremental pagerank for twitter data using hadoop. Technical paper
32.
go back to reference Desikan P, Pathak N, Srivastava J, Kumar V (2005) Incremental page rank computation on evolving graphs. In: Special Interest Tracks and Posters of the 14th International Conference on World Wide Web. ACM, pp 1094–1095 Desikan P, Pathak N, Srivastava J, Kumar V (2005) Incremental page rank computation on evolving graphs. In: Special Interest Tracks and Posters of the 14th International Conference on World Wide Web. ACM, pp 1094–1095
33.
go back to reference Kim KS, Choi YS (2015) Incremental iteration method for fast pagerank computation. In: Proceedings of the 9th International Conference on Ubiquitous Information Management and Communication. ACM, p 80 Kim KS, Choi YS (2015) Incremental iteration method for fast pagerank computation. In: Proceedings of the 9th International Conference on Ubiquitous Information Management and Communication. ACM, p 80
34.
go back to reference Lin W (2019) Distributed algorithms for fully personalized pagerank on large graphs. In: The World Wide Web Conference, pp 1084–1094 Lin W (2019) Distributed algorithms for fully personalized pagerank on large graphs. In: The World Wide Web Conference, pp 1084–1094
35.
go back to reference Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B (1998) Support vector machines. IEEE Intell Syst Appl 13(4):18–28CrossRef Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B (1998) Support vector machines. IEEE Intell Syst Appl 13(4):18–28CrossRef
Metadata
Title
DiffPageRank: an efficient differential PageRank approach in MapReduce
Authors
Maryam Nooraei Abadeh
Mansooreh Mirzaie
Publication date
30-03-2020
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 1/2021
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-020-03265-3

Other articles of this Issue 1/2021

The Journal of Supercomputing 1/2021 Go to the issue

Premium Partner