Skip to main content
Erschienen in: Cluster Computing 4/2016

01.12.2016

A fast parallel re-computation with redundancy mechanism for parallel digital terrain analysis

verfasst von: Wanfeng Dou, Shoushuai Miao

Erschienen in: Cluster Computing | Ausgabe 4/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

According to many published literature, parallel computing is regarded as an efficient solution in digital terrain analysis (DTA) of geographic information system. The stable and credible services play an irreplaceable role in the high performance computing, especially when an error occurs in large-scale science computing. In this paper, a new approach for the parallel DTA considering the performance of fault-tolerance was proposed: fast parallel re-computation (FPR). FPR owns a fast self-recovery ability based on redundancy mechanisms compared to other fault-tolerant methods. Once some errors in application layers are detected, the data block having computation errors is further partitioned into several sub-blocks, which are re-computed by the surviving processes concurrently to improve the efficiency of failure recovery. The overlapping strategy of error detection and re-computation is presented through decomposing the data block into several logic sub-blocks. As a result, when an error of a logical sub-block of the data block is detected by a comparing thread the re-computing process immediately starts to correct the error. This strategy reduces the time of re-computation and error detection by overlapping them comparing the traditional re-computation method. The experiments show that the proposed FPR method can achieve better performance efficiency with fewer overhead.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Song, X., Tang, G., Li, F., et al.: Extraction of loess shoulder-line based on the parallel GVF snake model in the Loess hilly area of China. J. Comput. Geosci. 52(1), 11–20 (2013)CrossRef Song, X., Tang, G., Li, F., et al.: Extraction of loess shoulder-line based on the parallel GVF snake model in the Loess hilly area of China. J. Comput. Geosci. 52(1), 11–20 (2013)CrossRef
2.
Zurück zum Zitat Group, W., Lusk, E.: Fault tolerance in MPI programs. Spec. Issue J. High Perform. Comput. Appl. 18, 363–372 (2002)CrossRef Group, W., Lusk, E.: Fault tolerance in MPI programs. Spec. Issue J. High Perform. Comput. Appl. 18, 363–372 (2002)CrossRef
3.
Zurück zum Zitat Cauchi-Saunders, A., Lewis, I.: GPU enabled XDraw viewshed analysis. Int. J. Parallel Distrib. Comput. 84(7), 87–93 (2015)CrossRef Cauchi-Saunders, A., Lewis, I.: GPU enabled XDraw viewshed analysis. Int. J. Parallel Distrib. Comput. 84(7), 87–93 (2015)CrossRef
4.
Zurück zum Zitat Gomez, L., Maruyama, N., Cappello, F., Matsuoka, S.: Distributed diskless checkpoint for large scale systems. In: 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (CCGrid), Melbourne, Victoria, 17–20 May, pp. 63–72 (2010) Gomez, L., Maruyama, N., Cappello, F., Matsuoka, S.: Distributed diskless checkpoint for large scale systems. In: 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (CCGrid), Melbourne, Victoria, 17–20 May, pp. 63–72 (2010)
5.
Zurück zum Zitat Li, Y., Lan, Z.: A fast restart mechanism for checkpoint recovery protocols in networked environments. In: IEEE International Conference on Dependable System and Networks, pp. 217–226 (2008) Li, Y., Lan, Z.: A fast restart mechanism for checkpoint recovery protocols in networked environments. In: IEEE International Conference on Dependable System and Networks, pp. 217–226 (2008)
6.
Zurück zum Zitat Rao, S., Alvisi, L., Vin, H.: Egida: an extensible toolkit for low-overhead fault-tolerance. In: IEEE Fault-Tolerant Computing Symposium (FTCS-29), Madison, WI, June, pp. 48–55 (1999) Rao, S., Alvisi, L., Vin, H.: Egida: an extensible toolkit for low-overhead fault-tolerance. In: IEEE Fault-Tolerant Computing Symposium (FTCS-29), Madison, WI, June, pp. 48–55 (1999)
7.
Zurück zum Zitat Patel, J., Fung, L.: Concurrent error detection in ALUs by re-computing with shifted operands. IEEE Trans. Comput. C.31(7), 589–595 (1982)CrossRefMATH Patel, J., Fung, L.: Concurrent error detection in ALUs by re-computing with shifted operands. IEEE Trans. Comput. C.31(7), 589–595 (1982)CrossRefMATH
8.
Zurück zum Zitat Mozaffari-Kerimani, M., Manoharan, N., Azarderakhsh, R.: Reliable radix-4 complex division for fault-sensitive applications. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 34(4), 656–667 (2015) Mozaffari-Kerimani, M., Manoharan, N., Azarderakhsh, R.: Reliable radix-4 complex division for fault-sensitive applications. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 34(4), 656–667 (2015)
9.
Zurück zum Zitat Mozaffari-Kerimani, M., Manoharan, N., Azarderakhsh, R.: Efficient error detection architectures for CORDIC through recomputing with encoded operands. In: IEEE International Symposium on Circuits and Systems (ISCAS), pp. 2154–2157 (2016) Mozaffari-Kerimani, M., Manoharan, N., Azarderakhsh, R.: Efficient error detection architectures for CORDIC through recomputing with encoded operands. In: IEEE International Symposium on Circuits and Systems (ISCAS), pp. 2154–2157 (2016)
10.
Zurück zum Zitat Yang, X., Du, Y., Wang, P., et al.: The fault tolerant parallel algorithm: the parallel re-computing based failure recovery. In: 16th International Conference on Parallel Architecture and Compilation Techniques (PACT), Brasov, Romania, 15–19 September, pp. 199–209 (2007) Yang, X., Du, Y., Wang, P., et al.: The fault tolerant parallel algorithm: the parallel re-computing based failure recovery. In: 16th International Conference on Parallel Architecture and Compilation Techniques (PACT), Brasov, Romania, 15–19 September, pp. 199–209 (2007)
11.
Zurück zum Zitat Du, Y., Tang, Y., Xie, X.: A new parallel recomputing code design methodology for fast failure recovery. J. Comput. Electr. Eng. 39(4), 1095–1113 (2013)CrossRef Du, Y., Tang, Y., Xie, X.: A new parallel recomputing code design methodology for fast failure recovery. J. Comput. Electr. Eng. 39(4), 1095–1113 (2013)CrossRef
12.
Zurück zum Zitat Evans, J.: Fault Tolerance in Hadoop for Work Migration. Technical Report CSCI B534, November. Indiana University (2011) Evans, J.: Fault Tolerance in Hadoop for Work Migration. Technical Report CSCI B534, November. Indiana University (2011)
13.
Zurück zum Zitat Goiri, I., Julia, F., Guitart, J., Torres, J.: Checkpoint-based fault-tolerance infrastructure for virtualized service providers. In: IEEE/IFIP Network Operations and Management Symposium, April, pp. 455–462. IEEE, Osaka (2010) Goiri, I., Julia, F., Guitart, J., Torres, J.: Checkpoint-based fault-tolerance infrastructure for virtualized service providers. In: IEEE/IFIP Network Operations and Management Symposium, April, pp. 455–462. IEEE, Osaka (2010)
14.
Zurück zum Zitat Plank, J., Li, K., Puening, M.: Diskless check-pointing. IEEE Trans. Parallel Distrib. Syst. 9(10), 972–986 (1998)CrossRef Plank, J., Li, K., Puening, M.: Diskless check-pointing. IEEE Trans. Parallel Distrib. Syst. 9(10), 972–986 (1998)CrossRef
15.
Zurück zum Zitat Engelmann, C., Geist, A.: A diskless check-pointing algorithm for super-scale architectures applied to the fast Fourier transform. In: IEEE 1st International Workshop on Challenges of Large Applications in Distributed Environments (CLADE), Seattle, WA, 21 June, pp. 47–52 (2003) Engelmann, C., Geist, A.: A diskless check-pointing algorithm for super-scale architectures applied to the fast Fourier transform. In: IEEE 1st International Workshop on Challenges of Large Applications in Distributed Environments (CLADE), Seattle, WA, 21 June, pp. 47–52 (2003)
16.
Zurück zum Zitat Song, X., Dou, W., Tang, G., Yang, K., Qian, K.: A diskless check-pointing algorithm for cluster architectures applied to geospatial raster data processing. J. Algorithms Comput. Technol. 8(4), 369–387 (2014)CrossRef Song, X., Dou, W., Tang, G., Yang, K., Qian, K.: A diskless check-pointing algorithm for cluster architectures applied to geospatial raster data processing. J. Algorithms Comput. Technol. 8(4), 369–387 (2014)CrossRef
17.
Zurück zum Zitat Bronevetsky, G., Marques, D., Pingali, K., Stodghill, P.: Automated application-level checkpoint of MPI program. In: ACM Symposium on Principles and Practice of Parallel Programming (PPoPP), San Diego, CA, 11–13 June, pp. 84–94 (2003) Bronevetsky, G., Marques, D., Pingali, K., Stodghill, P.: Automated application-level checkpoint of MPI program. In: ACM Symposium on Principles and Practice of Parallel Programming (PPoPP), San Diego, CA, 11–13 June, pp. 84–94 (2003)
18.
Zurück zum Zitat Chen, Z., Dongarra, J.: Highly scalable self-healing algorithms for high performance scientific computing. IEEE Trans. Comput. 58(11), 1512–1524 (2009) Chen, Z., Dongarra, J.: Highly scalable self-healing algorithms for high performance scientific computing. IEEE Trans. Comput. 58(11), 1512–1524 (2009)
19.
Zurück zum Zitat Dou, W., Miao, S.: Performance analysis for fast parallel recomputing algorithm under DTA. In: 14th International Symposium on Distributed Computing and Algorithms for Business, Engineering, and Sciences (DCABES), Guiyang, China, 18–24 August, pp. 46–49 (2015) Dou, W., Miao, S.: Performance analysis for fast parallel recomputing algorithm under DTA. In: 14th International Symposium on Distributed Computing and Algorithms for Business, Engineering, and Sciences (DCABES), Guiyang, China, 18–24 August, pp. 46–49 (2015)
20.
Zurück zum Zitat Miao, S., Dou, W., Li, Y.: An error-detecting approach for fault tolerance parallel recomputing with parallel digital terrain analysis. J. Algorithms Comput. Technol. 10(1), 52–61 (2016)MathSciNetCrossRef Miao, S., Dou, W., Li, Y.: An error-detecting approach for fault tolerance parallel recomputing with parallel digital terrain analysis. J. Algorithms Comput. Technol. 10(1), 52–61 (2016)MathSciNetCrossRef
21.
Zurück zum Zitat Miao, S., Dou, W., Li, Y.: Study on error-detecting approach for fault tolerance recomputing oriented parallel digital terrain analysis. In: Distributed Computing and Algorithms for Business, Engineering, and Sciences (DCABES), Xianning, Hubei, 29–30 November, pp. 148–151 (2014) Miao, S., Dou, W., Li, Y.: Study on error-detecting approach for fault tolerance recomputing oriented parallel digital terrain analysis. In: Distributed Computing and Algorithms for Business, Engineering, and Sciences (DCABES), Xianning, Hubei, 29–30 November, pp. 148–151 (2014)
22.
Zurück zum Zitat Miao, S., Dou, W., Li, Y.: Research on the fast parallel re-computing for parallel digital terrain analysis, In: Second International Conference on Geo-informatics in Resource Management and Sustainable Ecosystem (GRMSE), CCIS 482, pp. 244–251 (2014) Miao, S., Dou, W., Li, Y.: Research on the fast parallel re-computing for parallel digital terrain analysis, In: Second International Conference on Geo-informatics in Resource Management and Sustainable Ecosystem (GRMSE), CCIS 482, pp. 244–251 (2014)
Metadaten
Titel
A fast parallel re-computation with redundancy mechanism for parallel digital terrain analysis
verfasst von
Wanfeng Dou
Shoushuai Miao
Publikationsdatum
01.12.2016
Verlag
Springer US
Erschienen in
Cluster Computing / Ausgabe 4/2016
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-016-0644-z

Weitere Artikel der Ausgabe 4/2016

Cluster Computing 4/2016 Zur Ausgabe

Premium Partner