Skip to main content
Erschienen in: Cluster Computing 2/2020

09.05.2019

A novel packet exchanging strategy for preventing HoL-blocking in fat-trees

verfasst von: Seyed Mehdi Mohtavipour, Morteza Mollajafari, Ali Naseri

Erschienen in: Cluster Computing | Ausgabe 2/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Fat-tree interconnection network is one of the most popular and widely-used networks in massively parallel processing systems. Its superb characteristics such as deterministic routing, in-order delivery and providing the same performance as adaptive routing methods have made it an attractive interconnection network. However, due to its deterministic routing as well as simultaneously usage of switch links, Head of Line (HoL)-blocking may occur in buffers during high traffic workload. In order to mitigate this problem, in this paper, a novel strategy in switch buffers based on the blocking of the paths is proposed. It has been shown that combining packets with different blocked paths can reduce congestion by packet exchanging. To exchange packets, we used short and medium depth buffers and also considered two exchanging states; consecutive and non-consecutive. This novel strategy provides a trade-off between the performance improvement and reduction of buffers’ depths, while doesn’t change the delivery order of the packets. Simulation results show that in comparison with one buffer in each switch, 22% and 33% average network latency is improved with consecutive and one unit non-consecutive exchanging states respectively and also the depth of buffers in each switch is reduced 43.75% and 37.5% in comparison with multiple buffers.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Chakaravarthy, V.T., Checconi, F., Murali, P., Petrini, F., Sabharwal, Y.: Scalable single source shortest path algorithms for massively parallel systems. IEEE Trans. Parallel Distrib. Syst. 28(7), 2031–2045 (2017)CrossRef Chakaravarthy, V.T., Checconi, F., Murali, P., Petrini, F., Sabharwal, Y.: Scalable single source shortest path algorithms for massively parallel systems. IEEE Trans. Parallel Distrib. Syst. 28(7), 2031–2045 (2017)CrossRef
2.
Zurück zum Zitat Alsmadi, I., Khreishah, A., Xu, D.: Network slicing to improve multicasting in HPC clusters. Clust. Comput. 21(3), 1493–1506 (2018)CrossRef Alsmadi, I., Khreishah, A., Xu, D.: Network slicing to improve multicasting in HPC clusters. Clust. Comput. 21(3), 1493–1506 (2018)CrossRef
3.
Zurück zum Zitat Shet, A.G., Sadayappan, P., Bernholdt, D.E., Nieplocha, J., Tipparaju, V.: A framework for characterizing overlap of communication and computation in parallel applications. Clust. Comput. 11(1), 75–90 (2008)CrossRef Shet, A.G., Sadayappan, P., Bernholdt, D.E., Nieplocha, J., Tipparaju, V.: A framework for characterizing overlap of communication and computation in parallel applications. Clust. Comput. 11(1), 75–90 (2008)CrossRef
4.
Zurück zum Zitat Mahapatra, S., Yuan, X., Nienaber, W.: Limited multi-path routing on extended generalized fat-trees. IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), pp. 938–945 (2012) Mahapatra, S., Yuan, X., Nienaber, W.: Limited multi-path routing on extended generalized fat-trees. IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), pp. 938–945 (2012)
5.
Zurück zum Zitat Petrini, F., Vanneschi, M.: Performance analysis of wormhole routed k-ary n-trees. Int. J. Found. Comput. Sci. 9(02), 157–177 (1998)CrossRef Petrini, F., Vanneschi, M.: Performance analysis of wormhole routed k-ary n-trees. Int. J. Found. Comput. Sci. 9(02), 157–177 (1998)CrossRef
6.
Zurück zum Zitat Mahanta, H.J., Biswas, A., Hussain, A.: An architecture based routing for heterogeneous fat tree network on chip” IEEE International Symposium on Advanced Computing and Communication (ISACC), pp. 341–345 (2015) Mahanta, H.J., Biswas, A., Hussain, A.: An architecture based routing for heterogeneous fat tree network on chip” IEEE International Symposium on Advanced Computing and Communication (ISACC), pp. 341–345 (2015)
7.
Zurück zum Zitat Lee, J.H., Kim, M.S., Han, T.H.: Insertion loss-aware routing analysis and optimization for a fat-tree-based optical network-on-chip. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 37(3), 559–572 (2018)CrossRef Lee, J.H., Kim, M.S., Han, T.H.: Insertion loss-aware routing analysis and optimization for a fat-tree-based optical network-on-chip. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 37(3), 559–572 (2018)CrossRef
8.
Zurück zum Zitat Wang, Z., Xu, J., Wu, X., Ye, Y., Zhang, W., Nikdast, M., Wang, X., Wang, Z.: Floorplan optimization of fat-tree-based networks-on-chip for chip multiprocessors. IEEE Trans. Comput. 63(6), 1446–1459 (2014)MathSciNetCrossRef Wang, Z., Xu, J., Wu, X., Ye, Y., Zhang, W., Nikdast, M., Wang, X., Wang, Z.: Floorplan optimization of fat-tree-based networks-on-chip for chip multiprocessors. IEEE Trans. Comput. 63(6), 1446–1459 (2014)MathSciNetCrossRef
9.
Zurück zum Zitat Chueh, H.S., Lien, C.M., Chang, C.S., Cheng, J., Lee, D.S.: Load-balanced Birkhoff-von Neumann switches and fat-tree networks. IEEE 14th International Conference on High Performance Switching and Routing (HPSR), pp. 142–147 (2013) Chueh, H.S., Lien, C.M., Chang, C.S., Cheng, J., Lee, D.S.: Load-balanced Birkhoff-von Neumann switches and fat-tree networks. IEEE 14th International Conference on High Performance Switching and Routing (HPSR), pp. 142–147 (2013)
10.
Zurück zum Zitat Leiserson, C.E.: Fat-trees: universal networks for hardware-efficient supercomputing. IEEE Trans. Comput. 100(10), 892–901 (1985)CrossRef Leiserson, C.E.: Fat-trees: universal networks for hardware-efficient supercomputing. IEEE Trans. Comput. 100(10), 892–901 (1985)CrossRef
11.
Zurück zum Zitat Hoefler, T., Schneider, T., Lumsdaine, A.: Multistage switches are not crossbars: Effects of static routing in high-performance networks. IEEE International Conference on Cluster Computing, pp. 116–125 (2008) Hoefler, T., Schneider, T., Lumsdaine, A.: Multistage switches are not crossbars: Effects of static routing in high-performance networks. IEEE International Conference on Cluster Computing, pp. 116–125 (2008)
12.
Zurück zum Zitat Prisacari, B., Rodriguez, G., Minkenberg, C., Hoefler, T.: Bandwidth-optimal all-to-all exchanges in fat tree networks. Proceedings of the 27th international ACM conference on International conference on supercomputing, pp. 139–148 (2013) Prisacari, B., Rodriguez, G., Minkenberg, C., Hoefler, T.: Bandwidth-optimal all-to-all exchanges in fat tree networks. Proceedings of the 27th international ACM conference on International conference on supercomputing, pp. 139–148 (2013)
13.
Zurück zum Zitat Alonso, M., Coll, S., Martínez, J.M., Santonja, V., López, P.: Power consumption management in fat-tree interconnection networks. Parallel Comput. 48, 59–80 (2015)CrossRef Alonso, M., Coll, S., Martínez, J.M., Santonja, V., López, P.: Power consumption management in fat-tree interconnection networks. Parallel Comput. 48, 59–80 (2015)CrossRef
14.
Zurück zum Zitat He, Y., Kondo, M.: Opportunistic circuit-switching for energy efficient on-chip networks. IFIP/IEEE International conference on very large scale integration (VLSI-SoC), pp. 1–6 (2016) He, Y., Kondo, M.: Opportunistic circuit-switching for energy efficient on-chip networks. IFIP/IEEE International conference on very large scale integration (VLSI-SoC), pp. 1–6 (2016)
15.
Zurück zum Zitat Al-Fares, M., Loukissas, A., Vahdat, A.: A scalable, commodity data center network architecture. ACM SIGCOMM Comput. Commun. Rev. 38, 63–74 (2008)CrossRef Al-Fares, M., Loukissas, A., Vahdat, A.: A scalable, commodity data center network architecture. ACM SIGCOMM Comput. Commun. Rev. 38, 63–74 (2008)CrossRef
16.
Zurück zum Zitat Niranjan Mysore, R., Pamboris, A., Farrington, N., Huang, N., Miri, P., Radhakrishnan, S., Subramanya, V., Vahdat, A.: Portland: a scalable fault-tolerant layer 2 data center network fabric. ACM SIGCOMM Comput. Commun. Rev. 39, 39–50 (2009)CrossRef Niranjan Mysore, R., Pamboris, A., Farrington, N., Huang, N., Miri, P., Radhakrishnan, S., Subramanya, V., Vahdat, A.: Portland: a scalable fault-tolerant layer 2 data center network fabric. ACM SIGCOMM Comput. Commun. Rev. 39, 39–50 (2009)CrossRef
17.
Zurück zum Zitat Singh, A., Ong, J., Agarwal, A., Anderson, G., Armistead, A., Bannon, R., Boving, S., Desai, G., Felderman, B., Germano, P., Kanagala, A.: Jupiter rising: a decade of clos topologies and centralized control in google’s datacenter network. ACM SIGCOMM Comput. Commun. Rev. 45, 183–197 (2015)CrossRef Singh, A., Ong, J., Agarwal, A., Anderson, G., Armistead, A., Bannon, R., Boving, S., Desai, G., Felderman, B., Germano, P., Kanagala, A.: Jupiter rising: a decade of clos topologies and centralized control in google’s datacenter network. ACM SIGCOMM Comput. Commun. Rev. 45, 183–197 (2015)CrossRef
18.
Zurück zum Zitat Bogdanski, B., Johnsen, B.D., Reinemo, S.A.: Multi-homed fat-tree routing with InfiniBand. IEEE Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 122–129 (2014) Bogdanski, B., Johnsen, B.D., Reinemo, S.A.: Multi-homed fat-tree routing with InfiniBand. IEEE Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 122–129 (2014)
19.
Zurück zum Zitat Yan, F., Gao, J.: Reliable NoC design with low latency and power consumption. Electron. Lett. 53(6), 382–383 (2017)CrossRef Yan, F., Gao, J.: Reliable NoC design with low latency and power consumption. Electron. Lett. 53(6), 382–383 (2017)CrossRef
20.
Zurück zum Zitat Karol, M., Hluchyj, M., Morgan, S.: Input versus output queuing on a space-division packet switch. IEEE Trans. Commun. 35(12), 1347–1356 (1987)CrossRef Karol, M., Hluchyj, M., Morgan, S.: Input versus output queuing on a space-division packet switch. IEEE Trans. Commun. 35(12), 1347–1356 (1987)CrossRef
21.
Zurück zum Zitat Li, C., Dong, D., Liao, X., Wu, J., Lei, F.: RoB-router: low latency network-on-chip router microarchitecture using reorder buffer. 24th IEEE Annual Symposium on High-Performance Interconnects, pp. 68–75 (2016) Li, C., Dong, D., Liao, X., Wu, J., Lei, F.: RoB-router: low latency network-on-chip router microarchitecture using reorder buffer. 24th IEEE Annual Symposium on High-Performance Interconnects, pp. 68–75 (2016)
22.
Zurück zum Zitat Anderson, T.E., Owicki, S.S., Saxe, J.B., Thacker, C.P.: High-speed switch scheduling for local-area networks. ACM Trans. Comput. Syst. 11(4), 319–352 (1993)CrossRef Anderson, T.E., Owicki, S.S., Saxe, J.B., Thacker, C.P.: High-speed switch scheduling for local-area networks. ACM Trans. Comput. Syst. 11(4), 319–352 (1993)CrossRef
23.
Zurück zum Zitat Farouk, A., El-Boghdadi, H.M.: A methodology for easing the congestion in fat-trees using traffic pattern detection. In: IEEE 10th International Symposium on Parallel and Distributed Processing with Applications, pp. 705–712 (2012) Farouk, A., El-Boghdadi, H.M.: A methodology for easing the congestion in fat-trees using traffic pattern detection. In: IEEE 10th International Symposium on Parallel and Distributed Processing with Applications, pp. 705–712 (2012)
24.
Zurück zum Zitat Guay, W.L., Reinemo, S.A., Lysne, O., Skeie, T.: dFtree: a fat-tree routing algorithm using dynamic allocation of virtual lanes to alleviate congestion in infiniband networks. Proceedings of the first international workshop on Network-aware data management, pp. 1–10 (2011) Guay, W.L., Reinemo, S.A., Lysne, O., Skeie, T.: dFtree: a fat-tree routing algorithm using dynamic allocation of virtual lanes to alleviate congestion in infiniband networks. Proceedings of the first international workshop on Network-aware data management, pp. 1–10 (2011)
25.
Zurück zum Zitat Peñaranda, R., Gómez, C., Gómez, M.E., López, P., Duato, J.: Deterministic routing with HoL-blocking-awareness for direct topologies. Procedia Comput. Sci. 18, 2521–2524 (2013)CrossRef Peñaranda, R., Gómez, C., Gómez, M.E., López, P., Duato, J.: Deterministic routing with HoL-blocking-awareness for direct topologies. Procedia Comput. Sci. 18, 2521–2524 (2013)CrossRef
26.
Zurück zum Zitat Gómez, C., Gilabert, F., Gómez, M.E., López, P., Duato, J.: A HoL-blocking aware mechanism for selecting the upward path in fat-tree topologies. J. Supercomput. 71(7), 2339–2364 (2015)CrossRef Gómez, C., Gilabert, F., Gómez, M.E., López, P., Duato, J.: A HoL-blocking aware mechanism for selecting the upward path in fat-tree topologies. J. Supercomput. 71(7), 2339–2364 (2015)CrossRef
27.
Zurück zum Zitat Samman, F.A., Hollstein, T., Glesner, M.: Runtime contention and bandwidth-aware adaptive routing selection strategies for networks-on-chip. IEEE Trans. Parallel Distrib. Syst. 24(7), 1411–1421 (2013)CrossRef Samman, F.A., Hollstein, T., Glesner, M.: Runtime contention and bandwidth-aware adaptive routing selection strategies for networks-on-chip. IEEE Trans. Parallel Distrib. Syst. 24(7), 1411–1421 (2013)CrossRef
28.
Zurück zum Zitat Huang, A.: Starlite: a wideband digital switch’ Proceeding of Globecom’84, pp. 3–5 (1984) Huang, A.: Starlite: a wideband digital switch’ Proceeding of Globecom’84, pp. 3–5 (1984)
29.
Zurück zum Zitat Escamilla, J.V., Flich, J., Garcia, P.J.: Head-of-Line Blocking Avoidance in Networks-on-Chip. IEEE 27th International parallel and distributed processing symposium workshops & PhD forum (IPDPSW), pp. 796–805 (2013) Escamilla, J.V., Flich, J., Garcia, P.J.: Head-of-Line Blocking Avoidance in Networks-on-Chip. IEEE 27th International parallel and distributed processing symposium workshops & PhD forum (IPDPSW), pp. 796–805 (2013)
30.
Zurück zum Zitat Bistouni, F., Jahanshahi, M.: Scalable crossbar network: a non-blocking interconnection network for large-scale systems. J Supercomput. 71(2), 697–728 (2015)CrossRef Bistouni, F., Jahanshahi, M.: Scalable crossbar network: a non-blocking interconnection network for large-scale systems. J Supercomput. 71(2), 697–728 (2015)CrossRef
31.
Zurück zum Zitat Karthikeyan, A., Kumar, P.S.: Randomly prioritized buffer-less routing architecture for 3D network on chip. Comput. Electr. Eng. 59, 39–50 (2017)CrossRef Karthikeyan, A., Kumar, P.S.: Randomly prioritized buffer-less routing architecture for 3D network on chip. Comput. Electr. Eng. 59, 39–50 (2017)CrossRef
32.
Zurück zum Zitat Tamir, Y., Frazier, G.L.: Dynamically-allocated multi-queue buffers for VLSI communication switches. IEEE Trans. Comput. 41(6), 725–737 (1992)CrossRef Tamir, Y., Frazier, G.L.: Dynamically-allocated multi-queue buffers for VLSI communication switches. IEEE Trans. Comput. 41(6), 725–737 (1992)CrossRef
33.
Zurück zum Zitat Nachiondo, T., Flich, J., Duato, J.: Buffer management strategies to reduce hol blocking. IEEE Trans. Parallel Distrib. Syst. 21(6), 739–753 (2010)CrossRef Nachiondo, T., Flich, J., Duato, J.: Buffer management strategies to reduce hol blocking. IEEE Trans. Parallel Distrib. Syst. 21(6), 739–753 (2010)CrossRef
34.
Zurück zum Zitat Escudero-Sahuquillo, J., Garcia, P.J., Quiles, F.J., Duato, J.: An efficient strategy for reducing head-of-line blocking in fat-trees. European conference on parallel processing, pp. 413–427 (2010) Escudero-Sahuquillo, J., Garcia, P.J., Quiles, F.J., Duato, J.: An efficient strategy for reducing head-of-line blocking in fat-trees. European conference on parallel processing, pp. 413–427 (2010)
35.
Zurück zum Zitat Ofori-Attah, E., Agyeman, M.O.: A survey of recent contributions on low power NoC architectures. IEEE computing conference, pp. 1086–1090 (2017) Ofori-Attah, E., Agyeman, M.O.: A survey of recent contributions on low power NoC architectures. IEEE computing conference, pp. 1086–1090 (2017)
36.
Zurück zum Zitat Su, N., Gu, H., Wang, K., Yu, X., Zhang, B.: A highly efficient dynamic router for application-oriented network on chip. J. Supercomput. 74(7), 2905–2915 (2018)CrossRef Su, N., Gu, H., Wang, K., Yu, X., Zhang, B.: A highly efficient dynamic router for application-oriented network on chip. J. Supercomput. 74(7), 2905–2915 (2018)CrossRef
37.
Zurück zum Zitat Liu, Y., Jin, J., Lai, Z.: A dynamic adaptive arbiter for Network-on-Chip. MIDEM J. Microelectron. Electron. Compon. Mater. 43(2), 111–118 (2013) Liu, Y., Jin, J., Lai, Z.: A dynamic adaptive arbiter for Network-on-Chip. MIDEM J. Microelectron. Electron. Compon. Mater. 43(2), 111–118 (2013)
38.
Zurück zum Zitat Gomez, C., Gilabert, F., Gomez, M.E., López, P., Duato, J.: Deterministic versus adaptive routing in fat-trees. IEEE international parallel and distributed processing symposium, pp. 1–8 (2007) Gomez, C., Gilabert, F., Gomez, M.E., López, P., Duato, J.: Deterministic versus adaptive routing in fat-trees. IEEE international parallel and distributed processing symposium, pp. 1–8 (2007)
39.
Zurück zum Zitat Widjaja, I., Walid, A., Luo, Y., Xu, Y., Chao, H.J.: Small versus large: switch sizing in topology design of energy-efficient data centers. IEEE/ACM 21st International Symposium on Quality of Service (IWQoS), pp. 1–6 (2013) Widjaja, I., Walid, A., Luo, Y., Xu, Y., Chao, H.J.: Small versus large: switch sizing in topology design of energy-efficient data centers. IEEE/ACM 21st International Symposium on Quality of Service (IWQoS), pp. 1–6 (2013)
40.
Zurück zum Zitat Al-Fares, M., Loukissas, A., Vahdat, A.: A scalable, commodity data center architecture. Proceedings Of SIGCOMM (2008) Al-Fares, M., Loukissas, A., Vahdat, A.: A scalable, commodity data center architecture. Proceedings Of SIGCOMM (2008)
41.
Zurück zum Zitat Villar, J.A., Andujar, F.J., Alfaro, F.J., Duato, J.: C-switches: increasing switch radix with current integration scale. IEEE 13th International conference on high performance computing and communications (HPCC), pp. 40–49 (2011) Villar, J.A., Andujar, F.J., Alfaro, F.J., Duato, J.: C-switches: increasing switch radix with current integration scale. IEEE 13th International conference on high performance computing and communications (HPCC), pp. 40–49 (2011)
42.
Zurück zum Zitat Villar, J.A., AndúJar, F.J., SáNchez, J.L., Alfaro, F.J., GáMez, J.A., Duato, J.: Obtaining the optimal configuration of high-radix combined switches. J. Parallel Distrib. Comput. 73(9), 1239–1250 (2013)CrossRef Villar, J.A., AndúJar, F.J., SáNchez, J.L., Alfaro, F.J., GáMez, J.A., Duato, J.: Obtaining the optimal configuration of high-radix combined switches. J. Parallel Distrib. Comput. 73(9), 1239–1250 (2013)CrossRef
43.
Zurück zum Zitat Kim, J., Dally, W.J., Towles, B., Gupta, A.K.: Microarchitecture of a high radix router. IEEE 32nd International Symposium on Computer Architecture ISCA’05, pp. 420–431 (2005) Kim, J., Dally, W.J., Towles, B., Gupta, A.K.: Microarchitecture of a high radix router. IEEE 32nd International Symposium on Computer Architecture ISCA’05, pp. 420–431 (2005)
44.
Zurück zum Zitat Bahn, J.H., Bagherzadeh, N.: A generic traffic model for on-chip interconnection networks. Network on Chip Architectures, pp. 22–28 (2008) Bahn, J.H., Bagherzadeh, N.: A generic traffic model for on-chip interconnection networks. Network on Chip Architectures, pp. 22–28 (2008)
Metadaten
Titel
A novel packet exchanging strategy for preventing HoL-blocking in fat-trees
verfasst von
Seyed Mehdi Mohtavipour
Morteza Mollajafari
Ali Naseri
Publikationsdatum
09.05.2019
Verlag
Springer US
Erschienen in
Cluster Computing / Ausgabe 2/2020
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-019-02940-2

Weitere Artikel der Ausgabe 2/2020

Cluster Computing 2/2020 Zur Ausgabe

Premium Partner