Zum Inhalt

Cross-Modal Communication Technology

  • 2026
  • Buch
insite
SUCHEN

Über dieses Buch

Dieses Buch bietet eine systematische und eingehende Untersuchung der Cross-Modal Communication Technology. In Kapitel 1 stellen die Autoren den Hintergrund und die Motivation dieses Bereichs dar. In Kapitel 2 skizzieren die Autoren multimodale Dienstleistungen und eine allgemeine intermodale Kommunikationsarchitektur, einschließlich typischer Dienstleistungskategorien, Kommunikationsanforderungen, architektonischen Details, Herausforderungen und Gestaltungsprinzipien. In den Kapiteln 3, 4 und 5 schlagen die Autoren jeweils drei Schlüsseltechniken innerhalb der intermodalen Kommunikationsarchitektur vor: intermodale Kodierung, intermodale Übertragung und intermodale Signalgewinnung. Für jede Technik stellen die Autoren die repräsentativen Algorithmen und die damit verbundenen experimentellen Ergebnisse eingehend vor. Kapitel 6 beschreibt eine etablierte Datenbank und die drei entwickelten modalübergreifenden Kommunikationsprototypen. Kapitel 7 führt erweiterte Techniken ein, wie etwa die modale semantische Kommunikation und die modale Kommunikation in Szenarien mit extrem geringen Ressourcen. Schließlich geben die Autoren eine Zusammenfassung des gesamten Buches und diskutieren zukünftige Forschungsrichtungen in Kapitel 8. Die Leser werden ein tiefgreifendes Verständnis der Entwicklung und der technischen Details dieses neuen Kommunikationsparadigmas für die entstehenden multimodalen Dienste erhalten. Gegenwärtig werden haptische Informationen schrittweise in traditionelle audiovisuell-dominante Multimediadienste integriert und bilden multimodale Dienste. Multimodale Dienste wurden in der B5G- und 6G-Ära als Killer-Anwendungen betrachtet, darunter ferngesteuertes industrielles Robotergreifen, Tele-Chirurgie und -Diagnose, Online-Einkaufen und -Spielen usw. Um die florierenden multimodalen Dienste zu unterstützen, wird das Paradigma der intermodalen Kommunikation vorgeschlagen. Im Vergleich zur herkömmlichen multimedialen Kommunikation ist das Paradigma der modalen Kommunikation durch die kollaborative Übertragung und Verarbeitung von audiovisuellen, haptischen Signalen und Strömen gekennzeichnet. Durch die vollständige Erforschung der potenziellen Korrelationen zwischen audiovisuellen und haptischen Modalitäten verbessert es die natürliche Mensch-Maschine-Interaktion und immersive Erfahrungen in multimodalen Diensten, während es gleichzeitig die Anforderungen an geringe Latenz, hohe Zuverlässigkeit und hohen Durchsatz erfüllt. Dieses Buch richtet sich an Studenten und Studenten der Fächer Drahtlose Kommunikation, Informatik und Elektrotechnik sowie an Forscher, die in diesem Bereich arbeiten. Fachleute, die multimediale Kommunikation für die entstehenden multimodalen Dienste suchen, werden auch dieses Buch kaufen wollen.

Inhaltsverzeichnis

Frontmatter
Chapter 1. Introduction
Abstract
As the demand for comprehensive digitization and intelligence continues to grow, multi-modal services that integrate audio, visual, and haptic elements have emerged [1, 2]. The International Telecommunication Union (ITU) Network 2030 Focus Group has identified these services as killer applications for the B5G and 6G era due to their ability to enhance immersive experiences in scenarios such as industrial manufacturing, rehabilitation medicine, and intelligent education. Cisco’s white paper on future network trends projects a 100-fold increase in global mobile data traffic by 2030, with 81% attributed to multi-modal services [3]. The rapid growth of multi-modal services has led to a massive increase in multi-modal data, including audio, visual, and haptic signals, posing significant challenges to current multimedia communication systems. In the IMT-2030 (6G) white paper published by the China Academy of Information and Communications Technology, it is highlighted that for fully immersive real-time interaction, transmission delay must be under 1ms, data rate should be over 10Gbps, and reliability should reach 99.99999% [4]. However, existing 5G-based systems have 20ms end-to-end delay and a packet loss rate of around \(10^{-5}\) with high deployment costs. Therefore, communication technology should be re-designed to fulfill the increase in multi-modal data volume and the requirements of humans’ immersive experiences.
Xin Wei, Dan Wu, Liang Zhou
Chapter 2. Cross-Modal Communication Architecture
Abstract
In this chapter, the typical multi-modal services that have emerged in recent years are firstly described. These services can be categorized into three main types: haptic-dominant, audio-visual-dominant, and equally important multi-modal services. Following this, an analysis of the characteristics of different modalities is conducted. On the one hand, the heterogeneous communication requirements of audio, visual, and haptic signals are examined. On the other hand, the diverse communication requirements of the different categories of multi-modal services are also presented. Subsequently, to meet these requirements, a cross-modal communication architecture is constructed and the details of its components are described. Finally, the key techniques within this architecture are introduced. Additionally, the issues that need to be addressed as well as the core ideas behind the design of these key techniques are also discussed.
Xin Wei, Dan Wu, Liang Zhou
Chapter 3. Cross-Modal Coding
Abstract
In this chapter, a visual-haptic mutual redundancy elimination scheme is firstly provided by considering mutual assistance between these two modalities. Then, an audio-haptic redundancy elimination mechanism is explored from the perspective of time-frequency masking. Subsequently, by exploring the benefit of potential correlation, a visual-aided haptic coding scheme is proposed to support cross-modal communications. From a theoretical perspective, the minimum number of bits required to compress haptic signals under the rate conditions of visual streams is explored. From a technical perspective, AI-empowered cross-modal prediction and channel coding is designed to support this scheme. Finally, to further achieve the optimum rate of multi-modal streams, a general cross-modal coding strategy is presented, which concerns the mutual assistance between visual and haptic modalities.
Xin Wei, Dan Wu, Liang Zhou
Chapter 4. Cross-Modal Transmission
Abstract
In this chapter, the cross-modal transmission techniques are described in detail, which can be divided into two main categories. Schemes in the first category belong to the preemptive-resume scheduling pattern, where haptic streams have higher priority than audio-visual streams during transmission. On one hand, low-latency and high-reliability demands for haptic streaming should be guaranteed. Considering this, the intra-modal correlation-based haptic streaming scheme is proposed, which explores correlation within the haptic modality to employ efficient transmission. On the other hand, due to the occasional arrival of haptic streams, the throughput of audio-visual streams may well be reduced or even interrupted. The inter-modal correlation-based audio-visual streaming and content-aware stream scheduling schemes are respectively designed, handling the fluctuant audio-visual transmission quality issues with the aid of the haptic modality. Schemes in the second category belong to the non-preemptive-resume scheduling pattern. In other words, the transmission resources allocated to haptic and audio-visual modalities are under comprehensive and coordinated consideration. Among them, the modal-aware stream scheduling scheme flexibly settles the transmission priority of the modal stream instead of the data flow, achieving a tradeoff between various requirements. The adaptive prediction-aware scheme predicts and transmits the multi-modal signals in advance to reduce the whole latency during stream delivery. The dynamic transmission mode selection scheme tries to improve resource utilization efficacy by optimizing both the chosen transmission strategy and the allocated resource. Finally, the edge intelligence-empowered transmission decision scheme carefully considers factors such as communication, caching, computation, and control capacities, and realizes autonomous transmission decisions through AI technology.
Xin Wei, Dan Wu, Liang Zhou
Chapter 5. Cross-Modal Signal Recovery
Abstract
In order to handle issues about the lost, damaged, or delayed arrival of different modality signals after transmission, the corresponding signal recovery schemes should be designed. Firstly, for haptic-dominant multi-modal services, an audio-visual-aided haptic signal recovery scheme is proposed, while for audio-visual-dominant multi-modal services, a haptic-aided visual signal recovery scheme is presented. The core ideas are to utilize the one modality signal (audio-visual or haptic) to recover the other desired modality signal (haptic or visual) by leveraging their potential correlations. Then, for equally important multi-modal services, a visual-haptic mutual signal recovery scheme is provided, which can be seen as the general case for the two schemes mentioned above. Subsequently, besides these three representative cross-modal signal recovery solutions, information retrieval is another method with fast response and high-reliability characteristics. Under this circumstance, an information retrieval-based signal recovery scheme is designed. Finally, in order to enhance the display quality of the receiver and guarantee human experience, a cross-modal super-resolution reconstruction strategy is proposed, which achieves high-resolution visual signals from the received low-resolution visual signals and the related haptic signals.
Xin Wei, Dan Wu, Liang Zhou
Chapter 6. Cross-Modal Communication Prototype System
Abstract
Based on theoretical and technical investigations, in this chapter, several developed cross-modal communication prototype systems in the areas of industrial manipulation, smart education, and intelligent healthcare have been described. Firstly, a large-scale multimodal database named VisTouch has been established, which supports the research of cross-modal communications. Secondly, the details about the design and implementation of a visual-haptic human-machine interactive system are provided, which effectively meets the demands of the remote industrial robotic grasping applications. Thirdly, a virtual acupuncture skill training application is constructed to support the emerging remote practical teaching demands. Finally, a remote throat swab sampling platform is constructed, which is of great significance in protecting the lives of medical staff and breaking the chains of virus transmission during the COVID-19 pandemic.
Xin Wei, Dan Wu, Liang Zhou
Chapter 7. Extended Techniques
Abstract
In this chapter, two extended cross-modal communication techniques are presented. On the one hand, cross-modal communication technology has been deeply integrated with semantic communications, which has attracted widespread attention in recent years, forming a new communication paradigm called cross-modal semantic communications. This new communication paradigm fully explores the semantics both within each modality (intra-modal semantics) and among modalities (inter-modal semantics), and combines them organically. It can effectively solve the polysemy and ambiguity issues existing in current semantic communications and ensure the stability of semantic transmissions. On the other hand, in extremely low communication and computing resource scenarios such as underwater equipment troubleshooting, agricultural operations in remote mountainous areas, and emergency rescue missions responding to natural disasters, existing regular cross-modal communication techniques often fail, and the associated algorithms need to be re-designed. This chapter takes the cross-modal haptic signal recovery in an emergency rescue scenario as an example to illustrate the encountered challenges and available solutions in this direction.
Xin Wei, Dan Wu, Liang Zhou
Chapter 8. Conclusion
Abstract
To support the thriving multi-modal services, the cross-modal communication paradigm is proposed by the MultiMedia Communication (MMC) Research Group, which comprises researchers from Nanjing University of Posts and Telecommunications and Army Engineering University of PLA. This book offers a systematic and in-depth investigation into this highly promising research area. There are several specific concluding remarks given in the following. Firstly, this book provides a deep introduction to the characteristics of the current and future multi-modal services. Based on these characteristics, a general cross-modal communication architecture is constructed. Three key techniques, their encountered challenges, and design principles are also analyzed. It can be concluded that both the architecture construction and the key techniques design should center on the core notion of potential correlations among modalities or “semantic”. In other words, it should focus on uncovering, analyzing, and utilizing semantic information that reflects the potential correlations among modalities.
Xin Wei, Dan Wu, Liang Zhou
Titel
Cross-Modal Communication Technology
Verfasst von
Xin Wei
Dan Wu
Liang Zhou
Copyright-Jahr
2026
Electronic ISBN
978-3-032-00957-9
Print ISBN
978-3-032-00956-2
DOI
https://doi.org/10.1007/978-3-032-00957-9

Die PDF-Dateien dieses Buches wurden gemäß dem PDF/UA-1-Standard erstellt, um die Barrierefreiheit zu verbessern. Dazu gehören Bildschirmlesegeräte, beschriebene nicht-textuelle Inhalte (Bilder, Grafiken), Lesezeichen für eine einfache Navigation, tastaturfreundliche Links und Formulare sowie durchsuchbarer und auswählbarer Text. Wir sind uns der Bedeutung von Barrierefreiheit bewusst und freuen uns über Anfragen zur Barrierefreiheit unserer Produkte. Bei Fragen oder Bedarf an Barrierefreiheit kontaktieren Sie uns bitte unter accessibilitysupport@springernature.com.