Skip to main content

2024 | OriginalPaper | Buchkapitel

Image Captioning System for Movie Subtitling Using Neural Networks and LSTM

verfasst von : K. Vijay, Eashaan Manohar, B. Saiganesh, S. Sanjai, S. R. Deepak

Erschienen in: Proceedings of Third International Conference on Computing and Communication Networks

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

With the advent of the Internet, the multimedia business has experienced explosive growth and is now available to consumers worldwide. Particularly important to this growth has been the dominance of over-the-top (OTT) platforms during the COVID-19 epidemic. One important component in this growth has been the implementation of state-of-the-art Machine Learning (ML) methods. These algorithms can generate captions from video frames mechanically, increasing the platform's accessibility. However, it can be difficult to meet the needs of users speaking different languages, as there are many films being streamed in languages that may be inaccessible to consumers in other parts of the world. There is an issue with English being the most spoken language in the world. Therefore, individuals who aren't fluent in English may have trouble finding videos in their native language. By automatically creating English subtitles for any movie, regardless of the language spoken on the original audio track, machine learning plays a significant role in removing this barrier. Neural Networks, used for processing each frame of the video, and LSTM models, used for caption synthesis, are the key models employed here from the realm of machine learning. After the models have been trained, they can be incorporated into the user interface (UI) using a programming language like Python. In the user interface, the created caption can be shown alongside the uploaded image.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Staniūtė, R., Šešok, D.: A systematic literature review on image captioning. Appl. Sci. 9(10), 2024 (2019)CrossRef Staniūtė, R., Šešok, D.: A systematic literature review on image captioning. Appl. Sci. 9(10), 2024 (2019)CrossRef
3.
Zurück zum Zitat Omri, M., Abdel-Khalek, S., Khalil, E.M., Bouslimi, J., Joshi, G.P.: Modeling of hyperparameter tuned deep learning model for automated image captioning. Mathematics, MDPI 10(3), 1–20 (2022) Omri, M., Abdel-Khalek, S., Khalil, E.M., Bouslimi, J., Joshi, G.P.: Modeling of hyperparameter tuned deep learning model for automated image captioning. Mathematics, MDPI 10(3), 1–20 (2022)
4.
Zurück zum Zitat Sur, C.: AiTPR: attribute interaction-tensor product representation for image caption. Neural. Process. Lett. 53(2), 1229–1251 (2021)CrossRef Sur, C.: AiTPR: attribute interaction-tensor product representation for image caption. Neural. Process. Lett. 53(2), 1229–1251 (2021)CrossRef
5.
6.
Zurück zum Zitat Latha, G.C.P., Sridhar, S., Prithi, S., Anitha, T.: Cardio-vascular disease classification using stacked segmentation model and convolutional neural networks. J. Cardiovasc. Disease Res. 11(4), 26–31 (2020) Latha, G.C.P., Sridhar, S., Prithi, S., Anitha, T.: Cardio-vascular disease classification using stacked segmentation model and convolutional neural networks. J. Cardiovasc. Disease Res. 11(4), 26–31 (2020)
7.
Zurück zum Zitat Rennie, S.J., Marcheret, E., Mroueh, Y., Ross, J., Goel; V.: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 7008–7024 (2017) Rennie, S.J., Marcheret, E., Mroueh, Y., Ross, J., Goel; V.: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 7008–7024 (2017)
8.
Zurück zum Zitat Cao, P., Yang, Z., Sun, L., Liang, Y., Yang, M.Q., Guan, R.: Image captioning with bidirectional semantic attention-based guiding of long short-term memory. Neural. Process. Lett. 50(1), 103–119 (2019)CrossRef Cao, P., Yang, Z., Sun, L., Liang, Y., Yang, M.Q., Guan, R.: Image captioning with bidirectional semantic attention-based guiding of long short-term memory. Neural. Process. Lett. 50(1), 103–119 (2019)CrossRef
9.
Zurück zum Zitat Wang, T., Hu, H., He, C.: Image caption with endogenous–exogenous attention. Neural. Process. Lett. 50(1), 431–443 (2019)CrossRef Wang, T., Hu, H., He, C.: Image caption with endogenous–exogenous attention. Neural. Process. Lett. 50(1), 431–443 (2019)CrossRef
10.
Zurück zum Zitat Yang, L., Hu, H.: Adaptive syncretic attention for constrained image captioning. Neural. Process. Lett. 50(1), 549–564 (2019)MathSciNetCrossRef Yang, L., Hu, H.: Adaptive syncretic attention for constrained image captioning. Neural. Process. Lett. 50(1), 549–564 (2019)MathSciNetCrossRef
11.
Zurück zum Zitat Anusha, S., Elakkiya, N., Vijayakumar, R.: Separable Reversible Data Hiding in Encrypted Image Using Dual Data Embedding with Histogram Shifting (2020) Anusha, S., Elakkiya, N., Vijayakumar, R.: Separable Reversible Data Hiding in Encrypted Image Using Dual Data Embedding with Histogram Shifting (2020)
12.
Zurück zum Zitat Lu, J., Yang, J., Batra, D., Parikh, D.: Neural baby talk. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7219–7228 (2018) Lu, J., Yang, J., Batra, D., Parikh, D.: Neural baby talk. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7219–7228 (2018)
13.
Zurück zum Zitat Yao, T., Pan, Y., Li, Y., Mei, T.: Exploring visual relationship for image captioning. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 684–699 (2018) Yao, T., Pan, Y., Li, Y., Mei, T.: Exploring visual relationship for image captioning. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 684–699 (2018)
14.
Zurück zum Zitat Adhikari, A., Ghimire, S.: Nepali Image Captioning Artificial Intelligence for Transforming Business and Society (AITB) (2019) Adhikari, A., Ghimire, S.: Nepali Image Captioning Artificial Intelligence for Transforming Business and Society (AITB) (2019)
15.
Zurück zum Zitat Zheng, L., Caiming, Z., Caixian, C.: MMDF-LDA: an improved multi-modal latent Dirichlet allocation model for social image annotation. Expert Syst. Appl. 104, 168–184 29. Sur, C. (2021). aiTPR: attribute interaction-tensor product representation (2018) Zheng, L., Caiming, Z., Caixian, C.: MMDF-LDA: an improved multi-modal latent Dirichlet allocation model for social image annotation. Expert Syst. Appl. 104, 168–184 29. Sur, C. (2021). aiTPR: attribute interaction-tensor product representation (2018)
16.
Zurück zum Zitat Onan, M., Shillah, A.: Enhancing Automatic Image Captioning System LSTM (2023) Onan, M., Shillah, A.: Enhancing Automatic Image Captioning System LSTM (2023)
17.
Zurück zum Zitat Kundu, R., Singh, S., Amali, G., Noel, M.M., Umadevi, K.S.: Automatic image captioning using different variants of the long short-term memory (LSTM) deep learning model. In: Deep Learning Research Applications for Natural Language Processing, pp. 132–155. IGI Global (2023) Kundu, R., Singh, S., Amali, G., Noel, M.M., Umadevi, K.S.: Automatic image captioning using different variants of the long short-term memory (LSTM) deep learning model. In: Deep Learning Research Applications for Natural Language Processing, pp. 132–155. IGI Global (2023)
18.
Zurück zum Zitat Jia, J., Ding, X., Pang, S., Gao, X., Xin, X., Hu, R., Nie, J.: Image captioning based on scene graphs: a survey. Expert Syst. Appl. 120698 (2023) Jia, J., Ding, X., Pang, S., Gao, X., Xin, X., Hu, R., Nie, J.: Image captioning based on scene graphs: a survey. Expert Syst. Appl. 120698 (2023)
19.
Zurück zum Zitat Ansari, S.A., Zafar, A.: (2023) A comprehensive study on video captioning techniques, benchmark datasets and QoS metrics. In: 2023 10th International Conference on Computing for Sustainable Global Development (INDIACom), pp. 1598–1603. IEEE. Ansari, S.A., Zafar, A.: (2023) A comprehensive study on video captioning techniques, benchmark datasets and QoS metrics. In: 2023 10th International Conference on Computing for Sustainable Global Development (INDIACom), pp. 1598–1603. IEEE.
20.
Zurück zum Zitat AL-Sammarraie, Y.Q., Khaled, A.Q., AL-Mousa, M.R., Desouky, S.F.: Image captions and hashtags generation using deep learning approach. In: 2022 International Engineering Conference on Electrical, Energy, and Artificial Intelligence (EICEEAI), pp. 1–5. IEEE (2022) AL-Sammarraie, Y.Q., Khaled, A.Q., AL-Mousa, M.R., Desouky, S.F.: Image captions and hashtags generation using deep learning approach. In: 2022 International Engineering Conference on Electrical, Energy, and Artificial Intelligence (EICEEAI), pp. 1–5. IEEE (2022)
Metadaten
Titel
Image Captioning System for Movie Subtitling Using Neural Networks and LSTM
verfasst von
K. Vijay
Eashaan Manohar
B. Saiganesh
S. Sanjai
S. R. Deepak
Copyright-Jahr
2024
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-97-0892-5_43