Skip to main content
Top

2024 | OriginalPaper | Chapter

Image Captioning System for Movie Subtitling Using Neural Networks and LSTM

Authors : K. Vijay, Eashaan Manohar, B. Saiganesh, S. Sanjai, S. R. Deepak

Published in: Proceedings of Third International Conference on Computing and Communication Networks

Publisher: Springer Nature Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

With the advent of the Internet, the multimedia business has experienced explosive growth and is now available to consumers worldwide. Particularly important to this growth has been the dominance of over-the-top (OTT) platforms during the COVID-19 epidemic. One important component in this growth has been the implementation of state-of-the-art Machine Learning (ML) methods. These algorithms can generate captions from video frames mechanically, increasing the platform's accessibility. However, it can be difficult to meet the needs of users speaking different languages, as there are many films being streamed in languages that may be inaccessible to consumers in other parts of the world. There is an issue with English being the most spoken language in the world. Therefore, individuals who aren't fluent in English may have trouble finding videos in their native language. By automatically creating English subtitles for any movie, regardless of the language spoken on the original audio track, machine learning plays a significant role in removing this barrier. Neural Networks, used for processing each frame of the video, and LSTM models, used for caption synthesis, are the key models employed here from the realm of machine learning. After the models have been trained, they can be incorporated into the user interface (UI) using a programming language like Python. In the user interface, the created caption can be shown alongside the uploaded image.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Staniūtė, R., Šešok, D.: A systematic literature review on image captioning. Appl. Sci. 9(10), 2024 (2019)CrossRef Staniūtė, R., Šešok, D.: A systematic literature review on image captioning. Appl. Sci. 9(10), 2024 (2019)CrossRef
3.
go back to reference Omri, M., Abdel-Khalek, S., Khalil, E.M., Bouslimi, J., Joshi, G.P.: Modeling of hyperparameter tuned deep learning model for automated image captioning. Mathematics, MDPI 10(3), 1–20 (2022) Omri, M., Abdel-Khalek, S., Khalil, E.M., Bouslimi, J., Joshi, G.P.: Modeling of hyperparameter tuned deep learning model for automated image captioning. Mathematics, MDPI 10(3), 1–20 (2022)
4.
go back to reference Sur, C.: AiTPR: attribute interaction-tensor product representation for image caption. Neural. Process. Lett. 53(2), 1229–1251 (2021)CrossRef Sur, C.: AiTPR: attribute interaction-tensor product representation for image caption. Neural. Process. Lett. 53(2), 1229–1251 (2021)CrossRef
5.
6.
go back to reference Latha, G.C.P., Sridhar, S., Prithi, S., Anitha, T.: Cardio-vascular disease classification using stacked segmentation model and convolutional neural networks. J. Cardiovasc. Disease Res. 11(4), 26–31 (2020) Latha, G.C.P., Sridhar, S., Prithi, S., Anitha, T.: Cardio-vascular disease classification using stacked segmentation model and convolutional neural networks. J. Cardiovasc. Disease Res. 11(4), 26–31 (2020)
7.
go back to reference Rennie, S.J., Marcheret, E., Mroueh, Y., Ross, J., Goel; V.: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 7008–7024 (2017) Rennie, S.J., Marcheret, E., Mroueh, Y., Ross, J., Goel; V.: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 7008–7024 (2017)
8.
go back to reference Cao, P., Yang, Z., Sun, L., Liang, Y., Yang, M.Q., Guan, R.: Image captioning with bidirectional semantic attention-based guiding of long short-term memory. Neural. Process. Lett. 50(1), 103–119 (2019)CrossRef Cao, P., Yang, Z., Sun, L., Liang, Y., Yang, M.Q., Guan, R.: Image captioning with bidirectional semantic attention-based guiding of long short-term memory. Neural. Process. Lett. 50(1), 103–119 (2019)CrossRef
9.
go back to reference Wang, T., Hu, H., He, C.: Image caption with endogenous–exogenous attention. Neural. Process. Lett. 50(1), 431–443 (2019)CrossRef Wang, T., Hu, H., He, C.: Image caption with endogenous–exogenous attention. Neural. Process. Lett. 50(1), 431–443 (2019)CrossRef
10.
go back to reference Yang, L., Hu, H.: Adaptive syncretic attention for constrained image captioning. Neural. Process. Lett. 50(1), 549–564 (2019)MathSciNetCrossRef Yang, L., Hu, H.: Adaptive syncretic attention for constrained image captioning. Neural. Process. Lett. 50(1), 549–564 (2019)MathSciNetCrossRef
11.
go back to reference Anusha, S., Elakkiya, N., Vijayakumar, R.: Separable Reversible Data Hiding in Encrypted Image Using Dual Data Embedding with Histogram Shifting (2020) Anusha, S., Elakkiya, N., Vijayakumar, R.: Separable Reversible Data Hiding in Encrypted Image Using Dual Data Embedding with Histogram Shifting (2020)
12.
go back to reference Lu, J., Yang, J., Batra, D., Parikh, D.: Neural baby talk. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7219–7228 (2018) Lu, J., Yang, J., Batra, D., Parikh, D.: Neural baby talk. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7219–7228 (2018)
13.
go back to reference Yao, T., Pan, Y., Li, Y., Mei, T.: Exploring visual relationship for image captioning. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 684–699 (2018) Yao, T., Pan, Y., Li, Y., Mei, T.: Exploring visual relationship for image captioning. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 684–699 (2018)
14.
go back to reference Adhikari, A., Ghimire, S.: Nepali Image Captioning Artificial Intelligence for Transforming Business and Society (AITB) (2019) Adhikari, A., Ghimire, S.: Nepali Image Captioning Artificial Intelligence for Transforming Business and Society (AITB) (2019)
15.
go back to reference Zheng, L., Caiming, Z., Caixian, C.: MMDF-LDA: an improved multi-modal latent Dirichlet allocation model for social image annotation. Expert Syst. Appl. 104, 168–184 29. Sur, C. (2021). aiTPR: attribute interaction-tensor product representation (2018) Zheng, L., Caiming, Z., Caixian, C.: MMDF-LDA: an improved multi-modal latent Dirichlet allocation model for social image annotation. Expert Syst. Appl. 104, 168–184 29. Sur, C. (2021). aiTPR: attribute interaction-tensor product representation (2018)
16.
go back to reference Onan, M., Shillah, A.: Enhancing Automatic Image Captioning System LSTM (2023) Onan, M., Shillah, A.: Enhancing Automatic Image Captioning System LSTM (2023)
17.
go back to reference Kundu, R., Singh, S., Amali, G., Noel, M.M., Umadevi, K.S.: Automatic image captioning using different variants of the long short-term memory (LSTM) deep learning model. In: Deep Learning Research Applications for Natural Language Processing, pp. 132–155. IGI Global (2023) Kundu, R., Singh, S., Amali, G., Noel, M.M., Umadevi, K.S.: Automatic image captioning using different variants of the long short-term memory (LSTM) deep learning model. In: Deep Learning Research Applications for Natural Language Processing, pp. 132–155. IGI Global (2023)
18.
go back to reference Jia, J., Ding, X., Pang, S., Gao, X., Xin, X., Hu, R., Nie, J.: Image captioning based on scene graphs: a survey. Expert Syst. Appl. 120698 (2023) Jia, J., Ding, X., Pang, S., Gao, X., Xin, X., Hu, R., Nie, J.: Image captioning based on scene graphs: a survey. Expert Syst. Appl. 120698 (2023)
19.
go back to reference Ansari, S.A., Zafar, A.: (2023) A comprehensive study on video captioning techniques, benchmark datasets and QoS metrics. In: 2023 10th International Conference on Computing for Sustainable Global Development (INDIACom), pp. 1598–1603. IEEE. Ansari, S.A., Zafar, A.: (2023) A comprehensive study on video captioning techniques, benchmark datasets and QoS metrics. In: 2023 10th International Conference on Computing for Sustainable Global Development (INDIACom), pp. 1598–1603. IEEE.
20.
go back to reference AL-Sammarraie, Y.Q., Khaled, A.Q., AL-Mousa, M.R., Desouky, S.F.: Image captions and hashtags generation using deep learning approach. In: 2022 International Engineering Conference on Electrical, Energy, and Artificial Intelligence (EICEEAI), pp. 1–5. IEEE (2022) AL-Sammarraie, Y.Q., Khaled, A.Q., AL-Mousa, M.R., Desouky, S.F.: Image captions and hashtags generation using deep learning approach. In: 2022 International Engineering Conference on Electrical, Energy, and Artificial Intelligence (EICEEAI), pp. 1–5. IEEE (2022)
Metadata
Title
Image Captioning System for Movie Subtitling Using Neural Networks and LSTM
Authors
K. Vijay
Eashaan Manohar
B. Saiganesh
S. Sanjai
S. R. Deepak
Copyright Year
2024
Publisher
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-97-0892-5_43