nach oben

Journal of Intelligent Information Systems

Erschienen in:

01.10.2014

Image understanding and the web: a state-of-the-art review

verfasst von: Fariza Fauzi, Mohammed Belkhatir

Erschienen in: Journal of Intelligent Information Systems | Ausgabe 2/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The contextual information of Web images is investigated to address the issue of characterizing their content with semantic descriptors and therefore bridge the semantic gap, i.e. the gap between their automated low-level representation in terms of colors, textures, shapes…and their semantic interpretation. Such characterization allows for understanding the image content and is crucial in important Web-based tasks such as image indexing and retrieval. Although we are highly motivated by the availability of rich knowledge on the Web and the relative success achieved by commercial search engines in automatically characterizing the image content using contextual information in Web pages, we are aware that the unpredictable quality of the contextual information is a major limiting factor. Among the reasons explaining the difficulty to leverage on the image contextual information, some problems are related to the characterization and extraction of this information. Indeed, the first issue is the lack of large-scale studies to highlight what is considered the relevant contextual information of an image, where it is located in a Web page and whether it is consistent across Web pages of different types, content layouts and domains. Also, the matter related to the extraction of this contextual information is topical as state-of-the-art automated extraction tools are unable to handle the heterogeneous Web. As far as the processing of the contextual information is concerned, problems linked to the syntactic and semantic characterizations of the textual components are important to address in order to tackle the semantic gap. Furthermore, questions pertaining to the organization of these textual components into coherent structures that are usable in image indexing and retrieval frameworks shall arise. To address these issues, we lay down the anatomy of a generic context-based Web image understanding framework and propose its stage-based decomposition, covering topical issues from information indexing and retrieval, image description models, natural language processing, webpage segmentation and automated information extraction. For each of the identified stages, we review state-of-the-art solutions in the literature categorized and analyzed under the light of the techniques used.

Vorheriger Artikel Automatic content based image retrieval using semantic analysis

Nächster Artikel Consistent feature selection and its application to face recognition

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Ait-Mokhtar, S., Chanod, J.-P., Roux, C. (2002). Robustness beyond shallowness: incremental deep parsing. Natural Language Engineering, 8(3), 121–144.

Alcic, S., & Conrad, S. (2011). A clustering-based approach to web image context extraction. In Proceedings of the third international conferences on advances in multimedia (pp. 74–79).

Anick, P.G. (1991). Integrating “Natural Language” and Boolean query: an application of computational linguistics to full-text information retrieval. In Proceedings of the AAAI-91 workshop on natural language text retrieval.

Arasu, A., & Garcia-Molina, H. (2003). Extracting structured data from web pages. In Proceedings of the 2003 ACM SIGMOD international conference on management of data (pp. 337–348).

Armitage, L.H., & Enser, P.G.B. (1997). Analysis of user need in image archives. Journal of Information Science, 23(4), 287–299.CrossRef

Aslandogan, Y.A., & Yu, C.T. (2000). Evaluating strategies and systems for content based indexing of person images on the web. In Proceedings of the ACM international conference on multimedia (pp. 313–321).

Aslandogan, Y.A., Thier, C., Yu, C., Zou, J., Rishe, N. (1997). Using semantic contents and WordNet in image retrieval. In Proceedings of SIGIR.

BBC News (2009). Microsoft and Yahoo Seal Web Deal. BBC News. Last updated date: 29 July 2009. Retrieved from http://news.bbc.co.uk/2/hi/business/8174763.stm.

Blaschko, M.B., & Lampert, C.H. (2008). Correlational spectral clustering. In IEEE conference on computer vision and pattern recognition 2008. CVPR 2008 (pp. 1–8).

Cai, D., Yu, S., Wen, J.-R. (2003). VIPS?: a vision-based page segmentation algorithm.

Cai, D., He, X., Li, Z., Ma, W.Y., Wen, J.R. (2004). Hierarchical clustering of WWW image search results using visual, textual and link information. In Proceedings of the 12th annual ACM international conference on multimedia (pp. 952–959).

Chakrabarti, D., Kumar, R., Punera, K. (2008). A graph-theoretic approach to web page segmentation. In Proceedings of the 17th international conference on World Wide Web (pp. 377–386).

Chang, C.-H., Kayed, M., Girgis, M. R., Shaalan, K.F. (2006). A survey of web information extraction systems. IEEE Transactions on Knowledge and Data Engineering, 18(10), 1411–1428.CrossRef

Chen, Z., Wenyin, L., Hu, C., Li, M., Zhang, H.J. (2001). Ifind: a web image search engine. In Proceedings of ACM SIGIR (p. 450).

Chen, Y., Ma, W. Y., Zhang, H.J. (2003). Detecting web page structure for adaptive viewing on small form factor devices. In Proceedings of the 12th international conference on World Wide Web (pp. 20–24).

Choi, Y., & Rasmussen, E.M. (2003). Searching for images: the analysis of users’ queries for image retrieval in American History. Journal of the American Society for Information Science and Technology, 54(6), 498–511.CrossRef

Chowdhury, G. G. (2003). Natural language processing. Annual Review of Information Science and Technology, 37(1), 51–89.CrossRef

Chung, E., & Yoon, J. (2010). An exploratory analysis on unsuccessful image searches. Proceedings of the American Society for Information Science and Technology, 47(1), 1–2.CrossRef

Coelho, T.A.S., Calado, P.P., Souza, L.V., Ribeiro-Neto, B. (2004). Using multiple evidence ranking. IEEE Transactions on Knowledge and Data Engineering, 16(4), 408–417.CrossRef

Crescenzi, V., & Mecca, G. (1998). Grammars have exceptions. Information Systems, 23(8), 539–565.CrossRef

Crescenzi, V., Mecca, G., Merialdo, P. (2001). Roadrunner: towards automatic data extraction from large web sites. In Proceedings of the 27th VLDB conference (pp. 109–118).

Datta, R., Joshi, D., Li, J., Wang, J.Z. (2008). Image retrieval: ideas, influences, and trends of the new age. ACM Computing Surveys, 40(2), 1–60.CrossRef

De Marneffe, M.C., Maccartney, B., Manning, C.D. (2006). Generating typed dependency parses from phrase structure parses. Proceedings of LREC, 6, 449–454.

Deschacht, K., & Moens, M.F. (2007). Text analysis for automatic image annotation. Annual Meeting-Association for Computational Linguistics, 45, 1000.

Deschacht, K., & Moens, M.F. (2008). Finding the best picture: cross-media retrieval of content. In Proceedings of the European conference on advances in information retrieval (pp. 539–546).

Dumais, S.T. (1991). Improving the retrieval of information from external sources. Behavior Research Methods, Instruments & Computers, 23(2), 229–236.CrossRef

Eakins, J. (2002). Towards intelligent image retrieval. Pattern Recognition, 35(1), 3–14.CrossRefMATH

Evans, S. (2009). Google launches ‘Similar Images’ search tool. Computer Business Review. Last updated date: 28 October 2009. Retrieved from http://www.cbronline.com/news/google\launches\similar\images\search\tool\281009.

Fauzi, F., & Belkhatir, M. (2010). A user study to investigate semantically relevant contextual information of WWW images. International Journal of Human Computer Studies, 68(5), 270–287.CrossRef

Feng, Y., & Lapata, M. (2008). Automatic image annotation using auxiliary text information. In Proceedings of the 46th annual meeting of the association for computational linguistics: human language technologies (pp. 272–280).

Feng, Y., & Lapata, M. (2010). Topic models for image annotation and text illustration. In Proceedings of the 2010 annual conference of the North American chapter of the association for computational linguistics (pp. 831–839).

Feng, H., Shi, R., Chua, T.-S. (2004). A bootstrapping framework for annotating and retrieving WWW images. In Proceedings of the 12th annual ACM international conference on multimedia (p. 960).

Frankel, C., Swain, M. J., Athitsos, V. (1996). Webseer?: an image search engine for the World Wide Web. World Wide Web Internet and Web Information Systems, 1–24.

Gao, B., Liu, T. Y., Qin, T., Zheng, X., Cheng, Q.S., Ma, W.Y. (2005). Web image clustering by consistent utilization of visual features and surrounding texts. In Proceedings of the 13th annual ACM international conference on multimedia (pp. 112–121).

Ghoshal, A., Ircing, P., Khudanpur, S. (2005). Hidden Markov models for automatic annotation and contentbased retrieval of images and video. In Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR ’05). (pp. 544–551) New York, NY: ACM. USA,. doi: 10.1145/1076034.1076127.

Gong, Z.U., Hou, L.H., Cheang, C.W. (2006). Web image indexing by using associated texts. Knowledge and Information Systems, 10(2), 243–264.CrossRef

Goodrum, A., & Spink, A. (2001). Image searching on the excite web search engine. Information Processing & Management, 37(2), 295–311.CrossRefMATH

Hammer, J., Garcia-Molina, H., Cho, J., Aranha, R., Crespo, A. (1997). Extracting semistructured information from the web. World Wide Web Internet and Web Information Systems, 1–8.

He, X., Cai, D., Wen, J.R., Ma, W.Y., Zhang, H.J. (2007). Clustering and searching WWW images using link and page layout analysis. ACM Transactions on Multimedia Computing, Communications, and Applications, 3(2), 10.CrossRef

Hearst, M.A. (2006). Clustering versus faceted categories for information exploration. Communications of the ACM, 49(4), 59–61.CrossRef

Hollink, L., Schreiber, A. T., Wielinga, B. J., Worring, M. (2004). Classification of user image descriptions. International Journal of Human-Computer Studies, 61(5), 601–626.CrossRef

Hong, J.L., Siew, E.-G., Egerton, S. (2010). Information extraction for search engines using fast heuristic techniques. Data & Knowledge Engineering, 69(2), 169–196.CrossRef

Hua, Z., Wang, X. J., Liu, Q., Lu, H. (2005). Semantic knowledge extraction and annotation for web images. In Proceedings of the 13th annual ACM international conference on multimedia (pp. 467–470).

Hughes, A., Wilkens, T., Wildemuth, B., Marchionini, G. (2003). Text or pictures? An eyetracking study of how people view digital video surrogates. In Proceedings of the international conference on image and video retrieval (pp. 271–280).

Inoue, M. (2004). On the need for annotation-based image retrieval. In Proceedings of the workshop on information retrieval in context (Irix), Sheffield, UK (pp. 44–46).

Inoue, M. (2009). Image retrieval: research and use in the information explosion. Progress in Informatics, 6, 3.CrossRef

Jaimes, A., & Chang, S.F. (2000). A conceptual framework for indexing visual information at multiple levels. IS&T/SPIE Internet Imaging, 3964, 2–15.CrossRef

Jin, Y., Khan, L., Wang, L., Awad, M. (2005). Image annotations by combining multiple evidence & wordnet. In Proceedings of the 13th annual ACM international conference on multimedia (pp. 706–715).

Jones, K.S. (1973). Index term weighting. Information Storage and Retrieval, 9, 619–633.CrossRef

Jörgensen, C., & Jörgensen, P. (2005). Image querying by image professionals. Journal of the American Society for Information Science and Technology, 56(12), 1346–1359.CrossRef

Joshi, P. M., & Liu, S. (2009). Web document text and images extraction using DOM analysis and natural language processing. In Proceedings of the 9th ACM symposium on document engineering (p. 218).

Kang, J., Yang, J., Choi, J. (2010). Repetition-based web page segmentation by detecting tag patterns for small-screen devices. IEEE Transactions on Consumer Electronics, 56(2), 980–986.CrossRef

Kao, H.-Y., Ho, J.-M., Chen, M.-S. (2005). WISDOM?: Web intra-page informative structure mining based on document object model. IEEE Transactions on Knowledge and Data Engineering, 17(5), 614–627.CrossRef

Katz, G., & Giesbrecht, E. (2006). Automatic identification of non-compositional multi-word expressions using latent semantic analysis. In Proceedings of the workshop on multiword expressions: identifying and exploiting underlying properties (pp. 12–19).

Kennedy, L.S., & Naaman, M. (2008). Generating diverse and representative image search results for landmarks. In Proceedings of the 17th international conference on World Wide Web (pp. 297–306).

Kherfi, M.L., Ziou, D., Bernardi, A. (2004). Image retrieval from the World Wide Web: issues, techniques, and systems. ACM Computing Surveys (CSUR), 36(1), 35–67.CrossRef

Kohlschutter, C., & Nejdl, W. (2008). A densitometric approach to web page segmentation. In Proceeding of the 17th ACM conference on information and knowledge management.

La Cascia, M., Sethi, S., Sclaroff, S. (1998). Combining textual and visual cues for content-based image retrieval on the World Wide Web. In Proceedings of IEEE workshop on content-based access of image and video libraries, 1998 (pp. 24–28).

Larson, M., Kofler, C., Hanjalic, A. (2011). Reading between the tags to predict real-world size-class for visually depicted objects in images. In Proceedings of ACM multimedia.

Leong, C.W., & Mihalcea, R. (2009). Explorations in automatic image annotation using textual features. In Proceedings of the third linguistic annotation workshop on - ACL-IJCNLP ’09 (pp. 56–59).

Leong, C.W., Mihalcea, R., Hassan, S. (2010). Text mining for automatic image tagging. In Proceedings of the 23rd international conference on computational linguistics (pp. 647–655).

Lew, M. S. (2000). Next-generation web searches for visual content. IEEE Computer, 33(11), 46–53.CrossRef

Li, J., Liu, T., Wang, W., Gao, W. (2006). A broadcast model for web image annotation. In Proceedings of the 7th pacific rim conference on multimedia.

Lin, D. (1999). Automatic identification of non-compositional phrases. In Proceedings of the 37th annual meeting of the association for computational linguistics on computational linguistics (pp. 317–324).

Liu, Y., Zhang, D., Lu, G., Ma, W.-Y. (2007). A survey of content-based image retrieval with high-level semantics. Pattern Recognition, 40(1), 262–282.CrossRefMATH

Liu, J., Li, M., Liu, Q., Lu, H., Ma, S. (2009). Image annotation via graph learning. Pattern Recognition, 42(2), 218–228.CrossRefMATH

Liu, W., Meng, X., Meng, W. (2010). Vide: a vision-based approach for deep web data extraction. IEEE Transactions on Knowledge and Data Engineering, 22(3), 447–460.CrossRef

Lu, Y., Hu, C., Zhu, X., Zhang, H.J., Yang, Q. (2000). A unified framework for semantics and feature based relevance feedback in image retrieval systems. In Proceedings of the 8th annual ACM international conference on multimedia (pp. 31–37).

Luo, J., Yu, J., Joshi, D., Hao, W. (2008). Event recognition: viewing the world with a third eye. In Proceedings of the 16th ACM international conference on multimedia (pp. 1071–1080).

Meghini, C., Sebastiani, F., Straccia, U. (2001). A model for multimedia information retrieval. Journal of the ACM (JACM), 48(5).

Mukherjea, S., & Hirata, K. (1999). Amore: a World Wide Web image retrieval engine. World Wide Web, 2, 115–132.CrossRef

Olivares, X., Ciaramita, M., Van Zwol, R. (2008). Boosting image retrieval through aggregating search results based on visual annotations. In Proceedings of ACM Multimedia.

Ortega-Binderberger, M., Mehrotra, S., Chakrabarti, K., Porkaew, K. (2000). Webmars: a multimedia search engine for full document retrieval and cross media browsing. In Proceedings of the sixth international workshop on advances in multimedia information systems (pp. 72–81).

Panofsky, E. (1962). Studies in iconology. New York: Harper & Row.

Pedersen, T., & Kolhatkar, V. (2009). Wordnet:: Senserelate:: Allwords: a broad coverage word sense tagger that maximizes semantic relatedness. In Proceedings of human language technologies: the 2009 annual conference of the North American chapter of the association for computational linguistics, companion volume: demonstration session (pp. 17–20).

Pnueli, A., Bergman, R., Schein, S., Barkol, O. (2009). Web page layout via visual segmentation. Retrieved from http://www.davidaitken.com/hplabs.pdf.

Pu, H.-T. (2008). An analysis of failed queries for web image retrieval. Journal of Information Science, 34(3), 275–289.CrossRef

Quack, T., Leibe, B., Van Gool, L. (2008). World-scale mining of objects and events from community photo collections. In Proceedings of the international conference on content-based image and video retrieval (pp. 47–56).

Rege, M., Dong, M., Hua, J. (2008). Graph theoretical framework for simultaneously integrating visual and textual features for efficient web image clustering. In Proceedings of the 17th international conference on World Wide Web (p. 317).

Rorissa, A. (2008). User-generated descriptions of individual images versus labels of groups of images: a comparison using basic level theory. Information Processing & Management, 44(5), 1741–1753.CrossRef

Rorissa, A. (2010). A comparative study of Flickr tags and index terms in a general image collection. Journal of the American Society for Information Science and Technology, 61(11), 2230–2242.CrossRef

Sahuguet, A., & Azavant, F. (2001). Building intelligent web applications using lightweight wrappers. Data & Knowledge Engineering, 36(3), 283–316.CrossRefMATH

Sclaroff, S., Cascia, M.L., Sethi, S. (1999). Unifying textual and visual cues for content-based image retrieval on the World Wide Web. Computer Vision and Image, 75, 86–98.CrossRef

Shatford, S. (1986). Analyzing the subject of a picture: a theoretical approach. Cataloging & Classification Quarterly, 6(3), 39–62.CrossRef

Shen, H.T., Ooi, B.C., Tan, K.-L. (2000). Giving meanings to WWW images. In Proceedings of the 8th annual ACM international conference on multimedia (pp. 39–s47).

Simon, I., Snavely, N., Seitz, S.M. (2007). Scene summarization for online image collections. In Proceedings of IEEE 11th international conference on computer vision (pp. 1–8).

Slawski, B. (2008). Microsoft granted patent on vision-based document segmentation (VIPS). Internet Marketing and Search Engine Optimization (SEO) Services, Consulting, and Research. Retrieved from http://www.seobythesea.com/2008/09/microsoft-granted-patent-on-vision-based-document-segmentation-vips/.

Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., Jain, R. (2000). Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12), 1349–1380.CrossRef

Smith, J.R., & Chang, S.F. (1997). An image and video search engine for the world-wide web. In Symposium on electronic imaging: science and technology-storage & retrieval for image and video databases V.

Spengler, A., & Gallinari, P. (2009). Learning to extract content from news web pages. In Proceedings of the 2009 international conference on advanced information networking and applications workshop (pp. 709–714).

Tang, J., Yan, S., Hong, R., Qi, G. (2009). Inferring semantic concepts from community-contributed images and noisy tags. In Proceedings of the 17th ACM international conference on multimedia (p. 223).

Tian, G., Guan, G., Wang, Z., Feng, D. (2012). What is Happening: Annotating Images with Verbs. In: Proceedings of the 20th ACM International Conference on Multimedia-MULTIMEDIA 2012 (pp. 1077–1080).

Toyama, K., Logan, R., Roseway, A. (2003). Geographic location tags on digital images. In Proceedings of the 11th ACM international conference on multimedia-MULTIMEDIA 2003 (pp. 156–166).

Tryfou, G., & Tsapatsoulis, N. (2012). Extraction of Web Image Information: Semantic or Visual Cues? In Proceedings of the 8th Artificial Intelligence Applications and Innovations Conference-AIAI 2012, (pp. 368–373).

Wang, J., & Lochovsky, F.H. (2003). Data extraction and label assignment for web databases. In Proceedings of the 12th international conference on World Wide Web (pp. 187–196).

Wang X.-J., Ma W.-Y., Xue G.-R., Li X. (2004). Multi-model similarity propagation and its application for web image retrieval. In Proceedings of the 12th annual ACM international conference on multimedia. New York, doi: 10.1145/1027527.1027746.

Wang, X. J., Ma, W. Y., Zhang, L., Li, X. (2005). Iteratively clustering web images based on link and attribute reinforcements. In Proceedings of the ACM international conference on multimedia (pp. 122–131).

Wang, C., Zhang, L., Zhang, H.-J. (2008). Learning to reduce the semantic gap in web image retrieval and annotation. In Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval - SIGIR 2008 (p. 355).

Westerveld, T. (2000). Image retrieval: content versus context. In Content-based multimedia information access, RIAO.

Yang, K. et al. (2011). Tag tagging: towards more descriptive keywords of image content. IEEE Transactions on Multimedia, 13(4), 662–673.CrossRef

Yee, K. P., Swearingen, K., Li, K., Hearst, M. (2003). Faceted metadata for image search and browsing. In Proceedings of SIGCHI (pp. 401–408).

Zhai, Y., & Liu, B. (2005). Web data extraction based on partial tree alignment. In Proceedings of the 14th international conference on World Wide Web (pp. 76–85).

Zheng, Y.T., Zhao, M., Song, Y., Adam, H., Buddemeier, U., Bissacco, A., Brucher, F., et al. (2009). Tour the world: building a web-scale landmark recognition engine. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 1085–1092).

Titel: Image understanding and the web: a state-of-the-art review
verfasst von: Fariza Fauzi
Mohammed Belkhatir
Publikationsdatum: 01.10.2014
Verlag: Springer US
Erschienen in: Journal of Intelligent Information Systems / Ausgabe 2/2014
Print ISSN: 0925-9902
Elektronische ISSN: 1573-7675
DOI: https://doi.org/10.1007/s10844-014-0323-6

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 2/2014

Learning from syntax generalizations for automatic semantic annotation

Improving process models by mining mappings of low-level events to high-level activities

DART+: Direction-aware bichromatic reverse k nearest neighbor query processing in spatial databases

Time-slide window join over data streams

A rule-based expert system for earthquake prediction

Consistent feature selection and its application to face recognition