Effective indexing of social
media
data is key to searching for information on the social Web. However, the characteristics of social media data
make it a challenging task. The large-scale and streaming nature is the first challenge, which requires the indexing algorithm
to be able to efficiently update the indexing structure
when receiving data streams. The second challenge is utilizing the rich meta-information of social media data for a better evaluation of the similarity
between data objects and for a more semantically meaningful indexing of the data, which may allow the users to
search for them using the different types of queries they like. Existing approaches
based on either matrix
operations or hashing
usually cannot perform an online update of the indexing base to encode upcoming data streams, and they have difficulty handling noisy data. This chapter presents a study on using the
Online
Multimodal
Co-indexing
Adaptive Resonance
Theory
(OMC-ART)
for an effective and efficient indexing and retrieval
of social media data. More specifically, two types of social media data are considered: (1) the weakly supervised image data, which is associated with captions, tags
and descriptions given by the users; and (2) the e-commerce
product data, which includes product images, titles, descriptions and user comments. These scenarios make this study related to multimodal web image indexing and retrieval. Compared with existing studies, OMC-ART
has several distinct characteristics. First, OMC-ART is able to perform online learning
of sequential data. Second, instead of a plain indexing structure, OMC-ART builds a two-layer one, in which the first layer co-indexes the images by the key visual and textual features based on the generalized
distributions
of the clusters
they belong to; while in the second layer, the data objects are co-indexed by their own feature
distributions. Third, OMC-ART enables flexible multimodal searching by using either visual features, keywords, or a combination of both. Fourth, OMC-ART employs a ranking
algorithm
that does not need to go through the whole indexing system
when only a limited number of images need to be retrieved. Experiments on two publicly accessible image datasets
and a real-world e-commerce dataset demonstrate the efficiency and effectiveness of OMC-ART. The content of this chapter is summarized and extended from [
13] (
https://doi.org/10.1145/2671188.2749362), and the Python
codes of
OMC-ART
with examples on building an e-commerce
product search engine are available
at
https://github.com/Lei-Meng/OMC-ART-Build-a-toy-online-search-engine-.