Due to the problem of semantic gap, i.e. the visual content of an image may not represent its semantics well, existing efforts on web image organization usually transform this task to clustering
the surrounding text. However, because the surrounding text is usually short and the words
therein usually appear only once, existing text
clustering
algorithms
can hardly use the statistical information
for image representation
and may achieve downgraded performance
with higher computational cost caused by learning from
noisy tags
. This chapter presents using the
Probabilistic ART
with user
preference
architecture
, as introduced in Sects.
3.5 and
3.4, for personalized
web image organization. This fused algorithm
is named
Probabilistic Fusion
ART
(PF-ART), which groups
images of similar semantics together and simultaneously mines the key tags
/topics of individual clusters. Moreover, it performs
semi-supervised learning
using the user-provided taggings
for images to give users direct control of the generated clusters. An agglomerative merging strategy is further used to organize the clusters into a hierarchy
, which is of a multi-branch tree
structure
rather than a binary tree
generated by traditional hierarchical
clustering
algorithms. The entire two-step algorithm is called
Personalized Hierarchical
Theme-based Clustering (PHTC)
, for
tag-based web image organization. Two large-scale real-world web image collections, namely the NUS-WIDE
and the Flickr datasets
, are used to evaluate
PHTC
and compare it with existing algorithms in terms of clustering
performance
and time cost. The content of this chapter is summarized and extended from the prior
study [
17] (©2012 IEEE. Reprinted, with permission, from [
17]).