Abstract
Short text analysis is a challenging task as far as the sparsity and limitation of semantics. The semantic extension approach learns the meaning of a short text by introducing external knowledge. However, for the randomness of short text descriptions in microblogs, traditional extension methods cannot accurately mine the semantics suitable for the microblog theme. Therefore, we use the prominent and refined hashtag information in microblogs as well as complex social relationships to provide implicit guidance for semantic extension of short text. Specifically, we design a deep hash model based on social and conceptual semantic extension, which consists of dual semantic extension and deep hashing representation. In the extension method, the short text is first conceptualized to achieve the construction of hashtag graph under conceptual space. Then, the associated hashtags are generated by correlation calculation based on the integration of social relationships and concepts to extend the short text. In the deep hash model, we use the semantic hashing model to encode the abundant semantic features and form a compact and meaningful binary encoding. Finally, extensive experiments demonstrate that our method can learn and represent the short texts well by using more meaningful semantic signal. It can effectively enhance and guide the semantic analysis and understanding of short text in microblogs.
- Mohammed El Amine Abderrahim, Saïd Benameur, and Mohammed Alaeddine Abderrahim. 2013. The number of terms and documents for pseudo-relevant feedback for ad hoc information retrieval. Int. J. Comput. Sci. Iss. 10, 1 (2013), 661.Google Scholar
- Khaled Albishre, Yuefeng Li, and Yue Xu. 2017. Effective pseudo-relevance for microblog retrieval. In Proceedings of the Australasian Computer Science Week Multiconference. ACM, 51. Google ScholarDigital Library
- Saeid Balaneshin-kordan and Alexander Kotov. 2016. Sequential query expansion using concept graph. In Proceedings of the 25th ACM International Conference on Information and Knowledge Management. ACM, 155--164. Google ScholarDigital Library
- Piyush Bansal, Somay Jain, and Vasudeva Varma. 2015. Towards semantic retrieval of hashtags in microblogs. In Proceedings of the 24th International Conference on World Wide Web. ACM, 7--8. Google ScholarDigital Library
- Guihong Cao, Jian-Yun Nie, Jianfeng Gao, and Stephen Robertson. 2008. Selecting good expansion terms for pseudo-relevance feedback. In Proceedings of the 31st International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 243--250. Google ScholarDigital Library
- Ciro Cattuto, Marco Quaggiotto, André Panisson, and Alex Averbuch. 2013. Time-varying social networks in a graph database: A Neo4j use case. In Proceedings of the 1st International Workshop on Graph Data Management Experiences and Systems. ACM, 11. Google ScholarDigital Library
- Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 3 (2011), 27:1--27:27. Google ScholarDigital Library
- Wanqiu Cui, Junping Du, Dawei Wang, Feifei Kou, Meiyu Liang, Zhe Xue, and Nan Zhou. 2018. Extended search method based on a semantic hashtag graph combining social and conceptual information. In World Wide Web (Special Issue on Web and Big Data) (2018), 1--22.Google Scholar
- Ofer Egozi, Shaul Markovitch, and Evgeniy Gabrilovich. 2011. Concept-based information retrieval using explicit semantic analysis. ACM Trans. Inform. Syst. 29, 2 (2011), 8. Google ScholarDigital Library
- Nicholas E. Evangelopoulos. 2013. Latent semantic analysis. Wiley Interdisc. Rev.: Cog. Sci. 4, 6 (2013), 683--692.Google ScholarCross Ref
- Fangxiang Feng, Xiaojie Wang, and Ruifan Li. 2014. Cross-modal retrieval with correspondence autoencoder. In Proceedings of the 22nd ACM International Conference on Multimedia. ACM, 7--16. Google ScholarDigital Library
- Evgeniy Gabrilovich and Shaul Markovitch. 2016. Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In Proceedings of the International Joint Conference on Artificial Intelligence, Vol. 7. 1606--1611. Google ScholarDigital Library
- Lianli Gao, Yuan Wang, Dongsheng Li, Junming Shao, and Jingkuan Song. 2017. Real-time social media retrieval with spatial, temporal and social constraints. Neurocomputing 253 (2017), 77--88. Google ScholarDigital Library
- Yue Gao, Hanwang Zhang, Xibin Zhao, and Shuicheng Yan. 2017. Event classification in microblogs via social tracking. ACM Trans. Intell. Syst. Technol. 8, 3 (2017), 35. Google ScholarDigital Library
- Dan Guo and Pengfei Gao. 2016. Complex-query web image search with concept-based relevance estimation. World Wide Web 19, 2 (2016), 247--264. Google ScholarDigital Library
- Ki-Joo Hong and Han-Joon Kim. 2016. A semantic search technique with Wikipedia-based text representation model. In Proceedings of the International Conference on Big Data and Smart Computing (BigComp’16). IEEE, 177--182. Google ScholarDigital Library
- Wen Hua, Zhongyuan Wang, Haixun Wang, Kai Zheng, and Xiaofang Zhou. 2017. Understand short texts by harvesting and analyzing semantic knowledge. IEEE Trans. Knowl. Data Eng. 29, 3 (2017), 499--512. Google ScholarDigital Library
- Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia. 675--678. Google ScholarDigital Library
- Yuncheng Jiang, Wen Bai, Xiaopei Zhang, and Jiaojiao Hu. 2017. Wikipedia-based information content and semantic similarity computation. Inform. Proc. Manag. 53, 1 (2017), 248--265. Google ScholarDigital Library
- Zhiwei Jin, Juan Cao, Han Guo, Yongdong Zhang, and Jiebo Luo. 2017. Multimodal fusion with recurrent neural networks for rumor detection on microblogs. In Proceedings of the ACM Conference on Multimedia. 795--816. Google ScholarDigital Library
- Alexander Kotov and ChengXiang Zhai. 2012. Tapping into knowledge base for concept feedback: Leveraging conceptnet to improve search results for difficult queries. ACM International Conference on Web Search 8 Data Mining. ACM, 403--412. Google ScholarDigital Library
- Feifei Kou, Junping Du, Zijian Lin, Meiyu Liang, Haisheng Li, Lei Shi, and Congxian Yang. 2018. A semantic modeling method for social network short text based on spatial and temporal characteristics. J. Comput. Sci. 28 (2018), 281--293.Google ScholarCross Ref
- Fei-Fei Kou, Jun-Ping Du, Cong-Xian Yang, Yan-Song Shi, Wan-Qiu Cui, Mei-Yu Liang, and Yue Geng. 2018. Hashtag recommendation based on multi-features of microblogs. J. Comput. Sci. Technol. 33, 4 (2018), 711--726.Google ScholarCross Ref
- Chao Li, Cheng Deng, Ning Li, Wei Liu, Xinbo Gao, and Dacheng Tao. 2018. Self-supervised adversarial hashing networks for cross-modal retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4242--4251.Google ScholarCross Ref
- Yang Li, Jing Jiang, Ting Liu, Minghui Qiu, and Xiaofei Sun. 2017. Personalized microtopic recommendation on microblogs. ACM Trans. Intell. Syst. Technol. 8, 6 (2017), 77. Google ScholarDigital Library
- Haomiao Liu, Ruiping Wang, Shiguang Shan, and Xilin Chen. 2016. Deep supervised hashing for fast image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2064--2072.Google ScholarCross Ref
- Zhunchen Luo, Yang Yu, Miles Osborne, and Ting Wang. 2015. Structuring Tweets for improving Twitter search. J. Assoc. Inform. Sci. Technol. 66, 12 (2015), 2522--2539. Google ScholarDigital Library
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In Proceedings of the 1st International Conference on Learning Representations. 1--12.Google Scholar
- Yuxin Peng, Xin Huang, and Jinwei Qi. 2016. Cross-media shared representation by hierarchical learning with multiple deep networks. In Proceedings of the International Joint Conference on Artificial Intelligence. 3846--3853. Google ScholarDigital Library
- José R. Pérez-Agüera, Javier Arroyo, Jane Greenberg, Joaquin Perez Iglesias, and Victor Fresno. 2010. Using BM25F for semantic search. In Proceedings of the 3rd International Semantic Search Workshop. ACM, 2. Google ScholarDigital Library
- Ruslan Salakhutdinov and Geoffrey Hinton. 2009. Semantic hashing. Int. J. Approx. Reason. 50, 7 (2009), 969--978. Google ScholarDigital Library
- Benno Stein. 2007. Principles of hash-based text retrieval. In Proceedings of the 30th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 527--534. Google ScholarDigital Library
- Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. 2008. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning. ACM, 1096--1103. Google ScholarDigital Library
- Fang Wang, Zhongyuan Wang, Zhoujun Li, and Ji-Rong Wen. 2014. Concept-based short text classification and ranking. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. ACM, 1069--1078. Google ScholarDigital Library
- Pengwei Wang, Lei Ji, Jun Yan, Dejing Dou, Nisansa De Silva, Yong Zhang, and Lianwen Jin. 2018. Concept and attention-based CNN for question retrieval in multi-view learning. ACM Trans. Intell. Syst. Technol. 9, 4 (2018), 41. Google ScholarDigital Library
- Yashen Wang, Heyan Huang, and Chong Feng. 2017. Query expansion based on a feedback concept model for microblog retrieval. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 559--568. Google ScholarDigital Library
- Yuan Wang, Jie Liu, Yalou Huang, and Xia Feng. 2016. Using hashtag graph-based topic model to connect semantically-related words without co-occurrence in microblogs. IEEE Trans. Knowl. Data Eng. 28, 7 (2016), 1919--1933.Google ScholarCross Ref
- Zhongyuan Wang, Kejun Zhao, Haixun Wang, Xiaofeng Meng, and Ji-Rong Wen. 2015. Query understanding through knowledge-based conceptualization. In Proceedings of the International Joint Conference on Artificial Intelligence. 3264--3270. Google ScholarDigital Library
- Chen Xing, Wei Wu, Yu Wu, Jie Liu, Yalou Huang, Ming Zhou, and Wei-Ying Ma. 2017. Topic aware neural response generation. In Proceedings of the 31th AAAI Conference on Artificial Intelligence. 3351--3357. Google ScholarDigital Library
- Zheng Ye, Jimmy Xiangji Huang, and Hongfei Lin. 2014. Finding a good query-related topic for boosting pseudo-relevance feedback. J. Amer. Soc. Inform. Sci. Technol. 62, 4 (2014), 748--760. Google ScholarDigital Library
- Zheng Yu, Haixun Wang, Xuemin Lin, and Min Wang. 2016. Understanding short texts through semantic enrichment and hashing. IEEE Trans. Knowl. Data Eng. 28, 2 (2016), 566--579. Google ScholarDigital Library
- Weinan Zhang, Dingquan Wang, Gui-Rong Xue, and Hongyuan Zha. 2012. Advertising keywords recommendation for short-text web pages using Wikipedia. ACM Trans. Intell. Syst. Technol. 3, 2 (2012), 36. Google ScholarDigital Library
- Feng Zhao, Yajun Zhu, Hai Jin, and Laurence T. Yang. 2016. A personalized hashtag recommendation approach using LDA-based topic model in microblog environment. Future Gen. Comput. Syst. 65 (2016), 196--206. Google ScholarDigital Library
- Guoqiang Zhong, Hui Xu, Pan Yang, Sijiang Wang, and Junyu Dong. 2016. Deep hashing learning networks. In Proceedings of the International Joint Conference on Neural Networks. 2236--2243.Google ScholarCross Ref
- Han Zhu, Mingsheng Long, Jianmin Wang, and Yue Cao. 2016. Deep hashing network for efficient similarity retrieval. In Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2415--2421. Google ScholarDigital Library
- Siying Zhu, Bong Nam Kang, and Daijin Kim. 2017. A deep neural network based hashing for efficient image retrieval. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics. IEEE, 002483--002488.Google Scholar
Index Terms
- Short Text Analysis Based on Dual Semantic Extension and Deep Hashing in Microblog
Recommendations
Semantic Extension of Query for the Linked Data
With the advent of Big Data Era, users prefer to get knowledge rather than pages from Web. Linked Data, a new form of knowledge representation and publishing described by RDF, can provide a more precise and comprehensible semantic structure to satisfy ...
Building associated semantic representation model for the ultra-short microblog text jumping in big data
In the massive microblog texts, the ultra-short microblog text is difficult to be independently understood because of its special characteristics such as data sparseness, content fragmentation and so on. To solve this problem, this paper presents an ...
The Semantic Web needs more cognition
One of the key deficiencies of the Semantic Web is its lack of cognitive plausibility. We argue that by accounting for people's reasoning mechanisms and cognitive representations, the usefulness of information coming from the Semantic Web will be ...
Comments