skip to main content
research-article

Short Text Analysis Based on Dual Semantic Extension and Deep Hashing in Microblog

Authors Info & Claims
Published:26 August 2019Publication History
Skip Abstract Section

Abstract

Short text analysis is a challenging task as far as the sparsity and limitation of semantics. The semantic extension approach learns the meaning of a short text by introducing external knowledge. However, for the randomness of short text descriptions in microblogs, traditional extension methods cannot accurately mine the semantics suitable for the microblog theme. Therefore, we use the prominent and refined hashtag information in microblogs as well as complex social relationships to provide implicit guidance for semantic extension of short text. Specifically, we design a deep hash model based on social and conceptual semantic extension, which consists of dual semantic extension and deep hashing representation. In the extension method, the short text is first conceptualized to achieve the construction of hashtag graph under conceptual space. Then, the associated hashtags are generated by correlation calculation based on the integration of social relationships and concepts to extend the short text. In the deep hash model, we use the semantic hashing model to encode the abundant semantic features and form a compact and meaningful binary encoding. Finally, extensive experiments demonstrate that our method can learn and represent the short texts well by using more meaningful semantic signal. It can effectively enhance and guide the semantic analysis and understanding of short text in microblogs.

References

  1. Mohammed El Amine Abderrahim, Saïd Benameur, and Mohammed Alaeddine Abderrahim. 2013. The number of terms and documents for pseudo-relevant feedback for ad hoc information retrieval. Int. J. Comput. Sci. Iss. 10, 1 (2013), 661.Google ScholarGoogle Scholar
  2. Khaled Albishre, Yuefeng Li, and Yue Xu. 2017. Effective pseudo-relevance for microblog retrieval. In Proceedings of the Australasian Computer Science Week Multiconference. ACM, 51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Saeid Balaneshin-kordan and Alexander Kotov. 2016. Sequential query expansion using concept graph. In Proceedings of the 25th ACM International Conference on Information and Knowledge Management. ACM, 155--164. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Piyush Bansal, Somay Jain, and Vasudeva Varma. 2015. Towards semantic retrieval of hashtags in microblogs. In Proceedings of the 24th International Conference on World Wide Web. ACM, 7--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Guihong Cao, Jian-Yun Nie, Jianfeng Gao, and Stephen Robertson. 2008. Selecting good expansion terms for pseudo-relevance feedback. In Proceedings of the 31st International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 243--250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ciro Cattuto, Marco Quaggiotto, André Panisson, and Alex Averbuch. 2013. Time-varying social networks in a graph database: A Neo4j use case. In Proceedings of the 1st International Workshop on Graph Data Management Experiences and Systems. ACM, 11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 3 (2011), 27:1--27:27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Wanqiu Cui, Junping Du, Dawei Wang, Feifei Kou, Meiyu Liang, Zhe Xue, and Nan Zhou. 2018. Extended search method based on a semantic hashtag graph combining social and conceptual information. In World Wide Web (Special Issue on Web and Big Data) (2018), 1--22.Google ScholarGoogle Scholar
  9. Ofer Egozi, Shaul Markovitch, and Evgeniy Gabrilovich. 2011. Concept-based information retrieval using explicit semantic analysis. ACM Trans. Inform. Syst. 29, 2 (2011), 8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Nicholas E. Evangelopoulos. 2013. Latent semantic analysis. Wiley Interdisc. Rev.: Cog. Sci. 4, 6 (2013), 683--692.Google ScholarGoogle ScholarCross RefCross Ref
  11. Fangxiang Feng, Xiaojie Wang, and Ruifan Li. 2014. Cross-modal retrieval with correspondence autoencoder. In Proceedings of the 22nd ACM International Conference on Multimedia. ACM, 7--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Evgeniy Gabrilovich and Shaul Markovitch. 2016. Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In Proceedings of the International Joint Conference on Artificial Intelligence, Vol. 7. 1606--1611. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Lianli Gao, Yuan Wang, Dongsheng Li, Junming Shao, and Jingkuan Song. 2017. Real-time social media retrieval with spatial, temporal and social constraints. Neurocomputing 253 (2017), 77--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Yue Gao, Hanwang Zhang, Xibin Zhao, and Shuicheng Yan. 2017. Event classification in microblogs via social tracking. ACM Trans. Intell. Syst. Technol. 8, 3 (2017), 35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Dan Guo and Pengfei Gao. 2016. Complex-query web image search with concept-based relevance estimation. World Wide Web 19, 2 (2016), 247--264. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Ki-Joo Hong and Han-Joon Kim. 2016. A semantic search technique with Wikipedia-based text representation model. In Proceedings of the International Conference on Big Data and Smart Computing (BigComp’16). IEEE, 177--182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Wen Hua, Zhongyuan Wang, Haixun Wang, Kai Zheng, and Xiaofang Zhou. 2017. Understand short texts by harvesting and analyzing semantic knowledge. IEEE Trans. Knowl. Data Eng. 29, 3 (2017), 499--512. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia. 675--678. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Yuncheng Jiang, Wen Bai, Xiaopei Zhang, and Jiaojiao Hu. 2017. Wikipedia-based information content and semantic similarity computation. Inform. Proc. Manag. 53, 1 (2017), 248--265. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Zhiwei Jin, Juan Cao, Han Guo, Yongdong Zhang, and Jiebo Luo. 2017. Multimodal fusion with recurrent neural networks for rumor detection on microblogs. In Proceedings of the ACM Conference on Multimedia. 795--816. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Alexander Kotov and ChengXiang Zhai. 2012. Tapping into knowledge base for concept feedback: Leveraging conceptnet to improve search results for difficult queries. ACM International Conference on Web Search 8 Data Mining. ACM, 403--412. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Feifei Kou, Junping Du, Zijian Lin, Meiyu Liang, Haisheng Li, Lei Shi, and Congxian Yang. 2018. A semantic modeling method for social network short text based on spatial and temporal characteristics. J. Comput. Sci. 28 (2018), 281--293.Google ScholarGoogle ScholarCross RefCross Ref
  23. Fei-Fei Kou, Jun-Ping Du, Cong-Xian Yang, Yan-Song Shi, Wan-Qiu Cui, Mei-Yu Liang, and Yue Geng. 2018. Hashtag recommendation based on multi-features of microblogs. J. Comput. Sci. Technol. 33, 4 (2018), 711--726.Google ScholarGoogle ScholarCross RefCross Ref
  24. Chao Li, Cheng Deng, Ning Li, Wei Liu, Xinbo Gao, and Dacheng Tao. 2018. Self-supervised adversarial hashing networks for cross-modal retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4242--4251.Google ScholarGoogle ScholarCross RefCross Ref
  25. Yang Li, Jing Jiang, Ting Liu, Minghui Qiu, and Xiaofei Sun. 2017. Personalized microtopic recommendation on microblogs. ACM Trans. Intell. Syst. Technol. 8, 6 (2017), 77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Haomiao Liu, Ruiping Wang, Shiguang Shan, and Xilin Chen. 2016. Deep supervised hashing for fast image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2064--2072.Google ScholarGoogle ScholarCross RefCross Ref
  27. Zhunchen Luo, Yang Yu, Miles Osborne, and Ting Wang. 2015. Structuring Tweets for improving Twitter search. J. Assoc. Inform. Sci. Technol. 66, 12 (2015), 2522--2539. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In Proceedings of the 1st International Conference on Learning Representations. 1--12.Google ScholarGoogle Scholar
  29. Yuxin Peng, Xin Huang, and Jinwei Qi. 2016. Cross-media shared representation by hierarchical learning with multiple deep networks. In Proceedings of the International Joint Conference on Artificial Intelligence. 3846--3853. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. José R. Pérez-Agüera, Javier Arroyo, Jane Greenberg, Joaquin Perez Iglesias, and Victor Fresno. 2010. Using BM25F for semantic search. In Proceedings of the 3rd International Semantic Search Workshop. ACM, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Ruslan Salakhutdinov and Geoffrey Hinton. 2009. Semantic hashing. Int. J. Approx. Reason. 50, 7 (2009), 969--978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Benno Stein. 2007. Principles of hash-based text retrieval. In Proceedings of the 30th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 527--534. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. 2008. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning. ACM, 1096--1103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Fang Wang, Zhongyuan Wang, Zhoujun Li, and Ji-Rong Wen. 2014. Concept-based short text classification and ranking. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. ACM, 1069--1078. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Pengwei Wang, Lei Ji, Jun Yan, Dejing Dou, Nisansa De Silva, Yong Zhang, and Lianwen Jin. 2018. Concept and attention-based CNN for question retrieval in multi-view learning. ACM Trans. Intell. Syst. Technol. 9, 4 (2018), 41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Yashen Wang, Heyan Huang, and Chong Feng. 2017. Query expansion based on a feedback concept model for microblog retrieval. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 559--568. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Yuan Wang, Jie Liu, Yalou Huang, and Xia Feng. 2016. Using hashtag graph-based topic model to connect semantically-related words without co-occurrence in microblogs. IEEE Trans. Knowl. Data Eng. 28, 7 (2016), 1919--1933.Google ScholarGoogle ScholarCross RefCross Ref
  38. Zhongyuan Wang, Kejun Zhao, Haixun Wang, Xiaofeng Meng, and Ji-Rong Wen. 2015. Query understanding through knowledge-based conceptualization. In Proceedings of the International Joint Conference on Artificial Intelligence. 3264--3270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Chen Xing, Wei Wu, Yu Wu, Jie Liu, Yalou Huang, Ming Zhou, and Wei-Ying Ma. 2017. Topic aware neural response generation. In Proceedings of the 31th AAAI Conference on Artificial Intelligence. 3351--3357. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Zheng Ye, Jimmy Xiangji Huang, and Hongfei Lin. 2014. Finding a good query-related topic for boosting pseudo-relevance feedback. J. Amer. Soc. Inform. Sci. Technol. 62, 4 (2014), 748--760. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Zheng Yu, Haixun Wang, Xuemin Lin, and Min Wang. 2016. Understanding short texts through semantic enrichment and hashing. IEEE Trans. Knowl. Data Eng. 28, 2 (2016), 566--579. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Weinan Zhang, Dingquan Wang, Gui-Rong Xue, and Hongyuan Zha. 2012. Advertising keywords recommendation for short-text web pages using Wikipedia. ACM Trans. Intell. Syst. Technol. 3, 2 (2012), 36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Feng Zhao, Yajun Zhu, Hai Jin, and Laurence T. Yang. 2016. A personalized hashtag recommendation approach using LDA-based topic model in microblog environment. Future Gen. Comput. Syst. 65 (2016), 196--206. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Guoqiang Zhong, Hui Xu, Pan Yang, Sijiang Wang, and Junyu Dong. 2016. Deep hashing learning networks. In Proceedings of the International Joint Conference on Neural Networks. 2236--2243.Google ScholarGoogle ScholarCross RefCross Ref
  45. Han Zhu, Mingsheng Long, Jianmin Wang, and Yue Cao. 2016. Deep hashing network for efficient similarity retrieval. In Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2415--2421. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Siying Zhu, Bong Nam Kang, and Daijin Kim. 2017. A deep neural network based hashing for efficient image retrieval. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics. IEEE, 002483--002488.Google ScholarGoogle Scholar

Index Terms

  1. Short Text Analysis Based on Dual Semantic Extension and Deep Hashing in Microblog

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Intelligent Systems and Technology
            ACM Transactions on Intelligent Systems and Technology  Volume 10, Issue 4
            Survey Papers and Regular Papers
            July 2019
            327 pages
            ISSN:2157-6904
            EISSN:2157-6912
            DOI:10.1145/3344873
            Issue’s Table of Contents

            Copyright © 2019 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 26 August 2019
            • Accepted: 1 April 2019
            • Revised: 1 February 2019
            • Received: 1 October 2018
            Published in tist Volume 10, Issue 4

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format