research-article

Short Text Analysis Based on Dual Semantic Extension and Deep Hashing in Microblog

Authors:
Wanqiu Cui

Beijing University of Posts and Telecommunications, Beijing, China

Beijing University of Posts and Telecommunications, Beijing, China

0000-0003-2927-0787
View Profile

,
Junping Du

Beijing University of Posts and Telecommunications, Beijing, China

Beijing University of Posts and Telecommunications, Beijing, China

0000-0001-8590-3767
View Profile

,
Dawei Wang

Renmin University of China, Beijing, China

Renmin University of China, Beijing, China
View Profile

,
Xunpu Yuan

Beijing University of Posts and Telecommunications, Beijing, China

Beijing University of Posts and Telecommunications, Beijing, China
View Profile

,
Feifei Kou

Beijing University of Posts and Telecommunications, Beijing, China

Beijing University of Posts and Telecommunications, Beijing, China
View Profile

,
Liyan Zhou

Beijing University of Posts and Telecommunications, Beijing, China

Beijing University of Posts and Telecommunications, Beijing, China
View Profile

,
Nan Zhou

Beijing University of Posts and Telecommunications, Beijing, China

Beijing University of Posts and Telecommunications, Beijing, China
View Profile

ACM Transactions on Intelligent Systems and Technology Volume 10 Issue 4Article No.: 38pp 1–24https://doi.org/10.1145/3326166

Published:26 August 2019Publication History

ACM Transactions on Intelligent Systems and Technology

Abstract

Short text analysis is a challenging task as far as the sparsity and limitation of semantics. The semantic extension approach learns the meaning of a short text by introducing external knowledge. However, for the randomness of short text descriptions in microblogs, traditional extension methods cannot accurately mine the semantics suitable for the microblog theme. Therefore, we use the prominent and refined hashtag information in microblogs as well as complex social relationships to provide implicit guidance for semantic extension of short text. Specifically, we design a deep hash model based on social and conceptual semantic extension, which consists of dual semantic extension and deep hashing representation. In the extension method, the short text is first conceptualized to achieve the construction of hashtag graph under conceptual space. Then, the associated hashtags are generated by correlation calculation based on the integration of social relationships and concepts to extend the short text. In the deep hash model, we use the semantic hashing model to encode the abundant semantic features and form a compact and meaningful binary encoding. Finally, extensive experiments demonstrate that our method can learn and represent the short texts well by using more meaningful semantic signal. It can effectively enhance and guide the semantic analysis and understanding of short text in microblogs.

References

Mohammed El Amine Abderrahim, Saïd Benameur, and Mohammed Alaeddine Abderrahim. 2013. The number of terms and documents for pseudo-relevant feedback for ad hoc information retrieval. Int. J. Comput. Sci. Iss. 10, 1 (2013), 661.Google Scholar
Khaled Albishre, Yuefeng Li, and Yue Xu. 2017. Effective pseudo-relevance for microblog retrieval. In Proceedings of the Australasian Computer Science Week Multiconference. ACM, 51. Google ScholarDigital Library
Saeid Balaneshin-kordan and Alexander Kotov. 2016. Sequential query expansion using concept graph. In Proceedings of the 25th ACM International Conference on Information and Knowledge Management. ACM, 155--164. Google ScholarDigital Library
Piyush Bansal, Somay Jain, and Vasudeva Varma. 2015. Towards semantic retrieval of hashtags in microblogs. In Proceedings of the 24th International Conference on World Wide Web. ACM, 7--8. Google ScholarDigital Library
Guihong Cao, Jian-Yun Nie, Jianfeng Gao, and Stephen Robertson. 2008. Selecting good expansion terms for pseudo-relevance feedback. In Proceedings of the 31st International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 243--250. Google ScholarDigital Library
Ciro Cattuto, Marco Quaggiotto, André Panisson, and Alex Averbuch. 2013. Time-varying social networks in a graph database: A Neo4j use case. In Proceedings of the 1st International Workshop on Graph Data Management Experiences and Systems. ACM, 11. Google ScholarDigital Library
Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 3 (2011), 27:1--27:27. Google ScholarDigital Library
Wanqiu Cui, Junping Du, Dawei Wang, Feifei Kou, Meiyu Liang, Zhe Xue, and Nan Zhou. 2018. Extended search method based on a semantic hashtag graph combining social and conceptual information. In World Wide Web (Special Issue on Web and Big Data) (2018), 1--22.Google Scholar
Ofer Egozi, Shaul Markovitch, and Evgeniy Gabrilovich. 2011. Concept-based information retrieval using explicit semantic analysis. ACM Trans. Inform. Syst. 29, 2 (2011), 8. Google ScholarDigital Library
Nicholas E. Evangelopoulos. 2013. Latent semantic analysis. Wiley Interdisc. Rev.: Cog. Sci. 4, 6 (2013), 683--692.Google ScholarCross Ref
Fangxiang Feng, Xiaojie Wang, and Ruifan Li. 2014. Cross-modal retrieval with correspondence autoencoder. In Proceedings of the 22nd ACM International Conference on Multimedia. ACM, 7--16. Google ScholarDigital Library
Evgeniy Gabrilovich and Shaul Markovitch. 2016. Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In Proceedings of the International Joint Conference on Artificial Intelligence, Vol. 7. 1606--1611. Google ScholarDigital Library
Lianli Gao, Yuan Wang, Dongsheng Li, Junming Shao, and Jingkuan Song. 2017. Real-time social media retrieval with spatial, temporal and social constraints. Neurocomputing 253 (2017), 77--88. Google ScholarDigital Library
Yue Gao, Hanwang Zhang, Xibin Zhao, and Shuicheng Yan. 2017. Event classification in microblogs via social tracking. ACM Trans. Intell. Syst. Technol. 8, 3 (2017), 35. Google ScholarDigital Library
Dan Guo and Pengfei Gao. 2016. Complex-query web image search with concept-based relevance estimation. World Wide Web 19, 2 (2016), 247--264. Google ScholarDigital Library
Ki-Joo Hong and Han-Joon Kim. 2016. A semantic search technique with Wikipedia-based text representation model. In Proceedings of the International Conference on Big Data and Smart Computing (BigComp’16). IEEE, 177--182. Google ScholarDigital Library
Wen Hua, Zhongyuan Wang, Haixun Wang, Kai Zheng, and Xiaofang Zhou. 2017. Understand short texts by harvesting and analyzing semantic knowledge. IEEE Trans. Knowl. Data Eng. 29, 3 (2017), 499--512. Google ScholarDigital Library
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia. 675--678. Google ScholarDigital Library
Yuncheng Jiang, Wen Bai, Xiaopei Zhang, and Jiaojiao Hu. 2017. Wikipedia-based information content and semantic similarity computation. Inform. Proc. Manag. 53, 1 (2017), 248--265. Google ScholarDigital Library
Zhiwei Jin, Juan Cao, Han Guo, Yongdong Zhang, and Jiebo Luo. 2017. Multimodal fusion with recurrent neural networks for rumor detection on microblogs. In Proceedings of the ACM Conference on Multimedia. 795--816. Google ScholarDigital Library
Alexander Kotov and ChengXiang Zhai. 2012. Tapping into knowledge base for concept feedback: Leveraging conceptnet to improve search results for difficult queries. ACM International Conference on Web Search 8 Data Mining. ACM, 403--412. Google ScholarDigital Library
Feifei Kou, Junping Du, Zijian Lin, Meiyu Liang, Haisheng Li, Lei Shi, and Congxian Yang. 2018. A semantic modeling method for social network short text based on spatial and temporal characteristics. J. Comput. Sci. 28 (2018), 281--293.Google ScholarCross Ref
Fei-Fei Kou, Jun-Ping Du, Cong-Xian Yang, Yan-Song Shi, Wan-Qiu Cui, Mei-Yu Liang, and Yue Geng. 2018. Hashtag recommendation based on multi-features of microblogs. J. Comput. Sci. Technol. 33, 4 (2018), 711--726.Google ScholarCross Ref
Chao Li, Cheng Deng, Ning Li, Wei Liu, Xinbo Gao, and Dacheng Tao. 2018. Self-supervised adversarial hashing networks for cross-modal retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4242--4251.Google ScholarCross Ref
Yang Li, Jing Jiang, Ting Liu, Minghui Qiu, and Xiaofei Sun. 2017. Personalized microtopic recommendation on microblogs. ACM Trans. Intell. Syst. Technol. 8, 6 (2017), 77. Google ScholarDigital Library
Haomiao Liu, Ruiping Wang, Shiguang Shan, and Xilin Chen. 2016. Deep supervised hashing for fast image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2064--2072.Google ScholarCross Ref
Zhunchen Luo, Yang Yu, Miles Osborne, and Ting Wang. 2015. Structuring Tweets for improving Twitter search. J. Assoc. Inform. Sci. Technol. 66, 12 (2015), 2522--2539. Google ScholarDigital Library
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In Proceedings of the 1st International Conference on Learning Representations. 1--12.Google Scholar
Yuxin Peng, Xin Huang, and Jinwei Qi. 2016. Cross-media shared representation by hierarchical learning with multiple deep networks. In Proceedings of the International Joint Conference on Artificial Intelligence. 3846--3853. Google ScholarDigital Library
José R. Pérez-Agüera, Javier Arroyo, Jane Greenberg, Joaquin Perez Iglesias, and Victor Fresno. 2010. Using BM25F for semantic search. In Proceedings of the 3rd International Semantic Search Workshop. ACM, 2. Google ScholarDigital Library
Ruslan Salakhutdinov and Geoffrey Hinton. 2009. Semantic hashing. Int. J. Approx. Reason. 50, 7 (2009), 969--978. Google ScholarDigital Library
Benno Stein. 2007. Principles of hash-based text retrieval. In Proceedings of the 30th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 527--534. Google ScholarDigital Library
Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. 2008. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning. ACM, 1096--1103. Google ScholarDigital Library
Fang Wang, Zhongyuan Wang, Zhoujun Li, and Ji-Rong Wen. 2014. Concept-based short text classification and ranking. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. ACM, 1069--1078. Google ScholarDigital Library
Pengwei Wang, Lei Ji, Jun Yan, Dejing Dou, Nisansa De Silva, Yong Zhang, and Lianwen Jin. 2018. Concept and attention-based CNN for question retrieval in multi-view learning. ACM Trans. Intell. Syst. Technol. 9, 4 (2018), 41. Google ScholarDigital Library
Yashen Wang, Heyan Huang, and Chong Feng. 2017. Query expansion based on a feedback concept model for microblog retrieval. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 559--568. Google ScholarDigital Library
Yuan Wang, Jie Liu, Yalou Huang, and Xia Feng. 2016. Using hashtag graph-based topic model to connect semantically-related words without co-occurrence in microblogs. IEEE Trans. Knowl. Data Eng. 28, 7 (2016), 1919--1933.Google ScholarCross Ref
Zhongyuan Wang, Kejun Zhao, Haixun Wang, Xiaofeng Meng, and Ji-Rong Wen. 2015. Query understanding through knowledge-based conceptualization. In Proceedings of the International Joint Conference on Artificial Intelligence. 3264--3270. Google ScholarDigital Library
Chen Xing, Wei Wu, Yu Wu, Jie Liu, Yalou Huang, Ming Zhou, and Wei-Ying Ma. 2017. Topic aware neural response generation. In Proceedings of the 31th AAAI Conference on Artificial Intelligence. 3351--3357. Google ScholarDigital Library
Zheng Ye, Jimmy Xiangji Huang, and Hongfei Lin. 2014. Finding a good query-related topic for boosting pseudo-relevance feedback. J. Amer. Soc. Inform. Sci. Technol. 62, 4 (2014), 748--760. Google ScholarDigital Library
Zheng Yu, Haixun Wang, Xuemin Lin, and Min Wang. 2016. Understanding short texts through semantic enrichment and hashing. IEEE Trans. Knowl. Data Eng. 28, 2 (2016), 566--579. Google ScholarDigital Library
Weinan Zhang, Dingquan Wang, Gui-Rong Xue, and Hongyuan Zha. 2012. Advertising keywords recommendation for short-text web pages using Wikipedia. ACM Trans. Intell. Syst. Technol. 3, 2 (2012), 36. Google ScholarDigital Library
Feng Zhao, Yajun Zhu, Hai Jin, and Laurence T. Yang. 2016. A personalized hashtag recommendation approach using LDA-based topic model in microblog environment. Future Gen. Comput. Syst. 65 (2016), 196--206. Google ScholarDigital Library
Guoqiang Zhong, Hui Xu, Pan Yang, Sijiang Wang, and Junyu Dong. 2016. Deep hashing learning networks. In Proceedings of the International Joint Conference on Neural Networks. 2236--2243.Google ScholarCross Ref
Han Zhu, Mingsheng Long, Jianmin Wang, and Yue Cao. 2016. Deep hashing network for efficient similarity retrieval. In Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2415--2421. Google ScholarDigital Library
Siying Zhu, Bong Nam Kang, and Daijin Kim. 2017. A deep neural network based hashing for efficient image retrieval. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics. IEEE, 002483--002488.Google Scholar

Index Terms

Short Text Analysis Based on Dual Semantic Extension and Deep Hashing in Microblog
1. Information systems
  1. Information retrieval
    1. Document representation
      1. Content analysis and feature selection
    2. Specialized information retrieval
      1. Environment-specific retrieval
        Web and social media search
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Algorithmic game theory and mechanism design
      1. Social networks
    2. Machine learning theory

Recommendations

Semantic Extension of Query for the Linked Data

With the advent of Big Data Era, users prefer to get knowledge rather than pages from Web. Linked Data, a new form of knowledge representation and publishing described by RDF, can provide a more precise and comprehensible semantic structure to satisfy ...
Read More
Building associated semantic representation model for the ultra-short microblog text jumping in big data

In the massive microblog texts, the ultra-short microblog text is difficult to be independently understood because of its special characteristics such as data sparseness, content fragmentation and so on. To solve this problem, this paper presents an ...
Read More
The Semantic Web needs more cognition

One of the key deficiencies of the Semantic Web is its lack of cognitive plausibility. We argue that by accounting for people's reasoning mechanisms and cognitive representations, the usefulness of information coming from the Semantic Web will be ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Intelligent Systems and Technology Volume 10, Issue 4
Survey Papers and Regular Papers
July 2019
327 pages
ISSN:2157-6904
EISSN:2157-6912
DOI:10.1145/3344873
Editor:
Yu Zheng
JD Finance, China
Issue’s Table of Contents
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 August 2019
- Accepted: 1 April 2019
- Revised: 1 February 2019
- Received: 1 October 2018
Published in tist Volume 10, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Semantic extension
conceptual space
deep hash model
hashtag graph
social and conceptual semantics
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 253
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Short Text Analysis Based on Dual Semantic Extension and Deep Hashing in Microblog

ACM Transactions on Intelligent Systems and Technology

Abstract

References

Cited By

Index Terms

Recommendations

Semantic Extension of Query for the Linked Data

Building associated semantic representation model for the ultra-short microblog text jumping in big data

The Semantic Web needs more cognition