Abstract
It has become increasingly crucial and imperative to facilitate knowledge extraction for decision support and deliver targeted information to analysts that span wide application domains. Interestingly, the buzzing term “big data” which is estimated to be 90% unstructured further makes it difficult to tap and analyze information with traditional tools. Text mining entails defining a process which transforms and substitutes this unstructured data into a structured one to discover knowledge. Use of classification algorithms to intelligently mine text has been studied extensively across literature. This study predominantly surveys the text classification algorithms employed in the process of mining unstructured data to report a conclusive analysis on the trend of their use in terms of their respective strengths, weaknesses, opportunities and threats (SWOT). The scope of these algorithms is then explored apropos the application area of sentiment analysis, a typical text classification task. A mapping which determines the unexplored social media technologies and the extent of use of these algorithms within respective social media is proffered to give an insight to the amount of work that has been done in the domain of machine learning based sentiment analysis on social media.
Similar content being viewed by others
References
Goutam C (2016) Analysis of unstructured data: applications of text analytics and sentiment mining. SAS. Retrieved 24 June 2016
International Data Corporation (IDC): http://www.idc.com. Accessed 28 Jan 2017
Aggarwal CC, Zhai C-X (2012) A survey of text classification algorithms, book chapter in mining text data. Springer, Berlin
Rajman M, Besançon R (1998) Text mining—knowledge extraction from unstructured textual data. In: Proceeding of 6th Conference of International Federation of Classification Societies (IFCS-98), pp 473–480
Bhatia MPS, Kumar A, Beniwal R (2016) SWOT analysis of ontology driven software engineering. Indian J Sci Technol 9(38). https://doi.org/10.17485/ijst/2016/v9i38/102970
The Data Mining Encyclopedia, Idea Group Inc, 2006
Raghu R, Gehrke J (2000) Database management systems. McGraw-Hill, Boston
Kuhlen R (1991) Information and pragmatic value-adding: language games and information science. Comput Humanit 25:93–101
Nunes S (2006) State of the art in web information retrieval. Technical Report. FEUP, Porto
Kosala R, Blockeel H (2000) Web mining research: a survey. SIGKDD Explor 2(1):1–15
Bhatia MPS, Kumar A (2008) Information retrieval and machine learning: supporting technologies for web mining research and practice. Webology 5(2):5
Srivastava J, Desikan P, Kumar V (2002) Web mining: accomplishments and future directions. In: National science foundation workshop on next generation data mining (NGDM’02), pp 51–69
Structured versus unstructured Data: http://www.brightplanet.com. Accessed 28 Jan 2017
Xindong W et al (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1):1–37
Aggarwal CC, Zhai CX (2012) Mining text data. Springer, Berlin
Tran TK, Phan TT (2017) Mining opinion targets and opinion words from online reviews. Int J Inform Technol 9(3):239–249
Yang Y, Pederson JO (1995) A comparative study on feature selection in text categorization. ACM SIGIR Conference
Yang Y (1995) Noise reduction in a statistical approach to text categorization, ACM SIGIR Conference
Jindal R, Malhotra R, Jain A (2015) Techniques for text classification: literature review and current trends. Webology 12(2):1
Kumar A, Sharma A (2016) Paradigm shift from E-governance to S-governance. The human element of big data: issues, analytics, and performance. Taylor & Francis, Boca Raton
Mohod SW, Dhote CA (2014) Survey of machine learning techniques in textual document classification. IOSR J Comput Eng (IOSR-JCE) 16(1):17–21 (e-ISSN: 2278-0661, p- ISSN: 2278-8727)
Khan A, Baharudin B, Lee LH, Khan K (2010) A review of machine learning algorithms for text-documents classification. J Adv Inform Technol 1(1):4–20
Myllymaki P, Tirri H (1993) Bayesian case-based reasoning with neural network. Proc IEEE Int Conf Neural Netw 1:422–427
Rahman A, Tasnim S (2014) Ensemble classifiers and their applications: a review. Int J Comput Trends Technol (IJCTT) 10:1
Youn S, McLeod D (2007) A comparative study for email classification. In Elleithy K (ed) Advances and innovations in systems, computing sciences and software engineering. Springer, New York, NY, pp 387–391
Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inform Retr 2(1–2):1–135
Kumar A, Sebastian TM (2012) Sentiment analysis: a perspective on its past, present and future. Int J Intell Syst Appl 4(10):1
Carstens L (2011) Sentiment analysis—a multinational approach. Imperial College London, London
Google Scholar: https://scholar.google.com/scholar/about.html
Kumar A, Dabas V (2016) A social media complaint workflow automation tool using sentiment intelligence. Lecture notes in engineering and computer science. In: Proceedings of the world congress on engineering 2016, pp 176–181
Kumar A, Joshi A (2017) SentIndigov-O: an ontology based tool for sentiment analysis to empower digital governance. In: Proceedings of the 10th International Conference on theory and practice of electronic governance, ICEGOV 2017, ACM, pp 576–577
Kumar A, Ahmad N (2012) ComEx miner: expert mining in virtual communities. Int J Adv Comput Sci Appl (IJACSA) 3(6):54–65 (The Science and Information Organization Inc, USA)
Khan I, Naqvi SK, Alam M, Rizvi SNA (2017) An efficient framework for real-time tweet classification. Int J Inform Technol 9(2):215–221 (Springer)
Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining. In: Proceedings of the Seventh Conference on International language resources and evaluation. pp 1320–1326
Kumar A, Jaiswal A (2017) Empirical Study of twitter and tumblr for sentiment analysis using soft computing techniques. International Conference on Soft Computing and Applications (ICSCA 2017), World Congress on Engineering and Computer Science, pp. 472–476
Kumar A, Praveen S, Goel N, Sanwal K (2017) Opinion extraction from quora using user–biased sentiment analysis. Fourth International Conference on information systems design and intelligent applications, India 2017, Springer Advances in Intelligent Systems and Computing (AISC) Series (To be published) (*Best Paper Award)
Kumar A, Jaiswal A (2017) Image sentiment analysis using convolutional neural network. 17th International Conference on intelligent systems design and applications (ISDA 2017), Springer Verlag (Accepted)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kumar, A., Dabas, V. & Hooda, P. Text classification algorithms for mining unstructured data: a SWOT analysis. Int. j. inf. tecnol. 12, 1159–1169 (2020). https://doi.org/10.1007/s41870-017-0072-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41870-017-0072-1