Skip to main content
Log in

Text classification algorithms for mining unstructured data: a SWOT analysis

  • Original Research
  • Published:
International Journal of Information Technology Aims and scope Submit manuscript

Abstract

It has become increasingly crucial and imperative to facilitate knowledge extraction for decision support and deliver targeted information to analysts that span wide application domains. Interestingly, the buzzing term “big data” which is estimated to be 90% unstructured further makes it difficult to tap and analyze information with traditional tools. Text mining entails defining a process which transforms and substitutes this unstructured data into a structured one to discover knowledge. Use of classification algorithms to intelligently mine text has been studied extensively across literature. This study predominantly surveys the text classification algorithms employed in the process of mining unstructured data to report a conclusive analysis on the trend of their use in terms of their respective strengths, weaknesses, opportunities and threats (SWOT). The scope of these algorithms is then explored apropos the application area of sentiment analysis, a typical text classification task. A mapping which determines the unexplored social media technologies and the extent of use of these algorithms within respective social media is proffered to give an insight to the amount of work that has been done in the domain of machine learning based sentiment analysis on social media.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Goutam C (2016) Analysis of unstructured data: applications of text analytics and sentiment mining. SAS. Retrieved 24 June 2016

  2. International Data Corporation (IDC): http://www.idc.com. Accessed 28 Jan 2017

  3. Aggarwal CC, Zhai C-X (2012) A survey of text classification algorithms, book chapter in mining text data. Springer, Berlin

    Google Scholar 

  4. Rajman M, Besançon R (1998) Text mining—knowledge extraction from unstructured textual data. In: Proceeding of 6th Conference of International Federation of Classification Societies (IFCS-98), pp 473–480

  5. Bhatia MPS, Kumar A, Beniwal R (2016) SWOT analysis of ontology driven software engineering. Indian J Sci Technol 9(38). https://doi.org/10.17485/ijst/2016/v9i38/102970

  6. The Data Mining Encyclopedia, Idea Group Inc, 2006

  7. Raghu R, Gehrke J (2000) Database management systems. McGraw-Hill, Boston

    MATH  Google Scholar 

  8. Kuhlen R (1991) Information and pragmatic value-adding: language games and information science. Comput Humanit 25:93–101

    Article  Google Scholar 

  9. Nunes S (2006) State of the art in web information retrieval. Technical Report. FEUP, Porto

    Google Scholar 

  10. Kosala R, Blockeel H (2000) Web mining research: a survey. SIGKDD Explor 2(1):1–15

    Article  Google Scholar 

  11. Bhatia MPS, Kumar A (2008) Information retrieval and machine learning: supporting technologies for web mining research and practice. Webology 5(2):5

    Google Scholar 

  12. Srivastava J, Desikan P, Kumar V (2002) Web mining: accomplishments and future directions. In: National science foundation workshop on next generation data mining (NGDM’02), pp 51–69

  13. Structured versus unstructured Data: http://www.brightplanet.com. Accessed 28 Jan 2017

  14. Xindong W et al (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1):1–37

    Article  Google Scholar 

  15. Aggarwal CC, Zhai CX (2012) Mining text data. Springer, Berlin

    Book  Google Scholar 

  16. Tran TK, Phan TT (2017) Mining opinion targets and opinion words from online reviews. Int J Inform Technol 9(3):239–249

    Article  Google Scholar 

  17. Yang Y, Pederson JO (1995) A comparative study on feature selection in text categorization. ACM SIGIR Conference

  18. Yang Y (1995) Noise reduction in a statistical approach to text categorization, ACM SIGIR Conference

  19. Jindal R, Malhotra R, Jain A (2015) Techniques for text classification: literature review and current trends. Webology 12(2):1

    Google Scholar 

  20. Kumar A, Sharma A (2016) Paradigm shift from E-governance to S-governance. The human element of big data: issues, analytics, and performance. Taylor & Francis, Boca Raton

    Google Scholar 

  21. Mohod SW, Dhote CA (2014) Survey of machine learning techniques in textual document classification. IOSR J Comput Eng (IOSR-JCE) 16(1):17–21 (e-ISSN: 2278-0661, p- ISSN: 2278-8727)

    Article  Google Scholar 

  22. Khan A, Baharudin B, Lee LH, Khan K (2010) A review of machine learning algorithms for text-documents classification. J Adv Inform Technol 1(1):4–20

    Google Scholar 

  23. Myllymaki P, Tirri H (1993) Bayesian case-based reasoning with neural network. Proc IEEE Int Conf Neural Netw 1:422–427

    Article  Google Scholar 

  24. Rahman A, Tasnim S (2014) Ensemble classifiers and their applications: a review. Int J Comput Trends Technol (IJCTT) 10:1

    Google Scholar 

  25. Youn S, McLeod D (2007) A comparative study for email classification. In Elleithy K (ed) Advances and innovations in systems, computing sciences and software engineering. Springer, New York, NY, pp 387–391

    Chapter  Google Scholar 

  26. Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inform Retr 2(1–2):1–135

    Article  Google Scholar 

  27. Kumar A, Sebastian TM (2012) Sentiment analysis: a perspective on its past, present and future. Int J Intell Syst Appl 4(10):1

    Google Scholar 

  28. Carstens L (2011) Sentiment analysis—a multinational approach. Imperial College London, London

    Google Scholar 

  29. Google Scholar: https://scholar.google.com/scholar/about.html

  30. Kumar A, Dabas V (2016) A social media complaint workflow automation tool using sentiment intelligence. Lecture notes in engineering and computer science. In: Proceedings of the world congress on engineering 2016, pp 176–181

  31. Kumar A, Joshi A (2017) SentIndigov-O: an ontology based tool for sentiment analysis to empower digital governance. In: Proceedings of the 10th International Conference on theory and practice of electronic governance, ICEGOV 2017, ACM, pp 576–577

  32. Kumar A, Ahmad N (2012) ComEx miner: expert mining in virtual communities. Int J Adv Comput Sci Appl (IJACSA) 3(6):54–65 (The Science and Information Organization Inc, USA)

    Google Scholar 

  33. Khan I, Naqvi SK, Alam M, Rizvi SNA (2017) An efficient framework for real-time tweet classification. Int J Inform Technol 9(2):215–221 (Springer)

    Article  Google Scholar 

  34. Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining. In: Proceedings of the Seventh Conference on International language resources and evaluation. pp 1320–1326

  35. Kumar A, Jaiswal A (2017) Empirical Study of twitter and tumblr for sentiment analysis using soft computing techniques. International Conference on Soft Computing and Applications (ICSCA 2017), World Congress on Engineering and Computer Science, pp. 472–476

  36. Kumar A, Praveen S, Goel N, Sanwal K (2017) Opinion extraction from quora using user–biased sentiment analysis. Fourth International Conference on information systems design and intelligent applications, India 2017, Springer Advances in Intelligent Systems and Computing (AISC) Series (To be published) (*Best Paper Award)

  37. Kumar A, Jaiswal A (2017) Image sentiment analysis using convolutional neural network. 17th International Conference on intelligent systems design and applications (ISDA 2017), Springer Verlag (Accepted)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Akshi Kumar.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kumar, A., Dabas, V. & Hooda, P. Text classification algorithms for mining unstructured data: a SWOT analysis. Int. j. inf. tecnol. 12, 1159–1169 (2020). https://doi.org/10.1007/s41870-017-0072-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41870-017-0072-1

Keywords

Navigation