Top

Knowledge and Information Systems

Published in:

31-07-2021 | Regular Paper

Tweet-scan-post: a system for analysis of sensitive private data disclosure in online social media

Authors: R. Geetha, S. Karthika, Ponnurangam Kumaraguru

Published in: Knowledge and Information Systems | Issue 9/2021

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

The social media technologies are open to users who are intended in creating a community and publishing their opinions of recent incidents. The participants of the online social networking sites remain ignorant of the criticality of disclosing personal data to the public audience. The private data of users are at high risk leading to many adverse effects like cyberbullying, identity theft, and job loss. This research work aims to define the user entities or data like phone number, email address, family details, health-related information as user’s sensitive private data (SPD) in a social media platform. The proposed system, Tweet-Scan-Post (TSP), is mainly focused on identifying the presence of SPD in user’s posts under personal, professional, and health domains. The TSP framework is built based on the standards and privacy regulations established by social networking sites and organizations like NIST, DHS, GDPR. The proposed approach of TSP addresses the prevailing challenges in determining the presence of sensitive PII, user privacy within the bounds of confidentiality and trustworthiness. A novel layered classification approach with various state-of-art machine learning models is used by the TSP framework to classify tweets as sensitive and insensitive. The findings of TSP systems include 201 Sensitive Privacy Keywords using a boosting strategy, sensitivity scaling that measures the degree of sensitivity allied with a tweet. The experimental results revealed that personal tweets were highly related to mother and children, professional tweets with apology, and health tweets with concern over the father’s health condition.

previous article A generative model for time evolving networks

next article OpenWGL: open-world graph learning for unseen class node classification

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Abid Y, Imine A, Rusinowitch M (2018) Sensitive attribute prediction for social networks users. In DARLI-AP 2018–2nd international workshop on data analytics solutions for real-life applications

Ampong G, Mensah A, Adu A, Addae J, Omoregie O, Ofori K (2018) Examining self-disclosure on social networking sites: a flow theory and privacy perspective. Behav Sci 8(6):58CrossRef

Becker M, Klausing SM, Hess T (2019) Uncovering the privacy paradox: the influence of distraction on data disclosure decision. In: Proceedings of the 27th European conference on information systems (ECIS)

Caliskan Islam A, Walsh J, Greenstadt R (2014) Privacy detective: detecting private information and collective privacy behavior in a large social network. Proceedings of the 13th workshop on privacy in the electronic society, ACM, pp. 35–46

Castillo SRM, Chen Z (2016) Using transfer learning to identify privacy leaks in tweets. IEEE 2nd international conference on collaboration and internet computing (CIC), IEEE, pp. 506–513

Chauhan A, Kummamuru K, Toshniwal D (2017) Prediction of places of visit using tweets. Knowl Inf Syst 50(1):145–166CrossRef

Corley CD, Cook DJ, Mikler AR, Singh KP (2010) Text and structural data mining of influenza mentions in web and social media. Int J Environ Res Public Health 7(2):596–615CrossRef

Dong C, Jin H, Knijnenburg BP (2016) Ppm: a privacy prediction model for online social networks. International conference on social informatics. Springer, Cham, pp. 400–420

Eliacik AB, Erdogan N (2018) Influential user weighted sentiment analysis on topic based microblogging community. Exp Syst Appl 92:403–418CrossRef

10.

Fan S, Huang B (2017) Recurrent collective classification. Knowledge and Information Systems, 1–15

11.

Fares M, Moufarrej A, Jreij E, Tekli J, Grosky W (2019) Difficulties and improvements to graph-based lexical sentiment analysis using LISA. 2019 IEEE international conference on cognitive computing (ICCC). IEEE, pp. 28–35

12.

Fu X, Liu W, Xu Y, Cui L (2017) Combine HowNet lexicon to train phrase recursive autoencoder for sentence-level sentiment analysis. Neurocomputing 241:18–27CrossRef

13.

Gan D, Jenkins LR (2015) Social networking privacy—Who’s stalking you? Future Internet 7(1):67–93CrossRef

14.

Gao W, Peng M, Wang H, Zhang Y, Xie Q, Tian G (2018) Incorporating word embeddings into topic modeling of short text. Knowledge and Information Systems, 1–23

15.

Geetha R, Karthika S, Pavithra N, Preethi V (2019) Tweedle: sensitivity check in health-related social short texts based on regret theory. Procedia Comput Sci 165:663–675CrossRef

16.

Ghosh S, Desarkar MS (2018) Class specific TF-IDF boosting for short-text classification: application to short-texts generated during disasters. In companion proceedings of the the web conference 2018, pp. 1629–1637

17.

Gill AJ, Vasalou A, Papoutsi C, Joinson AN (2011) Privacy dictionary: a linguistic taxonomy of privacy for content analysis. In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, pp. 3227–3236

18.

Gopal, J., Huang, S., & Luo, B. (2015). FamilyID: a hybrid approach to identify family information from microblogs. In IFIP annual conference on data and applications security and privacy. Springer, Cham, pp. 215-222

19.

Househ M, Grainger R, Petersen C, Bamidis P, Merolli M (2018) Balancing between privacy and patient needs for health information in the age of participatory health and social media: a scoping review. Yearb Med Inform 27(01):029–036CrossRef

20.

Jordan K, Weller M (2018) Academics and social networking sites: benefits, problems and tensions in professional engagement with online networking. J Interact Media Educ 2018(1)

21.

Kotsiantis SB (2005) Logitboost of simple bayesian classifier. Informatica 29(1)

22.

Kumar CP, Babu LD (2019) Novel text preprocessing framework for sentiment analysis. In: Smart intelligent computing and applications. Springer, Singapore, pp 309–317CrossRef

23.

Kumar HK, Harish BS (2018) Classification of short text using various preprocessing techniques: an empirical evaluation. Recent findings in intelligent computing techniques. Springer, Singapore, pp 19–30CrossRef

24.

Li P, Cho H, Goh ZH (2019) Unpacking the process of privacy management and self-disclosure from the perspectives of regulatory focus and privacy calculus. Telematics Inform 41:114–125CrossRef

25.

Li Y, Li T, Liu H (2017) Recent advances in feature selection and its applications. Knowl Inf Syst 53(3):551–577CrossRef

26.

Liu S, Wang Y, Chen C, Xiang Y (2016) An ensemble learning approach for addressing the class imbalance problem in Twitter spam detection. Australasian conference on information security and privacy. Springer, Cham, pp 215–228CrossRef

27.

Liu Z, Wang X (2018) How to regulate individuals’ privacy boundaries on social network sites: a cross-cultural comparison. Inform Manag 55(8):1005–1023CrossRef

28.

Liu Z, Wang X, Liu J (2019) How digital natives make their self-disclosure decisions: a cross-cultural comparison. Inform Technol People

29.

Lu X, Zhaowei Qu, Li Qi, Hui P (2015) Privacy information security classification for internet of things based on internet data. Int J Distrib Sens Netw 11(8):932–941CrossRef

30.

Mao H, Shuai X, Kapadia A (2011) Loose tweets: an analysis of privacy leaks on twitter. Proceedings of the 10th annual ACM workshop on privacy in the electronic society. ACM, pp. 1–12

31.

Marwick AE, Boyd D (2011) I tweet honestly, I tweet passionately: twitter users, context collapse, and the imagined audience. New Media Soc 13(1):114–133CrossRef

32.

McCallister E (2010) Guide to protecting the confidentiality of personally identifiable information. Diane Publishing

33.

Moll R, Pieschl S, Bromme R (2014) Trust into collective privacy? The role of subjective theories for self-disclosure in online communication. Societies 4(4):770–784CrossRef

34.

Nassar L, Karray F (2018) Overview of the crowdsourcing process. Knowledge and Information Systems, 1–24

35.

Parra-Arnau J, Mármol FG, Rebollo-Monedero D, Forné J (2017) Shall I post this now? Optimized, delay-based privacy protection in social networks. Knowl Inf Syst 52(1):113–145CrossRef

36.

Peddinti ST, Ross KW, Cappos J (2017) User anonymity on twitter. IEEE Secur Priv 15(3):84–87CrossRef

37.

Pla F, Hurtado LF (2017) Language identification of multilingual posts from Twitter: a case study. Knowl Inf Syst 51(3):965–989CrossRef

38.

Schapire RE (2003) The boosting approach to machine learning: an overview. In: Denison DD, Hansen MH, Holmes CC, Mallick B, Yu B (eds) Nonlinear estimation and classification. Lecture notes in statistics, vol 171. Springer, pp. 149–171

39.

Shao G (2009) Understanding the appeal of user-generated media: a uses and gratification perspective. Internet Res 19(1):7–25CrossRef

40.

Sleeper M, Cranshaw J, Kelley PG, Ur G, Acquisti A, Cranor LF, Sadeh N (2013) I read my Twitter the next morning and was astonished: a conversational perspective on Twitter regrets. In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, pp. 3277–3286

41.

Sun X, Chan PK (2018) Estimating effectiveness of twitter messages with a personalized machine learning approach. Knowl Inf Syst 56(1):27–53CrossRef

42.

Tang JH, Wang CC (2012) Self-disclosure among bloggers: re-examination of social penetration theory. Cyberpsychol Behav Soc Netw 15(5):245–250CrossRef

43.

Tsakalidis A, Papadopoulos S, Kompatsiaris I (2014) An ensemble model for cross-domain polarity classification on twitter. In international conference on web information systems engineering. Springer, Cham, pp. 168-177

44.

Tu W, Cheung D, Mamoulis N (2015) Time-sensitive opinion mining for prediction. In Twenty-Ninth AAAI conference on artificial intelligence, 29(1): 4214-4215

45.

Tuarob S, Tucker CS, Salathe M, Ram N (2014) An ensemble heterogeneous classification methodology for discovering health-related knowledge in social media messages. J Biomed Inform 49:255–268CrossRef

46.

Vasalou A, Gill AJ, Mazanderani F, Papoutsi C, Joinson A (2011) Privacy dictionary: a new resource for the automated content analysis of privacy. J Am Soc Inform Sci Technol 62(11):2095–2105CrossRef

47.

Vitak J, Blasiola S, Patil S, Litt E (2015) Balancing audience and privacy tensions on social network sites: strategies of highly engaged users. Int J Commun 9:20

48.

Wagner A, Krasnova H, Abramova O, Buxmann P, Benbasat I (2018) From˜ Privacy Calculus™ to˜ Social Calculus™: Understanding self-disclosure on social networking sites

49.

Wan Y, Gao Q (2015) An ensemble sentiment classification system of twitter data for airline services analysis. 2015 IEEE international conference on data mining workshop (ICDMW), IEEE, pp. 1318–1325

50.

Wang Q, Bhandal J, Huang S, Luo B (2017) Content-based classification of sensitive tweets. Int J Semant Comput 11(04):541–562CrossRef

51.

Yue L, Chen W, Li X, Zuo W, Yin M (2018) A survey of sentiment analysis in social media. Knowledge and Information Systems, 1–47

52.

Zhang S, Kwok RCW, Lowry PB, Liu Z, Wu J (2019) The influence of role stress on self-disclosure on social networking sites: a conservation of resources perspective. Inform Manag 56(7):103–147CrossRef

53.

Zuo Y, Zhao J, Xu K (2016) Word network topic model: a simple but general solution for short and imbalanced texts. Knowl Inf Syst 48(2):379–398CrossRef

54.

Statistica. https://www.statista.com/statistics/242606/number-of-active-twitter-users-in-selected-countries/. Accessed 15 February, 2020

55.

IndiaToday. https://www.indiatoday.in/india/story/kotak-mahindra-bank-sacks-employee-after-his-irresponsible-facebook-post-on-kathua-gangrape-victim-1211705-2018-04-13. Accessed 13 April 2018

56.

Times of India. https://timesofindia.indiatimes.com/home/science/hashtags-that-can-put-your-child-in-danger-online/articleshow/63652567.cms Accessed 20 April 2018

57.

Intersoft Consulting. http://gdpr-info.eu Accessed 25 June 2017

58.

Homeland Security. https://www.dhs.gov/publication/dhs-handbook-safeguarding-sensitive-pii Accessed 14 May 2018

59.

Shraddha Bajracharya, Businesstopia, https://www.businesstopia.net/mass-communication/uses-gratifications-theory Accessed 10 February 2018

60.

The Breach Level Index. https://www.breachlevelindex.com/data-breach-database Accessed 18 May 2019.

Title: Tweet-scan-post: a system for analysis of sensitive private data disclosure in online social media
Authors: R. Geetha
S. Karthika
Ponnurangam Kumaraguru
Publication date: 31-07-2021
Publisher: Springer London
Published in: Knowledge and Information Systems / Issue 9/2021
Print ISSN: 0219-1377
Electronic ISSN: 0219-3116
DOI: https://doi.org/10.1007/s10115-021-01592-2

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Other articles of this Issue 9/2021

On augmenting database schemas by latent visual attributes

Toward data-driven solutions to interactive dynamic influence diagrams

Deep reinforcement learning-based resource allocation and seamless handover in multi-access edge computing based on SDN

On entropy-based term weighting schemes for text categorization

Data stream classification with novel class detection: a review, comparison and challenges

Privacy protection of user profiles in online search via semantic randomization

Premium Partner