Skip to main content
Top
Published in: Knowledge and Information Systems 9/2021

31-07-2021 | Regular Paper

Tweet-scan-post: a system for analysis of sensitive private data disclosure in online social media

Authors: R. Geetha, S. Karthika, Ponnurangam Kumaraguru

Published in: Knowledge and Information Systems | Issue 9/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The social media technologies are open to users who are intended in creating a community and publishing their opinions of recent incidents. The participants of the online social networking sites remain ignorant of the criticality of disclosing personal data to the public audience. The private data of users are at high risk leading to many adverse effects like cyberbullying, identity theft, and job loss. This research work aims to define the user entities or data like phone number, email address, family details, health-related information as user’s sensitive private data (SPD) in a social media platform. The proposed system, Tweet-Scan-Post (TSP), is mainly focused on identifying the presence of SPD in user’s posts under personal, professional, and health domains. The TSP framework is built based on the standards and privacy regulations established by social networking sites and organizations like NIST, DHS, GDPR. The proposed approach of TSP addresses the prevailing challenges in determining the presence of sensitive PII, user privacy within the bounds of confidentiality and trustworthiness. A novel layered classification approach with various state-of-art machine learning models is used by the TSP framework to classify tweets as sensitive and insensitive. The findings of TSP systems include 201 Sensitive Privacy Keywords using a boosting strategy, sensitivity scaling that measures the degree of sensitivity allied with a tweet. The experimental results revealed that personal tweets were highly related to mother and children, professional tweets with apology, and health tweets with concern over the father’s health condition.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Abid Y, Imine A, Rusinowitch M (2018) Sensitive attribute prediction for social networks users. In DARLI-AP 2018–2nd international workshop on data analytics solutions for real-life applications Abid Y, Imine A, Rusinowitch M (2018) Sensitive attribute prediction for social networks users. In DARLI-AP 2018–2nd international workshop on data analytics solutions for real-life applications
2.
go back to reference Ampong G, Mensah A, Adu A, Addae J, Omoregie O, Ofori K (2018) Examining self-disclosure on social networking sites: a flow theory and privacy perspective. Behav Sci 8(6):58CrossRef Ampong G, Mensah A, Adu A, Addae J, Omoregie O, Ofori K (2018) Examining self-disclosure on social networking sites: a flow theory and privacy perspective. Behav Sci 8(6):58CrossRef
3.
go back to reference Becker M, Klausing SM, Hess T (2019) Uncovering the privacy paradox: the influence of distraction on data disclosure decision. In: Proceedings of the 27th European conference on information systems (ECIS) Becker M, Klausing SM, Hess T (2019) Uncovering the privacy paradox: the influence of distraction on data disclosure decision. In: Proceedings of the 27th European conference on information systems (ECIS)
4.
go back to reference Caliskan Islam A, Walsh J, Greenstadt R (2014) Privacy detective: detecting private information and collective privacy behavior in a large social network. Proceedings of the 13th workshop on privacy in the electronic society, ACM, pp. 35–46 Caliskan Islam A, Walsh J, Greenstadt R (2014) Privacy detective: detecting private information and collective privacy behavior in a large social network. Proceedings of the 13th workshop on privacy in the electronic society, ACM, pp. 35–46
5.
go back to reference Castillo SRM, Chen Z (2016) Using transfer learning to identify privacy leaks in tweets. IEEE 2nd international conference on collaboration and internet computing (CIC), IEEE, pp. 506–513 Castillo SRM, Chen Z (2016) Using transfer learning to identify privacy leaks in tweets. IEEE 2nd international conference on collaboration and internet computing (CIC), IEEE, pp. 506–513
6.
go back to reference Chauhan A, Kummamuru K, Toshniwal D (2017) Prediction of places of visit using tweets. Knowl Inf Syst 50(1):145–166CrossRef Chauhan A, Kummamuru K, Toshniwal D (2017) Prediction of places of visit using tweets. Knowl Inf Syst 50(1):145–166CrossRef
7.
go back to reference Corley CD, Cook DJ, Mikler AR, Singh KP (2010) Text and structural data mining of influenza mentions in web and social media. Int J Environ Res Public Health 7(2):596–615CrossRef Corley CD, Cook DJ, Mikler AR, Singh KP (2010) Text and structural data mining of influenza mentions in web and social media. Int J Environ Res Public Health 7(2):596–615CrossRef
8.
go back to reference Dong C, Jin H, Knijnenburg BP (2016) Ppm: a privacy prediction model for online social networks. International conference on social informatics. Springer, Cham, pp. 400–420 Dong C, Jin H, Knijnenburg BP (2016) Ppm: a privacy prediction model for online social networks. International conference on social informatics. Springer, Cham, pp. 400–420
9.
go back to reference Eliacik AB, Erdogan N (2018) Influential user weighted sentiment analysis on topic based microblogging community. Exp Syst Appl 92:403–418CrossRef Eliacik AB, Erdogan N (2018) Influential user weighted sentiment analysis on topic based microblogging community. Exp Syst Appl 92:403–418CrossRef
10.
go back to reference Fan S, Huang B (2017) Recurrent collective classification. Knowledge and Information Systems, 1–15 Fan S, Huang B (2017) Recurrent collective classification. Knowledge and Information Systems, 1–15
11.
go back to reference Fares M, Moufarrej A, Jreij E, Tekli J, Grosky W (2019) Difficulties and improvements to graph-based lexical sentiment analysis using LISA. 2019 IEEE international conference on cognitive computing (ICCC). IEEE, pp. 28–35 Fares M, Moufarrej A, Jreij E, Tekli J, Grosky W (2019) Difficulties and improvements to graph-based lexical sentiment analysis using LISA. 2019 IEEE international conference on cognitive computing (ICCC). IEEE, pp. 28–35
12.
go back to reference Fu X, Liu W, Xu Y, Cui L (2017) Combine HowNet lexicon to train phrase recursive autoencoder for sentence-level sentiment analysis. Neurocomputing 241:18–27CrossRef Fu X, Liu W, Xu Y, Cui L (2017) Combine HowNet lexicon to train phrase recursive autoencoder for sentence-level sentiment analysis. Neurocomputing 241:18–27CrossRef
13.
go back to reference Gan D, Jenkins LR (2015) Social networking privacy—Who’s stalking you? Future Internet 7(1):67–93CrossRef Gan D, Jenkins LR (2015) Social networking privacy—Who’s stalking you? Future Internet 7(1):67–93CrossRef
14.
go back to reference Gao W, Peng M, Wang H, Zhang Y, Xie Q, Tian G (2018) Incorporating word embeddings into topic modeling of short text. Knowledge and Information Systems, 1–23 Gao W, Peng M, Wang H, Zhang Y, Xie Q, Tian G (2018) Incorporating word embeddings into topic modeling of short text. Knowledge and Information Systems, 1–23
15.
go back to reference Geetha R, Karthika S, Pavithra N, Preethi V (2019) Tweedle: sensitivity check in health-related social short texts based on regret theory. Procedia Comput Sci 165:663–675CrossRef Geetha R, Karthika S, Pavithra N, Preethi V (2019) Tweedle: sensitivity check in health-related social short texts based on regret theory. Procedia Comput Sci 165:663–675CrossRef
16.
go back to reference Ghosh S, Desarkar MS (2018) Class specific TF-IDF boosting for short-text classification: application to short-texts generated during disasters. In companion proceedings of the the web conference 2018, pp. 1629–1637 Ghosh S, Desarkar MS (2018) Class specific TF-IDF boosting for short-text classification: application to short-texts generated during disasters. In companion proceedings of the the web conference 2018, pp. 1629–1637
17.
go back to reference Gill AJ, Vasalou A, Papoutsi C, Joinson AN (2011) Privacy dictionary: a linguistic taxonomy of privacy for content analysis. In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, pp. 3227–3236 Gill AJ, Vasalou A, Papoutsi C, Joinson AN (2011) Privacy dictionary: a linguistic taxonomy of privacy for content analysis. In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, pp. 3227–3236
18.
go back to reference Gopal, J., Huang, S., & Luo, B. (2015). FamilyID: a hybrid approach to identify family information from microblogs. In IFIP annual conference on data and applications security and privacy. Springer, Cham, pp. 215-222 Gopal, J., Huang, S., & Luo, B. (2015). FamilyID: a hybrid approach to identify family information from microblogs. In IFIP annual conference on data and applications security and privacy. Springer, Cham, pp. 215-222
19.
go back to reference Househ M, Grainger R, Petersen C, Bamidis P, Merolli M (2018) Balancing between privacy and patient needs for health information in the age of participatory health and social media: a scoping review. Yearb Med Inform 27(01):029–036CrossRef Househ M, Grainger R, Petersen C, Bamidis P, Merolli M (2018) Balancing between privacy and patient needs for health information in the age of participatory health and social media: a scoping review. Yearb Med Inform 27(01):029–036CrossRef
20.
go back to reference Jordan K, Weller M (2018) Academics and social networking sites: benefits, problems and tensions in professional engagement with online networking. J Interact Media Educ 2018(1) Jordan K, Weller M (2018) Academics and social networking sites: benefits, problems and tensions in professional engagement with online networking. J Interact Media Educ 2018(1)
21.
go back to reference Kotsiantis SB (2005) Logitboost of simple bayesian classifier. Informatica 29(1) Kotsiantis SB (2005) Logitboost of simple bayesian classifier. Informatica 29(1)
22.
go back to reference Kumar CP, Babu LD (2019) Novel text preprocessing framework for sentiment analysis. In: Smart intelligent computing and applications. Springer, Singapore, pp 309–317CrossRef Kumar CP, Babu LD (2019) Novel text preprocessing framework for sentiment analysis. In: Smart intelligent computing and applications. Springer, Singapore, pp 309–317CrossRef
23.
go back to reference Kumar HK, Harish BS (2018) Classification of short text using various preprocessing techniques: an empirical evaluation. Recent findings in intelligent computing techniques. Springer, Singapore, pp 19–30CrossRef Kumar HK, Harish BS (2018) Classification of short text using various preprocessing techniques: an empirical evaluation. Recent findings in intelligent computing techniques. Springer, Singapore, pp 19–30CrossRef
24.
go back to reference Li P, Cho H, Goh ZH (2019) Unpacking the process of privacy management and self-disclosure from the perspectives of regulatory focus and privacy calculus. Telematics Inform 41:114–125CrossRef Li P, Cho H, Goh ZH (2019) Unpacking the process of privacy management and self-disclosure from the perspectives of regulatory focus and privacy calculus. Telematics Inform 41:114–125CrossRef
25.
go back to reference Li Y, Li T, Liu H (2017) Recent advances in feature selection and its applications. Knowl Inf Syst 53(3):551–577CrossRef Li Y, Li T, Liu H (2017) Recent advances in feature selection and its applications. Knowl Inf Syst 53(3):551–577CrossRef
26.
go back to reference Liu S, Wang Y, Chen C, Xiang Y (2016) An ensemble learning approach for addressing the class imbalance problem in Twitter spam detection. Australasian conference on information security and privacy. Springer, Cham, pp 215–228CrossRef Liu S, Wang Y, Chen C, Xiang Y (2016) An ensemble learning approach for addressing the class imbalance problem in Twitter spam detection. Australasian conference on information security and privacy. Springer, Cham, pp 215–228CrossRef
27.
go back to reference Liu Z, Wang X (2018) How to regulate individuals’ privacy boundaries on social network sites: a cross-cultural comparison. Inform Manag 55(8):1005–1023CrossRef Liu Z, Wang X (2018) How to regulate individuals’ privacy boundaries on social network sites: a cross-cultural comparison. Inform Manag 55(8):1005–1023CrossRef
28.
go back to reference Liu Z, Wang X, Liu J (2019) How digital natives make their self-disclosure decisions: a cross-cultural comparison. Inform Technol People Liu Z, Wang X, Liu J (2019) How digital natives make their self-disclosure decisions: a cross-cultural comparison. Inform Technol People
29.
go back to reference Lu X, Zhaowei Qu, Li Qi, Hui P (2015) Privacy information security classification for internet of things based on internet data. Int J Distrib Sens Netw 11(8):932–941CrossRef Lu X, Zhaowei Qu, Li Qi, Hui P (2015) Privacy information security classification for internet of things based on internet data. Int J Distrib Sens Netw 11(8):932–941CrossRef
30.
go back to reference Mao H, Shuai X, Kapadia A (2011) Loose tweets: an analysis of privacy leaks on twitter. Proceedings of the 10th annual ACM workshop on privacy in the electronic society. ACM, pp. 1–12 Mao H, Shuai X, Kapadia A (2011) Loose tweets: an analysis of privacy leaks on twitter. Proceedings of the 10th annual ACM workshop on privacy in the electronic society. ACM, pp. 1–12
31.
go back to reference Marwick AE, Boyd D (2011) I tweet honestly, I tweet passionately: twitter users, context collapse, and the imagined audience. New Media Soc 13(1):114–133CrossRef Marwick AE, Boyd D (2011) I tweet honestly, I tweet passionately: twitter users, context collapse, and the imagined audience. New Media Soc 13(1):114–133CrossRef
32.
go back to reference McCallister E (2010) Guide to protecting the confidentiality of personally identifiable information. Diane Publishing McCallister E (2010) Guide to protecting the confidentiality of personally identifiable information. Diane Publishing
33.
go back to reference Moll R, Pieschl S, Bromme R (2014) Trust into collective privacy? The role of subjective theories for self-disclosure in online communication. Societies 4(4):770–784CrossRef Moll R, Pieschl S, Bromme R (2014) Trust into collective privacy? The role of subjective theories for self-disclosure in online communication. Societies 4(4):770–784CrossRef
34.
go back to reference Nassar L, Karray F (2018) Overview of the crowdsourcing process. Knowledge and Information Systems, 1–24 Nassar L, Karray F (2018) Overview of the crowdsourcing process. Knowledge and Information Systems, 1–24
35.
go back to reference Parra-Arnau J, Mármol FG, Rebollo-Monedero D, Forné J (2017) Shall I post this now? Optimized, delay-based privacy protection in social networks. Knowl Inf Syst 52(1):113–145CrossRef Parra-Arnau J, Mármol FG, Rebollo-Monedero D, Forné J (2017) Shall I post this now? Optimized, delay-based privacy protection in social networks. Knowl Inf Syst 52(1):113–145CrossRef
36.
go back to reference Peddinti ST, Ross KW, Cappos J (2017) User anonymity on twitter. IEEE Secur Priv 15(3):84–87CrossRef Peddinti ST, Ross KW, Cappos J (2017) User anonymity on twitter. IEEE Secur Priv 15(3):84–87CrossRef
37.
go back to reference Pla F, Hurtado LF (2017) Language identification of multilingual posts from Twitter: a case study. Knowl Inf Syst 51(3):965–989CrossRef Pla F, Hurtado LF (2017) Language identification of multilingual posts from Twitter: a case study. Knowl Inf Syst 51(3):965–989CrossRef
38.
go back to reference Schapire RE (2003) The boosting approach to machine learning: an overview. In: Denison DD, Hansen MH, Holmes CC, Mallick B, Yu B (eds) Nonlinear estimation and classification. Lecture notes in statistics, vol 171. Springer, pp. 149–171 Schapire RE (2003) The boosting approach to machine learning: an overview. In: Denison DD, Hansen MH, Holmes CC, Mallick B, Yu B (eds) Nonlinear estimation and classification. Lecture notes in statistics, vol 171. Springer, pp. 149–171
39.
go back to reference Shao G (2009) Understanding the appeal of user-generated media: a uses and gratification perspective. Internet Res 19(1):7–25CrossRef Shao G (2009) Understanding the appeal of user-generated media: a uses and gratification perspective. Internet Res 19(1):7–25CrossRef
40.
go back to reference Sleeper M, Cranshaw J, Kelley PG, Ur G, Acquisti A, Cranor LF, Sadeh N (2013) I read my Twitter the next morning and was astonished: a conversational perspective on Twitter regrets. In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, pp. 3277–3286 Sleeper M, Cranshaw J, Kelley PG, Ur G, Acquisti A, Cranor LF, Sadeh N (2013) I read my Twitter the next morning and was astonished: a conversational perspective on Twitter regrets. In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, pp. 3277–3286
41.
go back to reference Sun X, Chan PK (2018) Estimating effectiveness of twitter messages with a personalized machine learning approach. Knowl Inf Syst 56(1):27–53CrossRef Sun X, Chan PK (2018) Estimating effectiveness of twitter messages with a personalized machine learning approach. Knowl Inf Syst 56(1):27–53CrossRef
42.
go back to reference Tang JH, Wang CC (2012) Self-disclosure among bloggers: re-examination of social penetration theory. Cyberpsychol Behav Soc Netw 15(5):245–250CrossRef Tang JH, Wang CC (2012) Self-disclosure among bloggers: re-examination of social penetration theory. Cyberpsychol Behav Soc Netw 15(5):245–250CrossRef
43.
go back to reference Tsakalidis A, Papadopoulos S, Kompatsiaris I (2014) An ensemble model for cross-domain polarity classification on twitter. In international conference on web information systems engineering. Springer, Cham, pp. 168-177 Tsakalidis A, Papadopoulos S, Kompatsiaris I (2014) An ensemble model for cross-domain polarity classification on twitter. In international conference on web information systems engineering. Springer, Cham, pp. 168-177
44.
go back to reference Tu W, Cheung D, Mamoulis N (2015) Time-sensitive opinion mining for prediction. In Twenty-Ninth AAAI conference on artificial intelligence, 29(1): 4214-4215 Tu W, Cheung D, Mamoulis N (2015) Time-sensitive opinion mining for prediction. In Twenty-Ninth AAAI conference on artificial intelligence, 29(1): 4214-4215
45.
go back to reference Tuarob S, Tucker CS, Salathe M, Ram N (2014) An ensemble heterogeneous classification methodology for discovering health-related knowledge in social media messages. J Biomed Inform 49:255–268CrossRef Tuarob S, Tucker CS, Salathe M, Ram N (2014) An ensemble heterogeneous classification methodology for discovering health-related knowledge in social media messages. J Biomed Inform 49:255–268CrossRef
46.
go back to reference Vasalou A, Gill AJ, Mazanderani F, Papoutsi C, Joinson A (2011) Privacy dictionary: a new resource for the automated content analysis of privacy. J Am Soc Inform Sci Technol 62(11):2095–2105CrossRef Vasalou A, Gill AJ, Mazanderani F, Papoutsi C, Joinson A (2011) Privacy dictionary: a new resource for the automated content analysis of privacy. J Am Soc Inform Sci Technol 62(11):2095–2105CrossRef
47.
go back to reference Vitak J, Blasiola S, Patil S, Litt E (2015) Balancing audience and privacy tensions on social network sites: strategies of highly engaged users. Int J Commun 9:20 Vitak J, Blasiola S, Patil S, Litt E (2015) Balancing audience and privacy tensions on social network sites: strategies of highly engaged users. Int J Commun 9:20
48.
go back to reference Wagner A, Krasnova H, Abramova O, Buxmann P, Benbasat I (2018) From˜ Privacy Calculus™ to˜ Social Calculus™: Understanding self-disclosure on social networking sites Wagner A, Krasnova H, Abramova O, Buxmann P, Benbasat I (2018) From˜ Privacy Calculus™ to˜ Social Calculus™: Understanding self-disclosure on social networking sites
49.
go back to reference Wan Y, Gao Q (2015) An ensemble sentiment classification system of twitter data for airline services analysis. 2015 IEEE international conference on data mining workshop (ICDMW), IEEE, pp. 1318–1325 Wan Y, Gao Q (2015) An ensemble sentiment classification system of twitter data for airline services analysis. 2015 IEEE international conference on data mining workshop (ICDMW), IEEE, pp. 1318–1325
50.
go back to reference Wang Q, Bhandal J, Huang S, Luo B (2017) Content-based classification of sensitive tweets. Int J Semant Comput 11(04):541–562CrossRef Wang Q, Bhandal J, Huang S, Luo B (2017) Content-based classification of sensitive tweets. Int J Semant Comput 11(04):541–562CrossRef
51.
go back to reference Yue L, Chen W, Li X, Zuo W, Yin M (2018) A survey of sentiment analysis in social media. Knowledge and Information Systems, 1–47 Yue L, Chen W, Li X, Zuo W, Yin M (2018) A survey of sentiment analysis in social media. Knowledge and Information Systems, 1–47
52.
go back to reference Zhang S, Kwok RCW, Lowry PB, Liu Z, Wu J (2019) The influence of role stress on self-disclosure on social networking sites: a conservation of resources perspective. Inform Manag 56(7):103–147CrossRef Zhang S, Kwok RCW, Lowry PB, Liu Z, Wu J (2019) The influence of role stress on self-disclosure on social networking sites: a conservation of resources perspective. Inform Manag 56(7):103–147CrossRef
53.
go back to reference Zuo Y, Zhao J, Xu K (2016) Word network topic model: a simple but general solution for short and imbalanced texts. Knowl Inf Syst 48(2):379–398CrossRef Zuo Y, Zhao J, Xu K (2016) Word network topic model: a simple but general solution for short and imbalanced texts. Knowl Inf Syst 48(2):379–398CrossRef
Metadata
Title
Tweet-scan-post: a system for analysis of sensitive private data disclosure in online social media
Authors
R. Geetha
S. Karthika
Ponnurangam Kumaraguru
Publication date
31-07-2021
Publisher
Springer London
Published in
Knowledge and Information Systems / Issue 9/2021
Print ISSN: 0219-1377
Electronic ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-021-01592-2

Other articles of this Issue 9/2021

Knowledge and Information Systems 9/2021 Go to the issue

Premium Partner