Skip to main content
Erschienen in: Empirical Software Engineering 3/2024

01.05.2024

VioDroid-Finder: automated evaluation of compliance and consistency for Android apps

verfasst von: Junren Chen, Cheng Huang, Jiaxuan Han

Erschienen in: Empirical Software Engineering | Ausgabe 3/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Rapid growth in the variety and quantity of apps makes it difficult for users to protect their privacy, although existing regulations have been introduced and the Android ecosystem is constantly being improved, there are still violations as privacy policies may not fully comply with regulations, and app behavior may not be fully consistent with privacy policies. To solve such issues, this paper proposes an automated method called VioDroid-Finder aiming at the evaluation of compliance and consistency for Android apps. We first study existing common regulations and conclude the privacy policy content into 7 aspects (i.e., privacy categories), for privacy policies, different compliance rules are required to be complied with in each privacy category. Secondly, we present a policy structure parser model based on the structure extraction/rebuilding method (which can convert the unstructured text to an XML tree) and subtitle similarity calculation algorithm. Thirdly, we propose a violation analyzer using the BERT model to classify each sentence in the privacy policy, we collect existing issues and combine them with manual observations to define 6 types of violations and detect them based on classification results. Then, we propose an inconsistency analyzer that converts permissions, APIs, and GUI into a set of personal information based on static analysis, inconsistencies are detected by comparing that set with personal information declared in the privacy policy. Finally, we evaluate 600 Chinese apps using the proposed method, from which we detect many violations and inconsistencies reflecting the current widespread privacy violation issues.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
Zurück zum Zitat Bui D, Shin KG, Choi J-M, Shin J (2021) Automated extraction and presentation of data practices in privacy policies. Proc Priv Enhancing Technol 2021(2):88–110CrossRef Bui D, Shin KG, Choi J-M, Shin J (2021) Automated extraction and presentation of data practices in privacy policies. Proc Priv Enhancing Technol 2021(2):88–110CrossRef
Zurück zum Zitat CEJAS OA, Abualhaija S, Torre D, Sabetzadeh M, Briand L (2021) AI-enabled automation for completeness checking of privacy policies. IEEE Trans Softw Eng CEJAS OA, Abualhaija S, Torre D, Sabetzadeh M, Briand L (2021) AI-enabled automation for completeness checking of privacy policies. IEEE Trans Softw Eng
Zurück zum Zitat Coppola R, Morisio M, Torchiano M, Ardito L (2019) Scripted GUI testing of Android open-source apps: evolution of test code and fragility causes. Empir Softw Eng 24:3205–3248CrossRef Coppola R, Morisio M, Torchiano M, Ardito L (2019) Scripted GUI testing of Android open-source apps: evolution of test code and fragility causes. Empir Softw Eng 24:3205–3248CrossRef
Zurück zum Zitat Cui H, Trimananda R, Markopoulou A, Jordan S (2022) PoliGraph: automated privacy policy analysis using knowledge graphs. arXiv:2210.06746 Cui H, Trimananda R, Markopoulou A, Jordan S (2022) PoliGraph: automated privacy policy analysis using knowledge graphs. arXiv:​2210.​06746
Zurück zum Zitat Custers B, Sears AM, Dechesne F, Georgieva I, Tani T, Van der Hof S (2019) EU personal data protection in policy and practice. SpringerCrossRef Custers B, Sears AM, Dechesne F, Georgieva I, Tani T, Van der Hof S (2019) EU personal data protection in policy and practice. SpringerCrossRef
Zurück zum Zitat Daoudi N, Allix K, Bissyandé TF, Klein J (2023) Assessing the opportunity of combining state-of-the-art android malware detectors. Empir Softw Eng 28(2):22CrossRef Daoudi N, Allix K, Bissyandé TF, Klein J (2023) Assessing the opportunity of combining state-of-the-art android malware detectors. Empir Softw Eng 28(2):22CrossRef
Zurück zum Zitat Demissie BF, Ceccato M, Shar LK (2020) Security analysis of permission re-delegation vulnerabilities in android apps. Empir Softw Eng 25:5084–5136CrossRef Demissie BF, Ceccato M, Shar LK (2020) Security analysis of permission re-delegation vulnerabilities in android apps. Empir Softw Eng 25:5084–5136CrossRef
Zurück zum Zitat Desnos A, Gueguen G (2013) Androguard-reverse engineering, malware and goodware analysis of android applications. URL code. google. com/p/androguard, 153 Desnos A, Gueguen G (2013) Androguard-reverse engineering, malware and goodware analysis of android applications. URL code. google. com/p/androguard, 153
Zurück zum Zitat Devlin J, Chang M-W, Lee K, Toutanova K (2018) BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 Devlin J, Chang M-W, Lee K, Toutanova K (2018) BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:​1810.​04805
Zurück zum Zitat Elluri L, Joshi KP, Kotal A (2020) Measuring semantic similarity across EU GDPR regulation and cloud privacy policies. In: 2020 IEEE international conference on big data (Big Data). IEEE, pp 3963–3978 Elluri L, Joshi KP, Kotal A (2020) Measuring semantic similarity across EU GDPR regulation and cloud privacy policies. In: 2020 IEEE international conference on big data (Big Data). IEEE, pp 3963–3978
Zurück zum Zitat Fan O, Jian X (2022) S3Feature: a static sensitive subgraph-based feature for android malware detection. Comput Secur 112:102513CrossRef Fan O, Jian X (2022) S3Feature: a static sensitive subgraph-based feature for android malware detection. Comput Secur 112:102513CrossRef
Zurück zum Zitat Fan M, Yu L, Chen S, Zhou H, Luo X, Li S, Liu Y, Liu J, Liu T (2020) An empirical evaluation of GDPR compliance violations in Android mHealth apps. In: 2020 IEEE 31st international symposium on software reliability engineering (ISSRE). IEEE, pp 253–264 Fan M, Yu L, Chen S, Zhou H, Luo X, Li S, Liu Y, Liu J, Liu T (2020) An empirical evaluation of GDPR compliance violations in Android mHealth apps. In: 2020 IEEE 31st international symposium on software reliability engineering (ISSRE). IEEE, pp 253–264
Zurück zum Zitat He H, Choi JD (2021) The stem cell hypothesis: dilemma behind multi-task learning with transformer encoders. In: Proceedings of the 2021 conference on empirical methods in natural language processing, pp 5555–5577, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics. https://aclanthology.org/2021.emnlp-main.451 He H, Choi JD (2021) The stem cell hypothesis: dilemma behind multi-task learning with transformer encoders. In: Proceedings of the 2021 conference on empirical methods in natural language processing, pp 5555–5577, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics. https://​aclanthology.​org/​2021.​emnlp-main.​451
Zurück zum Zitat Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef
Zurück zum Zitat Huang J, Li Z, Xiao X, Wu Z, Lu K, Zhang X, Jiang G (2015) \(\{\)SUPOR\(\}\): precise and scalable sensitive user input detection for android apps. In: 24th USENIX security symposium (USENIX Security 15). pp 977–992 Huang J, Li Z, Xiao X, Wu Z, Lu K, Zhang X, Jiang G (2015) \(\{\)SUPOR\(\}\): precise and scalable sensitive user input detection for android apps. In: 24th USENIX security symposium (USENIX Security 15). pp 977–992
Zurück zum Zitat Huang J, Zhang X, Tan L, Wang P, Liang B (2014) Asdroid: detecting stealthy behaviors in android applications by user interface and program behavior contradiction. In: Proceedings of the 36th international conference on software engineering. pp 1036–1046 Huang J, Zhang X, Tan L, Wang P, Liang B (2014) Asdroid: detecting stealthy behaviors in android applications by user interface and program behavior contradiction. In: Proceedings of the 36th international conference on software engineering. pp 1036–1046
Zurück zum Zitat Kaur J, Dara RA, Obimbo C, Song F, Menard K (2018) A comprehensive keyword analysis of online privacy policies. Inf Secur J: Glob Perspect 27(5–6):260–275 Kaur J, Dara RA, Obimbo C, Song F, Menard K (2018) A comprehensive keyword analysis of online privacy policies. Inf Secur J: Glob Perspect 27(5–6):260–275
Zurück zum Zitat Liu G, Guo J (2019) Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing 337:325–338CrossRef Liu G, Guo J (2019) Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing 337:325–338CrossRef
Zurück zum Zitat Liu X, Liu J, Zhu S, Wang W, Zhang X (2019) Privacy risk analysis and mitigation of analytics libraries in the android ecosystem. IEEE Trans Mobile Comput 19(5):1184–1199CrossRef Liu X, Liu J, Zhu S, Wang W, Zhang X (2019) Privacy risk analysis and mitigation of analytics libraries in the android ecosystem. IEEE Trans Mobile Comput 19(5):1184–1199CrossRef
Zurück zum Zitat Liu K, Xu G, Zhang X, Xu G, Zhao Z (2022) Evaluating the privacy policy of android apps: a privacy policy compliance study for popular apps in China and Europe. Sci Program 2022 Liu K, Xu G, Zhang X, Xu G, Zhao Z (2022) Evaluating the privacy policy of android apps: a privacy policy compliance study for popular apps in China and Europe. Sci Program 2022
Zurück zum Zitat Liu S, Zhao B, Guo R, Meng G, Zhang F, Zhang M (2021) Have you been properly notified? automatic compliance analysis of privacy policy text with GDPR Article 13. In: Proceedings of the web conference, vol 2021, pp 2154–2164 Liu S, Zhao B, Guo R, Meng G, Zhang F, Zhang M (2021) Have you been properly notified? automatic compliance analysis of privacy policy text with GDPR Article 13. In: Proceedings of the web conference, vol 2021, pp 2154–2164
Zurück zum Zitat McDonald AM, Cranor LF (2008) The cost of reading privacy policies. Isjlp 4:543 McDonald AM, Cranor LF (2008) The cost of reading privacy policies. Isjlp 4:543
Zurück zum Zitat Nan Y, Yang Z, Yang M, Zhou S, Zhang Y, Guofei G, Wang X, Sun L (2016) Identifying user-input privacy in mobile applications at a large scale. IEEE Trans Inf Forensics Secur 12(3):647–661CrossRef Nan Y, Yang Z, Yang M, Zhou S, Zhang Y, Guofei G, Wang X, Sun L (2016) Identifying user-input privacy in mobile applications at a large scale. IEEE Trans Inf Forensics Secur 12(3):647–661CrossRef
Zurück zum Zitat Nan Y, Yang M, Yang Z, Zhou S, Gu G, Wang X (2015) \(\{\)UIPicker\(\}\):\(\{\)User-Input\(\}\) privacy identification in mobile applications. In: 24th USENIX security symposium (USENIX Security 15). pp 993–1008 Nan Y, Yang M, Yang Z, Zhou S, Gu G, Wang X (2015) \(\{\)UIPicker\(\}\):\(\{\)User-Input\(\}\) privacy identification in mobile applications. In: 24th USENIX security symposium (USENIX Security 15). pp 993–1008
Zurück zum Zitat Nejad NM, Jabat P, Nedelchev R, Scerri S, Graux D (2020) Establishing a strong baseline for privacy policy classification. In: IFIP international conference on ICT systems security and privacy protection. Springer, pp 370–383 Nejad NM, Jabat P, Nedelchev R, Scerri S, Graux D (2020) Establishing a strong baseline for privacy policy classification. In: IFIP international conference on ICT systems security and privacy protection. Springer, pp 370–383
Zurück zum Zitat Ni Z, Wang Y, Qian Y et al (2021) Privacy policy compliance of chronic disease management apps in China: scale development and content evaluation. JMIR mHealth and uHealth 9(1):e23409CrossRef Ni Z, Wang Y, Qian Y et al (2021) Privacy policy compliance of chronic disease management apps in China: scale development and content evaluation. JMIR mHealth and uHealth 9(1):e23409CrossRef
Zurück zum Zitat Okoyomon E, Samarin N, Wijesekera P, On AEB, Vallina-Rodriguez N, Reyes I, Feal Á, Egelman S et al (2019) On the ridiculousness of notice and consent: contradictions in app privacy policies. In: Workshop on technology and consumer protection (ConPro 2019), in conjunction with the 39th IEEE symposium on security and privacy Okoyomon E, Samarin N, Wijesekera P, On AEB, Vallina-Rodriguez N, Reyes I, Feal Á, Egelman S et al (2019) On the ridiculousness of notice and consent: contradictions in app privacy policies. In: Workshop on technology and consumer protection (ConPro 2019), in conjunction with the 39th IEEE symposium on security and privacy
Zurück zum Zitat Ramos J et al (2003) Using TF-IDF to determine word relevance in document queries. In: Proceedings of the first instructional conference on machine learning, vol 242. Citeseer, pp 29–48 Ramos J et al (2003) Using TF-IDF to determine word relevance in document queries. In: Proceedings of the first instructional conference on machine learning, vol 242. Citeseer, pp 29–48
Zurück zum Zitat Rehurek R, Sojka P (2011) Gensim–Python framework for vector space modelling. NLP Centre, Faculty of Informatics, Masaryk University, Brno, Czech Republic 3(2) Rehurek R, Sojka P (2011) Gensim–Python framework for vector space modelling. NLP Centre, Faculty of Informatics, Masaryk University, Brno, Czech Republic 3(2)
Zurück zum Zitat Sanh V, Debut L, Chaumond J, Wolf T (2019) DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv:1910.01108 Sanh V, Debut L, Chaumond J, Wolf T (2019) DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv:​1910.​01108
Zurück zum Zitat Sathyendra KM, Wilson S, Schaub F, Zimmeck S, Sadeh N (2017) Identifying the provision of choices in privacy policy text. In: Proceedings of the 2017 conference on empirical methods in natural language processing. pp 2774–2779 Sathyendra KM, Wilson S, Schaub F, Zimmeck S, Sadeh N (2017) Identifying the provision of choices in privacy policy text. In: Proceedings of the 2017 conference on empirical methods in natural language processing. pp 2774–2779
Zurück zum Zitat Shar LK, Demissie BF, Ceccato M, Tun YN, Lo D, Jiang L, Bienert C (2023) Experimental comparison of features, analyses, and classifiers for android malware detection. Empir Softw Eng 28(6):130CrossRef Shar LK, Demissie BF, Ceccato M, Tun YN, Lo D, Jiang L, Bienert C (2023) Experimental comparison of features, analyses, and classifiers for android malware detection. Empir Softw Eng 28(6):130CrossRef
Zurück zum Zitat Slavin R, Wang X, Hosseini MB, Hester J, Krishnan R, Bhatia J, Breaux TD, Niu J (2016a) PVDetector: a detector of privacy-policy violations for Android apps. In: Proceedings of the international conference on mobile software engineering and systems. pp 299–300 Slavin R, Wang X, Hosseini MB, Hester J, Krishnan R, Bhatia J, Breaux TD, Niu J (2016a) PVDetector: a detector of privacy-policy violations for Android apps. In: Proceedings of the international conference on mobile software engineering and systems. pp 299–300
Zurück zum Zitat Slavin R, Wang X, Hosseini MB, Hester J, Krishnan R, Bhatia J, Breaux TD, Niu J (2016b) Toward a framework for detecting privacy policy violations in android application code. In: Proceedings of the 38th international conference on software engineering. pp 25–36 Slavin R, Wang X, Hosseini MB, Hester J, Krishnan R, Bhatia J, Breaux TD, Niu J (2016b) Toward a framework for detecting privacy policy violations in android application code. In: Proceedings of the 38th international conference on software engineering. pp 25–36
Zurück zum Zitat Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Advances in neural information processing systems 30 Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Advances in neural information processing systems 30
Zurück zum Zitat Verderame L, Caputo D, Romdhana A, Merlo A (2020) On the (un) reliability of privacy policies in android apps. In: 2020 International joint conference on neural networks (IJCNN). IEEE, pp 1–9 Verderame L, Caputo D, Romdhana A, Merlo A (2020) On the (un) reliability of privacy policies in android apps. In: 2020 International joint conference on neural networks (IJCNN). IEEE, pp 1–9
Zurück zum Zitat Wang Y, Chen Y, Ye F, Liu H, Yang J (2019) Implications of smartphone user privacy leakage from the advertiser’s perspective. Pervasive Mob Comput 53:13–32CrossRef Wang Y, Chen Y, Ye F, Liu H, Yang J (2019) Implications of smartphone user privacy leakage from the advertiser’s perspective. Pervasive Mob Comput 53:13–32CrossRef
Zurück zum Zitat Wilson S, Schaub F, Dara AA, Liu F, Cherivirala S, Leon PG, Andersen MS, Zimmeck S, Sathyendra KM, Russell NC et al (2016) The creation and analysis of a website privacy policy corpus. In: Proceedings of the 54th annual meeting of the association for computational linguistics (vol 1: Long Papers), pp 1330–1340 Wilson S, Schaub F, Dara AA, Liu F, Cherivirala S, Leon PG, Andersen MS, Zimmeck S, Sathyendra KM, Russell NC et al (2016) The creation and analysis of a website privacy policy corpus. In: Proceedings of the 54th annual meeting of the association for computational linguistics (vol 1: Long Papers), pp 1330–1340
Zurück zum Zitat Yu L, Luo X, Chen J, Zhou H, Zhang T, Chang H, Leung HKN (2018) PPChecker: towards accessing the trustworthiness of android apps’ privacy policies. IEEE Trans Softw Eng 47(2):221–242CrossRef Yu L, Luo X, Chen J, Zhou H, Zhang T, Chang H, Leung HKN (2018) PPChecker: towards accessing the trustworthiness of android apps’ privacy policies. IEEE Trans Softw Eng 47(2):221–242CrossRef
Zurück zum Zitat Zaeem RN, Barber KS (2021) A large publicly available corpus of website privacy policies based on DMOZ. In: Proceedings of the eleventh ACM conference on data and application security and privacy. pp 143–148 Zaeem RN, Barber KS (2021) A large publicly available corpus of website privacy policies based on DMOZ. In: Proceedings of the eleventh ACM conference on data and application security and privacy. pp 143–148
Zurück zum Zitat Zimmeck S, Story P, Smullen D, Ravichander A, Wang Z, Reidenberg JR, Russell NC, Sadeh N (2019) MAPS: scaling privacy compliance analysis to a million apps. Proc Priv Enhancing Tech 2019:66 Zimmeck S, Story P, Smullen D, Ravichander A, Wang Z, Reidenberg JR, Russell NC, Sadeh N (2019) MAPS: scaling privacy compliance analysis to a million apps. Proc Priv Enhancing Tech 2019:66
Metadaten
Titel
VioDroid-Finder: automated evaluation of compliance and consistency for Android apps
verfasst von
Junren Chen
Cheng Huang
Jiaxuan Han
Publikationsdatum
01.05.2024
Verlag
Springer US
Erschienen in
Empirical Software Engineering / Ausgabe 3/2024
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-024-10470-8

Weitere Artikel der Ausgabe 3/2024

Empirical Software Engineering 3/2024 Zur Ausgabe

Premium Partner