Skip to main content
Erschienen in: Soft Computing 17/2019

12.01.2019 | Foundations

SimAndro: an effective method to compute similarity of Android applications

verfasst von: Masoud Reyhani Hamednai, Gyoosik Kim, Seong-je Cho

Erschienen in: Soft Computing | Ausgabe 17/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

As the number of Android applications (apps) is increasing dramatically, users face a serious problem to find relevant apps to their needs. Therefore, there is an important demand for app search engines or recommendation services where developing an accurate similarity method is a challenging issue. Contrary to malware detection, very fewer efforts have been devoted to similarity computation of apps. Furthermore, all the existing methods use the features obtained only from the app stores such as description and rating, which could be inaccurate, varied in different stores, and affected by language barrier; they totally neglect useful information clearly capturing the app’s functionalities and behaviors that can be mined from the apps themselves such as the API calls and manifest information. In this paper, we propose an effective method called SimAndro to compute the similarity of apps, which extracts the features based on the information obtained only from apps themselves and the Android platform without using information obtained from third-party sources such as app stores. SimAndro performs both feature extraction and similarity computation where the API calls, manifest information, package name, and strings are used as features. To compute the similarity score of an app-pair, a separate similarity score is computed based on each feature, and a weighted linear combination of these four scores is regarded as the final similarity score by utilizing an automatic weighting scheme based on TreeRankSVM. The results of extensive experiments with three real-world datasets and a dataset constructed by human experts demonstrate the effectiveness of SimAndro.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
4
If the total number of methods referenced in an app exceeds 65536, the app is converted to multiple DEX files (Android developers site 2018).
 
7
We note that the app recommendation, app clustering, and malware detection topics are out of the scope of our paper. As a part of our future work, we plan to extensively study and evaluate the effectiveness of applying SimAndro to the aforementioned topics.
 
12
Except of the size and developer, which are used as features for similarity computation in references Chen et al. (2015) and Chen et al. (2016).
 
13
The translation of the app name is “thehyundai.com.”
 
14
As observed in our experimental results in Sect. 5, all similarity methods show lower accuracy with the amazon dataset than other ones since it contains more duplicate apps than other datasets.
 
15
Similarity is normally defined as a concept between two objects (Lin et al. 2012); therefore, we use only a single app as the query.
 
16
The BBM free calls and messaging app is provided by BlackBerry Limited.
 
17
The Hike messenger is a free messaging app provided by Hike Ltd.
 
Literatur
Zurück zum Zitat Android developers site. developer.android.com/studio/build/multidex.html, (December 2018) Android developers site. developer.android.com/studio/build/multidex.html, (December 2018)
Zurück zum Zitat Aafer Y, Du W, Yin H (2013) Droidapiminer: mining api-level features for robust malware detection in android. In: Proceedings of international conference on security and privacy in communication systems, pp 86–103 Aafer Y, Du W, Yin H (2013) Droidapiminer: mining api-level features for robust malware detection in android. In: Proceedings of international conference on security and privacy in communication systems, pp 86–103
Zurück zum Zitat Airola A, Pahikkala T, Salakoski T (2011) Training linear ranking svms in linearithmic time using redblack trees. Pattern Recognit Lett 32(9):1328–1336CrossRef Airola A, Pahikkala T, Salakoski T (2011) Training linear ranking svms in linearithmic time using redblack trees. Pattern Recognit Lett 32(9):1328–1336CrossRef
Zurück zum Zitat Arp D, Spreitzenbarth M, Gascon H, Rieck K (2014) Drebin: effective and explainable detection of android malware in your pocket. In: Proceedings of the 14st international conference on network and distributed system security symposium, pp 1–12 Arp D, Spreitzenbarth M, Gascon H, Rieck K (2014) Drebin: effective and explainable detection of android malware in your pocket. In: Proceedings of the 14st international conference on network and distributed system security symposium, pp 1–12
Zurück zum Zitat Backurs A, Indyk P (2015) Edit distance cannot be computed in strongly subquadratic time (unless seth is false). In: Proceedings of the 47th annual ACM symposium on theory of computing, pp 51–58 Backurs A, Indyk P (2015) Edit distance cannot be computed in strongly subquadratic time (unless seth is false). In: Proceedings of the 47th annual ACM symposium on theory of computing, pp 51–58
Zurück zum Zitat Bhandari U, Sugiyama K, Datta A, Jindal R (2013) Serendipitous recommendation for mobile apps using item-item similarity graph. In: Proceedings of the 10th Asia information retrieval societies conference, pp 440–451 Bhandari U, Sugiyama K, Datta A, Jindal R (2013) Serendipitous recommendation for mobile apps using item-item similarity graph. In: Proceedings of the 10th Asia information retrieval societies conference, pp 440–451
Zurück zum Zitat Chae D-K, Kim S-W, Cho S-J, Kim Y (2015) Effective and efficient detection of software theft via dynamic API authority vectors. J Syst Softw 110:1–9CrossRef Chae D-K, Kim S-W, Cho S-J, Kim Y (2015) Effective and efficient detection of software theft via dynamic API authority vectors. J Syst Softw 110:1–9CrossRef
Zurück zum Zitat Chen N, Hoi S, Li S, Xiao X (2015) Simapp: a framework for detecting similar mobile applications by online kernel learning. In: Proceedings of the 8th ACM international conference on web search and data mining, pp 305–314 Chen N, Hoi S, Li S, Xiao X (2015) Simapp: a framework for detecting similar mobile applications by online kernel learning. In: Proceedings of the 8th ACM international conference on web search and data mining, pp 305–314
Zurück zum Zitat Chen N, Hoi S, Li S, Xiao X (2016) Mobile app tagging. In: Proceedings of the 9th ACM international conference on web search and data mining, pp 63–72 Chen N, Hoi S, Li S, Xiao X (2016) Mobile app tagging. In: Proceedings of the 9th ACM international conference on web search and data mining, pp 63–72
Zurück zum Zitat Chiki NF, Rothenburger B, Gilles N (2008) Combining link and content information for scientific topics discovery. In: Proceedings of 20th IEEE international conference on tools with artificial intelligence, ICTAI, pp 211–214 Chiki NF, Rothenburger B, Gilles N (2008) Combining link and content information for scientific topics discovery. In: Proceedings of 20th IEEE international conference on tools with artificial intelligence, ICTAI, pp 211–214
Zurück zum Zitat Crussell J, Gibler C, Chen H (2012) Attack of the clones: detecting cloned applications on android markets. In: Proceedings of the European symposium on research in computer security, pp 37–54 Crussell J, Gibler C, Chen H (2012) Attack of the clones: detecting cloned applications on android markets. In: Proceedings of the European symposium on research in computer security, pp 37–54
Zurück zum Zitat Crussell J, Gibler C, Chen H (2016) Andarwin: scalable detection of android application clones based on semantics. IEEE Trans Mobile Comput 14(10):2007–2019CrossRef Crussell J, Gibler C, Chen H (2016) Andarwin: scalable detection of android application clones based on semantics. IEEE Trans Mobile Comput 14(10):2007–2019CrossRef
Zurück zum Zitat Do Q, Martini B, Choo K-K (2015) Exfiltrating data from android devices. Comput Secur 48(C):74–91CrossRef Do Q, Martini B, Choo K-K (2015) Exfiltrating data from android devices. Comput Secur 48(C):74–91CrossRef
Zurück zum Zitat Dutta B, Shinde JV (2017) Intuitionistic fuzzy clustering based segmentation of spine mr image. Int Res J Eng Technol 4(7):790–794 Dutta B, Shinde JV (2017) Intuitionistic fuzzy clustering based segmentation of spine mr image. Int Res J Eng Technol 4(7):790–794
Zurück zum Zitat Faruki P, Bharmal A, Laxmi V, Ganmoor V, Gaur M (2015) Android security: a survey of issues, malware penetration, and defenses. IEEE Commun Surv Tutor 17(2):998–1022CrossRef Faruki P, Bharmal A, Laxmi V, Ganmoor V, Gaur M (2015) Android security: a survey of issues, malware penetration, and defenses. IEEE Commun Surv Tutor 17(2):998–1022CrossRef
Zurück zum Zitat Faruki P, Laxmi V, Bharmal A, Gaur MS, Ganmoor V (2015) Androsimilar: Robust signature for detecting cariants of android malware. Inf Secur Appl 22:66–80 Faruki P, Laxmi V, Bharmal A, Gaur MS, Ganmoor V (2015) Androsimilar: Robust signature for detecting cariants of android malware. Inf Secur Appl 22:66–80
Zurück zum Zitat Feizollah A, Anuar NB, Salleh R, Abdul Wahab A (2015) A review on feature selection in mobile malware detection. Digit Investig 13(C):22–37CrossRef Feizollah A, Anuar NB, Salleh R, Abdul Wahab A (2015) A review on feature selection in mobile malware detection. Digit Investig 13(C):22–37CrossRef
Zurück zum Zitat Hamedani MR, Kim S-W (2016) Simcc-at: a method to compute similarity of scientific papers with automatic parameter tuning. In: Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval, pp 1005–1008 Hamedani MR, Kim S-W (2016) Simcc-at: a method to compute similarity of scientific papers with automatic parameter tuning. In: Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval, pp 1005–1008
Zurück zum Zitat Hamedani MR, Kim S (2017) Jacsim: an accurate and efficient link-based similarity measure in graphs. Inf Sci 414:203–224CrossRef Hamedani MR, Kim S (2017) Jacsim: an accurate and efficient link-based similarity measure in graphs. Inf Sci 414:203–224CrossRef
Zurück zum Zitat Hamedani MR, Kim S-W, Kim D-J (2016) Simcc: a novel method to consider both content and citations for computing similarity of scientific papers. Inf Sci 334–335(C):273–292CrossRef Hamedani MR, Kim S-W, Kim D-J (2016) Simcc: a novel method to consider both content and citations for computing similarity of scientific papers. Inf Sci 334–335(C):273–292CrossRef
Zurück zum Zitat Jang J-W, Kang H, Woo J, Aziz M, Kim HK (2015) Andro-autopsy: Anti-malware system based on similarity matching of malware and malware creator-centric information. Digit Investig 14:17–35CrossRef Jang J-W, Kang H, Woo J, Aziz M, Kim HK (2015) Andro-autopsy: Anti-malware system based on similarity matching of malware and malware creator-centric information. Digit Investig 14:17–35CrossRef
Zurück zum Zitat Kim Y, Cho S-J, Han S, You I (2018) A software classification scheme using binary level characteristics for efficient software filtering. Soft Comput 22(2):595–606CrossRef Kim Y, Cho S-J, Han S, You I (2018) A software classification scheme using binary level characteristics for efficient software filtering. Soft Comput 22(2):595–606CrossRef
Zurück zum Zitat Ko J, Shim H, Kim D, Jeong Y-S, Cho S-j, Park M, Han S, Kim SB (2013) Measuring similarity of android applications via reversing and k-gram birthmarking. In: Proceedings of research in adaptive and convergent systems, pp 336–341 Ko J, Shim H, Kim D, Jeong Y-S, Cho S-j, Park M, Han S, Kim SB (2013) Measuring similarity of android applications via reversing and k-gram birthmarking. In: Proceedings of research in adaptive and convergent systems, pp 336–341
Zurück zum Zitat Lee K, Ban Y, Lee S (2017) Efficient depth enhancement using a combination of color and depth information. Sensors 17(7):1–27CrossRef Lee K, Ban Y, Lee S (2017) Efficient depth enhancement using a combination of color and depth information. Sensors 17(7):1–27CrossRef
Zurück zum Zitat Lee S, Dolby J, Ryu S (2016) Hybridroid: static analysis framework for android hybrid applications. In: Proceedings of the 31st IEEE/ACM international conference on automated software engineering, pp 250–261 Lee S, Dolby J, Ryu S (2016) Hybridroid: static analysis framework for android hybrid applications. In: Proceedings of the 31st IEEE/ACM international conference on automated software engineering, pp 250–261
Zurück zum Zitat Levin J (2015) Android internals—a confectioner’s cookbook. vol I. Cambridge, MA, USA Levin J (2015) Android internals—a confectioner’s cookbook. vol I. Cambridge, MA, USA
Zurück zum Zitat Li M, Li Q, Long Y (2017) Representation learning of multiword expressions with compositionality constraint. In: Proceedings of the international conference on knowledge science, engineering and management, pp 507–519 Li M, Li Q, Long Y (2017) Representation learning of multiword expressions with compositionality constraint. In: Proceedings of the international conference on knowledge science, engineering and management, pp 507–519
Zurück zum Zitat Lin Z, Lyu MR, King I (2012) Matchsim: a novel similarity measure based on maximum neighborhood matching. Knowl Inf Syst 32(1):141–166CrossRef Lin Z, Lyu MR, King I (2012) Matchsim: a novel similarity measure based on maximum neighborhood matching. Knowl Inf Syst 32(1):141–166CrossRef
Zurück zum Zitat Magdy W, Jones GJF (2010) Pres: A score metric for evaluating recall-oriented information retrieval applications. In: Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval, pp 611–618 Magdy W, Jones GJF (2010) Pres: A score metric for evaluating recall-oriented information retrieval applications. In: Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval, pp 611–618
Zurück zum Zitat Manning CD, Raghavan P, Schutze H (2008) Introduction to information retrieval. Cambridge University Press, CambridgeCrossRefMATH Manning CD, Raghavan P, Schutze H (2008) Introduction to information retrieval. Cambridge University Press, CambridgeCrossRefMATH
Zurück zum Zitat Motta JM, Ladouceur J (2017) A CRF machine learning model reinforced by ontological knowledge for document summarization. In: Proceedings of the international conference artificial intelligence, pp 127–135 Motta JM, Ladouceur J (2017) A CRF machine learning model reinforced by ontological knowledge for document summarization. In: Proceedings of the international conference artificial intelligence, pp 127–135
Zurück zum Zitat Narudin F, Feizollah A, Anuar N, Gani A (2016) Evaluation of machine learning classifiers for mobile malware detection. Soft Comput Fusion Found Methodol Appl 20(1):343–357 Narudin F, Feizollah A, Anuar N, Gani A (2016) Evaluation of machine learning classifiers for mobile malware detection. Soft Comput Fusion Found Methodol Appl 20(1):343–357
Zurück zum Zitat Ng T (2016) Prefix distance between regular languages. In: Proceedings of the international conference on implementation and application of automata, pp 224–235 Ng T (2016) Prefix distance between regular languages. In: Proceedings of the international conference on implementation and application of automata, pp 224–235
Zurück zum Zitat Rastogi V, Chen Y, Jiang X (2014) Catch me if you can: evaluating android anti-malware against transformation attacks. IEEE Trans Inf Forensics Secur 9(1):99–108CrossRef Rastogi V, Chen Y, Jiang X (2014) Catch me if you can: evaluating android anti-malware against transformation attacks. IEEE Trans Inf Forensics Secur 9(1):99–108CrossRef
Zurück zum Zitat Sanz B, Santos I, Laorden C, Ugarte-Pedrero X, Bringas PGa (2012) On the automatic categorisation of android applications. In: Proceedings of the 9th annual IEEE consumer communications and networking conference-security and content protection, pp 149–153 Sanz B, Santos I, Laorden C, Ugarte-Pedrero X, Bringas PGa (2012) On the automatic categorisation of android applications. In: Proceedings of the 9th annual IEEE consumer communications and networking conference-security and content protection, pp 149–153
Zurück zum Zitat Sarma B, Li N, Gates C, Potharaju R, Nita-Rotaru C, Molloy I (2012) Android permissions: a perspective combining risks and benefits. In: Proceedings of the 17th ACM symposium on access control models and technologies, pp 13–22 Sarma B, Li N, Gates C, Potharaju R, Nita-Rotaru C, Molloy I (2012) Android permissions: a perspective combining risks and benefits. In: Proceedings of the 17th ACM symposium on access control models and technologies, pp 13–22
Zurück zum Zitat Sugiyama K, Kan M-Y (2013) Exploiting potential citation papers in scholarly paper recommendation. In: Proceedings of the 13th ACM/IEEE joint conference on digital libraries, pp 153–162 Sugiyama K, Kan M-Y (2013) Exploiting potential citation papers in scholarly paper recommendation. In: Proceedings of the 13th ACM/IEEE joint conference on digital libraries, pp 153–162
Zurück zum Zitat Wei J, He J, Kai C, Zhou Y, Tang Z (2017) Collaborative filtering and deep learning based recommendation system for cold start items. Expert Syst Appl 69(1):29–39CrossRef Wei J, He J, Kai C, Zhou Y, Tang Z (2017) Collaborative filtering and deep learning based recommendation system for cold start items. Expert Syst Appl 69(1):29–39CrossRef
Zurück zum Zitat Wei T-E, Tyan H-R, Jeng A, Lee H-M, Liao H-Y, Wang J-C (2015) Droidexec: root exploit malware recognition against wide variability via folding redundant function-relation graph. In: Proceedings of the 17st international conference on advanced communication technology, pp 161–169 Wei T-E, Tyan H-R, Jeng A, Lee H-M, Liao H-Y, Wang J-C (2015) Droidexec: root exploit malware recognition against wide variability via folding redundant function-relation graph. In: Proceedings of the 17st international conference on advanced communication technology, pp 161–169
Zurück zum Zitat Wu D-J, Mao C-H, Wei T-E, Lee H-M, Wu K-P (2012) Droidmat: android malware detection through manifest and API calls tracing. In: Proceedings of the 7th Asia joint conference on information security, pp 62–96 Wu D-J, Mao C-H, Wei T-E, Lee H-M, Wu K-P (2012) Droidmat: android malware detection through manifest and API calls tracing. In: Proceedings of the 7th Asia joint conference on information security, pp 62–96
Zurück zum Zitat Yerima S, Sezer S, McWilliams G, Igor M (2013) A new android malware detection approach using bayesian classification. In: Proceedings of the 27th IEEE international conference on advanced information networking and applications, pp 121–128 Yerima S, Sezer S, McWilliams G, Igor M (2013) A new android malware detection approach using bayesian classification. In: Proceedings of the 27th IEEE international conference on advanced information networking and applications, pp 121–128
Zurück zum Zitat Yin P, Luo P, Lee W-C, Wang M (2013) App recommendation: a contest between satisfaction and temptation. In: Proceedings of the 6th ACM international conference on web search and data mining, pp 395–404 Yin P, Luo P, Lee W-C, Wang M (2013) App recommendation: a contest between satisfaction and temptation. In: Proceedings of the 6th ACM international conference on web search and data mining, pp 395–404
Zurück zum Zitat Zhang M, Duan Y, Yin H, Zhao Z (2014) Semantics-aware android malware classification using weighted contextual API dependency graphs. In: Proceedings of the ACM SIGSAC conference on computer and communications security, pp 1105–1116 Zhang M, Duan Y, Yin H, Zhao Z (2014) Semantics-aware android malware classification using weighted contextual API dependency graphs. In: Proceedings of the ACM SIGSAC conference on computer and communications security, pp 1105–1116
Zurück zum Zitat Zheng M, Sun M, Lui J (2013) Droid analytics: a signature based analytic system to collect, extract, analyze and associate android malware. In: Proceedings of the 12st IEEE international conference on trust, security and privacy in computing and communications, pp 163–171 Zheng M, Sun M, Lui J (2013) Droid analytics: a signature based analytic system to collect, extract, analyze and associate android malware. In: Proceedings of the 12st IEEE international conference on trust, security and privacy in computing and communications, pp 163–171
Zurück zum Zitat Zhou W, Zhou Y, Grace M, Jian X, Zou S (2013) Fast, scalable detection of piggybacked mobile applications. In: Proceedings of the 3th ACM conference on data and application security and privacy, pp 185–196 Zhou W, Zhou Y, Grace M, Jian X, Zou S (2013) Fast, scalable detection of piggybacked mobile applications. In: Proceedings of the 3th ACM conference on data and application security and privacy, pp 185–196
Metadaten
Titel
SimAndro: an effective method to compute similarity of Android applications
verfasst von
Masoud Reyhani Hamednai
Gyoosik Kim
Seong-je Cho
Publikationsdatum
12.01.2019
Verlag
Springer Berlin Heidelberg
Erschienen in
Soft Computing / Ausgabe 17/2019
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-019-03755-4

Weitere Artikel der Ausgabe 17/2019

Soft Computing 17/2019 Zur Ausgabe

Premium Partner