skip to main content
10.1145/2961111.2962600acmconferencesArticle/Chapter ViewAbstractPublication PagesesemConference Proceedingsconference-collections
research-article

Clustering Mobile Apps Based on Mined Textual Features

Authors Info & Claims
Published:08 September 2016Publication History

ABSTRACT

Context: Categorising software systems according to their functionality yields many benefits to both users and developers. Goal: In order to uncover the latent clustering of mobile apps in app stores, we propose a novel technique that measures app similarity based on claimed behaviour. Method: Features are extracted using information retrieval augmented with ontological analysis and used as attributes to characterise apps. These attributes are then used to cluster the apps using agglomerative hierarchical clustering. We empirically evaluate our approach on 17,877 apps mined from the BlackBerry and Google app stores in 2014. Results: The results show that our approach dramatically improves the existing categorisation quality for both Blackberry (from 0.02 to 0.41 on average) and Google (from 0.03 to 0.21 on average) stores. We also find a strong Spearman rank correlation (ρ= 0.96 for Google and ρ= 0.99 for BlackBerry) between the number of apps and the ideal granularity within each category, indicating that ideal granularity increases with category size, as expected. Conclusions: Current categorisation in the app stores studied do not exhibit a good classification quality in terms of the claimed feature space. However, a better quality can be achieved using a good feature extraction technique and a traditional clustering method.

References

  1. About WordNet. http://wordnet.princeton.edu/. Accessed: 2016-01-29.Google ScholarGoogle Scholar
  2. BlackBerry World. https://appworld.blackberry.com/webstore/. Accessed: 2014-08-23.Google ScholarGoogle Scholar
  3. Elevate - Brain Training - Google Play. https://play.google.com/store/apps/details?id=com.wonder. Accessed: 2016-01-29.Google ScholarGoogle Scholar
  4. Google Play. https://play.google.com/store/apps. Accessed: 2014-08-23.Google ScholarGoogle Scholar
  5. Mobile Learn™-Google Play. https://play.google. com/store/apps/details?id=com.blackboard.android. Accessed: 2016-01-29.Google ScholarGoogle Scholar
  6. A. Al-Subaihin, A. Finkelstein, M. Harman, Y. Jia, W. Martin, F. Sarro, and Y. Zhang. App store mining and analysis. In DeMobile'15, pages 1--2, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Al-Subaihin, M. Harman, Y. Jia, W. Martin, F. Sarro, and Y. Zhang. Mobile app and app store analysis, testing and optimisation. In MobileSoft'16, pages 243--244, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. W. Albert and T. Tullis. Measuring the User Experience: Collecting, Analyzing, and Presenting Usability Metrics. Newnes, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. E. R. Babbie. The practice of social research, volume 112. Wadsworth publishing company Belmont, CA, 1998.Google ScholarGoogle Scholar
  10. J. J. Bartko. The intraclass correlation coefficient as a measure of reliability. Psychological reports, 19(1):3--11, 1966.Google ScholarGoogle ScholarCross RefCross Ref
  11. F. Can and E. A. Ozkarahan. Concepts and effectiveness of the cover-coefficient-based clustering methodology for text databases. ACM Trans. Database Syst., 15(4):483--517, Dec. 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Cohen. Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychological Bulletin, 70(4):213--220, 1968.Google ScholarGoogle ScholarCross RefCross Ref
  13. J.-M. Davril, E. Delfosse, N. Hariri, M. Acher, J. Cleland-Huang, and P. Heymans. Feature model extraction from large collections of informal product descriptions. In FSE 2013, pages 290--300, Aug. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. I. S. Dhillon and D. S. Modha. Concept Decompositions for Large Sparse Text Data Using Clustering. Machine Learning, 42(1-2):143--175, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. H. Dumitru, M. Gibiec, N. Hariri, J. Cleland-Huang, B. Mobasher, C. Castro-Herrera, and M. Mirakhorli. On-demand feature recommendations derived from mining public product descriptions. In ICSE '11, pages 181--190, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Escobar-Avila, M. Linares-Vásquez, and S. Haiduc. Unsupervised software categorization using bytecode. In Proc. of the 23rd International Conference on Program Comprehension, ICPC'15, pages 229--239. IEEE Press, May 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Finkelstein, M. Harman, Y. Jia, W. Martin, F. Sarro, and Y. Zhang. App store analysis: Mining app stores for relationships between customer, business and technical characteristics. Technical Report RN/14/10, Department of Computer Science, University College London, 2014.Google ScholarGoogle Scholar
  18. J. L. Fleiss. Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5):378--382, 1971.Google ScholarGoogle ScholarCross RefCross Ref
  19. A. Gorla, I. Tavecchia, F. Gross, and A. Zeller. Checking app behavior against app descriptions. In Proc. of the 36th International Conference on Software Engineering - ICSE14, pages 1025--1035, May 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. E. Guzman and W. Maalej. How Do Users Like This Feature? A Fine Grained Sentiment Analysis of App Reviews. In IEEE 22nd International Requirements Engineering Conference (RE), pages 153--162, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  21. N. Hariri, C. Castro-Herrera, M. Mirakhorli, J. Cleland-Huang, and B. Mobasher. Supporting Domain Analysis through Mining and Recommending Features from Online Product Listings. IEEE TSE, 39(12):1736--1752, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Harman, Y. Jia, and Y. Zhang. App store mining and analysis: Msr for app stores. In Proc. of the 9th IEEE Working Conference on Mining Software Repositories, MSR'12, pages 108--111, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. C. Johnson. Hierarchical clustering schemes. Psychometrika, 32(3):241--254, 1967.Google ScholarGoogle ScholarCross RefCross Ref
  24. S. Kawaguchi, P. K. Garg, M. Matsushita, and K. Inoue. MUDABlue: An automatic categorization system for Open Source repositories. Journal of Systems and Software, 79(7):939--953, July 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. H. Khalid, E. Shihab, M. Nagappan, and A. E. Hassan. What do mobile app users complain about? IEEE Software, 32(3):70--77, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Linares-Vásquez, A. Holtzhauer, and D. Poshyvanyk. On Automatically Detecting Similar Android Apps. In Proc. of the 24th International Conference on Program Comprehension, ICPC'16. IEEE Press, May 2016.Google ScholarGoogle ScholarCross RefCross Ref
  27. M. Linares-Vásquez, C. McMillan, D. Poshyvanyk, and M. Grechanik. On using machine learning to automatically classify software applications into domain categories. Empirical Software Engineering, 19(3):582--618, Oct. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Y. S. Maarek, D. M. Berry, and G. E. Kaiser. An information retrieval approach for automatically constructing software libraries. IEEE TSE, 17(8):800--813, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. W. Martin, M. Harman, Y. Jia, F. Sarro, and Y. Zhang. The app sampling problem for app store mining. In Proc. of the Working Conference on Mining Software Repositories - MSR15, pages 123--133, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. W. Martin, F. Sarro, and M. Harman. Causal Impact Analysis for App Releases in Google Play. In FSE'16, 2016.Google ScholarGoogle Scholar
  31. W. Martin, F. Sarro, Y. Jia, and Y. Zhang. Survey of app store analysis for software engineering. Technical Report RN/16/02, Department of Computer Science, University College London, 2016.Google ScholarGoogle Scholar
  32. A. Massey, J. Eisenstein, A. Anton, and P. Swire. Automated text mining for requirements analysis of policy documents. In IEEE International Requirements Engineering Conference, pages 4--13, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  33. G. A. Miller. Wordnet: A lexical database for english. Commun. ACM, 38(11):39--41, Nov. 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. F. Murtagh and P. Legendre. Ward's Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward's Criterion? Journal of Classification, 31(3):274--295, Oct 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. C. E. Osgood. The nature and measurement of meaning. Psychological bulletin, 49(3):197--237, May 1952.Google ScholarGoogle ScholarCross RefCross Ref
  36. R. Pandita, X. Xiao, W. Yang, W. Enck, and T. Xie. WHYPER: Towards automating risk assessment of mobile applications. In USENIX Security Symposium, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. P. J. Rousseeuw. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20:53--65, nov 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. D. Rowinski. Another Reason Why App Discovery Is Completely Broken. http://arc.applause.com.Google ScholarGoogle Scholar
  39. G. Salton, A. Wong, and C. S. Yang. A vector space model for automatic indexing. Commun. ACM, 18(11):613--620, Nov. 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. K. Sangaralingam, N. Pervin, N. Ramasubbu, A. Datta, and K. Dutta. Takeoff and Sustained Success of Apps in Hypercompetitive Mobile Platform Ecosystems: An Empirical Analysis. In ICIS'12, pages 1850--1867, 2012.Google ScholarGoogle Scholar
  41. B. Sanz, I. Santos, C. Laorden, X. Ugarte-Pedrero, and P. G. Bringas. On the automatic categorisation of android applications. In 2012 IEEE Consumer Communications and Networking Conference (CCNC), pages 149--153. IEEE, Jan. 2012.Google ScholarGoogle ScholarCross RefCross Ref
  42. F. Sarro, A. AlSubaihin, M. Harman, Y. Jia, W. Martin, and Y. Zhang. Feature lifecycles as they spread, migrate, remain and die in app stores. Requirements Engineering (RE'15), pages 76--85, 2015.Google ScholarGoogle Scholar
  43. S. Seneviratne, A. Seneviratne, M. A. Kaafar, A. Mahanti, and P. Mohapatra. Early detection of spam mobile apps. WWW '15, pages 949--959, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. A. Shabtai, Y. Fledel, and Y. Elovici. Automated Static Code Analysis for Classifying Android Applications Using Machine Learning. In 2010 International Conference on Computational Intelligence and Security, pages 329--333. IEEE, Dec. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. M. J. Shepperd. Foundations of software measurement. Prentice Hall, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. C. E. Spearman. The proof and measurement of association between two things. The American Journal of Psychology, 15(1):72--101, January 1904.Google ScholarGoogle ScholarCross RefCross Ref
  47. A. Sutcliffe and P. Sawyer. Requirements elicitation: Towards the unknown unknowns. In IEEE International Requirements Engineering Conference, pages 92--104, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  48. K. Tian, M. Revelle, and D. Poshyvanyk. Using Latent Dirichlet Allocation for automatic categorization of software. In 6th IEEE International Working Conference on Mining Software Repositories MSR'09, pages 163--166. IEEE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. N. H. Timm. Applied Multivariate Analysis. Springer Science & Business Media, 2007.Google ScholarGoogle Scholar
  50. S. Vakulenko, O. Müller, and J. Brocke. Enriching iTunes App Store Categories via Topic Modeling. In Proc. of the Thirty Fifth International Conference on Information Systems, ICIS'14, 2014.Google ScholarGoogle Scholar
  51. T. Wang, H. Wang, G. Yin, C. X. Ling, X. Li, and P. Zou. Mining Software Profile across Multiple Repositories for Hierarchical Categorization. In IEEE International Conference on Software Maintenance ICSE'13, pages 240--249. IEEE, Sept. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. J. H. Ward. Hierarchical Grouping to Optimize an Objective Function. Journal of the American Statistical Association, 58(301):236--244, Mar 1963.Google ScholarGoogle ScholarCross RefCross Ref
  1. Clustering Mobile Apps Based on Mined Textual Features

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ESEM '16: Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement
        September 2016
        457 pages
        ISBN:9781450344272
        DOI:10.1145/2961111

        Copyright © 2016 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 8 September 2016

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        ESEM '16 Paper Acceptance Rate27of122submissions,22%Overall Acceptance Rate130of594submissions,22%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader