skip to main content
research-article

Personal insights for altering decisions of tree-based ensembles over time

Published:01 February 2020Publication History
Skip Abstract Section

Abstract

Machine Learning models are prevalent in critical human-related decision making, such as resume filtering and loan applications. Refused individuals naturally ask what could change the decision, should they reapply. This question is hard for the model owner to answer: first, the model is typically complex and not easily interpretable; second, models may be updated periodically; and last, attributes of the individual seeking approval are apt to change in time. While each of these challenges have been extensively studied in isolation, their conjunction has not.

To this end, we propose a novel framework that allows users to devise a plan of action to individuals in presence of Machine Learning classification, where both the ML model and the user properties are expected to change over time. Our technical solution is currently confined to a particular yet important class of models, namely those of tree-based ensembles (Random Forests, Gradient Boosted trees). In this setting it uniquely combines state-of-the-art solutions for single model interpretation, domain adaptation techniques for predicting future models, and constraint databases to represent and query the space of possible actions. We devise efficient algorithms that leverage these foundations in a novel solution, and experimentally show that they are effective in proposing useful and actionable steps leading to the desired classification.

References

  1. S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Müller, and W. Samek. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one, 10(7):e0130140, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  2. M. Baudinet, M. Niezette, and P. Wolper. On the representation of infinite temporal data and queries. In Proc. of the 10th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 280--290. Denver, 1991.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. W. Vaughan. A theory of learning from different domains. Machine learning, 79(1-2):151--175, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira. Analysis of representations for domain adaptation. In Advances in neural information processing systems, pages 137--144, 2007.Google ScholarGoogle Scholar
  5. M. Benedikt, G. Dong, L. Libkin, and L. Wong. Relational expressive power of constraint query languages. Journal of the ACM (JACM), 45(1):1--34, 1998.Google ScholarGoogle Scholar
  6. N. Boer, D. Deutch, N. Frost, and T. Milo. Just in time: Personal temporal insights for altering model decisions. In 2019 IEEE 35th International Conference on Data Engineering (ICDE), pages 1988--1991. IEEE, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  7. N. Boer, D. Deutch, N. Frost, and T. Milo. Personal insights for altering decisions of tree-based ensembles over time (technical report). http://bit.ly/2YCceoP, 2019.Google ScholarGoogle Scholar
  8. L. Breiman. Bagging predictors. Machine learning, 24(2):123--140, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  9. J.-H. Byon and P. Z. Revesz. Disco: A constraint database system with sets. In ESPRIT WG CONTESSA Workshop on Constraint Databases and Applications, pages 68--83. Springer, 1995.Google ScholarGoogle Scholar
  10. Y. S. Chan and H. T. Ng. Word sense disambiguation with distribution estimation. In IJCAI, volume 5, pages 1010--5, 2005.Google ScholarGoogle Scholar
  11. Y. S. Chan and H. T. Ng. Estimating class priors in domain adaptation for word sense disambiguation. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pages 89--96. Association for Computational Linguistics, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Datta, S. Sen, and Y. Zick. Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems. In Security and Privacy (SP), 2016 IEEE Symposium on, pages 598--617. IEEE, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  13. D. Deutch and N. Frost. CEC: Constraints based explanation for classifications. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pages 1879--1882. ACM, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Deutch and N. Frost. Constraints-based explanations of classifications. In 2019 IEEE 35th International Conference on Data Engineering (ICDE), pages 530--541. IEEE, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  15. F. Doshi-Velez and B. Kim. Towards a rigorous science of interpretable machine learning. 2017.Google ScholarGoogle Scholar
  16. Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1):119--139, 1997.Google ScholarGoogle Scholar
  17. J. H. Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189--1232, 2001.Google ScholarGoogle Scholar
  18. F. Geerts. Constraint databases. In Encyclopedia of Database Systems, pages 585--586. Springer New York, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  19. F. Geerts, S. Haesevoets, and B. Kuijpers. A theory of spatio-temporal database queries. In International Workshop on Database Programming Languages, pages 198--212. Springer, 2001.Google ScholarGoogle Scholar
  20. F. Geerts and B. Kuijpers. Real algebraic geometry and constraint databases. In Handbook of Spatial Logics, pages 799--856. Springer, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  21. X. Glorot, A. Bordes, and Y. Bengio. Domain adaptation for large-scale sentiment classification: A deep learning approach. In Proceedings of the 28th international conference on machine learning (ICML-11), pages 513--520, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Grumbach, P. Rigaux, M. Scholl, and L. Segoufin. Dedale, a spatial constraint database. In International Workshop on Database Programming Languages, pages 38--59. Springer, 1997.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Grumbach and J. Su. Queries with arithmetical constraints. Theoretical Computer Science, 173(1):151--181, 1997.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. Grumbach, J. Su, and C. Tollu. Linear constraint databases. 1995.Google ScholarGoogle Scholar
  25. T. K. Ho. Random decision forests. In Document analysis and recognition, 1995., proceedings of the third international conference on, pages 278--282, 1995.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Home credit data. https://www.kaggle.com/c/home-credit-default-risk/data.Google ScholarGoogle Scholar
  27. J. Huysmans, B. Baesens, and J. Vanthienen. Using rule extraction to improve the comprehensibility of predictive models. 2006.Google ScholarGoogle Scholar
  28. J. Huysmans, K. Dejaeger, C. Mues, J. Vanthienen, and B. Baesens. An empirical evaluation of the comprehensibility of decision table, tree and rule based predictive models. Decision Support Systems, 51:141--154, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. N. Japkowicz and S. Stephen. The class imbalance problem: A systematic study. Intelligent data analysis, 6(5):429--449, 2002.Google ScholarGoogle Scholar
  30. J. Jiang. A literature survey on domain adaptation of statistical classifiers. 3:1--12, 2008. http://sifaka.es.uiuc.edu/jiang4/domainadaptation/survey.Google ScholarGoogle Scholar
  31. M. Kachuee, S. Fazeli, and M. Sarrafzadeh. Ecg heartbeat classification: A deep transferable representation. In 2018 IEEE International Conference on Healthcare Informatics (ICHI), pages 443--444. IEEE, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  32. Kaggle. The state of ml and data science 2017. https://www.kaggle.com/surveys/2017, 2017.Google ScholarGoogle Scholar
  33. P. C. Kanellakis, G. M. Kuper, and P. Z. Revesz. Constraint query languages. Journal of Computer and System Sciences, 51(1):26--52, 1995.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. M. Koubarakis. Complexity results for first-order theories of temporal constraints. In Principles of Knowledge Representation and Reasoning, pages 379--390. Elsevier, 1994.Google ScholarGoogle ScholarCross RefCross Ref
  35. M. Koubarakis. The complexity of query evaluation in indefinite temporal constraint databases. Theoretical Computer Science, 171(1-2):25--60, 1997.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. B. Kuijpers and W. Othman. Trajectory databases: Data models, uncertainty and complete query languages. 2010.Google ScholarGoogle Scholar
  37. B. Kulis, K. Saenko, and T. Darrell. What you saw is not what you get: Domain adaptation using asymmetric kernel transforms. In CVPR 2011, pages 1785--1792. IEEE, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. A. Kumagai and T. Iwata. Learning future classifiers without additional data. In Thirtieth AAAI Conference on Artificial Intelligence, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. A. Kumagai and T. Iwata. Learning non-linear dynamics of decision boundaries for maintaining classification performance. In Thirty-First AAAI Conference on Artificial Intelligence, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. G. Kuper, L. Libkin, and J. Paredaens. Constraint databases. Springer Science & Business Media, 2013.Google ScholarGoogle Scholar
  41. C. H. Lampert. Predicting the future behavior of a time-varying probability distribution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 942--950, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  42. T. Laugel, M.-J. Lesot, C. Marsala, X. Renard, and M. Detyniecki. Inverse classification for comparison-based interpretability in machine learning. arXiv preprint arXiv:1712.08443, 2017.Google ScholarGoogle Scholar
  43. Lending club data. https://www.lendingclub.com/info/download-data.action.Google ScholarGoogle Scholar
  44. Y. Lin, Y. Lee, and G. Wahba. Support vector machines for classification in nonstandard situations. Machine learning, 46(1-3):191--202, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. S. M. Lundberg, G. Erion, H. Chen, A. DeGrave, J. M. Prutkin, B. Nair, R. Katz, J. Himmelfarb, N. Bansal, and S.-I. Lee. Explainable ai for trees: From local explanations to global understanding. arXiv preprint arXiv:1905.04610, 2019.Google ScholarGoogle Scholar
  46. S. M. Lundberg, G. G. Erion, and S.-I. Lee. Consistent individualized feature attribution for tree ensembles. arXiv preprint arXiv:1802.03888, 2018.Google ScholarGoogle Scholar
  47. S. M. Lundberg and S.-I. Lee. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems, pages 4765--4774, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. C. Molnar. Interpretable Machine Learning. 2019. https://christophm.github.io/interpretable-ml-book/.Google ScholarGoogle Scholar
  49. J. Paredaens, J. Van den Bussche, and D. Van Gucht. Towards a theory of spatial database queries. In Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, pages 279--288. ACM, 1994.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. A. Pnueli. The temporal logic of programs. In 18th Annual Symposium on Foundations of Computer Science (sfcs 1977), pages 46--57. IEEE, 1977.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. F. Poursabzi-Sangdeh, D. G. Goldstein, J. M. Hofman, J. W. Vaughan, and H. Wallach. Manipulating and measuring model interpretability. In NIPS 2017 Transparent and Interpretable Machine Learning in Safety Critical Environments Workshop, 2017.Google ScholarGoogle Scholar
  52. P. Z. Revesz. Constraint databases: A survey. In International Workshop on Semantics in Databases, pages 209--246. Springer, 1995.Google ScholarGoogle Scholar
  53. M. T. Ribeiro, S. Singh, and C. Guestrin. Why should i trust you?: Explaining the predictions of any classifier. pages 1135--1144, 2016.Google ScholarGoogle Scholar
  54. A. Shrikumar, P. Greenside, and A. Kundaje. Learning important features through propagating activation differences. arXiv preprint arXiv:1704.02685, 2017.Google ScholarGoogle Scholar
  55. E. Tzeng, J. Hoffman, T. Darrell, and K. Saenko. Simultaneous deep transfer across domains and tasks. In Proceedings of the IEEE International Conference on Computer Vision, pages 4068--4076, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. R. van der Meyden. The complexity of querying indefinite data about linearly ordered domains. Journal of Computer and System Sciences, 54(1):113--135, 1997.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. A. Vellido, J. D. Martín-Guerrero, and P. J. Lisboa. Making machine learning models interpretable. In ESANN, volume 12, pages 163--172, 2012.Google ScholarGoogle Scholar
  58. S. Wachter, B. Mittelstadt, and C. Russell. Counterfactual explanations without opening the black box: Automated decisions and the gpdr. Harv. JL & Tech., 31:841, 2017.Google ScholarGoogle Scholar
  59. M. Wang and W. Deng. Deep visual domain adaptation: A survey. Neurocomputing, 312:135--153, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Y. Wang, L. Wang, Y. Li, D. He, W. Chen, and T.-Y. Liu. A theoretical analysis of ndcg ranking measures. In Proceedings of the 26th annual conference on learning theory (COLT 2013), volume 8, page 6, 2013.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 13, Issue 6
    February 2020
    170 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    • Published: 1 February 2020
    Published in pvldb Volume 13, Issue 6

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader