Skip to main content
Erschienen in:
Buchtitelbild

2016 | OriginalPaper | Buchkapitel

P2P Lending Analysis Using the Most Relevant Graph-Based Features

verfasst von : Lixin Cui, Lu Bai, Yue Wang, Xiao Bai, Zhihong Zhang, Edwin R. Hancock

Erschienen in: Structural, Syntactic, and Statistical Pattern Recognition

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Peer-to-Peer (P2P) lending is an online platform to facilitate borrowing and investment transactions. A central problem for these P2P platforms is how to identify the most influential factors that are closely related to the credit risks. This problem is inherently complex due to the various forms of risks and the numerous influencing factors involved. Moreover, raw data of P2P lending are often high-dimension, highly correlated and unstable, making the problem more untractable by traditional statistical and machine learning approaches. To address these problems, we develop a novel filter-based feature selection method for P2P lending analysis. Unlike most traditional feature selection methods that use vectorial features, the proposed method is based on graph-based features and thus incorporates the relationships between pairwise feature samples into the feature selection process. Since the graph-based features are by nature completed weighted graphs, we use the steady state random walk to encapsulate the main characteristics of the graph-based features. Specifically, we compute a probability distribution of the walk visiting the vertices. Furthermore, we measure the discriminant power of each graph-based feature with respect to the target feature, through the Jensen-Shannon divergence measure between the probability distributions from the random walks. We select an optimal subset of features based on the most relevant graph-based features, through the Jensen-Shannon divergence measure. Unlike most existing state-of-the-art feature selection methods, the proposed method can accommodate both continuous and discrete target features. Experiments demonstrate the effectiveness and usefulness of the proposed feature selection algorithm on the problem of P2P lending platforms in China.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bai, L., Bunke, H., Hancock, E.R.: An attributed graph kernel from the Jensen-Shannon divergence. In: Proceedings of ICPR, pp. 88–93 (2014). DBLP:conf/icpr/2014 Bai, L., Bunke, H., Hancock, E.R.: An attributed graph kernel from the Jensen-Shannon divergence. In: Proceedings of ICPR, pp. 88–93 (2014). DBLP:conf/icpr/2014
2.
Zurück zum Zitat Bai, L., Rossi, L., Bunke, H., Hancock, E.R.: Attributed graph kernels using the Jensen-Tsallis q-differences. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS (LNAI), vol. 8724, pp. 99–114. Springer, Heidelberg (2014). doi:10.1007/978-3-662-44848-9_7 Bai, L., Rossi, L., Bunke, H., Hancock, E.R.: Attributed graph kernels using the Jensen-Tsallis q-differences. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS (LNAI), vol. 8724, pp. 99–114. Springer, Heidelberg (2014). doi:10.​1007/​978-3-662-44848-9_​7
3.
Zurück zum Zitat Chen, Y., Miao, D., Wang, R.: A rough set approach to feature selection based on ant colony optimization. Pattern Recogn. Lett. 31(3), 226–233 (2010)CrossRef Chen, Y., Miao, D., Wang, R.: A rough set approach to feature selection based on ant colony optimization. Pattern Recogn. Lett. 31(3), 226–233 (2010)CrossRef
4.
Zurück zum Zitat Crook, J.N., Edelman, D., Thomas, L.C.: Recent developments in consumer credit risk assessment. Eur. J. Oper. Res. 183(3), 1447–1465 (2007)MathSciNetCrossRefMATH Crook, J.N., Edelman, D., Thomas, L.C.: Recent developments in consumer credit risk assessment. Eur. J. Oper. Res. 183(3), 1447–1465 (2007)MathSciNetCrossRefMATH
5.
Zurück zum Zitat Hand, D.J., Henley, W.E.: Statistical classification methods in consumer credit scoring: a review. J. R. Stat. Soc. Ser. A 160(3), 523–541 (1997)CrossRef Hand, D.J., Henley, W.E.: Statistical classification methods in consumer credit scoring: a review. J. R. Stat. Soc. Ser. A 160(3), 523–541 (1997)CrossRef
6.
Zurück zum Zitat Guo, Y., Zhou, W., Luo, C., Liu, C., Xiong, H.: Instance-based credit risk assessment for investment decisions in P2P lending. Eur. J. Oper. Res. 249(2), 417–426 (2016)MathSciNetCrossRefMATH Guo, Y., Zhou, W., Luo, C., Liu, C., Xiong, H.: Instance-based credit risk assessment for investment decisions in P2P lending. Eur. J. Oper. Res. 249(2), 417–426 (2016)MathSciNetCrossRefMATH
7.
Zurück zum Zitat Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)MATH Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)MATH
8.
Zurück zum Zitat Hájek, P., Michalak, K.: Feature selection in corporate credit rating prediction. Knowl.-Based Syst. 51, 72–84 (2013)CrossRef Hájek, P., Michalak, K.: Feature selection in corporate credit rating prediction. Knowl.-Based Syst. 51, 72–84 (2013)CrossRef
9.
Zurück zum Zitat Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the ICML, pp. 359–366 (2000) Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the ICML, pp. 359–366 (2000)
10.
Zurück zum Zitat Han, J., Sun, Z., Hao, H.: Selecting feature subset with sparsity and low redundancy for unsupervised learning. Knowl.-Based Syst. 86, 210–223 (2015)CrossRef Han, J., Sun, Z., Hao, H.: Selecting feature subset with sparsity and low redundancy for unsupervised learning. Knowl.-Based Syst. 86, 210–223 (2015)CrossRef
11.
Zurück zum Zitat He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: Advances in Neural Information Processing Systems 18 [Neural Information Processing Systems, NIPS 2005, Vancouver, British Columbia, Canada, 5–8 December 2005], pp. 507–514 (2005) He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: Advances in Neural Information Processing Systems 18 [Neural Information Processing Systems, NIPS 2005, Vancouver, British Columbia, Canada, 5–8 December 2005], pp. 507–514 (2005)
12.
Zurück zum Zitat Huang, Y., McCullagh, P.J., Black, N.D.: An optimization of relieff for classification in large datasets. Data Knowl. Eng. 68(11), 1348–1356 (2009)CrossRef Huang, Y., McCullagh, P.J., Black, N.D.: An optimization of relieff for classification in large datasets. Data Knowl. Eng. 68(11), 1348–1356 (2009)CrossRef
13.
Zurück zum Zitat Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)CrossRefMATH Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)CrossRefMATH
14.
Zurück zum Zitat Last, M., Kandel, A., Maimon, O.: Information-theoretic algorithm for feature selection. Pattern Recogn. Lett. 22(6/7), 799–811 (2001)CrossRefMATH Last, M., Kandel, A., Maimon, O.: Information-theoretic algorithm for feature selection. Pattern Recogn. Lett. 22(6/7), 799–811 (2001)CrossRefMATH
15.
Zurück zum Zitat Malekipirbazari, M., Aksakalli, V.: Risk assessment in social lending via random forests. Expert Syst. Appl. 42(10), 4621–4631 (2015)CrossRef Malekipirbazari, M., Aksakalli, V.: Risk assessment in social lending via random forests. Expert Syst. Appl. 42(10), 4621–4631 (2015)CrossRef
16.
Zurück zum Zitat Pohjalainen, J., Räsänen, O., Kadioglu, S.: Feature selection methods and their combinations in high-dimensional classification of speaker likability, intelligibility and personality traits. Comput. Speech Lang. 29(1), 145–171 (2015)CrossRef Pohjalainen, J., Räsänen, O., Kadioglu, S.: Feature selection methods and their combinations in high-dimensional classification of speaker likability, intelligibility and personality traits. Comput. Speech Lang. 29(1), 145–171 (2015)CrossRef
17.
Zurück zum Zitat Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes in C, 2nd edn. Cambridge University Press, Cambridge (1992)MATH Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes in C, 2nd edn. Cambridge University Press, Cambridge (1992)MATH
18.
Zurück zum Zitat Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)CrossRef Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)CrossRef
19.
Zurück zum Zitat Sotoca, J.M., Pla, F.: Supervised feature selection by clustering using conditional mutual information-based distances. Pattern Recogn. 43(6), 2068–2081 (2010)CrossRefMATH Sotoca, J.M., Pla, F.: Supervised feature selection by clustering using conditional mutual information-based distances. Pattern Recogn. 43(6), 2068–2081 (2010)CrossRefMATH
20.
Zurück zum Zitat Yeh, I.-C., Lien, C.-H.: The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Syst. Appl. 36(2), 2473–2480 (2009)CrossRef Yeh, I.-C., Lien, C.-H.: The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Syst. Appl. 36(2), 2473–2480 (2009)CrossRef
21.
Zurück zum Zitat Jin, Y., Zhu, Y.D.: A data-driven approach to predict default risk of loan for online Peer-to-Peer (P2P) lending. In: Proceedings of Fifth International Conference on Communication Systems and Network Technologies, pp. 609–613 (2015) Jin, Y., Zhu, Y.D.: A data-driven approach to predict default risk of loan for online Peer-to-Peer (P2P) lending. In: Proceedings of Fifth International Conference on Communication Systems and Network Technologies, pp. 609–613 (2015)
22.
Zurück zum Zitat Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5, 1205–1224 (2004)MathSciNetMATH Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5, 1205–1224 (2004)MathSciNetMATH
23.
Zurück zum Zitat Zhang, D., Chen, S., Zhou, Z.-H.: Constraint score: a new filter method for feature selection with pairwise constraints. Pattern Recogn. 41(5), 1440–1451 (2008)CrossRefMATH Zhang, D., Chen, S., Zhou, Z.-H.: Constraint score: a new filter method for feature selection with pairwise constraints. Pattern Recogn. 41(5), 1440–1451 (2008)CrossRefMATH
24.
Zurück zum Zitat Zhao, H., Le, W., Liu, Q., Ge, Y., Chen, E.: Investment recommendation in P2P lending: a portfolio perspective with risk management. In: Proceedings of ICDM, pp. 1109–1114 (2014) Zhao, H., Le, W., Liu, Q., Ge, Y., Chen, E.: Investment recommendation in P2P lending: a portfolio perspective with risk management. In: Proceedings of ICDM, pp. 1109–1114 (2014)
25.
Zurück zum Zitat Zhao, Z., Wang, L., Liu, H., Ye, J.: On similarity preserving feature selection. IEEE Trans. Knowl. Data Eng. 25(3), 619–632 (2013)CrossRef Zhao, Z., Wang, L., Liu, H., Ye, J.: On similarity preserving feature selection. IEEE Trans. Knowl. Data Eng. 25(3), 619–632 (2013)CrossRef
Metadaten
Titel
P2P Lending Analysis Using the Most Relevant Graph-Based Features
verfasst von
Lixin Cui
Lu Bai
Yue Wang
Xiao Bai
Zhihong Zhang
Edwin R. Hancock
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-49055-7_1

Premium Partner