Skip to main content
Top
Published in: World Wide Web 4/2016

01-07-2016

An effective contrast sequential pattern mining approach to taxpayer behavior analysis

Authors: Zhigang Zheng, Wei Wei, Chunming Liu, Wei Cao, Longbing Cao, Maninder Bhatia

Published in: World Wide Web | Issue 4/2016

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Data mining for client behavior analysis has become increasingly important in business, however further analysis on transactions and sequential behaviors would be of even greater value, especially in the financial service industry, such as banking and insurance, government and so on. In a real-world business application of taxation debt collection, in order to understand the internal relationship between taxpayers’ sequential behaviors (payment, lodgment and actions) and compliance to their debt, we need to find the contrast sequential behavior patterns between compliant and non-compliant taxpayers. Contrast Patterns (CP) are defined as the itemsets showing the difference/discrimination between two classes/datasets (Dong and Li, 1999). However, the existing CP mining methods which can only mine itemset patterns, are not suitable for mining sequential patterns, such as time-ordered transactions in taxpayer sequential behaviors. Little work has been conducted on Contrast Sequential Pattern (CSP) mining so far. Therefore, to address this issue, we develop a CSP mining approach, e C S P, by using an effective CSP-tree structure, which improves the PrefixSpan tree (Pei et al., 2001) for mining contrast patterns. We propose some heuristics and interestingness filtering criteria, and integrate them into the CSP-tree seamlessly to reduce the search space and to find business-interesting patterns as well. The performance of the proposed approach is evaluated on three real-world datasets. In addition, we use a case study to show how to implement the approach to analyse taxpayer behaviour. The results show a very promising performance and convincing business value.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Agichtein, E., Zheng, Z.: Identifying best bet web search results by mining past user behavior. In: KDD 2006, 902–908. ACM (2006) Agichtein, E., Zheng, Z.: Identifying best bet web search results by mining past user behavior. In: KDD 2006, 902–908. ACM (2006)
2.
go back to reference Agrawal, R., Srikant, R.: Mining sequential patterns. In: ICDE, pp. 3–14 (1995) Agrawal, R., Srikant, R.: Mining sequential patterns. In: ICDE, pp. 3–14 (1995)
3.
go back to reference Attenberg, J., Pandey, S., Suel, T.: Modeling and predicting user behavior in sponsored search. In: KDD 2009, pp. 1067–1076, ACM. (2009) Attenberg, J., Pandey, S., Suel, T.: Modeling and predicting user behavior in sponsored search. In: KDD 2009, pp. 1067–1076, ACM. (2009)
4.
go back to reference Ayres, J., Flannick, J., Gehrke, J., Yiu, T.: Sequential Pattern Mining Using a Bitmap representation. In: KDD 2002, pp. 429–435 (2002) Ayres, J., Flannick, J., Gehrke, J., Yiu, T.: Sequential Pattern Mining Using a Bitmap representation. In: KDD 2002, pp. 429–435 (2002)
5.
go back to reference Bailey, J., Manoukian, T., Ramamohanarao, K.: Fast algorithms for mining emerging patterns. Prin Data Min. Knowl. Disc. 2431, 187–208 (2002)MATH Bailey, J., Manoukian, T., Ramamohanarao, K.: Fast algorithms for mining emerging patterns. Prin Data Min. Knowl. Disc. 2431, 187–208 (2002)MATH
6.
go back to reference Bayardo, R.J.: Efficiently Mining Long Patterns from Databases. SIGMOD (1998) Bayardo, R.J.: Efficiently Mining Long Patterns from Databases. SIGMOD (1998)
7.
go back to reference Chan, S., Kao, B., Yip, C., Tang, M.: Mining emerging substrings. In: DASFAA 2003, pp. 119–126 (2003) Chan, S., Kao, B., Yip, C., Tang, M.: Mining emerging substrings. In: DASFAA 2003, pp. 119–126 (2003)
8.
go back to reference Cao, L.: Behavior informatics and analytics: Let behavior talk. In: ICDM 2008 Workshops, pp. 87–96 (2008) Cao, L.: Behavior informatics and analytics: Let behavior talk. In: ICDM 2008 Workshops, pp. 87–96 (2008)
9.
go back to reference Cao, L., Zhang, H., Zhao, Y., Luo, D., Zhang, C.: Combined mining: Discovering informative knowledge in complex data. IEEE Trans. Syst. Man. Cybern. B. Cybern. 41(3), 699–712 (2011)CrossRef Cao, L., Zhang, H., Zhao, Y., Luo, D., Zhang, C.: Combined mining: Discovering informative knowledge in complex data. IEEE Trans. Syst. Man. Cybern. B. Cybern. 41(3), 699–712 (2011)CrossRef
10.
go back to reference Dong, G., Li, J.: Efficient mining of emerging patterns: discovering trends and differences. In: KDD 1999, pp. 43–52 (1999) Dong, G., Li, J.: Efficient mining of emerging patterns: discovering trends and differences. In: KDD 1999, pp. 43–52 (1999)
11.
go back to reference Dong, G., Li, J., Zhang, X.: Discovering Jumping Emerging Patterns and Experiments on Real Datasets. (IDC99) (1999) Dong, G., Li, J., Zhang, X.: Discovering Jumping Emerging Patterns and Experiments on Real Datasets. (IDC99) (1999)
12.
go back to reference Dong, G., Zhang, X., Wong, L., Caep, J.Li.: Classification by aggregating emerging patterns. In: Discovery Science, vol. 1721, pp. 737–737 (1999) Dong, G., Zhang, X., Wong, L., Caep, J.Li.: Classification by aggregating emerging patterns. In: Discovery Science, vol. 1721, pp. 737–737 (1999)
13.
go back to reference Fan, H., Ramamohanarao, K.: Efficiently mining interesting emerging patterns. In: WAIM2003, pp. 189–201 (2003) Fan, H., Ramamohanarao, K.: Efficiently mining interesting emerging patterns. In: WAIM2003, pp. 189–201 (2003)
14.
go back to reference Fan, H., Ramamohanarao, K.: Fast discovery and the generalization of strong jumping emerging patterns for building compact and accurate classifiers. TKDE 18(6), 721–737 (2006) Fan, H., Ramamohanarao, K.: Fast discovery and the generalization of strong jumping emerging patterns for building compact and accurate classifiers. TKDE 18(6), 721–737 (2006)
15.
go back to reference Han, J., Pei, J., mortazavi-Asl, B., Chen, Q., Dayal, U., Hsu, M.-C.: Freespan: Frequent Pattern-projected Sequential Pattern Mining. In: KDD, pp. 355–359 (2000) Han, J., Pei, J., mortazavi-Asl, B., Chen, Q., Dayal, U., Hsu, M.-C.: Freespan: Frequent Pattern-projected Sequential Pattern Mining. In: KDD, pp. 355–359 (2000)
16.
go back to reference Ji, X., Bailey, J., Dong, G.: Mining minimal distinguishing subsequence patterns with gap constraints. Knowl. Inf. Syst. 11, 259–286 (2007)CrossRef Ji, X., Bailey, J., Dong, G.: Mining minimal distinguishing subsequence patterns with gap constraints. Knowl. Inf. Syst. 11, 259–286 (2007)CrossRef
17.
go back to reference Li, W., Han, J., Pei, J.: CMAR: Accurate and efficient classification based on multiple class-association rules. In: ICDM 2001, pp. 369–376 (2001) Li, W., Han, J., Pei, J.: CMAR: Accurate and efficient classification based on multiple class-association rules. In: ICDM 2001, pp. 369–376 (2001)
18.
go back to reference Loekito, E., Bailey, J.: Fast mining of high dimensional expressive contrast patterns using binary decision diagrams. In: SIGKDD 2006, pp. 307–316 (2006) Loekito, E., Bailey, J.: Fast mining of high dimensional expressive contrast patterns using binary decision diagrams. In: SIGKDD 2006, pp. 307–316 (2006)
19.
go back to reference Mannila, H., Toivonen, H.: Levelwise Search and Borders of Theories in Knowledge Discovery. Data Min. Knowl. Disc. 1(3), 41 (1997) Mannila, H., Toivonen, H.: Levelwise Search and Borders of Theories in Knowledge Discovery. Data Min. Knowl. Disc. 1(3), 41 (1997)
20.
go back to reference Mozer, M., Wolniewicz, R., Grimes, D., Johnson, E., Kaushansky, H.: Predicting subscriber dissatisfaction and improving retention in the wireless telecommunica- tions industry. IEEE Trans. Neural Netw. 11(3), 690–696 (2000)CrossRef Mozer, M., Wolniewicz, R., Grimes, D., Johnson, E., Kaushansky, H.: Predicting subscriber dissatisfaction and improving retention in the wireless telecommunica- tions industry. IEEE Trans. Neural Netw. 11(3), 690–696 (2000)CrossRef
21.
go back to reference Pasquier, N., Bastide, R., Taouil, R., Lakhal, L.: Efficient Mining of Association Rules using Closed Itemset Lattices. Information Systems 24(1) (1999) Pasquier, N., Bastide, R., Taouil, R., Lakhal, L.: Efficient Mining of Association Rules using Closed Itemset Lattices. Information Systems 24(1) (1999)
22.
go back to reference Pei, J., Han, J., Asl, M.B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.C.: PrefixSpan Mining Sequential Patterns Efficiently by Prefix Projected Pattern Growth. In: ICDE, pp. 215–226 (2001) Pei, J., Han, J., Asl, M.B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.C.: PrefixSpan Mining Sequential Patterns Efficiently by Prefix Projected Pattern Growth. In: ICDE, pp. 215–226 (2001)
23.
go back to reference Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, Los Altos (1993) Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, Los Altos (1993)
24.
go back to reference Ramamohanarao, K., Bailey, J.: Emerging patterns: mining and applications. In: ICISIP 2004, pp. 409–414 (2004) Ramamohanarao, K., Bailey, J.: Emerging patterns: mining and applications. In: ICISIP 2004, pp. 409–414 (2004)
25.
go back to reference Wang, X., Duan, L., Dong, G., Yu, Z., Tang, C.: Efficient Mining of Density-Aware Distinguishing Sequential Patterns with Gap Constraints. DASFAA 372–387 (2014) Wang, X., Duan, L., Dong, G., Yu, Z., Tang, C.: Efficient Mining of Density-Aware Distinguishing Sequential Patterns with Gap Constraints. DASFAA 372–387 (2014)
26.
go back to reference Zaki, M.J.: SPADE: An efficient algorithm for mining frequent sequence. Mach. Learn. 42, 31–60 (2001)CrossRefMATH Zaki, M.J.: SPADE: An efficient algorithm for mining frequent sequence. Mach. Learn. 42, 31–60 (2001)CrossRefMATH
27.
go back to reference Zhao, Y., Zhang, H., Cao, L., Zhang, C., Bohlscheid, H.: Combined Pattern Mining: From Learned Rules to Actionable Knowledge. AI 393–403 (2008) Zhao, Y., Zhang, H., Cao, L., Zhang, C., Bohlscheid, H.: Combined Pattern Mining: From Learned Rules to Actionable Knowledge. AI 393–403 (2008)
Metadata
Title
An effective contrast sequential pattern mining approach to taxpayer behavior analysis
Authors
Zhigang Zheng
Wei Wei
Chunming Liu
Wei Cao
Longbing Cao
Maninder Bhatia
Publication date
01-07-2016
Publisher
Springer US
Published in
World Wide Web / Issue 4/2016
Print ISSN: 1386-145X
Electronic ISSN: 1573-1413
DOI
https://doi.org/10.1007/s11280-015-0350-4

Other articles of this Issue 4/2016

World Wide Web 4/2016 Go to the issue

Premium Partner