Skip to main content

2017 | OriginalPaper | Buchkapitel

Interactive Pattern Sampling for Characterizing Unlabeled Data

verfasst von : Arnaud Giacometti, Arnaud Soulet

Erschienen in: Advances in Intelligent Data Analysis XVI

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Many data exploration tasks require a target class. Unfortunately, the data is not always labeled with respect to this desired class. Rather than using unsupervised methods or a labeling pre-processing, this paper proposes an interactive system that discovers this target class and characterizes it at the same time. More precisely, we introduce a new interactive pattern mining method that learns which part of the dataset is really interesting for the user. By integrating user feedback about patterns, our method aims at sampling patterns with a probability proportional to their frequency in the interesting transactions. We demonstrate that it accurately identifies the target class if user feedback is consistent. Experiments also show this method has a good true and false positive rate enabling to present relevant patterns to the user.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
It is also possible to set weights to 0 or 1 if the labels of some transactions are already known.
 
Literatur
1.
Zurück zum Zitat Bessiere, C., Coletta, R., Hebrard, E., Katsirelos, G., Lazaar, N., Narodytska, N., Quimper, C.G., Walsh, T.: Constraint acquisition via partial queries. In: Proceedings of the 23rd IJCAI, pp. 475–481 (2013) Bessiere, C., Coletta, R., Hebrard, E., Katsirelos, G., Lazaar, N., Narodytska, N., Quimper, C.G., Walsh, T.: Constraint acquisition via partial queries. In: Proceedings of the 23rd IJCAI, pp. 475–481 (2013)
2.
Zurück zum Zitat Bhuiyan, M., Mukhopadhyay, S., Hasan, M.A.: Interactive pattern mining on hidden data: a sampling-based solution. In: Proceedings of ACM CIKM, pp. 95–104 (2012) Bhuiyan, M., Mukhopadhyay, S., Hasan, M.A.: Interactive pattern mining on hidden data: a sampling-based solution. In: Proceedings of ACM CIKM, pp. 95–104 (2012)
3.
Zurück zum Zitat Boley, M., Lucchese, C., Paurat, D., Gärtner, T.: Direct local pattern sampling by efficient two-step random procedures. In: Proceedings of the 17th ACM SIGKDD, pp. 582–590 (2011) Boley, M., Lucchese, C., Paurat, D., Gärtner, T.: Direct local pattern sampling by efficient two-step random procedures. In: Proceedings of the 17th ACM SIGKDD, pp. 582–590 (2011)
4.
Zurück zum Zitat Dzyuba, V., van Leeuwen, M., De Raedt, L.: Flexible constrained sampling with guarantees for pattern mining. Data Min. Knowl. Disc. 31, 1–28 (2017)MathSciNetCrossRef Dzyuba, V., van Leeuwen, M., De Raedt, L.: Flexible constrained sampling with guarantees for pattern mining. Data Min. Knowl. Disc. 31, 1–28 (2017)MathSciNetCrossRef
5.
Zurück zum Zitat Dzyuba, V., Leeuwen, M.v., Nijssen, S., De Raedt, L.: Interactive learning of pattern rankings. Int. J. Artif. Intell. Tools 23(06), 32 p. (2014) Dzyuba, V., Leeuwen, M.v., Nijssen, S., De Raedt, L.: Interactive learning of pattern rankings. Int. J. Artif. Intell. Tools 23(06), 32 p. (2014)
6.
Zurück zum Zitat Giacometti, A., Soulet, A.: Anytime algorithm for frequent pattern outlier detection. Int. J. Data Sci. Analytics 2(3–4), 119–130 (2016)CrossRef Giacometti, A., Soulet, A.: Anytime algorithm for frequent pattern outlier detection. Int. J. Data Sci. Analytics 2(3–4), 119–130 (2016)CrossRef
7.
Zurück zum Zitat Leeuwen, M.: Interactive Data Exploration Using Pattern Mining. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 169–182. Springer, Heidelberg (2014). doi:10.1007/978-3-662-43968-5_9 CrossRef Leeuwen, M.: Interactive Data Exploration Using Pattern Mining. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 169–182. Springer, Heidelberg (2014). doi:10.​1007/​978-3-662-43968-5_​9 CrossRef
8.
9.
Zurück zum Zitat Novak, P.K., Lavrač, N., Webb, G.I.: Supervised descriptive rule discovery: a unifying survey of contrast set, emerging pattern and subgroup mining. J. Mach. Learn. Res. 10(Feb), 377–403 (2009)MATH Novak, P.K., Lavrač, N., Webb, G.I.: Supervised descriptive rule discovery: a unifying survey of contrast set, emerging pattern and subgroup mining. J. Mach. Learn. Res. 10(Feb), 377–403 (2009)MATH
10.
Zurück zum Zitat Rashidi, P., Cook, D.J.: Ask me better questions: active learning queries based on rule induction. In: Proceedings of the 17th ACM SIGKDD 2011, pp. 904–912 (2011) Rashidi, P., Cook, D.J.: Ask me better questions: active learning queries based on rule induction. In: Proceedings of the 17th ACM SIGKDD 2011, pp. 904–912 (2011)
11.
Zurück zum Zitat Rueping, S.: Ranking interesting subgroups. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 913–920. ACM (2009) Rueping, S.: Ranking interesting subgroups. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 913–920. ACM (2009)
12.
Zurück zum Zitat Settles, B.: A practical test for univariate and multivariate normality. Computer sciences Technical report 1648, University of Wisconsin, Madison (2010) Settles, B.: A practical test for univariate and multivariate normality. Computer sciences Technical report 1648, University of Wisconsin, Madison (2010)
13.
Zurück zum Zitat Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2, 45–66 (2001)MATH Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2, 45–66 (2001)MATH
14.
Zurück zum Zitat Xin, D., Shen, X., Mei, Q., Han, J.: Discovering interesting patterns through user’s interactive feedback. In: Proceedings of the 12th ACM SIGKDD 2006, pp. 773–778 (2006) Xin, D., Shen, X., Mei, Q., Han, J.: Discovering interesting patterns through user’s interactive feedback. In: Proceedings of the 12th ACM SIGKDD 2006, pp. 773–778 (2006)
Metadaten
Titel
Interactive Pattern Sampling for Characterizing Unlabeled Data
verfasst von
Arnaud Giacometti
Arnaud Soulet
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-68765-0_9

Premium Partner