Skip to main content
Erschienen in:
Buchtitelbild

2020 | OriginalPaper | Buchkapitel

Different Strategies of Fitting Logistic Regression for Positive and Unlabelled Data

verfasst von : Paweł Teisseyre, Jan Mielniczuk, Małgorzata Łazęcka

Erschienen in: Computational Science – ICCS 2020

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In the paper we revisit the problem of fitting logistic regression to positive and unlabelled data. There are two key contributions. First, a new light is shed on the properties of frequently used naive method (in which unlabelled examples are treated as negative). In particular we show that naive method is related to incorrect specification of the logistic model and consequently the parameters in naive method are shrunk towards zero. An interesting relationship between shrinkage parameter and label frequency is established. Second, we introduce a novel method of fitting logistic model based on simultaneous estimation of vector of coefficients and label frequency. Importantly, the proposed method does not require prior estimation, which is a major obstacle in positive unlabelled learning. The method is superior in predicting posterior probability to both naive method and weighted likelihood method for several benchmark data sets. Moreover, it yields consistently better estimator of label frequency than other two known methods. We also introduce simple but powerful representation of positive and unlabelled data under Selected Completely at Random assumption which yields straightforwardly most properties of such model.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bekker, J., Davis, J.: Learning from positive and unlabeled data: a survey (2018) Bekker, J., Davis, J.: Learning from positive and unlabeled data: a survey (2018)
2.
Zurück zum Zitat Sechidis, K., Sperrin, M., Petherick, E.S., Lująn, M., Brown, G.: Dealing with under-reported variables: an information theoretic solution. Int. J. Approx. Reason. 85, 159–177 (2017)MathSciNetCrossRef Sechidis, K., Sperrin, M., Petherick, E.S., Lująn, M., Brown, G.: Dealing with under-reported variables: an information theoretic solution. Int. J. Approx. Reason. 85, 159–177 (2017)MathSciNetCrossRef
3.
Zurück zum Zitat Onur, I., Velamuri, M.: The gap between self-reported and objective measures of disease status in India. PLOS ONE 13(8), 1–18 (2018)CrossRef Onur, I., Velamuri, M.: The gap between self-reported and objective measures of disease status in India. PLOS ONE 13(8), 1–18 (2018)CrossRef
4.
Zurück zum Zitat Liu, B., Dai, Y., Li, X., Lee, W.S., Yu, P.S.: Building text classifiers using positive and unlabeled examples. In: Proceedings of the Third IEEE International Conference on Data Mining, ICDM 2003, p. 179 (2003) Liu, B., Dai, Y., Li, X., Lee, W.S., Yu, P.S.: Building text classifiers using positive and unlabeled examples. In: Proceedings of the Third IEEE International Conference on Data Mining, ICDM 2003, p. 179 (2003)
5.
Zurück zum Zitat Fung, G.P.C., Yu, J.X., Lu, H., Yu, P.S.: Text classification without negative examples revisit. IEEE Trans. Knowl. Data Eng. 18(1), 6–20 (2006)CrossRef Fung, G.P.C., Yu, J.X., Lu, H., Yu, P.S.: Text classification without negative examples revisit. IEEE Trans. Knowl. Data Eng. 18(1), 6–20 (2006)CrossRef
6.
Zurück zum Zitat Li, X., Liu, B.: Learning to classify texts using positive and unlabeled data. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence, pp. 587–592 (2003) Li, X., Liu, B.: Learning to classify texts using positive and unlabeled data. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence, pp. 587–592 (2003)
7.
Zurück zum Zitat Mordelet, F., Vert, J.-P.: ProDiGe: prioritization of disease genes with multitask machine learning from positive and unlabeled examples. BMC Bioinformatics 12(1), 389 (2011)CrossRef Mordelet, F., Vert, J.-P.: ProDiGe: prioritization of disease genes with multitask machine learning from positive and unlabeled examples. BMC Bioinformatics 12(1), 389 (2011)CrossRef
8.
Zurück zum Zitat Cerulo, L., Elkan, C., Ceccarelli, M.: Learning gene regulatory networks from only positive and unlabeled data. BMC Bioinformatics 11, 228 (2010)CrossRef Cerulo, L., Elkan, C., Ceccarelli, M.: Learning gene regulatory networks from only positive and unlabeled data. BMC Bioinformatics 11, 228 (2010)CrossRef
9.
Zurück zum Zitat Elkan, C., Noto, K.: Learning classifiers from only positive and unlabeled data. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008, pp. 213–220 (2008) Elkan, C., Noto, K.: Learning classifiers from only positive and unlabeled data. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008, pp. 213–220 (2008)
11.
Zurück zum Zitat Bekker, J., Davis, J.: Estimating the class prior in positive and unlabeled data through decision tree induction. In: Proceedings of the 32th AAAI Conference on Artificial Intelligence, February 2018 Bekker, J., Davis, J.: Estimating the class prior in positive and unlabeled data through decision tree induction. In: Proceedings of the 32th AAAI Conference on Artificial Intelligence, February 2018
12.
Zurück zum Zitat Steinberg, D., Cardell, N.S.: Estimating logistic regression models when the dependent variable has no variance. Commun. Stat. Theory Methods 21(2), 423–450 (1992)CrossRef Steinberg, D., Cardell, N.S.: Estimating logistic regression models when the dependent variable has no variance. Commun. Stat. Theory Methods 21(2), 423–450 (1992)CrossRef
13.
Zurück zum Zitat Lancaster, T., Imbens, G.: Case-control studies with contaminated controls. J. Econom. 71(1), 145–160 (1996)MathSciNetCrossRef Lancaster, T., Imbens, G.: Case-control studies with contaminated controls. J. Econom. 71(1), 145–160 (1996)MathSciNetCrossRef
14.
Zurück zum Zitat Kiryo, R., Niu, G., du Plessis, M.C., Sugiyama, M.: Positive-unlabeled learning with non-negative risk estimator. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS 2017, pp. 1674–1684 (2017) Kiryo, R., Niu, G., du Plessis, M.C., Sugiyama, M.: Positive-unlabeled learning with non-negative risk estimator. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS 2017, pp. 1674–1684 (2017)
15.
Zurück zum Zitat Denis, F., Gilleron, R., Letouzey, F.: Learning from positive and unlabeled examples. Theoret. Comput. Sci. 348(1), 70–83 (2005)MathSciNetCrossRef Denis, F., Gilleron, R., Letouzey, F.: Learning from positive and unlabeled examples. Theoret. Comput. Sci. 348(1), 70–83 (2005)MathSciNetCrossRef
16.
Zurück zum Zitat Chapelle, O., Schölkopf, B., Zien, A.: Semi-Supervised Learning. The MIT Press, Cambridge (2010) Chapelle, O., Schölkopf, B., Zien, A.: Semi-Supervised Learning. The MIT Press, Cambridge (2010)
17.
Zurück zum Zitat Candès, E., Fan, Y., Janson, L., Lv, J.: Panning for gold: model-x knockoffs for high-dimensional controlled variable selection. Manuscript (2018) Candès, E., Fan, Y., Janson, L., Lv, J.: Panning for gold: model-x knockoffs for high-dimensional controlled variable selection. Manuscript (2018)
18.
Zurück zum Zitat Gottschalk, P.G., Dunn, J.R.: The five-parameter logistic: a characterization and comparison with the four-parameter logistic. Anal. Biochem. 343(1), 54–65 (2005)CrossRef Gottschalk, P.G., Dunn, J.R.: The five-parameter logistic: a characterization and comparison with the four-parameter logistic. Anal. Biochem. 343(1), 54–65 (2005)CrossRef
19.
Zurück zum Zitat Mielniczuk, J., Teisseyre, P.: What do we choose when we err? Model selection and testing for misspecified logistic regression revisited. In: Matwin, S., Mielniczuk, J. (eds.) Challenges in Computational Statistics and Data Mining. SCI, vol. 605, pp. 271–296. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-18781-5_15CrossRef Mielniczuk, J., Teisseyre, P.: What do we choose when we err? Model selection and testing for misspecified logistic regression revisited. In: Matwin, S., Mielniczuk, J. (eds.) Challenges in Computational Statistics and Data Mining. SCI, vol. 605, pp. 271–296. Springer, Cham (2016). https://​doi.​org/​10.​1007/​978-3-319-18781-5_​15CrossRef
20.
Zurück zum Zitat Kubkowski, M., Mielniczuk, J.: Active set of predictors for misspecified logistic regression. Statistics 51, 1023–1045 (2017)MathSciNetCrossRef Kubkowski, M., Mielniczuk, J.: Active set of predictors for misspecified logistic regression. Statistics 51, 1023–1045 (2017)MathSciNetCrossRef
Metadaten
Titel
Different Strategies of Fitting Logistic Regression for Positive and Unlabelled Data
verfasst von
Paweł Teisseyre
Jan Mielniczuk
Małgorzata Łazęcka
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-50423-6_1