Skip to main content

2018 | OriginalPaper | Buchkapitel

An Effective Method for Identifying Unknown Unknowns with Noisy Oracle

verfasst von : Bo Zheng, Xin Lin, Yanghua Xiao, Jing Yang, Liang He

Erschienen in: Case-Based Reasoning Research and Development

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Unknown Unknowns (UUs) are referred to the error predictions that with high confidence. The identifying of the UUs is important to understand the limitation of predictive models. Some proposed solutions are effective in such identifying. All of them assume there is a perfect Oracle to return the correct labels of the UUs. However, it is not practical since there is no perfect Oracle in real world. Even experts will make mistakes in UUs labelling. Such errors will lead to the terrible consequence since fake UUs will mislead the existing algorithms and reduce their performance. In this paper, we identify the impact of noisy Oracle and propose a UUs identifying algorithm that can be adapted to the setting of noisy Oracle. Experimental results demonstrate the effectiveness of our proposed method.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
4
Actually, \(\tau \) is a parameter worth discussing, and different thresholds will construct different search spaces. However, we tried several candidate values such as 0.70 and 0.75 in our experiments, and the results basically consistent, so we use the value in previous works [3, 14] without further discussion.
 
Literatur
1.
Zurück zum Zitat Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., Man, D.: Concrete problems in ai safety (2016) Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., Man, D.: Concrete problems in ai safety (2016)
2.
Zurück zum Zitat Attenberg, J., Ipeirotis, P., Provost, F.: Beat the machine: challenging humans to find a predictive model’s unknown unknowns. J. Data Inf. Qual. (JDIQ) 6(1), 1 (2015)CrossRef Attenberg, J., Ipeirotis, P., Provost, F.: Beat the machine: challenging humans to find a predictive model’s unknown unknowns. J. Data Inf. Qual. (JDIQ) 6(1), 1 (2015)CrossRef
3.
Zurück zum Zitat Bansal, G., Weld, D.S.: A coverage-based utility model for identifying unknown unknowns. In: AAAI (2018) Bansal, G., Weld, D.S.: A coverage-based utility model for identifying unknown unknowns. In: AAAI (2018)
4.
Zurück zum Zitat Blitzer, J., Dredze, M., Pereira, F.: Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. In: ACL, pp. 187–205 (2007) Blitzer, J., Dredze, M., Pereira, F.: Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. In: ACL, pp. 187–205 (2007)
5.
Zurück zum Zitat Chandola, V., Banerjee, A., Kumar, V.: Outlier detection: a survey. ACM Comput. Surv. (2007) Chandola, V., Banerjee, A., Kumar, V.: Outlier detection: a survey. ACM Comput. Surv. (2007)
6.
Zurück zum Zitat Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. (CSUR) 41(3), 15 (2009)CrossRef Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. (CSUR) 41(3), 15 (2009)CrossRef
7.
Zurück zum Zitat Elkan, C.: The foundations of cost-sensitive learning. In: IJCAI, vol. 17, pp. 973–978. Lawrence Erlbaum Associates Ltd. (2001) Elkan, C.: The foundations of cost-sensitive learning. In: IJCAI, vol. 17, pp. 973–978. Lawrence Erlbaum Associates Ltd. (2001)
8.
Zurück zum Zitat Glorot, X., Bordes, A., Bengio, Y.: Domain adaptation for large-scale sentiment classification: a deep learning approach. In: ICML, pp. 513–520 (2011) Glorot, X., Bordes, A., Bengio, Y.: Domain adaptation for large-scale sentiment classification: a deep learning approach. In: ICML, pp. 513–520 (2011)
9.
Zurück zum Zitat Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Burlington (2011)MATH Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Burlington (2011)MATH
10.
Zurück zum Zitat Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)CrossRef Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)CrossRef
11.
Zurück zum Zitat Hsueh, P.Y., Melville, P., Sindhwani, V.: Data quality from crowdsourcing: a study of annotation selection criteria. In: NAACL HLT Workshop on Active Learning and NLP, pp. 27–35. Association for Computational Linguistics (2009) Hsueh, P.Y., Melville, P., Sindhwani, V.: Data quality from crowdsourcing: a study of annotation selection criteria. In: NAACL HLT Workshop on Active Learning and NLP, pp. 27–35. Association for Computational Linguistics (2009)
12.
Zurück zum Zitat Hu, R., Delany, S., MacNamee, B.: Sampling with confidence: using K-NN confidence measures in active learning. In: ICCBR, p. 50 (2009) Hu, R., Delany, S., MacNamee, B.: Sampling with confidence: using K-NN confidence measures in active learning. In: ICCBR, p. 50 (2009)
13.
Zurück zum Zitat Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)
14.
Zurück zum Zitat Lakkaraju, H., Kamar, E., Caruana, R., Horvitz, E.: Identifying unknown unknowns in the open world: representations and policies for guided exploration. In: AAAI, pp. 2124–2132 (2017) Lakkaraju, H., Kamar, E., Caruana, R., Horvitz, E.: Identifying unknown unknowns in the open world: representations and policies for guided exploration. In: AAAI, pp. 2124–2132 (2017)
16.
Zurück zum Zitat McAuley, J., Pandey, R., Leskovec, J.: Inferring networks of substitutable and complementary products. In: KDD, pp. 785–794. ACM (2015) McAuley, J., Pandey, R., Leskovec, J.: Inferring networks of substitutable and complementary products. In: KDD, pp. 785–794. ACM (2015)
17.
Zurück zum Zitat Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013) Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:​1301.​3781 (2013)
18.
Zurück zum Zitat Pan, S.J., Yang, Q.: A survey on transfer learning. TKDE 22(10), 1345–1359 (2010) Pan, S.J., Yang, Q.: A survey on transfer learning. TKDE 22(10), 1345–1359 (2010)
19.
Zurück zum Zitat Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: ACL, p. 271. Association for Computational Linguistics (2004) Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: ACL, p. 271. Association for Computational Linguistics (2004)
20.
Zurück zum Zitat Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: ACL, pp. 115–124. Association for Computational Linguistics (2005) Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: ACL, pp. 115–124. Association for Computational Linguistics (2005)
21.
Zurück zum Zitat Ribeiro, M.T., Singh, S., Guestrin, C.: Why should i trust you? Explaining the predictions of any classifier. In: KDD, pp. 1135–1144. ACM (2016) Ribeiro, M.T., Singh, S., Guestrin, C.: Why should i trust you? Explaining the predictions of any classifier. In: KDD, pp. 1135–1144. ACM (2016)
23.
Zurück zum Zitat Settles, B.: Active learning literature survey. University of Wisconsin, Madison 52(55–66), 11 (2010) Settles, B.: Active learning literature survey. University of Wisconsin, Madison 52(55–66), 11 (2010)
24.
Zurück zum Zitat Settles, B., Craven, M., Ray, S.: Multiple-instance active learning. In: NIPS, pp. 1289–1296 (2008) Settles, B., Craven, M., Ray, S.: Multiple-instance active learning. In: NIPS, pp. 1289–1296 (2008)
25.
Zurück zum Zitat Seung, H.S., Opper, M., Sompolinsky, H.: Query by committee. In: COLT, pp. 287–294. ACM (1992) Seung, H.S., Opper, M., Sompolinsky, H.: Query by committee. In: COLT, pp. 287–294. ACM (1992)
26.
Zurück zum Zitat Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? Improving data quality and data mining using multiple, noisy labelers. In: KDD, pp. 614–622. ACM (2008) Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? Improving data quality and data mining using multiple, noisy labelers. In: KDD, pp. 614–622. ACM (2008)
27.
Zurück zum Zitat Shimodaira, H.: Improving predictive inference under covariate shift by weighting the log-likelihood function. J. Stat. Plan. Inference 90(2), 227–244 (2000)MathSciNetCrossRef Shimodaira, H.: Improving predictive inference under covariate shift by weighting the log-likelihood function. J. Stat. Plan. Inference 90(2), 227–244 (2000)MathSciNetCrossRef
28.
Zurück zum Zitat Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484 (2016)CrossRef Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484 (2016)CrossRef
29.
Zurück zum Zitat Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, vol. 1. MIT Press, Cambridge (1998) Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, vol. 1. MIT Press, Cambridge (1998)
30.
Zurück zum Zitat Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: lessons learned from the 2015 mscoco image captioning challenge. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 652–663 (2017)CrossRef Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: lessons learned from the 2015 mscoco image captioning challenge. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 652–663 (2017)CrossRef
31.
Zurück zum Zitat Zhang, T., Oles, F.: The value of unlabeled data for classification problems. In: ICML, pp. 1191–1198. Citeseer (2000) Zhang, T., Oles, F.: The value of unlabeled data for classification problems. In: ICML, pp. 1191–1198. Citeseer (2000)
32.
Zurück zum Zitat Zhu, X., Lafferty, J., Ghahramani, Z.: Combining active learning and semi-supervised learning using gaussian fields and harmonic functions. In: ICML 2003 Workshop on the Continuum From Labeled to Unlabeled Data in Machine Learning and Data Mining, vol. 3 (2003) Zhu, X., Lafferty, J., Ghahramani, Z.: Combining active learning and semi-supervised learning using gaussian fields and harmonic functions. In: ICML 2003 Workshop on the Continuum From Labeled to Unlabeled Data in Machine Learning and Data Mining, vol. 3 (2003)
Metadaten
Titel
An Effective Method for Identifying Unknown Unknowns with Noisy Oracle
verfasst von
Bo Zheng
Xin Lin
Yanghua Xiao
Jing Yang
Liang He
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-030-01081-2_32