nach oben

Erschienen in:

2018 | OriginalPaper | Buchkapitel

An Effective Method for Identifying Unknown Unknowns with Noisy Oracle

verfasst von : Bo Zheng, Xin Lin, Yanghua Xiao, Jing Yang, Liang He

Erschienen in: Case-Based Reasoning Research and Development

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Unknown Unknowns (UUs) are referred to the error predictions that with high confidence. The identifying of the UUs is important to understand the limitation of predictive models. Some proposed solutions are effective in such identifying. All of them assume there is a perfect Oracle to return the correct labels of the UUs. However, it is not practical since there is no perfect Oracle in real world. Even experts will make mistakes in UUs labelling. Such errors will lead to the terrible consequence since fake UUs will mislead the existing algorithms and reduce their performance. In this paper, we identify the impact of noisy Oracle and propose a UUs identifying algorithm that can be adapted to the setting of noisy Oracle. Experimental results demonstrate the effectiveness of our proposed method.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Considering Nutrients During the Generation of Recipes by Process-Oriented Case-Based Reasoning

Nächstes Kapitel On the Role of Similarity in Analogical Transfer

http://aiweb.cs.washington.edu/ai/unkunk18/.

https://www.kaggle.com/c/dogs-vs-cats/data.

http://scikit-learn.org/.

Actually, \(\tau \) is a parameter worth discussing, and different thresholds will construct different search spaces. However, we tried several candidate values such as 0.70 and 0.75 in our experiments, and the results basically consistent, so we use the value in previous works [3, 14] without further discussion.

https://en.wikipedia.org/wiki/Elbow_method_(clustering).

Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., Man, D.: Concrete problems in ai safety (2016)

Attenberg, J., Ipeirotis, P., Provost, F.: Beat the machine: challenging humans to find a predictive model’s unknown unknowns. J. Data Inf. Qual. (JDIQ) 6(1), 1 (2015)CrossRef

Bansal, G., Weld, D.S.: A coverage-based utility model for identifying unknown unknowns. In: AAAI (2018)

Blitzer, J., Dredze, M., Pereira, F.: Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. In: ACL, pp. 187–205 (2007)

Chandola, V., Banerjee, A., Kumar, V.: Outlier detection: a survey. ACM Comput. Surv. (2007)

Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. (CSUR) 41(3), 15 (2009)CrossRef

Elkan, C.: The foundations of cost-sensitive learning. In: IJCAI, vol. 17, pp. 973–978. Lawrence Erlbaum Associates Ltd. (2001)

Glorot, X., Bordes, A., Bengio, Y.: Domain adaptation for large-scale sentiment classification: a deep learning approach. In: ICML, pp. 513–520 (2011)

Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Burlington (2011)MATH

10.

Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)CrossRef

11.

Hsueh, P.Y., Melville, P., Sindhwani, V.: Data quality from crowdsourcing: a study of annotation selection criteria. In: NAACL HLT Workshop on Active Learning and NLP, pp. 27–35. Association for Computational Linguistics (2009)

12.

Hu, R., Delany, S., MacNamee, B.: Sampling with confidence: using K-NN confidence measures in active learning. In: ICCBR, p. 50 (2009)

13.

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)

14.

Lakkaraju, H., Kamar, E., Caruana, R., Horvitz, E.: Identifying unknown unknowns in the open world: representations and policies for guided exploration. In: AAAI, pp. 2124–2132 (2017)

15.

Lewis, D.D., Gale, W.A.: A sequential algorithm for training text classifiers. In: Croft, B.W., van Rijsbergen, C.J. (eds.) SIGIR, pp. 3–12. Springer, New York (1994). https://doi.org/10.1007/978-1-4471-2099-5_1CrossRef

16.

McAuley, J., Pandey, R., Leskovec, J.: Inferring networks of substitutable and complementary products. In: KDD, pp. 785–794. ACM (2015)

17.

Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

18.

Pan, S.J., Yang, Q.: A survey on transfer learning. TKDE 22(10), 1345–1359 (2010)

19.

Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: ACL, p. 271. Association for Computational Linguistics (2004)

20.

Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: ACL, pp. 115–124. Association for Computational Linguistics (2005)

21.

Ribeiro, M.T., Singh, S., Guestrin, C.: Why should i trust you? Explaining the predictions of any classifier. In: KDD, pp. 1135–1144. ACM (2016)

22.

Sani, S., Wiratunga, N., Massie, S., Cooper, K.: kNN sampling for personalised human activity recognition. In: Aha, D.W., Lieber, J. (eds.) ICCBR 2017. LNCS (LNAI), vol. 10339, pp. 330–344. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-61030-6_23CrossRef

23.

Settles, B.: Active learning literature survey. University of Wisconsin, Madison 52(55–66), 11 (2010)

24.

Settles, B., Craven, M., Ray, S.: Multiple-instance active learning. In: NIPS, pp. 1289–1296 (2008)

25.

Seung, H.S., Opper, M., Sompolinsky, H.: Query by committee. In: COLT, pp. 287–294. ACM (1992)

26.

Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? Improving data quality and data mining using multiple, noisy labelers. In: KDD, pp. 614–622. ACM (2008)

27.

Shimodaira, H.: Improving predictive inference under covariate shift by weighting the log-likelihood function. J. Stat. Plan. Inference 90(2), 227–244 (2000)MathSciNetCrossRef

28.

Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484 (2016)CrossRef

29.

Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, vol. 1. MIT Press, Cambridge (1998)

30.

Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: lessons learned from the 2015 mscoco image captioning challenge. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 652–663 (2017)CrossRef

31.

Zhang, T., Oles, F.: The value of unlabeled data for classification problems. In: ICML, pp. 1191–1198. Citeseer (2000)

32.

Zhu, X., Lafferty, J., Ghahramani, Z.: Combining active learning and semi-supervised learning using gaussian fields and harmonic functions. In: ICML 2003 Workshop on the Continuum From Labeled to Unlabeled Data in Machine Learning and Data Mining, vol. 3 (2003)

Titel: An Effective Method for Identifying Unknown Unknowns with Noisy Oracle
verfasst von: Bo Zheng
Xin Lin
Yanghua Xiao
Jing Yang
Liang He
Verlag: Springer International Publishing
Buch: Case-Based Reasoning Research and Development
Print ISBN: 978-3-030-01080-5

Electronic ISBN: 978-3-030-01081-2

Copyright-Jahr: 2018
DOI: https://doi.org/10.1007/978-3-030-01081-2_32

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"