nach oben

International Journal of Machine Learning and Cybernetics

Erschienen in:

01.12.2010 | Original Article

Margin-based active learning for structured predictions

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 1-4/2010

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Margin-based active learning remains the most widely used active learning paradigm due to its simplicity and empirical successes. However, most works are limited to binary or multiclass prediction problems, thus restricting the applicability of these approaches to many complex prediction problems where active learning would be most useful. For example, machine learning techniques for natural language processing applications often require combining multiple interdependent prediction problems—generally referred to as learning in structured output spaces. In many such application domains, complexity is further managed by decomposing a complex prediction into a sequence of predictions where earlier predictions are used as input to later predictions—commonly referred to as a pipeline model. This work describes methods for extending existing margin-based active learning techniques to these two settings, thus increasing the scope of problems for which active learning can be applied. We empirically validate these proposed active learning techniques by reducing the annotated data requirements on multiple instances of synthetic data, a semantic role labeling task, and a named entity and relation extraction system.

Vorheriger Artikel International journal of machine learning and cybernetics

Nächster Artikel Multiple classifier systems for robust classifier design in adversarial environments

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

ATZelectronics worldwide

ATZlectronics worldwide is up-to-speed on new trends and developments in automotive electronics on a scientific level with a high depth of information.

Order your 30-days-trial for free and without any commitment.

Jetzt informieren

ATZelektronik

Die Fachzeitschrift ATZelektronik bietet für Entwickler und Entscheider in der Automobil- und Zulieferindustrie qualitativ hochwertige und fundierte Informationen aus dem gesamten Spektrum der Pkw- und Nutzfahrzeug-Elektronik.

Lassen Sie sich jetzt unverbindlich 2 kostenlose Ausgabe zusenden.

Jetzt informieren

\(I\left[\kern-0.15em\left[ {} \right.\right.p\left.\left. {} \right]\kern-0.15em\right]\) is an indicator function such that \(I\left[\kern-0.15em\left[ {} \right.\right.p\left.\left. {} \right]\kern-0.15em\right]\) if p is true and 0 otherwise.

Empirical discrepancies between the performance reported in this work and that of [54] is accounted for by the use of averaged Perceptron and smaller batch sizes during instance selection.

Abney S (2002) Bootstrapping. In: Proceedings of the annual meeting of the association for computational linguistics (ACL), pp 360–367

Allwein EL, Schapire RE, Singer Y (2000) Reducing multiclass to binary: a unifying approach for margin classifiers. J Mach Learn Res 1:113–141CrossRefMathSciNet

Anderson B, Moore A (2005) Active learning for hidden Markov models: objective functions and algorithms. In: Proceedings of the international conference on machine learning (ICML), pp 9–16

Angluin D (1988) Queries and concept learning. Mach Learn 2(4):319–342

Balcan M-F, Beygelzimer A, Langford J (2006) Agnostic active learning. In: Proceedings of the international conference on machine learning (ICML), pp 65–72

Balcan M-F, Broder A, Zhang T (2007) Margin-based active learning. In: Proceedings of the annual ACM workshop on computational learning theory (COLT), pp 35–50

Balcan MF, Hanneke S, Wortman J (2008) The true sample complexity of active learning. In: Proceedings of the annual ACM workshop on computational learning theory (COLT), pp 45–56

Baldridge J, Osborne M (2004) Active learning and the total cost of annotation. In: Proceedings of the conference on empirical methods for natural language processing (EMNLP), pp 9–16

Baram Y, El-Yaniv R, Luz K (2004) Online choice of active learning algorithms. J Mach Learn Res 5:255–291MathSciNet

10.

Becker M (2008) Active learning: an explicit treatment of unreliable parameters. PhD thesis, University of Edinburgh

11.

Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the annual ACM workshop on computational learning theory (COLT), pp 92–100

12.

Brinker K (2004) Active learning of label ranking functions. In: Proceedings of the international conference on machine learning (ICML), pp 129–136

13.

Bunescu RC (2008) Learning with probabilistic features for improved pipeline models. In: Proceedings of the conference on empirical methods for natural language processing (EMNLP), pp 670–679

14.

Campbell C, Cristianini N, Smola A (2000) Query learning with large margin classifiers. In: Proceedings of the international conference on machine learning (ICML), pp 111–118

15.

Carreras X, Marquez L (2004) Introduction to the conll-2004 shared tasks: semantic role labeling. In:Proceedings of the annual conference on computational natural language learning (CoNLL)

16.

Castro RM, Nowak RD (2007) Minimax bounds for active learning. In: Proceedings of the Annual ACM workshop on computational learning theory (COLT), pp 5–19

17.

Chan YS, Ng HT (2007) Domain adaptation with active learning for word sense disambiguation. In: Proceedings of the annual meeting of the association for computational linguistics (ACL), pp 49–56

18.

Chang M-W, Do Q, Roth D (2006) Multilingual dependency parsing: a pipeline approach. In: Recent advances in natural language processing. Springer, Berlin, pp 195–204

19.

Chang M-W, Ratinov L, Rizzolo N, Roth D (2008) Learning and inference with constraints. In: Proceedings of the national conference on artificial intelligence (AAAI), pp 1513–1518

20.

Cohn D, Atlas L, Ladner R (1994) Improving generalization with active learning. Mach Learn 15(2):201–222

21.

Cohn DA, Ghahramani Z, Jordan MI (1996) Active learning with statistical models. J Artif Intell Res 4:129–145MATH

22.

Collins M (2002) Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In: Proceedings of the conference on empirical methods for natural language processing (EMNLP), pp 1–8

23.

Culotta A, McCallum A (2005) Reducing labeling effort for structured prediction tasks. In: Proceedings of the national conference on artificial intelligence (AAAI), pp 746–751

24.

Dagan I, Engelson SP (1995) Committee-based sampling for training probabilistic classifiers. In: Proceedings of the international conference on machine learning (ICML), pp 150–157

25.

Dasgupta S (2004) Analysis of a greedy active learning strategy. In: The conference on advances in neural information processing systems (NIPS), pp 337–344

26.

Dasgupta S, Hsu D, Monteleoni C (2007) A general agnostic active learning algorithm. In: The conference on advances in neural information processing systems (NIPS), vol 20, pp 353–360

27.

Dasgupta S, Kalai AT, Monteleoni C (2005) Analysis of perceptron-based active learning. In: Proceedings of the annual ACM workshop on computational learning theory (COLT), pp 249–263

28.

Daumé III H, Langford J, Marcu D (2009) Search-based structured prediction. Mach Learn 75(3):297–325CrossRef

29.

Davis PC (2002) Stone soup translation: the linked automata model. PhD thesis, Ohio State University

30.

Donmez P, Carbonell J (2008) Optimizing estimated loss reduction for active sampling in rank learning. In: Proceedings of the international conference on machine learning (ICML), pp 248–255

31.

Donmez P, Carbonell JG, Bennett PN (2007) Dual strategy active learning. In: Proceedings of the European conference on machine learning (ECML), pp 116–127

32.

Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley-Interscience, New York

33.

Finkel JR, Manning CD, Ng AY (2006) Solving the problem of cascading errors: approximate bayesian inference for linguistic annotation pipelines. In: Proceedings of the conference on empirical methods for natural language processing (EMNLP), pp 618–626

34.

Freund Y, Schapire RE (1997) An decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139MATHCrossRefMathSciNet

35.

Freund Y, Schapire RE (1999) Large margin classification using the perceptron algorithm. Mach Learn 37(3):277–296MATHCrossRef

36.

Godbole S, Harpale A, Sarawagi S, Chakrabarti S (2004) Document classification through interactive supervision of document and term labels. In: Proceedings of the European conference on principles and practice of knowledge discovery in databases (PKDD), pp 185–196

37.

Hanneke S (2007) A bound o the label complexity of agnostic active learning. In: Proceedings of the international conference on machine learning (ICML), pp 353–360

38.

Hanneke S (2007) Teaching dimension and the complexity of active learning. In: Proceedings of the annual ACM workshop on computational learning theory (COLT), pp 66–81

39.

Har-Peled S, Roth D, Zimak D (2002) Constraint classification for multiclass classification and ranking. In: The conference on advances in neural information processing systems (NIPS), pp 785–792

40.

Hinton G, Sejnowski TJ (1999) Unsupervised learning: foundations of neural computation. MIT Press, Cambridge

41.

Hwa R (2004) Sample selection for statistical parsing. Comput Linguist 30(3):253–276CrossRefMathSciNet

42.

Kearns MJ, Schapire RE, Sellie LM (1994) Toward efficient agnostic learning. Mach Learn 17(2–3):115–141MATH

43.

Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the international conference on machine learning (ICML), pp 282–289

44.

Laws F, Schütze H (2008) Stopping criteria for active learning of named entity recognition. In: Proceedings of the international conference on computational linguistics (COLING), pp 465–472

45.

Luo T, Kramer K, Goldgof DB, Hall LO, Samson S, Remsen A, Hopkins T (2005) Active learning to recognize multiple types of plankton. J Mach Learn Res 6:589–613MathSciNet

46.

Nguyen HT, Smeulders A (2004) Active learning using pre-clustering. In: Proceedings of the international conference on machine learning (ICML), pp 623–630

47.

Och FJ, Tillmann C, Ney H (1999) Improved alignment models for statistical machine translation. In: Proceedings of the conference on empirical methods for natural language processing (EMNLP), pp 20–28

48.

Olsson F (2009) A literature survey of active machine learning in the context of natural language processing. Technical report, Swedish Institute of Computer Science

49.

Punyakanok V, Roth D, tau Yih W, Zimak D (2005) Learning and inference over constrained output. In: Proceedings of the international joint conference on artificial intelligence (IJCAI), pp 1124–1129

50.

Punyakanok V, Roth D, Yih W, Zimak D (2004) Semantic role labeling via integer linear programming inference. In: Proceedings of the international conference on computational linguistics (COLING)

51.

Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Francisco

52.

Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. IEEE 77(2):257–286CrossRef

53.

Rai P, Saha A, Hal Daume III HD, Venkatasubramanian S (2010) Domain adaptation meets active learning. In:NAACL workshop on active learning for NLP (ALNLP)

54.

Roth D, Small K (2006) Margin-based active learning for structured output spaces. In: Proceedings of the European conference on machine learning (ECML), pp 413–424

55.

Roth D, Small K (2008) Active learning for pipeline models. In: Proceedings of the national conference on artificial intelligence (AAAI), pp 683–688

56.

Roth D, Small K, Titov I (2009) Sequential learning of classifiers for structured prediction problems. In: Proceedings of the international conference on artificial intelligence and statistics (AISTATS), pp 440–447

57.

Roth D, Yih W-T (2004) A linear programming formulation for global inference in natural language tasks. In: Proceedings of the annual conference on computational natural language learning (CoNLL), pp 1–8

58.

Roth D, Yih W-T (2005) Integer linear programming inference for conditional random fields. In: Proceedings of the international conference on machine learning (ICML), pp 737–744

59.

Roth D, Yih W-T (2007) Global inference for entity and relation identification via a linear programming formulation. In: Introduction to statistical relational learning

60.

Scheffer T, Wrobel S (2001) Active learning of partially hidden Markov models. In: Proceedings of the ECML/PKDD workshop on instance selection

61.

Schohn G, Cohn D (2000) Less is more: active learning with support vector machines. In: Proceedings of the international conference on machine learning (ICML), pp 839–846

62.

Sekine S, Sudo K, Nobata C (2002) Extended named entity hierarchy. In: Proceedings of the international conference on language resources and evaluation (LREC), pp 1818–1824

63.

Settles B (2009) Active learning literature survey. Technical Report 1648, University of Wisconsin-Madison

64.

Settles B, Craven M (2008) An analysis of active learning strategies for sequence labeling tasks. In: Proceedings of the conference on empirical methods for natural language processing (EMNLP), pp 1069–1078

65.

Shen D, Zhang J, Su J, Zhou G, Tan C-L (2004) Multi-criteria-based active learning for named entity recognition. In: Proceedings of the annual meeting of the association for computational linguistics (ACL), pp 589–596

66.

Small K (2005) Interactive learning protocols for natural language applications. PhD thesis, University of Illinois at Urbana-Champaign

67.

Tang M, Luo X, Roukos S (2002) Active learning for statistical natural language parsing. In: Proceedings of the annual meeting of the association for computational linguistics (ACL), pp 120–127

68.

Taskar B, Guestrin C, Koller D (2003) Max-margin Markov networks. In: The conference on advances in neural information processing systems (NIPS)

69.

Thompson CA, Califf ME, Mooney RJ (1999) Active learning for natural language parsing and information extraction. In: Proceedings of the international conference on machine learning (ICML), pp 406–414

70.

Tomanek K, Hahn U (2009) Semi-supervised active learning for sequence labeling. In: Proceedings of the annual meeting of the association for computational linguistics (ACL), pp 1039–1047

71.

Tomanek K, Wermter J, Hahn U (2007) An approach to text corpus construction which cuts annotation costs and maintains reusability of annotated data. In: Proceedings of the conference on empirical methods for natural language processing (EMNLP), pp 486–495

72.

Tong S, Koller D (2001) Support vector machine active learning with applications to text classification. J Mach Learn Res 2:45–66CrossRef

73.

Tsochantaridis I, Hofmann T, Joachims T, Altun Y (2004) Support vector machine learning for interdependent and structured output spaces. In: Proceedings of the international conference on machine learning (ICML), pp 823–830

74.

Valiant LG (1984) A theory of the learnable. Commun ACM, pp 1134–1142

75.

Vapnik VN (1999) The nature of statistical learning theory, 2nd edn. Springer, Berlin

76.

Vlachos A (2008) A stopping criterion for active learning. Comput Speech Lang 22(3):295–312CrossRef

77.

Waterman DA (1986) A guide to expert systems. Addison-Wesley, Reading

78.

Yan R, Yang J, Hauptmann A (2003) Automatically labeling video data using multiclass active learning. In: Proceedings of the international conference on computer vision (ICCV), pp 516–523

79.

Zhu J, Wang H, Hovy EH (2008) Learning a stopping criterion for active learning for word sense disambiguation and text classification. In: Proceedings of the international joint conference on natural language processing (IJCNLP), pp 366–372

80.

Zhu J, Wang H, Hovy EH (2008) Multi-criteria-based strategy to stop active learning for data annotation. In: Proceedings of the international conference on computational linguistics (COLING), pp 1129–1136

81.

Zhu X (2005) Semi-supervised learning learning literature survey. Computer Sciences 1530, University of Wisconsin-Madison

Titel: Margin-based active learning for structured predictions
Publikationsdatum: 01.12.2010
Erschienen in: International Journal of Machine Learning and Cybernetics / Ausgabe 1-4/2010
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI: https://doi.org/10.1007/s13042-010-0003-y

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Jonas Klose/© Pine Valley Capital GmbH, Carina Kießling von der Strategieberatung Roland Berger/© Monika Walther Fotografie | ATZ, Beijing Auto Show 2024: Deutsche Hersteller wollen angreifen./© EKH-Pictures / Generated with AI / Stock.adobe.com, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

ATZelectronics worldwide

ATZelektronik

Weitere Artikel der Ausgabe 1-4/2010

Multiple classifier systems for robust classifier design in adversarial environments

Understanding bag-of-words model: a statistical framework

Full-class set classification using the Hungarian algorithm

International journal of machine learning and cybernetics

An efficient gene selection technique for cancer recognition based on neighborhood mutual information

Genetic Algorithm-Neural Network (GANN): a study of neural network activation functions and depth of genetic algorithm search applied to feature selection

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.