Skip to main content

2017 | OriginalPaper | Buchkapitel

MOOCon: A Framework for Semi-supervised Concept Extraction from MOOC Content

verfasst von : Zhuoxuan Jiang, Yan Zhang, Xiaoming Li

Erschienen in: Database Systems for Advanced Applications

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Recent years have witnessed the rapid development of Massive Open Online Courses (MOOCs). MOOC platforms not only offer a one-stop learning setting, but also aggregate a large number of courses with various kinds of textual content, e.g. video subtitles, quizzes and forum content. MOOCs are also regarded as a large-scale ‘knowledge base’ which covers various domains. However, all the contents generated by instructors and learners are unstructured. In order to process the data to be structured for further knowledge management and mining, the first step could be concept extraction. In this paper, we expect to utilize human knowledge through labeling data, and propose a framework for concept extraction based on machine learning methods. The framework is flexible to support semi-supervised learning, in order to alleviate human effort of labeling training data. Also course-agnostic features are designed for modeling cross-domain data. Experimental results demonstrate that only 10% labeled data can lead to acceptable performance, and the semi-supervised learning method is comparable to the supervised version under the consistent framework. We find the textual contents of various forms, i.e. subtitles, PPTs and questions, should be separately processed due to their formal difference. At last we evaluate a new task: identifying needs of concept comprehension. Our framework can work well in doing identification on forum content while learning a model from subtitles.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Anderson, A., Huttenlocher, D., Kleinberg, J., Leskovec, J.: Engaging with massive online courses. In: WWW 2014, pp. 687–698 (2014) Anderson, A., Huttenlocher, D., Kleinberg, J., Leskovec, J.: Engaging with massive online courses. In: WWW 2014, pp. 687–698 (2014)
2.
3.
Zurück zum Zitat Chang, P.C., Galley, M., Manning, C.: Optimizing Chinese word segmentation for machine translation performance. In: WMT 2008, pp. 224–232 (2008) Chang, P.C., Galley, M., Manning, C.: Optimizing Chinese word segmentation for machine translation performance. In: WMT 2008, pp. 224–232 (2008)
4.
Zurück zum Zitat Collier, N., Nobata, C., Tsujii, J.: Automatic acquisition and classification of terminology using a tagged corpus in the molecular biology domain. Terminology 7(2), 239–257 (2002)CrossRef Collier, N., Nobata, C., Tsujii, J.: Automatic acquisition and classification of terminology using a tagged corpus in the molecular biology domain. Terminology 7(2), 239–257 (2002)CrossRef
5.
Zurück zum Zitat Dong, X., Gabrilovich, E., Heitz, G., Horn, W., Lao, N., Murphy, K., Strohmann, T., Zhang, S.S.W.: Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In: KDD 2014, pp. 601–610 (2014) Dong, X., Gabrilovich, E., Heitz, G., Horn, W., Lao, N., Murphy, K., Strohmann, T., Zhang, S.S.W.: Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In: KDD 2014, pp. 601–610 (2014)
6.
Zurück zum Zitat Frantzi, K., Ananiadou, S., Mima, H.: Automatic recognition of multi-word terms: the c-value/nc-value method. Int. J. Digit. Libr. 3(2), 115–130 (2000)CrossRef Frantzi, K., Ananiadou, S., Mima, H.: Automatic recognition of multi-word terms: the c-value/nc-value method. Int. J. Digit. Libr. 3(2), 115–130 (2000)CrossRef
7.
Zurück zum Zitat Hasan, K.S., Ng, V.: Automatic keyphrase extraction: a survey of the state of the art. In: ACL 2014, pp. 1262–1273 (2014) Hasan, K.S., Ng, V.: Automatic keyphrase extraction: a survey of the state of the art. In: ACL 2014, pp. 1262–1273 (2014)
8.
Zurück zum Zitat Huang, J., Dasgupta, A., Ghosh, A., Manning, J., Sanders, M.: Superposter behavior in MOOC forums. In: L@S 2014, Atlanta, GA, pp. 117–126, March 2014 Huang, J., Dasgupta, A., Ghosh, A., Manning, J., Sanders, M.: Superposter behavior in MOOC forums. In: L@S 2014, Atlanta, GA, pp. 117–126, March 2014
9.
Zurück zum Zitat Jiang, Z., Zhang, Y., Liu, C., Li, X.: Influence analysis by heterogeneous network in MOOC forums: what can we discover? In: EDM 2015, Madrid, Spain, pp. 242–249, June 2015 Jiang, Z., Zhang, Y., Liu, C., Li, X.: Influence analysis by heterogeneous network in MOOC forums: what can we discover? In: EDM 2015, Madrid, Spain, pp. 242–249, June 2015
10.
Zurück zum Zitat Justesona, J.S., Katza, S.M.: Technical terminology: some linguistic properties and an algorithm for identification in text. Nat. Lang. Eng. 1(1), 9–27 (1995) Justesona, J.S., Katza, S.M.: Technical terminology: some linguistic properties and an algorithm for identification in text. Nat. Lang. Eng. 1(1), 9–27 (1995)
11.
Zurück zum Zitat Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: ICML 2001, pp. 282–289 (2001) Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: ICML 2001, pp. 282–289 (2001)
12.
Zurück zum Zitat Liu, A., Jun, G., Ghosh, J.: A self-training approach to cost sensitive uncertainty sampling. Mach. Learn. 76(2–3), 257–270 (2009)CrossRef Liu, A., Jun, G., Ghosh, J.: A self-training approach to cost sensitive uncertainty sampling. Mach. Learn. 76(2–3), 257–270 (2009)CrossRef
13.
Zurück zum Zitat Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Workshop at ICLR 2013, pp. 1–12 (2013) Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Workshop at ICLR 2013, pp. 1–12 (2013)
14.
Zurück zum Zitat Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investig. 30(1), 3–26 (2007)CrossRef Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investig. 30(1), 3–26 (2007)CrossRef
15.
Zurück zum Zitat Nojiri, S., Manning, C.D.: Software document terminology recognition. In: AAAI Spring Symposium, pp. 49–54 (2015) Nojiri, S., Manning, C.D.: Software document terminology recognition. In: AAAI Spring Symposium, pp. 49–54 (2015)
16.
Zurück zum Zitat Qin, Y., Zheng, D., Zhao, T., Zhang, M.: Chinese terminology extraction using EM-based transfer learning method. In: Gelbukh, A. (ed.) CICLing 2013. LNCS, vol. 7816, pp. 139–152. Springer, Heidelberg (2013). doi:10.1007/978-3-642-37247-6_12 CrossRef Qin, Y., Zheng, D., Zhao, T., Zhang, M.: Chinese terminology extraction using EM-based transfer learning method. In: Gelbukh, A. (ed.) CICLing 2013. LNCS, vol. 7816, pp. 139–152. Springer, Heidelberg (2013). doi:10.​1007/​978-3-642-37247-6_​12 CrossRef
17.
Zurück zum Zitat Ratinov, L., Roth, D.: Design challenges and misconceptions in named entity recognition. In: CoNLL 2009, pp. 147–155 (2009) Ratinov, L., Roth, D.: Design challenges and misconceptions in named entity recognition. In: CoNLL 2009, pp. 147–155 (2009)
18.
Zurück zum Zitat Robertson, S., Zaragoza, H., Taylor, M.: Simple BM25 extension to multiple weighted fields. In: CIKM 2004, pp. 42–49 (2004) Robertson, S., Zaragoza, H., Taylor, M.: Simple BM25 extension to multiple weighted fields. In: CIKM 2004, pp. 42–49 (2004)
19.
Zurück zum Zitat Sutton, C., McCallum, A.: An introduction to conditional random fields. Mach. Learn. 4(4), 267–373 (2011)CrossRefMATH Sutton, C., McCallum, A.: An introduction to conditional random fields. Mach. Learn. 4(4), 267–373 (2011)CrossRefMATH
20.
Zurück zum Zitat Toutanova, K., Klein, D., Manning, C., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: HLT-NAACL 2003, pp. 252–259 (2003) Toutanova, K., Klein, D., Manning, C., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: HLT-NAACL 2003, pp. 252–259 (2003)
21.
Zurück zum Zitat Wang, X., Yang, D., Wen, M., Koedinger, K., Rosé, C.P.: Investigating how studentąŕs cognitive behavior in MOOC discussion forums affect learning gains. In: EDM 2015, Madrid, Spain, pp. 226–233, June 2015 Wang, X., Yang, D., Wen, M., Koedinger, K., Rosé, C.P.: Investigating how studentąŕs cognitive behavior in MOOC discussion forums affect learning gains. In: EDM 2015, Madrid, Spain, pp. 226–233, June 2015
22.
Zurück zum Zitat Wen, M., Yang, D., Rose, C.: Sentiment analysis in MOOC discussion forums: what does it tell us? In: EDM 2014, pp. 130–137 (2014) Wen, M., Yang, D., Rose, C.: Sentiment analysis in MOOC discussion forums: what does it tell us? In: EDM 2014, pp. 130–137 (2014)
Metadaten
Titel
MOOCon: A Framework for Semi-supervised Concept Extraction from MOOC Content
verfasst von
Zhuoxuan Jiang
Yan Zhang
Xiaoming Li
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-55705-2_24