Skip to main content
Erschienen in: Cluster Computing 4/2017

25.09.2017

Using hybrid algorithmic-crowdsourcing methods for academic knowledge acquisition

verfasst von: Zhaoan Dong, Jiaheng Lu, Tok Wang Ling, Ju Fan, Yueguo Chen

Erschienen in: Cluster Computing | Ausgabe 4/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Scientific literature contains a lot of meaningful objects such as Figures, Tables, Definitions, Algorithms, etc., which are called Knowledge Cells hereafter. An advanced academic search engine which could take advantage of Knowledge Cells and their various relationships to obtain more accurate search results is expected. Further, it’s expected to provide a fine-grained search regarding to Knowledge Cells for deep-level information discovery and exploration. Therefore, it is important to identify and extract the Knowledge Cells and their various relationships which are often intrinsic and implicit in articles. With the exponential growth of scientific publications, discovery and acquisition of such useful academic knowledge impose some practical challenges For example, existing algorithmic methods can hardly extend to handle diverse layouts of journals, nor to scale up to process massive documents. As crowdsourcing has become a powerful paradigm for large scale problem-solving especially for tasks that are difficult for computers but easy for human, we consider the problem of academic knowledge discovery and acquisition as a crowd-sourced database problem and show a hybrid framework to integrate the accuracy of crowdsourcing workers and the speed of automatic algorithms. In this paper, we introduce our current system implementation, a platform for academic knowledge discovery and acquisition (PANDA), as well as some interesting observations and promising future directions.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Alewiwi, M., Orencik, C., Savaş, E.: Efficient top-k similarity document search utilizing distributed file systems and cosine similarity. Clust. Comput. 19(1), 109–126 (2016). doi:10.1007/s10586-015-0506-0 CrossRef Alewiwi, M., Orencik, C., Savaş, E.: Efficient top-k similarity document search utilizing distributed file systems and cosine similarity. Clust. Comput. 19(1), 109–126 (2016). doi:10.​1007/​s10586-015-0506-0 CrossRef
2.
Zurück zum Zitat Allahbakhsh, M., Benatallah, B., Ignjatovic, A.: Quality control in crowdsourcing systems. IEEE Internet Comput. 17, 76–81 (2013)CrossRef Allahbakhsh, M., Benatallah, B., Ignjatovic, A.: Quality control in crowdsourcing systems. IEEE Internet Comput. 17, 76–81 (2013)CrossRef
3.
Zurück zum Zitat Chen, J.J., Menezes, N.J., Bradley, A.D., North, T.: Opportunities for crowdsourcing research on amazon mechanical turk. Interfaces 5(3) (2011) Chen, J.J., Menezes, N.J., Bradley, A.D., North, T.: Opportunities for crowdsourcing research on amazon mechanical turk. Interfaces 5(3) (2011)
4.
Zurück zum Zitat Dai, P., Lin, C.H., Weld, D.S., et al.: Pomdp-based control of workflows for crowdsourcing. Artif. Intell. 202, 52–85 (2013)CrossRefMATHMathSciNet Dai, P., Lin, C.H., Weld, D.S., et al.: Pomdp-based control of workflows for crowdsourcing. Artif. Intell. 202, 52–85 (2013)CrossRefMATHMathSciNet
5.
Zurück zum Zitat Doan, A., Ramakrishnan, R., Halevy, A.Y.: Crowdsourcing systems on the world-wide web. Commun. ACM 54(4), 86–96 (2011)CrossRef Doan, A., Ramakrishnan, R., Halevy, A.Y.: Crowdsourcing systems on the world-wide web. Commun. ACM 54(4), 86–96 (2011)CrossRef
6.
Zurück zum Zitat Franklin, M.J., Kossmann, D., Kraska, T., Ramesh, S., Xin, R.: CrowdDB:answering queries with crowdsourcing. In: SIGMOD, pp. 61–72 (2011) Franklin, M.J., Kossmann, D., Kraska, T., Ramesh, S., Xin, R.: CrowdDB:answering queries with crowdsourcing. In: SIGMOD, pp. 61–72 (2011)
7.
Zurück zum Zitat Gomes, C., Schneider, D., Moraes, K., de Souza, J.: Crowdsourcing for music: survey and taxonomy. In: 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 832–839 (2012) Gomes, C., Schneider, D., Moraes, K., de Souza, J.: Crowdsourcing for music: survey and taxonomy. In: 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 832–839 (2012)
8.
Zurück zum Zitat Howe, J.: The rise of crowdsourcing. Wired Mag. 14(6), 1–4 (2006) Howe, J.: The rise of crowdsourcing. Wired Mag. 14(6), 1–4 (2006)
9.
Zurück zum Zitat Hofeld, T., Tran-Gia, P., Vucovic, M.: Crowdsourcing: from theory to practice and long-term perspectives (dagstuhl seminar 13361). Dagstuhl Rep. 3(9), 1–33 (2013) Hofeld, T., Tran-Gia, P., Vucovic, M.: Crowdsourcing: from theory to practice and long-term perspectives (dagstuhl seminar 13361). Dagstuhl Rep. 3(9), 1–33 (2013)
10.
11.
Zurück zum Zitat Huang, F., Li, J., Lu, J., Ling, T.W., Dong, Z.: Pandasearch: a fine-grained academic search engine for research documents. In: ICDE 2015 (2015) Huang, F., Li, J., Lu, J., Ling, T.W., Dong, Z.: Pandasearch: a fine-grained academic search engine for research documents. In: ICDE 2015 (2015)
12.
Zurück zum Zitat Hung, N.Q.V., Tam, N.T., Tran, L.N., Aberer, K.: An evaluation of aggregation techniques in crowdsourcing. In: Web Information Systems Engineering–WISE 2013, pp. 1–15. Springer, New York (2013) Hung, N.Q.V., Tam, N.T., Tran, L.N., Aberer, K.: An evaluation of aggregation techniques in crowdsourcing. In: Web Information Systems Engineering–WISE 2013, pp. 1–15. Springer, New York (2013)
13.
Zurück zum Zitat Ipeirotis, P.G., Provost, F., Wang, J.: Quality management on amazon mechanical turk. In: Proceedings of the ACM SIGKDD Workshop on Human Computation, HCOMP ’10, pp. 64–67. ACM, New York, NY (2010). doi:10.1145/1837885.1837906 Ipeirotis, P.G., Provost, F., Wang, J.: Quality management on amazon mechanical turk. In: Proceedings of the ACM SIGKDD Workshop on Human Computation, HCOMP ’10, pp. 64–67. ACM, New York, NY (2010). doi:10.​1145/​1837885.​1837906
14.
Zurück zum Zitat Joglekar, M., Garcia-Molina, H., Parameswaran, A.: Evaluating the crowd with confidence. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 686–694 (2013) Joglekar, M., Garcia-Molina, H., Parameswaran, A.: Evaluating the crowd with confidence. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 686–694 (2013)
15.
Zurück zum Zitat Kamar, E., Hacker, S., Horvitz, E.: Combining human and machine intelligence in large-scale crowdsourcing. In: AAMAS, pp. 467–474 (2012) Kamar, E., Hacker, S., Horvitz, E.: Combining human and machine intelligence in large-scale crowdsourcing. In: AAMAS, pp. 467–474 (2012)
16.
Zurück zum Zitat Kittur, A., Nickerson, J.V., Bernstein, M., Gerber, E., Shaw, A., Zimmerman, J., Lease, M., Horton, J.: The future of crowd work. In: Proceedings of the 2013 Conference on Computer Supported Cooperative Work, pp. 1301–1318 (2013) Kittur, A., Nickerson, J.V., Bernstein, M., Gerber, E., Shaw, A., Zimmerman, J., Lease, M., Horton, J.: The future of crowd work. In: Proceedings of the 2013 Conference on Computer Supported Cooperative Work, pp. 1301–1318 (2013)
17.
Zurück zum Zitat Klampfl, S., Granitzer, M., Jack, K., Kern, R.: Unsupervised document structure analysis of digital scientific articles. Int. J. Digit. Libr. 14(3), 83–99 (2014)CrossRef Klampfl, S., Granitzer, M., Jack, K., Kern, R.: Unsupervised document structure analysis of digital scientific articles. Int. J. Digit. Libr. 14(3), 83–99 (2014)CrossRef
18.
Zurück zum Zitat Kondreddi, S.K., Triantafillou, P., Weikum, G.: Combining information extraction and human computing for crowdsourced knowledge acquisition. In: ICDE, pp. 988–999 (2014) Kondreddi, S.K., Triantafillou, P., Weikum, G.: Combining information extraction and human computing for crowdsourced knowledge acquisition. In: ICDE, pp. 988–999 (2014)
19.
Zurück zum Zitat Kulkarni, A.: The complexity of crowdsourcing: Theoretical problems in human computation. In: CHI Workshop on Crowdsourcing and Human Computation (2011) Kulkarni, A.: The complexity of crowdsourcing: Theoretical problems in human computation. In: CHI Workshop on Crowdsourcing and Human Computation (2011)
20.
Zurück zum Zitat Kuncheva, L.I., Whitaker, C.J., Shipp, C.A., Duin, R.P.W.: Limits on the majority vote accuracy in classifier fusion. Pattern Anal. Appl. 6(1), 22–31 (2003)CrossRefMATHMathSciNet Kuncheva, L.I., Whitaker, C.J., Shipp, C.A., Duin, R.P.W.: Limits on the majority vote accuracy in classifier fusion. Pattern Anal. Appl. 6(1), 22–31 (2003)CrossRefMATHMathSciNet
21.
Zurück zum Zitat Li, P., yang Yu, X., Liu, Y., ting Zhang, T.: Crowdsourcing fraud detection algorithm based on Ebbinghaus forgetting curve. Int. J. Secur. Appl. 8(1), 283 (2014) Li, P., yang Yu, X., Liu, Y., ting Zhang, T.: Crowdsourcing fraud detection algorithm based on Ebbinghaus forgetting curve. Int. J. Secur. Appl. 8(1), 283 (2014)
22.
Zurück zum Zitat Lofi, C., Maarry, K.E.: Design patterns for hybrid algorithmic-crowdsourcing workflows. In: CBI, pp. 1–8 (2014) Lofi, C., Maarry, K.E.: Design patterns for hybrid algorithmic-crowdsourcing workflows. In: CBI, pp. 1–8 (2014)
23.
Zurück zum Zitat Luz, N., Silva, N., Novais, P.: Generating human-computer micro-task workflows from domain ontologies. In: Human-Computer Interaction. Theories, Methods, and Tools, pp. 98–109. Springer, New York(2014) Luz, N., Silva, N., Novais, P.: Generating human-computer micro-task workflows from domain ontologies. In: Human-Computer Interaction. Theories, Methods, and Tools, pp. 98–109. Springer, New York(2014)
25.
Zurück zum Zitat Mozafari, B., Sarkar, P., Franklin, M.J., Jordan, M.I., Madden, S.: Scaling up crowd-sourcing to very large datasets: a case for active learning. Proc. VLDB Endow. PVLDB 8(2), 125–136 (2014)CrossRef Mozafari, B., Sarkar, P., Franklin, M.J., Jordan, M.I., Madden, S.: Scaling up crowd-sourcing to very large datasets: a case for active learning. Proc. VLDB Endow. PVLDB 8(2), 125–136 (2014)CrossRef
26.
Zurück zum Zitat Panos, I., Little, G., Malone, T.W.: Composing and analyzing crowdsourcing workflows. Collective Intelligence pp. 1–3 (2014) Panos, I., Little, G., Malone, T.W.: Composing and analyzing crowdsourcing workflows. Collective Intelligence pp. 1–3 (2014)
27.
Zurück zum Zitat Quinn, A.J., Bederson, B.B.: Human computation: a survey and taxonomy of a growing field. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1403–1412 (2011) Quinn, A.J., Bederson, B.B.: Human computation: a survey and taxonomy of a growing field. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1403–1412 (2011)
28.
Zurück zum Zitat Rzeszotarski, J., Kittur, A.: Crowdscape: interactively visualizing user behavior and output. In: Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology, pp. 55–62 (2012) Rzeszotarski, J., Kittur, A.: Crowdscape: interactively visualizing user behavior and output. In: Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology, pp. 55–62 (2012)
29.
Zurück zum Zitat Sabou, M., Bontcheva, K., Derczynski, L., Scharl, A.: Corpus annotation through crowdsourcing: towards best practice guidelines. In: Proceeding of the LREC (2014) Sabou, M., Bontcheva, K., Derczynski, L., Scharl, A.: Corpus annotation through crowdsourcing: towards best practice guidelines. In: Proceeding of the LREC (2014)
30.
Zurück zum Zitat Saxton, G.D., Oh, O., Kishore, R.: Rules of crowdsourcing: models, issues, and systems of control. Inf. Syst. Manag. 30(1), 2–20 (2013)CrossRef Saxton, G.D., Oh, O., Kishore, R.: Rules of crowdsourcing: models, issues, and systems of control. Inf. Syst. Manag. 30(1), 2–20 (2013)CrossRef
31.
32.
Zurück zum Zitat Wang, G., Wang, T., Zheng, H., Zhao, B.Y.: Man vs. machine: Practical adversarial detection of malicious crowdsourcing workers. In: 23rd USENIX Security Symposium, USENIX Association, CA, pp. 239–254 (2014) Wang, G., Wang, T., Zheng, H., Zhao, B.Y.: Man vs. machine: Practical adversarial detection of malicious crowdsourcing workers. In: 23rd USENIX Security Symposium, USENIX Association, CA, pp. 239–254 (2014)
33.
Zurück zum Zitat Wu, J., Williams, K., Chen, H., Khabsa, M., Caragea, C., Ororbia, A., Jordan, D., Giles, C.L.: Citeseerx: AI in a digital library search engine. In: AAAI, pp. 2930–2937 (2014) Wu, J., Williams, K., Chen, H., Khabsa, M., Caragea, C., Ororbia, A., Jordan, D., Giles, C.L.: Citeseerx: AI in a digital library search engine. In: AAAI, pp. 2930–2937 (2014)
34.
Zurück zum Zitat Yin, X., Liu, W., Wang, Y., Yang, C., Lu, L.: What? how? where? a survey of crowdsourcing. In: Frontier and Future Development of Information Technology in Medicine and Education, Lecture Notes in Electrical Engineering, vol. 269, chap. 22, pp. 221–232. Springer, Netherlands (2014). doi:10.1007/978-94-007-7618-0_22 Yin, X., Liu, W., Wang, Y., Yang, C., Lu, L.: What? how? where? a survey of crowdsourcing. In: Frontier and Future Development of Information Technology in Medicine and Education, Lecture Notes in Electrical Engineering, vol. 269, chap. 22, pp. 221–232. Springer, Netherlands (2014). doi:10.​1007/​978-94-007-7618-0_​22
35.
Zurück zum Zitat Zhao, Y., Zhu, Q.: Evaluation on crowdsourcing research: current status and future direction. Inf. Syst. Front. 1–18 (2014) Zhao, Y., Zhu, Q.: Evaluation on crowdsourcing research: current status and future direction. Inf. Syst. Front. 1–18 (2014)
Metadaten
Titel
Using hybrid algorithmic-crowdsourcing methods for academic knowledge acquisition
verfasst von
Zhaoan Dong
Jiaheng Lu
Tok Wang Ling
Ju Fan
Yueguo Chen
Publikationsdatum
25.09.2017
Verlag
Springer US
Erschienen in
Cluster Computing / Ausgabe 4/2017
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-017-1089-8

Weitere Artikel der Ausgabe 4/2017

Cluster Computing 4/2017 Zur Ausgabe