Skip to main content
Erschienen in: Quality & Quantity 3/2022

24.05.2021

Algorithmic thinking in the public interest: navigating technical, legal, and ethical hurdles to web scraping in the social sciences

verfasst von: Alex Luscombe, Kevin Dick, Kevin Walby

Erschienen in: Quality & Quantity | Ausgabe 3/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Web scraping, defined as the automated extraction of information online, is an increasingly important means of producing data in the social sciences. We contribute to emerging social science literature on computational methods by elaborating on web scraping as a means of automated access to information. We begin by situating the practice of web scraping in context, providing an overview of how it works and how it compares to other methods in the social sciences. Next, we assess the benefits and challenges of scraping as a technique of information production. In terms of benefits, we highlight how scraping can help researchers answer new questions, supersede limits in official data, overcome access hurdles, and reinvigorate the values of sharing, openness, and trust in the social sciences. In terms of challenges, we discuss three: technical, legal, and ethical. By adopting “algorithmic thinking in the public interest” as a way of navigating these hurdles, researchers can improve the state of access to information on the Internet while also contributing to scholarly discussions about the legality and ethics of web scraping. Example software accompanying this article are available within the supplementary materials.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
Other common monikers for data scraping include web scraping, screen scraping, web data extraction, web harvesting, and data harvesting. There are technical differences between the concepts of data “scraping” and website “crawling”. A crawler is a bot that will navigate to a website for the purpose of indexing (i.e. record keywords and metadata) and then navigating to other websites via the links on that page. A scraper is a bot designed with the explicit intent on navigating and extracting specific information from one or multiple target websites. For the sake of simplicity, we conflate the two concepts here. Where differences exist, the two are contrasted in-text.
 
2
Within the Supplementary Materials, we exemplify the combined use of several scraping libraries to achieve increasingly complex automated data extraction.
 
3
We thank the anonymous reviewer for this point.
 
Literatur
Zurück zum Zitat Abercrombie, G., Batista-Navarro, R.: Sentiment and position-taking analysis of parliamentary debates: a systematic literature review. J. Comput. Soc. Sci. 3, 245–270 (2020)CrossRef Abercrombie, G., Batista-Navarro, R.: Sentiment and position-taking analysis of parliamentary debates: a systematic literature review. J. Comput. Soc. Sci. 3, 245–270 (2020)CrossRef
Zurück zum Zitat Ackland, R.: Web social science: concepts, data and tools for social scientists in the digital age. Sage, Thousand Oaks (2013)CrossRef Ackland, R.: Web social science: concepts, data and tools for social scientists in the digital age. Sage, Thousand Oaks (2013)CrossRef
Zurück zum Zitat Allington, D.: Linguistic capital and development capital in a network of cultural producers: mutually valuing peer groups in the ‘interactive fiction’ retrogaming scene. Cult. Sociol. 10(2), 267–286 (2016) Allington, D.: Linguistic capital and development capital in a network of cultural producers: mutually valuing peer groups in the ‘interactive fiction’ retrogaming scene. Cult. Sociol. 10(2), 267–286 (2016)
Zurück zum Zitat Anglin, K.L.: Gather-narrow-extract: a framework for studying local policy variation using web-scraping and natural language processing. J. Res. Educ. Eff. 12(4), 685–706 (2019) Anglin, K.L.: Gather-narrow-extract: a framework for studying local policy variation using web-scraping and natural language processing. J. Res. Educ. Eff. 12(4), 685–706 (2019)
Zurück zum Zitat Bancroft, A.: Research in fractured digital spaces. Int. J. Drug Policy 73, 288–292 (2019)CrossRef Bancroft, A.: Research in fractured digital spaces. Int. J. Drug Policy 73, 288–292 (2019)CrossRef
Zurück zum Zitat Boeing, G., Waddell, P.: New insights into rental housing markets across the united states: web scraping and analyzing craigslist rental listings. J. Plan. Educ. Res. 37(4), 457–476 (2017)CrossRef Boeing, G., Waddell, P.: New insights into rental housing markets across the united states: web scraping and analyzing craigslist rental listings. J. Plan. Educ. Res. 37(4), 457–476 (2017)CrossRef
Zurück zum Zitat Braun, M.T., Kuljanin, G., DeShon, R.P.: Special considerations for the acquisition and wrangling of big data. Organ. Res. Methods 21(3), 633–659 (2018)CrossRef Braun, M.T., Kuljanin, G., DeShon, R.P.: Special considerations for the acquisition and wrangling of big data. Organ. Res. Methods 21(3), 633–659 (2018)CrossRef
Zurück zum Zitat Burrows, R., Savage, M.: After the crisis? Big data and the methodological challenges of empirical sociology. Big Data Soc. 1(1), 2053951714540280 (2014)CrossRef Burrows, R., Savage, M.: After the crisis? Big data and the methodological challenges of empirical sociology. Big Data Soc. 1(1), 2053951714540280 (2014)CrossRef
Zurück zum Zitat Caruana-Galizia, P., Caruana-Galizia, M.: Political land corruption: evidence from Malta-the European union’s smallest member state. J. Public Policy 38(4), 419–453 (2018) Caruana-Galizia, P., Caruana-Galizia, M.: Political land corruption: evidence from Malta-the European union’s smallest member state. J. Public Policy 38(4), 419–453 (2018)
Zurück zum Zitat Cavallo, A.: Scraped data and sticky prices. Rev. Econ. Stat. 100(1), 105–119 (2018)CrossRef Cavallo, A.: Scraped data and sticky prices. Rev. Econ. Stat. 100(1), 105–119 (2018)CrossRef
Zurück zum Zitat Cesare, N., Lee, H., McCormick, T., Spiro, E., Zagheni, E.: Promises and pitfalls of using digital traces for demographic research. Demography 55(5), 1979–1999 (2018)CrossRef Cesare, N., Lee, H., McCormick, T., Spiro, E., Zagheni, E.: Promises and pitfalls of using digital traces for demographic research. Demography 55(5), 1979–1999 (2018)CrossRef
Zurück zum Zitat Dewi, L.C., Chandra, A., et al.: Social media web scraping using social media developers api and regex. Procedia Comput. Sci. 157, 444–449 (2019)CrossRef Dewi, L.C., Chandra, A., et al.: Social media web scraping using social media developers api and regex. Procedia Comput. Sci. 157, 444–449 (2019)CrossRef
Zurück zum Zitat Dick, K., Charih, F., Woo, J., Green, J.R.: Gas prices of America: the machine-augmented crowd-sourcing era. In: 2020 17th Conference on Computer and Robot Vision (CRV), pp. 158–165. IEEE (2020) Dick, K., Charih, F., Woo, J., Green, J.R.: Gas prices of America: the machine-augmented crowd-sourcing era. In: 2020 17th Conference on Computer and Robot Vision (CRV), pp. 158–165. IEEE (2020)
Zurück zum Zitat Din, M.F.: Breaching and entering: when data scraping should be a federal computer hacking crime. Brooklyn Law Rev. 81, 405 (2015) Din, M.F.: Breaching and entering: when data scraping should be a federal computer hacking crime. Brooklyn Law Rev. 81, 405 (2015)
Zurück zum Zitat Drivas, I.: Liability for data scraping prohibitions under the refusal to deal doctrine. Univ. Chic. Law Rev. 86(7), 1901–1940 (2019) Drivas, I.: Liability for data scraping prohibitions under the refusal to deal doctrine. Univ. Chic. Law Rev. 86(7), 1901–1940 (2019)
Zurück zum Zitat Edelmann, A., Wolff, T., Montagne, D., Bail, C.A.: Computational social science and sociology. Ann. Rev. Sociol. 46, 61–81 (2020)CrossRef Edelmann, A., Wolff, T., Montagne, D., Bail, C.A.: Computational social science and sociology. Ann. Rev. Sociol. 46, 61–81 (2020)CrossRef
Zurück zum Zitat Edwards, A., Housley, W., Williams, M., Sloan, L., Williams, M.: Digital social research, social media and the sociological imagination: surrogacy, augmentation and re-orientation. Int. J. Soc. Res. Methodol. 16(3), 245–260 (2013)CrossRef Edwards, A., Housley, W., Williams, M., Sloan, L., Williams, M.: Digital social research, social media and the sociological imagination: surrogacy, augmentation and re-orientation. Int. J. Soc. Res. Methodol. 16(3), 245–260 (2013)CrossRef
Zurück zum Zitat Fazekas, M., Tóth, I.J.: From corruption to state capture: a new analytical framework with empirical applications from Hungary. Polit. Res. Q. 69(2), 320–334 (2016)CrossRef Fazekas, M., Tóth, I.J.: From corruption to state capture: a new analytical framework with empirical applications from Hungary. Polit. Res. Q. 69(2), 320–334 (2016)CrossRef
Zurück zum Zitat Felderer, B., Blom, A.G.: Acceptance of the automated online collection of geographical information. Sociol. Methods Res. 0049124119882480 (2019) Felderer, B., Blom, A.G.: Acceptance of the automated online collection of geographical information. Sociol. Methods Res. 0049124119882480 (2019)
Zurück zum Zitat Flisfeder, M.: Algorithmic Desire: Toward a New Structuralist Theory of Social Media. Northwestern University Press, Evanston (2021)CrossRef Flisfeder, M.: Algorithmic Desire: Toward a New Structuralist Theory of Social Media. Northwestern University Press, Evanston (2021)CrossRef
Zurück zum Zitat Futschek, G.: Algorithmic thinking: the key for understanding computer science. In: International Conference on Informatics in Secondary Schools-Evolution and Perspectives. Springer, pp. 159–168 (2006) Futschek, G.: Algorithmic thinking: the key for understanding computer science. In: International Conference on Informatics in Secondary Schools-Evolution and Perspectives. Springer, pp. 159–168 (2006)
Zurück zum Zitat Galliher, J.F.: Social scientists’ ethical responsibilties to superordinates: looking upward meekly. Soc. Probl. 27, 298 (1979) Galliher, J.F.: Social scientists’ ethical responsibilties to superordinates: looking upward meekly. Soc. Probl. 27, 298 (1979)
Zurück zum Zitat Golder, S.A., Macy, M.W.: Digital footprints: opportunities and challenges for online social research. Ann. Rev. Sociol. 40, 129–152 (2014)CrossRef Golder, S.A., Macy, M.W.: Digital footprints: opportunities and challenges for online social research. Ann. Rev. Sociol. 40, 129–152 (2014)CrossRef
Zurück zum Zitat Green, B., Viljoen, S.: Algorithmic realism: expanding the boundaries of algorithmic thought. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pp. 19–31 (2020) Green, B., Viljoen, S.: Algorithmic realism: expanding the boundaries of algorithmic thought. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pp. 19–31 (2020)
Zurück zum Zitat Gregory, K.: Online communication settings and the qualitative research process: acclimating students and novice researchers. Qual. Health Res. 28(10), 1610–1620 (2018)CrossRef Gregory, K.: Online communication settings and the qualitative research process: acclimating students and novice researchers. Qual. Health Res. 28(10), 1610–1620 (2018)CrossRef
Zurück zum Zitat Grimmer, J.: A bayesian hierarchical topic model for political texts: measuring expressed agendas in senate press releases. Polit. Anal. 18(1), 1–35 (2010)CrossRef Grimmer, J.: A bayesian hierarchical topic model for political texts: measuring expressed agendas in senate press releases. Polit. Anal. 18(1), 1–35 (2010)CrossRef
Zurück zum Zitat Haggerty, K.D.: Ethics creep: governing social science research in the name of ethics. Qual. Sociol. 27(4), 391–414 (2004)CrossRef Haggerty, K.D.: Ethics creep: governing social science research in the name of ethics. Qual. Sociol. 27(4), 391–414 (2004)CrossRef
Zurück zum Zitat Hampton, K.N.: Studying the digital: directions and challenges for digital methods. Ann. Rev. Sociol. 43, 167–188 (2017)CrossRef Hampton, K.N.: Studying the digital: directions and challenges for digital methods. Ann. Rev. Sociol. 43, 167–188 (2017)CrossRef
Zurück zum Zitat Hayes, A.L., Scott, T.A.: Multiplex network analysis for complex governance systems using surveys and online behavior. Policy Stud. J. 46(2), 327–353 (2018)CrossRef Hayes, A.L., Scott, T.A.: Multiplex network analysis for complex governance systems using surveys and online behavior. Policy Stud. J. 46(2), 327–353 (2018)CrossRef
Zurück zum Zitat Keuschnigg, M., Lovsjö, N., Hedström, P.: Analytical sociology and computational social science. J. Comput. Soc. Sci. 1(1), 3–14 (2018)CrossRef Keuschnigg, M., Lovsjö, N., Hedström, P.: Analytical sociology and computational social science. J. Comput. Soc. Sci. 1(1), 3–14 (2018)CrossRef
Zurück zum Zitat Landers, R.N., Brusso, R.C., Cavanaugh, K.J., Collmus, A.B.: A primer on theory-driven web scraping: automatic extraction of big data from the internet for use in psychological research. Psychol. Methods 21(4), 475 (2016)CrossRef Landers, R.N., Brusso, R.C., Cavanaugh, K.J., Collmus, A.B.: A primer on theory-driven web scraping: automatic extraction of big data from the internet for use in psychological research. Psychol. Methods 21(4), 475 (2016)CrossRef
Zurück zum Zitat Lazer, D., Radford, J.: Data ex machina: introduction to big data. Ann. Rev. Sociol. 43, 19–39 (2017)CrossRef Lazer, D., Radford, J.: Data ex machina: introduction to big data. Ann. Rev. Sociol. 43, 19–39 (2017)CrossRef
Zurück zum Zitat Lazer, D., Pentland, A., Adamic, L., Aral, S., Barabasi, A.L., Brewer, D., Christakis, N., Contractor, N., Fowler, J., Gutmann, M., et al.: Computational social science. Science (New York, NY) 323(5915), 721–723 (2009)CrossRef Lazer, D., Pentland, A., Adamic, L., Aral, S., Barabasi, A.L., Brewer, D., Christakis, N., Contractor, N., Fowler, J., Gutmann, M., et al.: Computational social science. Science (New York, NY) 323(5915), 721–723 (2009)CrossRef
Zurück zum Zitat Li, F., Zhou, Y, Cai, T.: Trails of data: Three cases for collecting web information for social science research. Soc. Sci. Comput. Rev. (OnlineFirst) (2019) Li, F., Zhou, Y, Cai, T.: Trails of data: Three cases for collecting web information for social science research. Soc. Sci. Comput. Rev. (OnlineFirst) (2019)
Zurück zum Zitat Lin, M., Lucas, H.C., Jr., Shmueli, G.: Research commentary-too big to fail: large samples and the p-value problem. Inf. Syst. Res. 24(4), 906–917 (2013)CrossRef Lin, M., Lucas, H.C., Jr., Shmueli, G.: Research commentary-too big to fail: large samples and the p-value problem. Inf. Syst. Res. 24(4), 906–917 (2013)CrossRef
Zurück zum Zitat Luscombe, A., Walby, K.: Theorizing freedom of information: the live archive, obfuscation, and actor-network theory. Gov. Inf. Q. 34(3), 379–387 (2017)CrossRef Luscombe, A., Walby, K.: Theorizing freedom of information: the live archive, obfuscation, and actor-network theory. Gov. Inf. Q. 34(3), 379–387 (2017)CrossRef
Zurück zum Zitat Maher, T.V., Seguin, C., Zhang, Y., Davis, A.P.: Social scientists’ testimony before congress in the united states between 1946–2016, trends from a new dataset. PLoS ONE 15(3), e0230104 (2020) Maher, T.V., Seguin, C., Zhang, Y., Davis, A.P.: Social scientists’ testimony before congress in the united states between 1946–2016, trends from a new dataset. PLoS ONE 15(3), e0230104 (2020)
Zurück zum Zitat Marres, N., Weltevrede, E.: Scraping the social? Issues in live social research. J. Cult. Econ. 6(3), 313–335 (2013)CrossRef Marres, N., Weltevrede, E.: Scraping the social? Issues in live social research. J. Cult. Econ. 6(3), 313–335 (2013)CrossRef
Zurück zum Zitat Massimino, B.: Accessing online data: web-crawling and information-scraping techniques to automate the assembly of research data. J. Bus. Logist. 37(1), 34–42 (2016)CrossRef Massimino, B.: Accessing online data: web-crawling and information-scraping techniques to automate the assembly of research data. J. Bus. Logist. 37(1), 34–42 (2016)CrossRef
Zurück zum Zitat Mausolf, J.G.: Occupy the government: analyzing presidential and congressional discursive response to movement repression. Soc. Sci. Res. 67, 91–114 (2017)CrossRef Mausolf, J.G.: Occupy the government: analyzing presidential and congressional discursive response to movement repression. Soc. Sci. Res. 67, 91–114 (2017)CrossRef
Zurück zum Zitat McFarland, D.A., McFarland, H.R.: Big data and the danger of being precisely inaccurate. Big Data Soc. 2(2), 2053951715602495 (2015)CrossRef McFarland, D.A., McFarland, H.R.: Big data and the danger of being precisely inaccurate. Big Data Soc. 2(2), 2053951715602495 (2015)CrossRef
Zurück zum Zitat McFarland, D.A., Lewis, K., Goldberg, A.: Sociology in the era of big data: the ascent of forensic social science. Am. Sociol. 47(1), 12–35 (2016)CrossRef McFarland, D.A., Lewis, K., Goldberg, A.: Sociology in the era of big data: the ascent of forensic social science. Am. Sociol. 47(1), 12–35 (2016)CrossRef
Zurück zum Zitat Millington, B., Millington, R.: ‘The datafication of everything’: toward a sociology of sport and big data. Sociol. Sport J. 32(2), 140–160 (2015) Millington, B., Millington, R.: ‘The datafication of everything’: toward a sociology of sport and big data. Sociol. Sport J. 32(2), 140–160 (2015)
Zurück zum Zitat Mitchell, R.: Web Scraping with Python: Collecting More Data from the Modern Web. O’Reilly Media, Newton (2018) Mitchell, R.: Web Scraping with Python: Collecting More Data from the Modern Web. O’Reilly Media, Newton (2018)
Zurück zum Zitat Munzert, S., Rubba, C., Meißner, P., Nyhuis, D.: Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining. Wiley, Hoboken (2014)CrossRef Munzert, S., Rubba, C., Meißner, P., Nyhuis, D.: Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining. Wiley, Hoboken (2014)CrossRef
Zurück zum Zitat Nader, L.: Up the anthropologist: perspectives gained from ‘studying up’. In: Hymes, D. (ed.) Reinventing Anthropology, pp. 284–311. Random House, New York (1968) Nader, L.: Up the anthropologist: perspectives gained from ‘studying up’. In: Hymes, D. (ed.) Reinventing Anthropology, pp. 284–311. Random House, New York (1968)
Zurück zum Zitat Nelson, L.K.: Computational grounded theory: a methodological framework. Sociol. Methods Res. 49(1), 3–42 (2020)CrossRef Nelson, L.K.: Computational grounded theory: a methodological framework. Sociol. Methods Res. 49(1), 3–42 (2020)CrossRef
Zurück zum Zitat Nelson, L.K., Burk, D., Knudsen, M., McCall, L.: The future of coding: a comparison of hand-coding and three types of computer-assisted text analysis methods. Sociol. Methods Res. 50(1), 202–237 (2021)CrossRef Nelson, L.K., Burk, D., Knudsen, M., McCall, L.: The future of coding: a comparison of hand-coding and three types of computer-assisted text analysis methods. Sociol. Methods Res. 50(1), 202–237 (2021)CrossRef
Zurück zum Zitat Nisser, A., Weidmann, N.B.: Online ethnic segregation in a post-conflict setting. Eur. J. Commun. 33(5), 489–504 (2018)CrossRef Nisser, A., Weidmann, N.B.: Online ethnic segregation in a post-conflict setting. Eur. J. Commun. 33(5), 489–504 (2018)CrossRef
Zurück zum Zitat Olmedilla, M., Martínez-Torres, M.R., Toral, S.: Harvesting big data in social science: a methodological approach for collecting online user-generated content. Comput. Stand. Interfaces 46, 79–87 (2016)CrossRef Olmedilla, M., Martínez-Torres, M.R., Toral, S.: Harvesting big data in social science: a methodological approach for collecting online user-generated content. Comput. Stand. Interfaces 46, 79–87 (2016)CrossRef
Zurück zum Zitat Pina-Sánchez, J., Grech, D., Brunton-Smith, I., Sferopoulos, D.: Exploring the origin of sentencing disparities in the crown court: using text mining techniques to differentiate between court and judge disparities. Soc. Sci. Res. 84, 102343 (2019)CrossRef Pina-Sánchez, J., Grech, D., Brunton-Smith, I., Sferopoulos, D.: Exploring the origin of sentencing disparities in the crown court: using text mining techniques to differentiate between court and judge disparities. Soc. Sci. Res. 84, 102343 (2019)CrossRef
Zurück zum Zitat Pina-Sánchez, J., Julian, V.R., Sferopoulos, D.: Does the crown court discriminate against Muslim-named offenders? A novel investigation based on text mining techniques. Br. J. Criminol. 59(3), 718–736 (2019a)CrossRef Pina-Sánchez, J., Julian, V.R., Sferopoulos, D.: Does the crown court discriminate against Muslim-named offenders? A novel investigation based on text mining techniques. Br. J. Criminol. 59(3), 718–736 (2019a)CrossRef
Zurück zum Zitat Possamai-Inesedy, A., Nixon, A.: A place to stand: digital sociology and the Archimedean effect. J. Sociol. 53(4), 865–884 (2017)CrossRef Possamai-Inesedy, A., Nixon, A.: A place to stand: digital sociology and the Archimedean effect. J. Sociol. 53(4), 865–884 (2017)CrossRef
Zurück zum Zitat Possler, D., Bruns, S., Niemann-Lenz, J.: Data is the new oil-but how do we drill it? Pathways to access and acquire large data sets in communication science. Int. J. Commun. 13, 3894–3911 (2019) Possler, D., Bruns, S., Niemann-Lenz, J.: Data is the new oil-but how do we drill it? Pathways to access and acquire large data sets in communication science. Int. J. Commun. 13, 3894–3911 (2019)
Zurück zum Zitat Qiu, L., Chan, S.H.M., Chan, D.: Big data in social and psychological science: theoretical and methodological issues. J. Comput. Soc. Sci. 1(1), 59–66 (2018)CrossRef Qiu, L., Chan, S.H.M., Chan, D.: Big data in social and psychological science: theoretical and methodological issues. J. Comput. Soc. Sci. 1(1), 59–66 (2018)CrossRef
Zurück zum Zitat Ravn, S., Barnwell, A., Barbosa Neves, B.: What is “publicly available data”? Exploring blurred public-private boundaries and ethical practices through a case study on Instagram. J. Empir. Res. Hum. Res. Ethics 15(1–2), 40–45 (2020) Ravn, S., Barnwell, A., Barbosa Neves, B.: What is “publicly available data”? Exploring blurred public-private boundaries and ethical practices through a case study on Instagram. J. Empir. Res. Hum. Res. Ethics 15(1–2), 40–45 (2020)
Zurück zum Zitat Roberts, M.E., Stewart, B.M., Tingley, D., Lucas, C., Leder-Luis, J., Albertson, B., Gadarian, S., Rand, D.: Topic models for open ended survey responses with applications to experiments. Am. J. Polit. Sci. 58, 1064–82 (2014)CrossRef Roberts, M.E., Stewart, B.M., Tingley, D., Lucas, C., Leder-Luis, J., Albertson, B., Gadarian, S., Rand, D.: Topic models for open ended survey responses with applications to experiments. Am. J. Polit. Sci. 58, 1064–82 (2014)CrossRef
Zurück zum Zitat Salganik, M.J.: Bit by bit: social research in the digital age. Princeton University Press, Princeton (2019) Salganik, M.J.: Bit by bit: social research in the digital age. Princeton University Press, Princeton (2019)
Zurück zum Zitat Savage, M., Burrows, R.: The coming crisis of empirical sociology. Sociology 41(5), 885–899 (2007)CrossRef Savage, M., Burrows, R.: The coming crisis of empirical sociology. Sociology 41(5), 885–899 (2007)CrossRef
Zurück zum Zitat Scassa, T.: Ownership and control over publicly accessible platform data. Online Inf. Rev. 43(6), 986–1002 (2019)CrossRef Scassa, T.: Ownership and control over publicly accessible platform data. Online Inf. Rev. 43(6), 986–1002 (2019)CrossRef
Zurück zum Zitat Schwartz, H.A., Ungar, L.H.: Data-driven content analysis of social media: a systematic overview of automated methods. Ann. Am. Acad. Pol. Soc. Sci. 659(1), 78–94 (2015)CrossRef Schwartz, H.A., Ungar, L.H.: Data-driven content analysis of social media: a systematic overview of automated methods. Ann. Am. Acad. Pol. Soc. Sci. 659(1), 78–94 (2015)CrossRef
Zurück zum Zitat Shi, F., Shi, Y., Dokshin, F.A., Evans, J.A., Macy, M.W.: Millions of online book co-purchases reveal partisan differences in the consumption of science. Nat. Hum. Behav. 1(4), 1–9 (2017)CrossRef Shi, F., Shi, Y., Dokshin, F.A., Evans, J.A., Macy, M.W.: Millions of online book co-purchases reveal partisan differences in the consumption of science. Nat. Hum. Behav. 1(4), 1–9 (2017)CrossRef
Zurück zum Zitat Stoltz, D.S., Taylor, M.A.: Concept mover’s distance: measuring concept engagement via word embeddings in texts. J. Comput. Soc. Sci. 2(2), 293–313 (2019) Stoltz, D.S., Taylor, M.A.: Concept mover’s distance: measuring concept engagement via word embeddings in texts. J. Comput. Soc. Sci. 2(2), 293–313 (2019)
Zurück zum Zitat Sugiura, L., Wiles, R., Pope, C.: Ethical challenges in online research: public/private perceptions. Res. Ethics 13(3–4), 184–199 (2017)CrossRef Sugiura, L., Wiles, R., Pope, C.: Ethical challenges in online research: public/private perceptions. Res. Ethics 13(3–4), 184–199 (2017)CrossRef
Zurück zum Zitat Tracy, S.J.: Qualitative quality: eight “big-tent” criteria for excellent qualitative research. Qual. Inq. 16(10), 837–851 (2010) Tracy, S.J.: Qualitative quality: eight “big-tent” criteria for excellent qualitative research. Qual. Inq. 16(10), 837–851 (2010)
Zurück zum Zitat Tufekci, Z.: Big questions for social media big data: representativeness, validity and other methodological pitfalls. arXiv:14037400 (2014) Tufekci, Z.: Big questions for social media big data: representativeness, validity and other methodological pitfalls. arXiv:​14037400 (2014)
Zurück zum Zitat Tzanetakis, M.: Comparing cryptomarkets for drugs. A characterisation of sellers and buyers over time. Int. J. Drug Policy 56, 176–186 (2018)CrossRef Tzanetakis, M.: Comparing cryptomarkets for drugs. A characterisation of sellers and buyers over time. Int. J. Drug Policy 56, 176–186 (2018)CrossRef
Zurück zum Zitat Ulbricht, L.: Scraping the demos. Digitalization, web scraping and the democratic project. Democratization 27(3), 426–442 (2020)CrossRef Ulbricht, L.: Scraping the demos. Digitalization, web scraping and the democratic project. Democratization 27(3), 426–442 (2020)CrossRef
Zurück zum Zitat Von Krogh, G., Von Hippel, E.: The promise of research on open source software. Manag. Sci. 52(7), 975–983 (2006)CrossRef Von Krogh, G., Von Hippel, E.: The promise of research on open source software. Manag. Sci. 52(7), 975–983 (2006)CrossRef
Metadaten
Titel
Algorithmic thinking in the public interest: navigating technical, legal, and ethical hurdles to web scraping in the social sciences
verfasst von
Alex Luscombe
Kevin Dick
Kevin Walby
Publikationsdatum
24.05.2021
Verlag
Springer Netherlands
Erschienen in
Quality & Quantity / Ausgabe 3/2022
Print ISSN: 0033-5177
Elektronische ISSN: 1573-7845
DOI
https://doi.org/10.1007/s11135-021-01164-0

Weitere Artikel der Ausgabe 3/2022

Quality & Quantity 3/2022 Zur Ausgabe

Premium Partner