research-article

Designing Ground Truth and the Social Life of Labels

Authors:
Michael Muller

AI Interactions IBM Research, United States

AI Interactions IBM Research, United States
View Profile

,
Christine T. Wolf

Independent Consultant, United States

Independent Consultant, United States
View Profile

,
Josh Andres

IBM Research Australia, Australia

IBM Research Australia, Australia
View Profile

,
Michael Desmond

IBM Research, United States

IBM Research, United States
View Profile

,
Narendra Nath Joshi

IBM Research IBM, United States

IBM Research IBM, United States
View Profile

,
Zahra Ashktorab

Thomas J. Watson Center IBM Research, United States

Thomas J. Watson Center IBM Research, United States
View Profile

,
Aabhas Sharma

IBM Research, United States

IBM Research, United States
View Profile

,
Kristina Brimijoin

IBM Research Mrs., United States

IBM Research Mrs., United States
View Profile

,
Qian Pan

IBM Research, United States

IBM Research, United States
View Profile

,
Evelyn Duesterwald

IBM Research, United States

IBM Research, United States
View Profile

,
Casey Dugan

IBM Research, United States

IBM Research, United States
View Profile

CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing SystemsMay 2021Article No.: 94Pages 1–16https://doi.org/10.1145/3411764.3445402

Published:07 May 2021Publication History

CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems

Pages 1–16

ABSTRACT

Ground-truth labeling is an important activity in machine learning. Many studies have examined how crowdworkers apply labels to records in machine learning datasets. However, there have been few studies that have examined the work of domain experts when their knowledge and expertise are needed to apply labels.

We provide a grounded account of the work of labeling teams with domain experts, including the experiences of labeling, collaborative configurations and work-practices, and quality issues. We show three major patterns in the social design of ground truth data: Principled design, Iterative design, and Improvisational design. We interpret our results through theories of from Human Centered Data Science, and particularly work on human interventions in data science work through the design and creation of data.

References

José Manuel Álvarez and Antonio Lopez. 2008. Novel index for objective evaluation of road detection algorithms. In 2008 11th International IEEE Conference on Intelligent Transportation Systems. IEEE, 815–820.Google ScholarCross Ref
Amol Ambardekar, Mircea Nicolescu, and Sergiu Dascalu. 2009. Ground truth verification tool (GTVT) for video surveillance systems. In 2009 Second International Conferences on Advances in Computer-Human Interactions. IEEE, 354–359.Google ScholarDigital Library
Saleema Amershi, Maya Cakmak, William Bradley Knox, and Todd Kulesza. 2014. Power to the people: The role of humans in interactive machine learning. Ai Magazine 35, 4 (2014), 105–120.Google ScholarDigital Library
Theresa Dirndorfer Anderson and Nicola Parker. 2019. Keeping the human in the data scientist: Shaping human-centered data science education. Proceedings of the Association for Information Science and Technology 56, 1 (2019), 601–603.Google ScholarCross Ref
Josh Andres, Christine T Wolf, Sergio Cabrero Barros, Erick Oduor, Rahul Nair, Alexander Kjærum, Anders Bech Tharsgaard, and Bo Schwartz Madsen. 2020. Scenario-based XAI for Humanitarian Aid Forecasting. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems Extended Abstracts. 1–8.Google Scholar
Cecilia Aragon, Clayton Hutto, Andy Echenique, Brittany Fiore-Gartland, Yun Huang, Jinyoung Kim, Gina Neff, Wanli Xing, and Joseph Bayer. 2016. Developing a research agenda for human-centered data science. In Proceedings of the 19th ACM Conference on Computer Supported Cooperative Work and Social Computing Companion. ACM, 529–535.Google ScholarDigital Library
Sinem Aslan, Sinem Emine Mete, Eda Okur, Ece Oktay, Nese Alyuz, Utku Ergin Genc, David Stanhill, and Asli Arslan Esme. 2017. Human expert labeling process (HELP): towards a reliable higher-order user state labeling process and tool to assess student engagement. Educational Technology(2017), 53–59.Google Scholar
Catherine M Baker, Lauren R Milne, and Richard E Ladner. 2019. Understanding the Impact of TVIs on Technology Use and Selection by Children with Visual Impairments. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–13.Google ScholarDigital Library
Sarah Elsie Baker and Rosalind Edwards. 2012. How many qualitative interviews is enough? Expert voices and early career reflections on sampling and cases in qualitative research. (2012).Google Scholar
Shaowen Bardzell, Daniela K Rosner, and Jeffrey Bardzell. 2012. Crafting quality in design: integrity, creativity, and public sensibility. In Proceedings of the Designing Interactive Systems Conference. 11–20.Google ScholarDigital Library
Gregory Bateson. 2000. Steps to an ecology of mind: Collected essays in anthropology, psychiatry, evolution, and epistemology. University of Chicago Press.Google Scholar
Jonathan Bean and Daniela Rosner. 2012. Old hat: craft versus design?interactions 19, 1 (2012), 86–88.Google ScholarDigital Library
Michael S Bernstein, Greg Little, Robert C Miller, Björn Hartmann, Mark S Ackerman, David R Karger, David Crowell, and Katrina Panovich. 2010. Soylent: a word processor with a crowd inside. In Proceedings of the 23nd annual ACM symposium on User interface software and technology. 313–322.Google ScholarDigital Library
Daniel Bertaux. 1981. From the life-history approach to the transformation of sociological practice. Biography and society: The life history approach in the social sciences (1981), 29–45.Google Scholar
Geoffrey C Bowker, C Geoffrey, W Bernard Carlson, 1994. Science on the run: Information management and industrial geophysics at Schlumberger, 1920-1940. MIT press.Google Scholar
Jenna Breckenridge and Derek Jones. 2009. Demystifying theoretical sampling in grounded theory research.Grounded Theory Review 8, 2 (2009).Google Scholar
Vitor R Carvalho, Matthew Lease, and Emine Yilmaz. 2011. Crowdsourcing for search evaluation. In ACM Sigir forum, Vol. 44. ACM New York, NY, USA, 17–22.Google Scholar
Kathy Charmaz. 2014. Constructing grounded theory. sage.Google Scholar
Kathy Charmaz and Antony Bryant. 2011. Grounded theory and credibility. Qualitative research 3(2011), 291–309.Google Scholar
Veronika Cheplygina and Josien PW Pluim. 2018. Crowd disagreement about medical images is informative. In Intravascular Imaging and Computer Assisted Stenting and Large-Scale Annotation of Biomedical Data and Expert Label Synthesis. Springer, 105–111.Google Scholar
Ming Cheung, James She, and Xiaopeng Li. 2015. Non-user generated annotation on user shared images for connection discovery. In 2015 IEEE International Conference on Data Science and Data Intensive Systems. IEEE, 204–209.Google ScholarDigital Library
Juliet Corbin and Anselm Strauss. 2014. Basics of qualitative research: Techniques and procedures for developing grounded theory. Sage publications.Google Scholar
Jonathan Corney, Andrew Lynn, Carmen Torres, Paola Di Maio, William Regli, Graeme Forbes, and Lynne Tobin. 2010. Towards crowdsourcing translation tasks in library cataloguing, a pilot study. In 4th IEEE International Conference on Digital Ecosystems and Technologies. IEEE, 572–577.Google ScholarCross Ref
Frederick J Damerau, David E Johnson, and Martin C Buskirk Jr. 2004. Automatic labeling of unlabeled text data. US Patent 6,697,998.Google Scholar
Michael Desmond, Kristina Brimijoin, Evelyn Duesterwald, Narendra Nath Joshi, Michael Muller, Zahra Ashktorab, Aabhas Sharma, Casey Dugan, and Qian Pan. 2020. AI=Assisted Data Labeling. Demo at NeurIPS 2020.Google Scholar
Christian Dietz and Michael R Berthold. 2016. KNIME for open-source bioimage analysis: a tutorial. In Focus on Bio-Image Informatics. Springer, 179–197.Google Scholar
Shari L Dworkin. 2012. Sample size policy for qualitative studies using in-depth interviews.Google Scholar
Thomas Erickson and Wendy A Kellogg. 2003. Social translucence: using minimalist visualisations of social activity to support collective interaction. In Designing information spaces: The social navigation approach. Springer, 17–41.Google Scholar
Hao Fang, Hao Cheng, Maarten Sap, Elizabeth Clark, Ari Holtzman, Yejin Choi, Noah A Smith, and Mari Ostendorf. 2018. Sounding board: A user-centric and content-driven social chatbot. arXiv preprint arXiv:1804.10202(2018).Google Scholar
Melanie Feinberg. 2017. A design perspective on data. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. 2952–2963.Google ScholarDigital Library
Melanie Feinberg. 2017. Material Vision. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. 604–617.Google Scholar
P. M. Ferreira, T. Mendonça, J. Rozeira, and P. Rocha. 2012. An Annotation Tool for Dermoscopic Image Segmentation. In Proceedings of the 1st International Workshop on Visual Interfaces for Ground Truth Collection in Computer Vision Applications (Capri, Italy) (VIGTA ’12). Association for Computing Machinery, New York, NY, USA, Article 5, 6 pages. https://doi.org/10.1145/2304496.2304501Google ScholarDigital Library
Karën Fort. 2016. Collaborative Annotation for Reliable Natural Language Processing: Technical and Sociological Aspects. John Wiley & Sons.Google Scholar
Susan Gasson and Jim Waters. 2013. Using a grounded theory approach to study online collaboration behaviors. European Journal of Information Systems 22, 1 (2013), 95–118.Google ScholarCross Ref
Elihu M Gerson and Susan Leigh Star. 1986. Analyzing due process in the workplace. ACM Transactions on Information Systems (TOIS) 4, 3 (1986), 257–270.Google ScholarDigital Library
Patty Gerstenblith. 2020. Provenience and Provenance Intersecting with International Law in the Market for Antiquities. NCJ Int’l L. 45(2020), 457.Google Scholar
Eric Gilbert. 2012. Designing social translucence over social networks. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2731–2740.Google ScholarDigital Library
Lisa Gitelman. 2013. Raw data is an oxymoron. MIT press.Google Scholar
Barney G Glaser and Anselm L Strauss. 2017. Discovery of grounded theory: Strategies for qualitative research. Routledge.Google Scholar
Michele Goetz. 2017. 3 Ways Data Preparation Tools Help You Get Ahead Of Big Data. ”https://go.forrester.com/blogs/15-02-17-3_ways_data_preparation_tools_help_you_get_ahead_of_big_data/”.Google Scholar
Charles Goodwin. 2000. Practices of color classification. Mind, culture, and activity 7, 1-2 (2000), 19–36.Google Scholar
Catherine Grady and Matthew Lease. 2010. Crowdsourcing document relevance assessment with mechanical turk. In Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with Amazon’s mechanical turk. Association for Computational Linguistics, 172–179.Google ScholarDigital Library
Mary L Gray and Siddharth Suri. 2019. Ghost Work: How to Stop Silicon Valley from Building a New Global Underclass. Eamon Dolan Books.Google Scholar
Ben Green. 2018. Data science as political action: grounding data science in a politics of justice. arXiv preprint arXiv:1811.03435(2018).Google Scholar
Greg Guest, Arwen Bunce, and Laura Johnson. 2006. How many interviews are enough? An experiment with data saturation and variability. Field methods 18, 1 (2006), 59–82.Google Scholar
Philip J Guo, Sean Kandel, Joseph M Hellerstein, and Jeffrey Heer. 2011. Proactive wrangling: Mixed-initiative end-user programming of data transformation scripts. In Proceedings of the 24th annual ACM symposium on User interface software and technology. 65–74.Google ScholarDigital Library
Ian Hampson and Anne Junor. 2005. Invisible work, invisible skills: interactive customer service as articulation work. New Technology, Work and Employment 20, 2 (2005), 166–181.Google ScholarCross Ref
Kotaro Hara, Vicki Le, and Jon Froehlich. 2013. Combining crowdsourcing and google street view to identify street-level accessibility problems. In Proceedings of the SIGCHI conference on human factors in computing systems. 631–640.Google ScholarDigital Library
Tony Hey, Stewart Tansley, Kristin Tolle, 2009. The fourth paradigm: data-intensive scientific discovery. Vol. 1. Microsoft research Redmond, WA.Google Scholar
Humayun Irshad, Eun-Yeong Oh, Daniel Schmolze, Liza M Quintana, Laura Collins, Rulla M Tamimi, and Andrew H Beck. 2017. Crowdsourcing scoring of immunohistochemistry images: Evaluating performance of the crowd and an automated computational method. Scientific reports 7(2017), 43286.Google Scholar
Narendra Nath Joshi, Aabhas Sharma, , Michael Muller, Qian Pan, Michael Desmond, Kristina Brimijoin, Zahra Ashktorab, Evelyn Duesterwald, and Casey Dugan. 2020. Fast and Automatic Visual Label Conflict Resolution. Demo at NeurIPS 2020.Google Scholar
Hiroshi Kajino, Yuta Tsuboi, Issei Sato, and Hisashi Kashima. 2012. Learning from crowds and experts. In Workshops at the Twenty-Sixth AAAI Conference on Artificial Intelligence.Google Scholar
Sean Kandel, Andreas Paepcke, Joseph Hellerstein, and Jeffrey Heer. 2011. Wrangler: Interactive visual specification of data transformation scripts. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 3363–3372.Google ScholarDigital Library
Mary Beth Kery, Bonnie E John, Patrick O’Flaherty, Amber Horvath, and Brad A Myers. 2019. Towards Effective Foraging by Data Scientists to Find Past Analysis Choices. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–13.Google ScholarDigital Library
Mary Beth Kery, Marissa Radensky, Mahima Arya, Bonnie E John, and Brad A Myers. 2018. The story in the notebook: Exploratory data science using a literate programming tool. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1–11.Google ScholarDigital Library
Rafal Kocielnik, Lillian Xiao, Daniel Avrahami, and Gary Hsieh. 2018. Reflection companion: a conversational system for engaging users in reflection on physical activity. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 2 (2018), 1–26.Google ScholarDigital Library
Marina Kogan, Aaron Halfaker, Shion Guha, Cecilia Aragon, Michael Muller, and Stuart Geiger. 2020. Mapping Out Human-Centered Data Science: Methods, Approaches, and Best Practices. In Companion of the 2020 ACM International Conference on Supporting Group Work. 151–156.Google ScholarDigital Library
Scott Krig. 2016. Ground truth data, content, metrics, and analysis. In Computer Vision Metrics. Springer, 247–271.Google Scholar
Larry Laudan. 1978. Progress and its problems: Towards a theory of scientific growth. Vol. 282. Univ of California Press.Google Scholar
Dong-Hyun Lee. 2013. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Workshop on challenges in representation learning, ICML, Vol. 3. 2.Google Scholar
Diana Lynn MacLean and Jeffrey Heer. 2013. Identifying medical terms in patient-authored text: a crowdsourcing-based approach. Journal of the american medical informatics association 20, 6(2013), 1120–1127.Google ScholarCross Ref
Mohd Aliff Abdul Majid, Mohhidin Othman, Siti Fatimah Mohamad, and Sarina Abdul Halim Lim. 2018. Achieving data saturation: evidence from a qualitative study of job satisfaction. Social and Management Research Journal 15, 2 (2018), 66–77.Google ScholarCross Ref
David W McDonald, Stephanie Gokhman, and Mark Zachry. 2012. Building for social translucence: a domain analysis and prototype system. In Proceedings of the ACM 2012 conference on computer supported cooperative work. 637–646.Google ScholarDigital Library
Nora McDonald, Sarita Schoenebeck, and Andrea Forte. 2019. Reliability and inter-rater reliability in qualitative research: Norms and guidelines for CSCW and HCI practice. Proceedings of the ACM on Human-Computer Interaction 3, CSCW(2019), 1–23.Google ScholarDigital Library
Helena M Mentis, Ahmed Rahim, and Pierre Theodore. 2016. Crafting the image in surgical telemedicine. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing. 744–755.Google ScholarDigital Library
Janice M Morse. 1995. The significance of saturation.Google Scholar
Michael Muller. 2014. Curiosity, creativity, and surprise as analytic tools: Grounded theory method. In Ways of Knowing in HCI. Springer, 25–48.Google Scholar
Michael Muller, Melanie Feinberg, Timothy George, Steven J Jackson, Bonnie E John, Mary Beth Kery, and Samir Passi. 2019. Human-Centered Study of Data Science Work Practices. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, W15.Google Scholar
Michael Muller, Ingrid Lange, Dakuo Wang, David Piorkowski, Jason Tsay, Q Vera Liao, Casey Dugan, and Thomas Erickson. 2019. How Data Science Workers Work with Data: Discovery, Capture, Curation, Design, Creation. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–15.Google ScholarDigital Library
Gina Neff, Anissa Tanweer, Brittany Fiore-Gartland, and Laura Osburn. 2017. Critique and contribute: A practice-based framework for improving critical data studies and data science. Big data 5, 2 (2017), 85–97.Google Scholar
Naveen Onkarappa and Angel D Sappa. 2015. Synthetic sequences and ground-truth flow field generation for algorithm validation. Multimedia Tools and Applications 74, 9 (2015), 3121–3135.Google ScholarDigital Library
Samir Passi and Steven Jackson. 2017. Data vision: Learning to see through algorithmic abstraction. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. 2436–2447.Google ScholarDigital Library
Samir Passi and Steven J Jackson. 2018. Trust in Data Science: Collaboration, Translation, and Accountability in Corporate Data Science Projects. Proceedings of the ACM on Human-Computer Interaction 2, CSCW(2018), 1–28.Google ScholarDigital Library
Kanu Patel, Jay Vala, and Jaymit Pandya. 2014. Comparison of various classification algorithms on iris datasets using WEKA. Int. J. Adv. Eng. Res. Dev.(IJAERD) 1, 1 (2014).Google Scholar
Sharoda A Paul, Lichan Hong, and Ed H Chi. 2011. What is a question? Crowdsourcing tweet categorization. In Workshop on Crowdsourcing and Human Computation at the Conference on Human Factors in Computing Systems (CHI).Google Scholar
João Felipe Pimentel, Saumen Dey, Timothy McPhillips, Khalid Belhajjame, David Koop, Leonardo Murta, Vanessa Braganholo, and Bertram Ludäscher. 2016. Yin & Yang: demonstrating complementary provenance from noWorkflow & YesWorkflow. In International Provenance and Annotation Workshop. Springer, 161–165.Google ScholarCross Ref
Kathleen H Pine and Max Liboiron. 2015. The politics of measurement and action. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. 3147–3156.Google ScholarDigital Library
Ivens Portugal, Paulo Alencar, and Donald Cowan. 2018. The use of machine learning algorithms in recommender systems: A systematic review. Expert Systems with Applications 97 (2018), 205–227.Google ScholarCross Ref
Alisha Pradhan, Ben Jelen, Katie A Siek, Joel Chan, and Amanda Lazar. 2020. Understanding Older Adults’ Participation in Design Workshops. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–15.Google ScholarDigital Library
Krishna Rajan. 2013. Informatics for materials science and engineering: data-driven discovery for accelerated experimentation and application. Butterworth-Heinemann.Google Scholar
Alexander Ratner, Stephen H Bach, Henry Ehrenberg, Jason Fries, Sen Wu, and Christopher Ré. 2020. Snorkel: Rapid training data creation with weak supervision. The VLDB Journal 29, 2 (2020), 709–730.Google ScholarCross Ref
Tye Rattenbury, Joseph M Hellerstein, Jeffrey Heer, Sean Kandel, and Connor Carreras. 2017. Principles of data wrangling: Practical techniques for data preparation. ” O’Reilly Media, Inc.”.Google Scholar
Johan Redström. 2008. RE: Definitions of use. Design studies 29, 4 (2008), 410–423.Google Scholar
Adrienne Rich. 1995. On lies, secrets, and silence: Selected prose 1966-1978. WW Norton & Company.Google Scholar
Yuji Roh, Geon Heo, and Steven Euijong Whang. 2019. A survey on data collection for machine learning: a big data-ai integration perspective. IEEE Transactions on Knowledge and Data Engineering (2019).Google Scholar
Adam Rule, Aurélien Tabard, and James D Hollan. 2018. Exploration and explanation in computational notebooks. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1–12.Google ScholarDigital Library
Manaswi Saha, Michael Saugstad, Hanuma Teja Maddali, Aileen Zeng, Ryan Holland, Steven Bower, Aditya Dash, Sage Chen, Anthony Li, Kotaro Hara, 2019. Project sidewalk: A web-based crowdsourcing tool for collecting sidewalk accessibility data at scale. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–14.Google ScholarDigital Library
Sheeba Samuel and Birgitta König-Ries. 2018. ProvBook: Provenance-based Semantic Enrichment of Interactive Notebooks for Reproducibility.. In International Semantic Web Conference (P&D/Industry/BlueSky).Google Scholar
Sheeba Samuel and Birgitta König-Ries. 2020. ReproduceMeGit: A Visualization Tool for Analyzing Reproducibility of Jupyter Notebooks. arXiv preprint arXiv:2006.12110(2020).Google Scholar
Mike Schaekermann, Graeme Beaton, Minahz Habib, L. I.M. Andrew, Kate Larson, and L. A.W. Edith. 2019. Understanding expert disagreement in medical data analysis through structured adjudication. , 23 pages. https://doi.org/10.1145/3359178Google ScholarDigital Library
Mike Schaekermann, Carrie J Cai, Abigail E Huang, and Rory Sayres. 2020. Expert Discussions Improve Comprehension of Difficult Cases in Medical Image Assessment. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–13.Google ScholarDigital Library
Kjeld Schmidt. 2002. Remarks on the complexity of cooperative work.Revue d’intelligence artificielle 16, 4-5 (2002), 443–483.Google Scholar
Donald A Schön. 1992. Designing as reflective conversation with the materials of a design situation. Knowledge-based systems 5, 1 (1992), 3–14.Google ScholarDigital Library
Philipp Schorch. 2020. Sensitive Heritage: Ethnographic Museums, Provenance Research, and the Potentialities of Restitutions. Museum and Society 18, 1 (2020), 1–5.Google ScholarCross Ref
Isabella Seeber, Eva Bittner, Robert O Briggs, Triparna de Vreede, Gert-Jan De Vreede, Aaron Elkins, Ronald Maier, Alexander B Merz, Sarah Oeste-Reiß, Nils Randrup, 2020. Machines as teammates: A research agenda on AI in team collaboration. Information & management 57, 2 (2020), 103174.Google Scholar
Cathrine Seidelin, Yvonne Dittrich, and Eric Grönvall. [n.d.]. Co-designing data experiments. ([n. d.]). (in preparation).Google Scholar
Cathrine Seidelin, Yvonne Dittrich, and Erik Grönvall. 2018. Data Work in a Knowledge-Broker Organisation: How Cross-Organisational Data Maintenance Shapes Human Data Interactions. In Proceedings of the 32nd International BCS Human Computer Interaction Conference (Belfast, United Kingdom) (HCI ’18). BCS Learning & Development Ltd., Swindon, GBR, Article 14, 12 pages. https://doi.org/10.14236/ewic/HCI2018.14Google ScholarDigital Library
Burr Settles. 2009. Active learning literature survey. Technical Report. University of Wisconsin-Madison Department of Computer Sciences.Google Scholar
Ayush Singhal, Pradeep Sinha, and Rakesh Pant. 2017. Use of deep learning in modern recommendation system: A summary of recent works. arXiv preprint arXiv:1712.07525(2017).Google Scholar
Susan Leigh Star. 1999. The ethnography of infrastructure. American behavioral scientist 43, 3 (1999), 377–391.Google Scholar
Susan Leigh Star and Karen Ruhleder. 1996. Steps toward an ecology of infrastructure: Design and access for large information spaces. Information systems research 7, 1 (1996), 111–134.Google Scholar
Susan Leigh Star and Anselm Strauss. 1999. Layers of silence, arenas of voice: The ecology of visible and invisible work. Computer supported cooperative work (CSCW) 8, 1-2 (1999), 9–30.Google Scholar
Stephanie B Steinhardt and Steven J Jackson. 2015. Anticipation work: Cultivating vision in collective practice. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. 443–453.Google ScholarDigital Library
P.N. Stern. 2007. Properties for growing grounded theory. In The Sage handbook of grounded theory, A. Bryant and K. Charmaz (Eds.). Sage, Thousand Oaks, CA, USA.Google Scholar
Miriam Sturdee, John Hardy, Nick Dunn, and Jason Alexander. 2015. A public ideation of shape-changing applications. In Proceedings of the 2015 International Conference on Interactive Tabletops & Surfaces. 219–228.Google ScholarDigital Library
Lucy Suchman. 2002. Located accountabilities in technology production. Scandinavian journal of information systems 14, 2 (2002), 7.Google ScholarDigital Library
Charles Sutton, Timothy Hobson, James Geddes, and Rich Caruana. 2018. Data diff: Interpretable, executable summaries of changes in distributions for data wrangling. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2279–2288.Google ScholarDigital Library
Madhusmita Swain, Sanjit Kumar Dash, Sweta Dash, and Ayeskanta Mohapatra. 2012. An approach for iris plant classification using neural network. International Journal on Soft Computing 3, 1 (2012), 79.Google ScholarCross Ref
Anissa Tanweer. 2018. Data science of the social: How the practice is responding to ethical crisis and spreading across sectors. Ph.D. Dissertation.Google Scholar
Natalia Tognoli and José Augusto Chaves Guimarães. 2020. Provenance as a Knowledge Organization Principle. KO KNOWLEDGE ORGANIZATION 46, 7 (2020), 558–568.Google ScholarCross Ref
Wil MP Van der Aalst. 2014. Data scientist: The engineer of the future. In Enterprise interoperability VI. Springer, 13–26.Google Scholar
Jesper E Van Engelen and Holger H Hoos. 2020. A survey on semi-supervised learning. Machine Learning 109, 2 (2020), 373–440.Google ScholarCross Ref
Luis Von Ahn. 2008. Human computation. In 2008 IEEE 24th international conference on data engineering. IEEE, 1–2.Google ScholarDigital Library
Dakuo Wang, Justin D. Weisz, Michael Muller, Parikshit Ram, Werner Geyer, Casey Dugan, Yla Tausczik, Horst Samulowitz, and Alexander Gray. 2019. Human-AI Collaboration in Data Science. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (Nov 2019), 1–24. https://doi.org/10.1145/3359313Google ScholarDigital Library
Dakuo Wang, Justin D Weisz, Michael Muller, Parikshit Ram, Werner Geyer, Casey Dugan, Yla Tausczik, Horst Samulowitz, and Alexander Gray. 2019. Human-AI Collaboration in Data Science: Exploring Data Scientists’ Perceptions of Automated AI. Proceedings of the ACM on Human-Computer Interaction 3, CSCW(2019), 1–24.Google ScholarDigital Library
Daniel Karl I Weidele, Justin D Weisz, Erick Oduor, Michael Muller, Josh Andres, Alexander Gray, and Dakuo Wang. 2020. AutoAIViz: opening the blackbox of automated artificial intelligence with conditional parallel coordinates. In Proceedings of the 25th International Conference on Intelligent User Interfaces. 308–312.Google ScholarDigital Library
Jacob Whitehill, Paul Ruvolo, Tingfan Wu, Jacob Bergsma, and Javier Movellan. 2009. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In Advances in Neural Information Processing Systems 22 - Proceedings of the 2009 Conference. 2035–2043.Google Scholar
Andrea Wiggins, Greg Newman, Robert D Stevenson, and Kevin Crowston. 2011. Mechanisms for data quality and validation in citizen science. In 2011 IEEE Seventh International Conference on e-Science Workshops. IEEE, 14–19.Google ScholarDigital Library
Peter Woitek, Paul Bräuer, and Holger Grossmann. 2010. A Novel Tool for Capturing Conceptualized Audio Annotations(AM ’10). Association for Computing Machinery, New York, NY, USA, Article 15, 8 pages. https://doi.org/10.1145/1859799.1859814Google ScholarDigital Library
Christine T. Wolf. 2019. Conceptualizing Care in the Everyday Work Practices of Machine Learning Developers. In Companion Publication of the 2019 on Designing Interactive Systems Conference 2019 Companion (San Diego, CA, USA) (DIS ’19 Companion). Association for Computing Machinery, New York, NY, USA, 331–335. https://doi.org/10.1145/3301019.3323879Google ScholarDigital Library
Christine T Wolf. 2020. AI Models and Their Worlds: Investigating Data-Driven, AI/ML Ecosystems Through a Work Practices Lens. In International Conference on Information. Springer, 651–664.Google ScholarCross Ref
Matthew Yapchain. 2018. Human-Centered Data Science: A New Paradigm for Industrial IoT. In Ethnographic Praxis in Industry Conference Proceedings, Vol. 2018. Wiley Online Library, 53–61.Google Scholar
Amy X Zhang, Michael Muller, and Dakuo Wang. 2020. How do Data Science Workers Collaborate? Roles, Workflows, and Tools. arXiv preprint arXiv:2001.06684(2020).Google Scholar
Amy X. Zhang, Michael Muller, and Dakuo Wang. 2020. How do Data Science Workers Collaborate? Roles, Workflows, and Tools. In Proc. ACM Hum.-Comput. Interact.Article 22. Issue CSCW1.Google ScholarDigital Library
Lei Zhang, Yan Tong, and Qiang Ji. 2008. Active image labeling and its application to facial action labeling. In European Conference on Computer Vision. Springer, 706–719.Google ScholarDigital Library
Xiaojin Zhu and Zoubin Ghahramani. 2002. Learning from labeled and unlabeled data with label propagation. (2002).Google Scholar
Laszlo Zsolnai. 1998. Rational choice and the diversity of choices. The Journal of Socio-Economics 27, 5 (1998), 613–622.Google ScholarCross Ref

Index Terms

Designing Ground Truth and the Social Life of Labels
1. Human-centered computing
  1. Collaborative and social computing

Index terms have been assigned to the content through auto-classification.

Recommendations

Using objective ground-truth labels created by multiple annotators for improved video classification: a comparative study
Read More
Using objective ground-truth labels created by multiple annotators for improved video classification: A comparative study

We address the problem of predicting category labels for unlabeled videos in a large video dataset by using a ground-truth set of objectively labeled videos that we have created. Large video databases like YouTube require that a user uploading a new ...
Read More
Weakly-supervised object detection via mining pseudo ground truth bounding-boxes
Highlights
- A novel W2F framework for weakly-supervised object detection is proposed.
- The ...
Abstract
Recently, weakly-supervised object detection has attracted much attention, since it does not require expensive bounding-box annotations while training the network. Although significant progress has also been made, there is still a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems
May 2021
10862 pages
ISBN:9781450380966
DOI:10.1145/3411764
General Chairs:
Yoshifumi Kitamura
Tohoku University, Japan
,
Aaron Quigley
University of New South Wales, Australia
,
Program Chairs:
Katherine Isbister
University of California Santa Cruz, USA
,
Takeo Igarashi
The University of Tokyo, Japan
,
Publications Chairs:
Pernille Bjørn
University of Copenhagen, Denmark
,
Steven Drucker
Microsoft Research, USA
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 May 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Critical computing
Human centered data science
human centered machine learning
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate6,199of26,314submissions,24%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 35
  Total Citations
  View Citations
- 1,182
  Total Downloads
- Downloads (Last 12 months)307
- Downloads (Last 6 weeks)36
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Designing Ground Truth and the Social Life of Labels

CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Using objective ground-truth labels created by multiple annotators for improved video classification: a comparative study

Using objective ground-truth labels created by multiple annotators for improved video classification: A comparative study

Weakly-supervised object detection via mining pseudo ground truth bounding-boxes