RES.TLL-008 | Spring 2023 | Non-Credit

Social and Ethical Responsibilities of Computing (SERC)

AI and Algorithms

6.864 Quantitative Methods for Natural Language Processing

> Related Topics: AI and AlgorithmsEthical Computing and Practice

Authors: Jacob Andreas, Catherine D’Ignazio, Harini Suresh

Keywords: data annotation; natural language processing; machine learning; content moderation

Topics addressed:

  • Critically question how and by whom the data was created
  • Determine what its limitations might be
  • Discuss what the data should and should not be used for

Resources:

Assignment “Dataset Creation” Description (PDF) (DOCX)

Part 0: Short-answer reflections to a few hypothetical scenarios.

Part 1: This part is done in groups. You will have the role of a researcher in charge of creating a dataset. You’ll receive a hypothetical task, make decisions about what labels you want to collect, and write instructions to a group of annotators.

Part 2: This part is done individually. You’ll now be an annotator. First, take the instructions you wrote and annotate a new set of examples according to them. Then, you will receive instructions from a different group, for a different task, and will be asked to annotate a set of examples by following their instructions.

Part 3: Pick one of the listed readings to read & respond to. See assignment description for titles.

Additional Reading:

Paullada, Amandalynne, Inioluwa Deborah Raji, et al. “Data and Its (Dis) Contents: A Survey of Dataset Development and Use in Machine Learning Sesearch.” arXiv preprint arXiv:2012.05345 (2020).

D’Ignazio, Catherine and Lauren Klein. “What Gets Counted Counts.” Chapter 4 in Data Feminism. March 16, 2020. 

Gebru, Timnit, Jamie Morgenstern, et al. “Datasheets for datasets (PDF - 2.1MB).” arXiv preprint arXiv:1803.09010 (2018).

Bhuiyan, M. Momen, Amy X. Zhang, Connie Moon Sehat, and Tanushree Mitra. “Investigating Differences in Crowdsourced News Credibility Assessment: Raters, Tasks, and Expert Criteria (PDF).” Proceedings of the ACM on Human-Computer Interaction 4, no. CSCW2 (2020): 1-26.

Metz, Cade. “A.I. is Learning From Humans. Many Humans.” The New York Times. Aug. 16, 2019.

Kaye, Kate. “These Companies Claim to Provide ‘Fair-Trade’ Data Work. Do They?Technology Review. Aug. 7, 2019.

Course Info

As Taught In
Spring 2023
Level
Learning Resource Types
Lecture Notes
Instructor Insights
Multiple Assignment Types