Decoding Inequality 2025
  • Kursbeschreibung
  • Syllabus
  • Interessante Links
  • Studentische Beiträge
  • Über uns
  1. Syllabus
  2. Session 8
  • Kursbeschreibung
  • Syllabus
    • Session 1
    • Session 2
    • Session 3
    • Session 4
    • Session 5
    • Session 6
    • Session 7
    • Session 8
    • Session 9
    • Session 10
  • Interessante Links
  • Studentische Beiträge
    • Saubere und gerechte KI?
    • ChatGPT im Klassenzimmer
    • The Humans in the Loop: Labor Exploitation and AI Training
  • Über uns

On this page

  • TLDR
    • Recap
    • Leseauftrag “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?”
      • 1. What Are Language Models Doing?
      • 2. Environmental and Financial Costs
      • 3. Bias and Harm in Training Data
      • 4. Illusions of Understanding
      • 5. Accountability and Documentation
      • 6. Displacement of Research Goals
      • 7. Recommendations for Responsible Development
    • Applikationen diskutieren
  • Edit this page
  • Report an issue
  1. Syllabus
  2. Session 8

Session 8

Application (ChatGPT)

Author
Affiliations

Moritz Mähr

University of Bern

University of Basel

Published

April 11, 2025

Modified

May 31, 2025

TLDR

  • Recap
  • Leseauftrag gemeinsam anschauen
  • Applikationen diskutieren

Recap

mündlich

Leseauftrag “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?”

This influential 2021 paper by Emily Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell critically examines the rapid development of increasingly large language models (LMs) such as GPT-3 and Google’s Switch-C. It questions the assumption that “bigger is always better” in natural language processing (NLP) and highlights key risks that should concern technologists and humanists alike.

1. What Are Language Models Doing?

Large LMs are trained to predict and generate text based on statistical patterns in massive datasets. While they can produce impressively fluent text, the authors argue that this does not mean they understand language. Instead, they are “stochastic parrots”: systems that generate plausible-sounding output without actual comprehension or intent.

2. Environmental and Financial Costs

Training large LMs consumes enormous amounts of energy, contributing significantly to CO₂ emissions. This is a justice issue: the environmental impact disproportionately affects marginalized communities, while the benefits of these technologies accrue mostly to wealthy, English-speaking users and corporations.

3. Bias and Harm in Training Data

Most LMs are trained on huge, uncurated datasets scraped from the internet. These datasets overrepresent hegemonic, often discriminatory viewpoints and exclude marginalized voices. The models inherit and reproduce these biases, including racism, sexism, ableism, and more—creating risks of psychological harm and systemic discrimination.

4. Illusions of Understanding

Because these systems can produce text that appears coherent, people may falsely assume the content is meaningful, factual, or generated by a human. This creates dangers of automation bias, misinformation, and manipulation (e.g. through fake news, extremist content, or abusive language).

5. Accountability and Documentation

A core critique is that datasets are often undocumented or under-documented, making it impossible to understand or audit how LMs behave. The authors argue for a “documentation budget” and recommend curating smaller, well-understood datasets over massive opaque ones.

6. Displacement of Research Goals

Focusing on ever-larger LMs draws attention and resources away from alternative, potentially more equitable paths in language technology, such as: - Smaller, task-specific models - Multilingual or low-resource language research - Approaches centered on human linguistic meaning, not just surface-level form

7. Recommendations for Responsible Development

The authors propose shifting toward: - Pre-mortem analysis: anticipate harms before development begins - Value-sensitive design: include affected stakeholders in the design process - Environmental benchmarking: consider carbon and energy efficiency as research metrics - Research redirection: focus less on leaderboard metrics and more on social impact and inclusivity

Why It Matters for Digital Humanities: This paper bridges critical perspectives from linguistics, ethics, and STS (science and technology studies). For DH scholars, it encourages skepticism toward “black-box” AI systems and stresses the importance of: - Interrogating data sources - Understanding power and representation in digital systems - Advocating for inclusive, sustainable, and human-centered computational research

Applikationen diskutieren

mündlich

Back to top
Session 7
Session 9
  • Edit this page
  • Report an issue