ALG Blog Post 2: How GenAI Works And How It Fails

Published on:

Exploring how generative AI models are trained, how they produce outputs, and the ethical questions that come with their use.

Case Study: “How Generative AI Works and How It Fails” by Arvind Narayanan and Sayash Kapoor

Purpose Of The Case Study

The case study explains how generative AI models like ChatGPT work at a high level. It covers training data, how models predict the next word, their creative abilities, and the broader impacts (like environmental costs). The goal is to give readers a foundation for asking ethical questions about the tradeoffs of these systems.

Learning

I tried using a chatbot to learn a topic I’ve been meaning to get better at: SQL queries. What worked well was how quickly it broke things down. It felt like tutoring on demand. What didn’t work so well was depth. If I didn’t already know enough to spot mistakes, I could have been misled. For instance, it suggested a syntax that didn’t actually run in the database I use. Always, to remember: No copy-paste, Need to double-check. Would I keep using it? Yes, but as a first pass or sparring partner, not as my only teacher. I think the real value is speed: it gets me unstuck, but the responsibility to practice and verify is still on me.

The Use Of Creative Work For Training

The ethics here are complicated. Generative AI is trained on the work of artists, journalists, writers, photographers, people who never consented or got paid. It feels a bit like someone borrowing your homework, remixing it, and then selling it back to you.

There are some ways the system could change. Licensing agreements could give publishers and creators a cut when their work is used. Copyright law could also be updated to address training data directly, right now, it’s not clear if “scraping” counts as fair use. Another idea is watermarking or tagging creative work so it’s harder to use without permission. I think it comes down to fairness: creators deserve recognition and compensation. Otherwise, the tech benefits big companies while hollowing out the creative industries it depends on.

Next-Word Prediction

At first, it seems surprising that a model trained just to predict the “next word” can do things like summarize essays, write code, or explain a math problem. But the reason is that predicting words is actually predicting patterns of thought. Over billions of examples, the model picks up structure: grammar, logic, even problem-solving strategies. So “next-word prediction” isn’t really simple, it’s the foundation for complex reasoning. The model doesn’t “understand” like a human, but it’s good at imitating the patterns that make understanding look real.

Environmental Impact

Generative AI isn’t weightless, it eats up massive resources: data center energy, water for cooling, rare earth mining for chips. The carbon footprint of training a single large model can rival flying a plane across the globe. The big question is: will AI get greener or dirtier as it grows? Cleaner energy and better chips could cut costs, but unchecked scaling may push the footprint higher. Some argue AI might even save energy by optimizing supply chains or predicting disasters, but efficiency often leads to more use, not less. Unlike streaming video, this tech runs at a heavier, global scale. If AI is going to keep growing, transparency and renewable infrastructure need to grow with it.

New Question

How should compensation for creators be handled if their work is used in AI training datasets?

I chose this question because the case study kept pointing back to the idea of inputs and ownership. To me, it’s not enough to say “everyone’s work is part of the internet, so it’s free.” That ignores the fact that creative work takes time, effort, and skill. My question invites discussion about whether we need new licensing systems, collective bargaining for creators, or technical solutions like data tracking.

Reflection

This case study reminded me that generative AI isn’t just impressive outputs, it’s also trade-offs. Next-word prediction works because models learn patterns at scale, but those same patterns can overlook nuance and originality. Training data represents real people’s work, and running models requires real energy and resources. It made me realize that behind every “smart” system are choices about fairness, cost, and responsibility. Going forward, I want to stay excited about what AI can do while also asking who benefits, who pays, and what gets left out.