Summarizing Books as Podcasts

By Mike Loukides
December 16, 2024
Objects detected with OpenCV's Deep Neural Network module (DNN). Objects detected with OpenCV's Deep Neural Network module (DNN). (source: MTheiler on Wikimedia Commons)

Like just about everyone, we were impressed by the ability of NotebookLM to generate podcasts: Two virtual people holding a discussion. You can give it some links, and it will generate a podcast based on the links. The podcasts were interesting and engaging. But they also had some limitations.

The problem with NotebookLM is that, while you can give it a prompt, it largely does what it’s going to do. It generates a podcast with two voices—one male, one female—and gives you little control over the result. There’s an optional prompt to customize the conversation, but that single prompt doesn’t allow you to do much. Specifically, you can’t tell it which topics to discuss or in what order to discuss them. You can try, but it won’t listen. It also isn’t conversational, which is something of a surprise now that we’ve all gotten used to chatting with AIs. You can’t tell it to iterate by saying “That was good, but please generate a new version changing these details” like you can with ChatGPT or Gemini.

Learn faster. Dig deeper. See farther.

Join the O'Reilly online learning platform. Get a free trial today and find answers on the fly, or master something new and useful.

Learn more

Can we do better? Can we integrate our knowledge of books and technology with AI’s ability to summarize? We’ve argued (and will continue to argue) that simply learning how to use AI isn’t enough; you need to learn how to do something with AI that’s better than what the AI could do on its own. You need to integrate artificial intelligence with human intelligence. To see what that would look like in practice, we built our own toolchain that gives us much more control over the results. It’s a multistage pipeline:

  • We use AI to generate a summary for each chapter of a book, making sure that all the important topics are covered.
  • We use AI to assemble the chapter summaries into a single summary. This step essentially gives us an extended outline.
  • We use AI to generate a two-person dialogue that becomes the podcast script.
  • We edit the script by hand, again making sure that the summaries cover the right topics in the right order. This is also an opportunity to correct errors and hallucinations.
  • We use Google’s speech-to-text multispeaker API (still in preview) to generate a summary podcast with two participants.

Why are we focusing on summaries? Summaries interest us for several reasons. First, let’s face it: Having two nonexistent people discuss something you wrote is fascinating—especially since they sound genuinely interested and excited. Hearing the voices of nonexistent cyberpeople discuss your work makes you feel like you’re living in a sci-fi fantasy. More practically: Generative AI is unquestionably good at summarization. There are few errors and almost no outright hallucinations. Finally, our users want summarization. On O’Reilly Answers, our customers frequently ask for summaries: summarize this book, summarize this chapter. They want to find the information they need. They want to find out whether they really need to read the book—and if so, what parts. A summary helps them do that while saving time. It lets them discover quickly whether the book will be helpful, and does so better than the back cover copy or a blurb on Amazon.

With that in mind, we had to think through what the most useful summary would be for our members. Should there be a single speaker or two? When a single synthesized voice summarized the book, my eyes (ears?) glazed over quickly. It was much easier to listen to a podcast-style summary where the virtual participants were excited and enthusiastic, like the ones on NotebookLM, than to a lecture. The give and take of a discussion, even if simulated, gave the podcasts energy that a single speaker didn’t have.

How long should the summary be? That’s an important question. At some point, the listener loses interest. We could feed a book’s entire text into a speech synthesis model and get an audio version—we may yet do that; it’s a product some people want. But on the whole, we expect summaries to be minutes long rather than hours. I might listen for 10 minutes, maybe 30 if it’s a topic or a speaker that I find fascinating. But I’m notably impatient when I listen to podcasts, and I don’t have a commute or other downtime for listening. Your preferences and your situation may be much different.

What exactly do listeners expect from these podcasts? Do users expect to learn, or do they only want to find out whether the book has what they’re looking for? That depends on the topic. I can’t see someone learning Go from a summary—maybe more to the point, I don’t see someone who’s fluent in Go learning how to program with AI. Summaries are useful for presenting the key ideas presented in the book: For example, the summaries of Cloud Native Go gave a good overview of how Go could be used to address the issues faced by people writing software that runs in the cloud. But really learning this material requires looking at examples, writing code, and practicing—something that’s out of bounds in a medium that’s limited to audio. I’ve heard AIs read out source code listings in Python; it’s awful and useless. Learning is more likely with a book like Facilitating Software Architecture, which is more about concepts and ideas than code. Someone could come away from the discussion with some useful ideas and possibly put them into practice. But again, the podcast summary is only an overview. To get all the value and detail, you need the book. In a recent article, Ethan Mollick writes, “Asking for a summary is not the same as reading for yourself. Asking AI to solve a problem for you is not an effective way to learn, even if it feels like it should be. To learn something new, you are going to have to do the reading and thinking yourself.”

Another difference between the NotebookLM podcasts and ours may be more important. The podcasts we generated from our toolchain are all about six minutes long. The podcasts generated by NotebookLM are in the 10- to 25-minute range. The longer length could allow the NotebookLM podcasts to be more detailed, but in reality that’s not what happens. Rather than discussing the book itself, NotebookLM tends to use the book as a jumping off point for a broader discussion. The O’Reilly-generated podcasts are more directed. They follow the book’s structure because we provided a plan, an outline, for the AI to follow. The virtual podcasters still express enthusiasm, still bring in ideas from other sources, but they’re headed in a direction. The longer NotebookLM podcasts, in contrast, can seem aimless, looping back around to pick up ideas they’ve already covered. To me, at least, that feels like an important point. Granted, using the book as the jumping-off point for a broader discussion is also useful, and there’s a balance that needs to be maintained. You don’t want it to feel like you’re listening to the table of contents. But you also don’t want it to feel unfocused. And if you want a discussion of a book, you should get a discussion of the book.

None of these AI-generated podcasts are without limitations. An AI-generated summary isn’t good at detecting and reflecting on nuances in the original writing. With NotebookLM, that clearly wasn’t under our control. With our own toolchain, we could certainly edit the script to reflect whatever we wanted, but the voices themselves weren’t under our control and wouldn’t necessarily follow the text’s lead. (It’s arguable that reflecting the nuances of a 250-page book in a six-minute podcast is a losing proposition.) Bias—a kind of implied nuance—is a bigger issue. Our first experiments with NotebookLM tended to have the female voice asking the questions, with the male voice providing the answers, though that seemed to improve over time. Our toolchain gave us control, because we provided the script. We won’t claim that we were unbiased—nobody should make claims like that—but at least we controlled how our virtual people presented themselves.

Our experiments are finished; it’s time to show you what we created. We’ve taken five books, generated short podcasts summarizing each with both NotebookLM and our toolchain, and posted both sets on oreilly.com. We’ll be adding more books in 2025. Listen to them—see what works for you. And please let us know what you think!

Post topics: AI & ML, Artificial Intelligence
Post tags: Research
Share:

Get the O’Reilly Radar Trends to Watch newsletter