Turing Post
Posts
FOD#69: Why NotebookLM is blowing everyone’s minds – after a year since launch

FOD#69: Why NotebookLM is blowing everyone’s minds – after a year since launch

we discuss the evolution of NotebookLM, from a mere AI assistant to a breakthrough; the tech behind it and how to us it + a carefully curated list of the best news and papers

Ksenia Se
September 30, 2024

This Week in Turing Post:

Tuesday, Guest Post: Your infrastructure shouldn’t live in a “black box”
Wednesday, AI 101: What is DoRA, QLoRA, QDoRA?
Friday, Agentic Workflows series: The History of Agents

If you like Turing Post, consider becoming a paid subscriber, checking out our partner’s amazing free book about Mastering RAG, or sharing it with a friend. It helps us keep Monday digests free →

The main topic

This last few days everyone cannot stop being amused by a not-so-novel AI-assistant, NotebookLM. It’s been around since July 2023, but chances are you haven’t heard much about it until recently. Since it’s intriguing from both technological and user experience angles, let’s explore what NotebookLM is about, where it comes from, and why it’s gaining traction.

Tailwind becomes NotebookLM

Initially developed and called Tailwind in Google Labs, the project was renamed NotebookLM, as it seemed more reflective of the goal to help users manage large volumes of information by organizing, summarizing, and generating insights from user-uploaded documents. You can feed it Google Docs, PDFs, and recently YouTube links and audio files, and it’ll provide grounded responses, complete with citations and relevant quotes. While this isn’t completely groundbreaking in the AI world, its seamless execution has caught the attention of many who deal with information overload.

To try it out, I uploaded about 50 files from my book project on Citizen Diplomacy. These included audio interviews in two languages, articles in PDFs, annual reports in docs, and links to Google Docs with drafts. I’m currently working on the seventh chapter, and since the narrative spans over 40 years, it’s crucial to have a concise overview of how the ideas connect and flow. Within seconds, NotebookLM generated a perfect brief, even helping me rediscover a point I wanted to include in this chapter but had forgotten. There's still plenty to explore, but that was already quite impressive.

Okay, that’s convenient but not mind-blowing.

What is mind-blowing about NotebookLM?

It’s actually amazing – a feature that’s turning heads lately is its ability to generate AI-driven podcasts called Deep Dives. It’s not just reading out the text. NotebookLM creates a conversation between two AI hosts discussing the material. They discuss the material, they banter, they laugh, and they make sense. This feature offers a fresh, passive way to consume information, which is a welcomed alternative to reading dense material.

Examples

Thomas Wolf suggested a self-care hack: download your LinkedIn profile and let the AI hosts dive deep into how amazing you are. Andrej Karpathy turned a C code that trains GPT-2 into a podcast, and while he noted he might have framed and emphasized some things differently, he found the podcast entertaining and surprisingly coherent. I uploaded Alan Turing’s article "Computing Machinery and Intelligence," and you can listen to the result. It’s super interesting and makes information easier to digest. However, it does make it sound as if Ada Lovelace and Alan Turing were from the same time, so as always, fact-checking is essential with GenAI.

Tech behind NotebookLM

The tool is powered by Google’s long-context Gemini 1.5 Pro, a Transformer model utilizing a sparse Mixture-of-Experts (we explain MoE here) architecture, which ensures efficiency by activating only relevant parts of the model. This allows NotebookLM to process up to 1,500 pages of information at once, making it suitable for those tackling large datasets or complex topics. It digest an enormous amount of information and so far doesn’t seem to be lost in it.

NotebookLM uses:

Retrieval-Augmented Generation (RAG) to process content from multiple sources.
Text-to-Speech (TTS): Generates the voices for the AI podcast hosts, creating a convincing conversational experience.
SoundStorm to generate realistic audio conversations. It converts scripts into natural dialogue with high-quality, engaging audio output.
Disfluency Injection to add human-like pauses, filler words, and natural speech patterns, making the dialogue sound more realistic.
Prompt Engineering to structure AI interactions and ensure the hosts maintain a natural, conversational tone.

Compelling UIUX exploration and evolving ways it’s being used

As Karpathy puts it “That's what I think is ultimately so compelling about the 2-person podcast format as a UIUX exploration. It lifts two major "barriers to enjoyment" of LLMs. 1 Chat is hard. You don't know what to say or ask. In the 2-person podcast format, the question asking is also delegated to an AI so you get a lot more chill experience instead of being a synchronous constraint in the generating process. 2 Reading is hard and it's much easier to just lean back and listen.”

How can use it?

It offers useful features for all audiences, both tech and non-tech, and can be immediately useful for students, researchers, and writers. It balances practicality with experimentation, offering a novel way to interact with personal data.

It’s possible that NotebookLM podcast episode generation is touching on a whole new territory of highly compelling LLM product formats. Feels reminiscent of ChatGPT. Maybe I’m overreacting.
— Andrej Karpathy (@karpathy)
9:11 PM • Sep 28, 2024

Maybe we are all overreacting, and it’s certainly not perfect, as none of the AI tools are. But if we’re being practical, tools like ChatGPT and now NotebookLM are like a lift to a different dimension of productivity. It’s like having an inflated external brain that doesn’t necessarily think but certainly processes.

💎 We recommend - a free ebook about Mastering RAG

Galileo just released a new free eBook: Mastering RAG - A Developer's Guide to Enterprise-Grade RAG Systems

Download Mastering RAG now to access 200 pages of technical content covering:

Chunking strategies
Embedding and reranking model selection
Vector database comparisons
RAG architecture best practices
Testing and evaluation methods

Twitter library

10+ Sources to Master Prompt Engineering

When dealing with an AI model, we all want it to do what we want it to do. That might be tricky sometimes, and that’s why prompting and prompt engineering is on the rise.

www.turingpost.com/p/10-sources-to-master-prompt-engineering

Weekly recommendation from AI practitioner👍🏼:

Crawl4AI – an open-source web crawler and scraper. Think of it as the go-to engine for automating your web scraping while scaling up those AI-driven projects with minimal setup.

News from The Usual Suspects ©

News from The Usual Suspects ©

California’s AI Bill Hits a Wall
- Governor Gavin Newsom vetoed California’s landmark AI safety bill SB 1047, citing concerns over stifling innovation and prompting AI firms to relocate. The bill, aimed at regulating powerful AI models with mandatory safety tests and "kill switch" mechanisms, faced strong opposition from tech giants like OpenAI and Google. Supporters argue it’s essential to prevent unchecked AI risks.
OpenAI’s Revolving Door: Who’s Next?
- OpenAI faces another leadership shake-up as Chief Research Officer Bob McGrew and VP Barret Zoph exit, following CTO Mira Murati’s abrupt departure. CEO Sam Altman downplays the resignations and says he will be the one now focusing more on technical matters (me! me! me!) But with whispers of a $150B valuation and a potential 7% stake for Altman himself, the real question is whether the company’s shift toward a for-profit model is driving the talent exodus.
- Also, according to NYTimes, OpenAI is projecting a $5 billion loss for 2024 despite 1,700% revenue growth since the beginning of 2023. The company is targeting $11.6 billion in revenue next year and is raising $7 billion in a funding round that could value it at $150 billion. Rising computational costs, and ops expenses are contributing to its financial challenges. Thrive Capital leads this round, Microsoft is involved. Apple just exited talks to unvest.
- Meanwhile, a new, mysterious image generation model named "Blueberry" has surfaced on the leaderboards, beating FLUX.1. We are not into speculation usually but sounds like OpenAI to us.
🥳 Hugging Face Hits 1 Million Models!
- Hugging Face now hosts over 1 million public models, from big names like LLaMA to countless specialized, custom AI models. With a new repository created every 10 seconds, the platform is proving that tailored AI is the future.
Meta’s Orion Glasses: The Future is Holographic
- At Meta Connect 2024, Project Orion stole the show last week. These futuristic AR glasses feature holographic displays and a neural interface that responds to wrist gestures, bringing sci-fi tech into reality. While Orion is still in development, the potential for blending the digital and physical worlds promises to push augmented reality to new heights. The best demo experience described in Stratechery.
Nvidia Gobbles Up OctoAI: Acquisition Fever Continues
- It is the fifth startup Nvidia has acquired in 2024. Before OctoAI, it brought under its roof Run:ai, Deci AI, Shoreline, and Brev.dev. As Nvidia tightens its grip on the AI infrastructure market, concerns about regulatory scrutiny and competition intensify.
Microsoft's AI Trust Plan: Locking It Down
- Microsoft unveils its latest push for "Trustworthy AI," emphasizing robust security, safety, and privacy. New capabilities like confidential inferencing and safety measures for content keep AI outputs clean and compliant. A key player in AI, Microsoft’s all-in on responsible AI, ensuring users are protected while unlocking the full potential of AI-driven innovations.
AI adoption: Insights
- The NBER Working Paper on generative AI adoption reveals rapid growth in the U.S., with 39.4% of adults aged 18-64 using the technology by August 2024. AI adoption is particularly high among younger, educated, and higher-income individuals, with men using it more than women. Usage is widespread across occupations, especially in management and tech roles, though notable adoption exists even among blue-collar workers. Generative AI primarily assists with writing, administrative tasks, and data interpretation. An estimated 0.5-3.5% of work hours are now supported by AI, suggesting its growing influence on productivity and economic impact.

The freshest research papers, categorized for your convenience

Our TOP

"Imagine yourself" is a new tuning-free model by @AIatMeta. It tackles image generation issues like lack of diversity and copying of reference, using:
- Synthetic paired data
- Fully parallel attention architecture
- Multi-stage finetuning
Let's see how good this approach works
— Ksenia Se (@Kseniase_)
9:46 PM • Sep 28, 2024

Making Text Embedders Few-Shot Learners
This paper is important because it introduces a novel approach to text embeddings by leveraging in-context learning (ICL) capabilities of LLMs. By integrating few-shot examples, the method significantly improves the performance of embeddings across multiple tasks, enhancing generalization and task relevance. This approach achieves state-of-the-art results on widely-used benchmarks (MTEB and AIR-Bench) without requiring complex model changes, making it a practical and efficient solution for advancing natural language processing tasks →read the paper

Applications in Specialized Domains

Prithvi WxC: Foundation Model for Weather and Climate addresses weather forecasting and climate modeling, outperforming traditional methods in tasks like hurricane tracking and extreme event prediction. Read the paper
TIME-MOE: Billion-Scale Time Series Foundation Models with Mixture of Experts scales time series forecasting with a mixture of experts architecture, optimizing computational efficiency and improving forecasting accuracy. Read the paper
Hackphyr: A Local Fine-Tuned LLM Agent for Network Security Environments applies large language models to network security tasks, offering adaptable performance for red teaming in complex scenarios. Read the paper
An adapted large language model facilitates multiple medical tasks in diabetes care specializes in diabetes-related medical tasks, achieving superior performance in clinical evaluations and personalized healthcare. Read the paper
Boosting Healthcare LLMs Through Retrieved Context improves factual accuracy in healthcare-specific models by integrating context retrieval systems, bridging the gap between open and proprietary models. Read the paper
Zero-shot Cross-lingual Voice Transfer for TTS enables voice transfer across languages using only one speech sample, significantly improving voice similarity and application to dysarthric speech. Read the paper

Multimodal and Vision Models

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models enhances vision-language models using human-annotated datasets, achieving state-of-the-art performance on multimodal benchmarks. Read the paper
MONOFORMER: One Transformer for Both Diffusion and Autoregression simplifies architecture for text and image generation by using one transformer for both autoregressive and diffusion-based tasks, showing competitive performance across benchmarks. Read the paper
Phantom of Latent for Large Language and Vision Models enhances vision-language learning by expanding latent dimensions temporarily, boosting performance in resource-constrained environments. Read the paper
EMOVA: Empowering Language Models to See, Hear, and Speak with Vivid Emotions integrates speech, vision, and text capabilities to improve emotional understanding in spoken dialogues and multimodal tasks. Read the paper
Reflecting Reality: Enabling Diffusion Models to Produce Faithful Mirror Reflections improves mirror reflection generation in diffusion models, ensuring accurate geometric reflections for image editing and AR. Read the paper

Optimized Efficiency

MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models reduces inference costs by applying a learnable pruning method, improving both efficiency and performance in large language models. Read the paper
Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction optimizes long-context language models by reducing input tokens, achieving faster processing and lower memory usage without sacrificing performance. Read the paper
Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models enhances LLMs' reasoning by integrating logical prompts, boosting performance across various reasoning tasks. Read the paper

Robust Reward and Reinforcement Learning

REWARD-ROBUST RLHF IN LLMS improves reinforcement learning from human feedback by introducing robust reward models that account for uncertainty, increasing learning stability. Read the paper
RRM: Robust Reward Model Training Mitigates Reward Hacking enhances reward model training to prevent reward hacking, improving preference alignment in large language models. Read the paper

Tools and Frameworks for Non-AI Experts

Minstrel: Structural Prompt Generation with Multi-Agents Coordination for Non-AI Experts automates prompt generation using multi-agent collaboration, improving ease of use for non-AI experts in generating high-quality prompts for LLMs. Read the paper
NoTeeline: Supporting Real-Time Notetaking from Keypoints with Large Language Models aids users in real-time notetaking by expanding micronotes into full-length notes, improving writing efficiency and quality. Read the paper

Leave a review!

Please send this newsletter to your colleagues if it can help them enhance their understanding of AI and stay ahead of the curve. You will get a 1-month subscription!

Reply

or to participate.