- Turing Post
- Posts
- FOD#96: State of AI 2025 – it's got real for business
FOD#96: State of AI 2025 – it's got real for business
we discuss Stanford's monumental AI report, Google's amazing announcements, OpenAI suggestions to Europe and more more more
This Week in Turing Post:
Wednesday, AI 101, Concept: What Causal AI is and why you need to know about it
Friday, Agentic Workflow: we will discuss Multi-Agent Collaboration – how multiple AI agents can communicate, coordinate, and delegate tasks in complex environments
Last week was packed – make sure to check every section of today’s newsletter. The first part is about trends, the second more tech.
This is a free edition. Upgrade if you want to receive our deep dives directly in your inbox. If you want to support us without getting a subscription – do it here.
The Year AI Got Real: Acceleration, Reckoning, and the Resource Race
Today, I’d like to go over Stanford’s AI State Report 2025. Brace yourself – it’s 456 pages. My overview will help you understand the undercurrents of AI, but I do encourage you to check the report yourself.
There are few ways to deal with this ginormous report:
Study the table of contents and stick to the topics that actually interest you. The report is click-through, so you can jump around easily.
The classic Ctrl+F (Cmd+F for Mac). Search for words or phrases like “out of data” or “hallucinations” and skim based on curiosity.
But here’s what we recommend: upload the PDF to Gemini 2.5 (because it has an enormous context window and you actually can upload a 456-pages report) and start asking it questions. This report is actually one of the best documents to test Gemini 2.5 on – it’s dense, diverse, and surprisingly rewarding. I found it fascinating to prompt Gemini for specifics and get more insight than I would’ve by reading it cover to cover (this report isn’t literature – you don’t need to savor every word).
So – what’s happening with AI, as chronicled by the exhaustive Stanford AI Index Report 2025? If previous years were about AI's dazzling potential, 2024 marked the year AI truly started showing up for work, getting deeply embedded in our economy and daily lives – but its arrival is far from smooth. Forget the futuristic hype (please!); the most interesting story is the collision between AI's breakneck acceleration, its staggering resource appetite, and a shift in public awareness.
Perhaps the most striking shift is AI's tangible economic integration. Businesses go past experimenting – adoption surged dramatically, with 78% reporting AI use, and generative AI adoption more than doubling in a single year. AI moved from a niche tool to a core business driver. Studies now increasingly confirm real productivity boosts, often, fascinatingly, lifting lower-skilled workers the most, complicating simple narratives of automation doom. AI is optimizing supply chains, drafting marketing copy, and even assisting in scientific discovery at a pace that led to AI-related work sharing Nobel Prizes.
But this rapid integration and capability leap fuels an equally interesting dynamic: the race at the frontier is becoming paradoxically both more competitive and more concentrated. Performance gaps between the very top models are shrinking, open-weight models are rapidly closing in on proprietary giants, and China is demonstrating remarkable speed in catching up on performance benchmarks. Yet, achieving these frontier results demands resources escalating at an exponential rate. Training compute needs double every five months, power consumption doubles annually, and training costs are hitting astronomical figures – think $100 million for GPT-4, with $1 billion runs already underway. This immense cost concentrates cutting-edge development power, primarily in US industry, even as the use of AI becomes cheaper via falling inference costs. We’re not talking about faster chips alone – this is a resource gap that’s growing wider by the minute.
One of the most under-discussed threads in the report is sustainability. The carbon cost of training frontier models is enormous. More urgent, though, is the data shortage. The “data commons” of the open web is shrinking fast – scraping restrictions, locked content, disappearing sources. Some forecasts suggest we’ll hit a wall by 2026. Relying on scale is starting to look fragile. Synthetic data? Better methods? That’s the open question.
Then there’s the push and pull between progress and caution. Scientific wins and commercial rollout continue at pace, but so do incidents, biases, and public distrust. Companies know the risks – they’re not acting fast enough. Governments are finally investing and coordinating, but the pace is just too fast. Sentiment is mixed: optimism, regional divides, job anxiety, low confidence in data practices.
The most important takeaway from the 2025 AI Index isn’t acceleration. It’s how that acceleration demands near-limitless inputs from a finite world. How AI adoption continues alongside a complex trust landscape. How we now face a global test of whether we can match innovation with restraint, and speed with responsibility. Those tensions – resources, safety, trust – may shape AI’s legacy more than any leaderboard ever will.
That’s a handful already but check this report also: “Anthropic Education Report: How University Students Use Claude”

Image Credit: Anthropic
Welcome to Monday. I bet, the only thing accelerating faster than AI right now is your unread tabs.
Curated Collections
We are reading/watching
Are AI Agents Sustainable? It depends – by Sasha Luccioni and Brigitte Tousignant and Yacine Jernite
News from The Usual Suspects ©
A few very very interesting updates and releases this week from the main players!
Congrats to our friends at Hugging Face! Robotics is one of the most interesting areas for AI in the next few years.
Let's make AI robotics open-source!
— clem 🤗 (@ClementDelangue)
1:10 PM • Apr 14, 2025
Google Cloud Next 2025 – Iron, Flash & Protocols: The AI Empire Strikes Back
Sundar Pichai kicked off Cloud Next 2025 reminding everyone: Google is an AI company. The message was quiet, confident – and backed with serious silicon, software, and standards.
Let’s start with TPU v7 “Ironwood”
Google’s 7th-generation AI chip launches later this year with 42.5 exaflops in full config – 24x faster than the world’s top supercomputer. Each chip offers 4,614 teraflops, 192GB high-bandwidth memory, and 7.2 Tbps throughput. Built for AI inference at super scale.Gemini 2.5 Pro and Flash Models
The new Gemini family pushes further into reasoning, code, and math. Gemini 2.5 Pro supports a 1M-token context window – tailored for complex workflows. Flash models offer leaner, faster variants optimized for deployment and responsiveness.Firebase Studio
A new developer tool positioned as a competitor to platforms like Cursor. It’s built for full-stack AI app development. It combines features from Project IDX, Genkit, and Gemini, and offers a workflow where you can prototype with natural language, write and preview code across devices, and deploy – all from a single browser tab. It also integrates Gemini for conversational assistance throughout the process.
Vertex AI Expands
Google’s generative suite now spans image, video, music, speech – and it's the only platform with with generative coverage across all major media types. New tools include Imagen 3, Veo 2, Lyria, and Chirp 3. Safety, watermarking, and IP protections are baked in.Who made some noise? Meet Agent-to-Agent Protocol (A2A) (in every newsletter this week). What is important to understand: Google’s Agent2Agent (A2A) and Anthropic’s Model Context Protocol (MCP) are complementary but distinct standards.
Image Credit: Google (How A2A works)
A2A standardizes communication among autonomous AI agents through structured messaging, capability sharing, and collaborative workflows. MCP, meanwhile, connects AI models directly to external data and tools via standardized interfaces, enhancing contextual awareness. Together, these protocols enable AI systems to seamlessly interact, collaborate, and utilize external resources for more intelligent and integrated operations.

OpenAI
ChatGPT Gets a Better Memory – Yours. The latest update lets ChatGPT reference all your past chats to sharpen its responses. That means smarter help, better advice, and smoother conversations. Plus and Pro users (outside the EEA and friends) get it first. Don’t want it? You can still opt out with a click.
Beside that they published EU Economic Blueprint, where the company proposes a four-pillar strategy – compute, data, energy, and talent – to drive AI-led prosperity, while cautioning that a maze of over 270 regulatory bodies might bury Europe's ambitions under its own paperwork. A more agile, harmonized framework, OpenAI argues, could turn Europe into an AI powerhouse.
They also introduced OpenAI’s Pioneers Program that helps startups build custom fine-tuned models for three high-impact industry use cases. It offers support for creating domain-specific evaluation benchmarks and expert models via reinforcement fine-tuning, aiming to boost AI performance in sectors like legal, healthcare, and finance through close collaboration with OpenAI researchers. Right after, they published BrowseComp: a benchmark for browsing agents.
Microsoft [Copilot+ Gets a Memory Upgrade]
Microsoft just dropped Windows 11 Build 26100.3902 into the Release Preview Channel, and it’s a Copilot+ showcase. They re-introduce The Recall function – their version of memory but for your whole PC. Meanwhile, Click to Do lets you act directly on-screen content, and search gets semantically smarter. Not every feature lands everywhere yet, but the AI future is clearly baked into Windows.
Models to pay attention to:
🌟SmolVLM (Hugging Face and Stanford University) introduces a family of compact VLMs with efficient tokenization and architecture, outperforming much larger models using under 1GB of GPU memory → read the paper
🌟Cogito v1 presents a full family of open LLMs trained with IDA alignment, outperforming other open models at every size from 3B to 70B → read the paper
DeepCoder – Together.ai releases a 14B code model trained via distributed RL, competitive with o3-mini and scoring high on LiveCodeBench and HumanEval+ → read the paper
Kimi-VL releases an open 2.8B MoE VLM with strong agent capabilities and long-context understanding, plus a “Thinking” variant tuned for chain-of-thought reasoning → read the paper
Skywork R1V adapts a pretrained LLM to visual inputs with lightweight projection and hybrid training, achieving strong scores on multimodal reasoning benchmarks → read the paper
The freshest research papers, categorized for your convenience
There were quite a few TOP research papers this week, we will mark them with 🌟 in each section.
Agentic Discovery & Research Tools
🌟The AI Scientist v2 (Sakana AI, UBC, Vector Institute, and Oxford) is an autonomous LLM-based agent that formulates hypotheses, runs experiments, analyzes data, and writes papers. It uses agentic tree search and VLM feedback for iterative refinement, removing human-authored code templates. Of three papers submitted to ICLR 2025 workshops, one passed peer review with a 6.33 score → read the paper
🌟 Debug-gym (Microsoft) provides an interactive sandboxed coding environment for LLMs to learn step-by-step debugging using tools like
pdb
. It supports repository-level reasoning and includes benchmarks (Aider, Mini-nightmare, SWE-bench) to assess debugging agents. Initial experiments show LLMs with tool access significantly outperform those without on SWE-bench Lite → read the paper
Training Data & Model Auditing
🌟 OLMoTrace tracks generated outputs back to the original training data in real time, using substring matching at scale → read the paper
Are You Getting What You Pay For? audits LLM APIs for covert model substitutions and tests detection strategies across realistic attack scenarios → read the paper
Multimodal Systems & Architectures
MM-IFEngine builds a training and evaluation pipeline for precise multimodal instruction-following, with high-quality data and rule-based + model-based metrics → read the paper
🌟 Scaling Laws for Native Multimodal Models analyzes 457 native multimodal models, finding that early-fusion architectures often outperform late-fusion ones in efficiency and scaling → read the paper
Serving & Optimizing MoE Models
🌟MegaScale-Infer (ByteDance) disaggregates attention and FFN layers in MoE models, introducing ping-pong parallelism and custom M2N communication for faster, cheaper inference → read the paper
🌟 Hogwild! Inference (Yandex) runs multiple LLM instances in parallel over a shared attention cache, improving throughput without needing fine-tunin → read the paper
HybriMoE optimizes MoE inference on mixed CPU-GPU setups with dynamic scheduling and caching to handle expert instability → read the paper
C3PO enhances test-time accuracy in MoE LLMs by reweighting expert mixing in key layers using similarity-based surrogates → read the paper
Quantization Hurts Reasoning? studies how quantization affects LLM reasoning and identifies settings where lower bit-widths degrade performance → read the paper
Reasoning, Reinforcement & CoT Techniques
🌟 Self-Steering Language Models (MIT and Yale) introduces DisCIPL, where a planning LM writes recursive inference programs for smaller LMs to follow, boosting efficiency and control → read the paper
🌟 VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks (byteDance) uses value-based reinforcement learning to train long-CoT reasoning models more reliably and sample-efficiently → read the paper
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning introduces AdaRFT that speeds up RL finetuning by adaptively adjusting the difficulty of math reasoning problems based on reward feedback → read the paper
Concise Reasoning via RL demonstrates that longer chain-of-thoughts are not always better and introduces post-hoc RL to train for brevity without losing accuracy → read the paper
Missing Premise exacerbates Overthinking shows how LLMs overthink when premises are missing and proposes new benchmarks to test critical thinking → read the paper
Diffusion & Constrained Generation
DDT decouples semantic encoding and detail decoding in diffusion transformers to boost both sample quality and training speed → read the paper
Adaptive Weighted Rejection Sampling improves constrained generation by selectively rejecting samples and estimating importance weights for unbiased generation → read the paper
That’s all for today. Thank you for reading! Please send this newsletter to your colleagues if it can help them enhance their understanding of AI and stay ahead of the curve
Leave a review! |
Reply