- Turing Post
- Posts
- Gemini Is Rising While Anthropic Works on Opening the Black Box of AI
Gemini Is Rising While Anthropic Works on Opening the Black Box of AI
Plus, an amazing timeline of Anthropic’s work peeling back the black box of AI, the news from usual suspects, and a few good reads
This Week in Turing Post:
Wednesday, AI 101, Concept: Let’s talk about Inference
Friday, Agentic Workflow: the evolving human–AI communication and how it shapes our experience and the world
Upgrade if you want to receive these articles directly in your inbox, complete with detailed explanations and a curated reading list. Or support us via “Buy Me a Coffee.”
Main: Case for Gemini and Anthropic’s path to open the black box of AI
A few changes in this edition. On the main stage today, we have two topics: both are grounded in recent news, but we zoom out to connect the dots and show the bigger picture from different angles. Second change: I’m ditching the Top Research section. Use the poll below if you disagree – your vote will decide whether it comes back.
Shall we keep the Top Research section? |
Now to the main topics
Gemini is Rising
I’ve always loved Gmail and Google Docs – they’re intuitive, reliable, and have been part of my daily workflow forever. Yet, whenever I tried to “like” Google's larger AI models, like Gemini, they just didn't click with me. While ChatGPT, Claude, Midjourney, and occasionally Grok fit smoothly into my routine, Google's AI offerings often felt distant and less precise, with their Deep Research frequently hallucinating – so much so that I stopped trying new Google products altogether.
Then suddenly, last week, Gemini 2.5 Pro began dominating headlines: an impressive 40-point ELO leap in the Chatbot Arena, announcements about bringing personal API keys to Cursor, and then my husband – a software developer and AI practitioner – began passionately praising Gemini and Gemma, painting a compelling picture of their power and finesse. Hmm, I thought. I still don’t see it!
So – to avoid being constrained by my own AI habits – I asked my husband why he believes Google might be reclaiming its edge in AI, and whether Gemma could genuinely become part of my routine as well.
Here’s what he wrote back:
"It’s funny how quickly narratives form in tech. One of the more common ones right now is that Google is “behind” in AI – playing catch-up to OpenAI, Anthropic, or even to scrappy open-source players. But that story doesn’t quite hold up once you actually use Google’s models directly.
Let’s not forget: Google literally invented the transformer architecture that underpins everything from GPT-4 to Claude to the open-source LLMs lighting up Hugging Face. And yet, public perception has turned, partly because Google has been unusually conservative in how it releases and positions its AI tools. The consumer-facing Gemini app might not feel as magical or responsive as ChatGPT. But when you drop into the raw model — running Gemma 3 on your own machine, or hitting the Gemini 2.5 API – the story changes.
There’s a tightness and focus to these models that’s hard to ignore. Interacting at the model level strips away the product decisions, the guardrails, the UI polish – and lets you feel the actual engineering. And Gemini feels fast, serious, and precise. It’s like driving a high-performance machine: the output quality might match something like Claude or GPT-4, but the experience is different. Less latency, fewer hallucinations, more substance with fewer words.
Does Google DeepResearch produce insightful research reports. Not really. But if you want to use something as a foundational model for your own product, I'm consistently starting with Gemini or Gemma3 in my work.
Gemma 3 is especially striking. It’s open-source and tiny compared to the giants, yet it punches way above its weight. Compared to other models of the same class, it feels more like a distilled essence of what a model should be – no flashy tricks, no quantization hacks, just really strong engineering. It seems like it captures the essence in a way that the others don't. My gut says that we’re heading toward a future where these small, well-tuned models outperform larger ones simply by being more efficient and better trained.
Contrast that with DeepSeek, which has its own kind of charm. It feels clever and scrappy, like a model cobbled together by brilliant engineers who aren’t afraid to cut corners in smart ways. The outputs are often good – sometimes great – but the sensation is different. Where Gemini feels grounded in theory and craftsmanship, DeepSeek feels like it’s hacking its way to good results. (And its significantly slower.)
And then there’s the multimodal magic. Gemini 2.5 is the first time I’ve genuinely felt like a model understood not just text, but image, audio, and the sensory world. It could reason about voice timbre, edit an image seamlessly, and do it all with speed and clarity. It wasn’t just responding – it was perceiving. That felt new.
So no, I don’t think Google is behind. I think they’re cautious. But under the hood, their models are among the best I’ve used (here’s he compares how deep each model understands things). The moment they decide to really ship – to stop slow-rolling every release – it’s going to be hard for anyone else to keep up."
Gemini 2.5 Pro sets SOTA on the aider polyglot leaderboard with a score of 73%.
This is well ahead of thinking/reasoning models. A huge jump from prior Gemini models. The first Gemini model to effectively use efficient diff-like editing formats.
aider.chat/docs/leaderboa…
— Paul Gauthier (@paulgauthier)
8:53 PM • Mar 25, 2025
Anthropic’s Neuroscience of Language Models
Few labs are peeling back the layers of large language models with as much consistency as Anthropic. Since May 2024, their interpretability team has been on a mission to map the inner life of Claude – starting with “dictionary learning,” a technique that helped uncover millions of neuron patterns or “features” and match them to human concepts. Think of it as building a rough glossary for the brain of an LLM.
By October, they were zooming in on monosemanticity – the idea that a single neuron pattern might align with a single meaning. A clean signal, not a messy blend. That work helped decompose Claude’s tangled representations into parts we can actually reason about.
Then came March 2025. In On the Biology of a Large Language Model they used attribution graphs to trace how Claude 3.5 Haiku reasons across multiple steps – writing poems, diagnosing patients, even planning ahead. The findings make one thing clear: these models aren’t just completing sentences. They’re constructing thoughts.
Together, these three studies form a kind of field guide to the model’s mind – showing how interpretability can move from theory to toolkit, and from safety mandate to creative lens.

Welcome to Monday! It’s already feels like Friday in the AI world…
Curated Collections
From AI practitioner
Trying new Reve Image 1.0 this week. So far, it’s been really good with text and photorealistic images. ​Can serve as a more affordable alternative to Flux.
We are reading/watching:
No elephants: Breakthroughs in image generation – not so much reading but a few good ideas how to use new OpenAI image generation by Ethan Mollick
When to use and when not to use fine-tuning by Andrew Ng
News from The Usual Suspects ©
Gemini (we told you, it’s rising!)
DeepMind puts Gemini to work
Google DeepMind’s Gemini Robotics shifts AI out of the lab and into the world. Built on Gemini 2.0, it powers robots with a Vision-Language-Action model that grasps, points, packs, and even folds origami. With zero- and few-shot learning, it adapts to new tasks and robot bodies on the fly – no retraining required. It's a quiet but bold stride toward truly general-purpose, physically capable AI.Google has made Gemini 2.5 Pro (experimental) free for all
Formerly a $19.99/month perk, the top-tier model now comes with file uploads, app integration, and the new Canvas tool. It’s a calculated move to flood the market with its best AI – currently top of the charts in reasoning and STEM. Democratization or bait-and-switch? Time (and pricing) will tell.
OpenAI launches Academy, but you only heard about Images
OpenAI has debuted OpenAI Academy – which not many people noticed – a global learning hub designed to turn AI curiosity into capability. It has such courses as “AI for Older Adults: Introduction to AI”, and “Getting Started with AI for Nonprofits”. Great news for AI Literacy.
OpenAI unveiled “Images in ChatGPT” but you probably know it already from the flood of Ghibli-style images sweeping the internet. While Google’s Gemini Flash also introduced image generation via Gemini 2.0, it was tucked behind developer tools and constrained by rigid safety filters. OpenAI, meanwhile, dropped image gen straight into ChatGPT for millions to try – no instructions needed.
the chatgpt launch 26 months ago was one of the craziest viral moments i'd ever seen, and we added one million users in five days.
we added one million users in the last hour.
— Sam Altman (@sama)
6:11 PM • Mar 31, 2025
Here is the model system card. Wonder why the image appears out of the blur? It’s just a feature.
After hacking GPT-4o's frontend, I made amazing discoveries:
đź’ˇThe line-by-line image generation effect users see is just a browser-side animation (pure frontend trick)
🔦OpenAI's server sends only 5 intermediate images per generation, captured at different stages
🎾Patch size=8— Jie Liu (@jie_liu1)
11:19 PM • Mar 28, 2025
Also in the mix: OpenAI adopting Anthropic’s Model Context Protocol, proving even fierce rivals agree on data plumbing. Or it just desperately trying to catch up.
Musk fuses X and xAI: One empire to rule them all
Elon Musk has folded his social platform X into xAI in an all-stock deal valuing the AI startup at $80B and X at $33B (after trimming off $12B in debt). It’s a strategic loop-closing – AI models get training data, users get an AI-native playground, and Musk gets a firmer grip on both.
AI safety watchdog picks up what governments dropped
As AI giants quietly dilute their ethics pledges and the Trump administration tears down safety frameworks, one small nonprofit – the Midas Project – is doing the unglamorous work of tracking it all. With its “AI Safety Watchtower,” it monitors policy changes at 16 major firms. In an era where trillion-dollar companies have no changelogs, Tyler Johnston is keeping the receipts.

No top models, no top research this week – vote above if you want these sections back.
That’s all for today. Thank you for reading! Please send this newsletter to your colleagues if it can help them enhance their understanding of AI and stay ahead of the curve
Leave a review! |
Reply