- Turing Post
- Posts
- š#77: Amid Big Model Chaos: Small Models and Embeddings Steal the Spotlight
š#77: Amid Big Model Chaos: Small Models and Embeddings Steal the Spotlight
plus our usual collection of interesting articles, relevant news, and research papers. Dive in!
This Week in Turing Post:
Wednesday, AI 101, Model: Whatās so cool about Llava-o1?
Friday/Saturday after Thanksgiving, itās a holiday in the use. So ā
š¦ What would you like to receive on that day š¦ |
Thank you for voting!
The main topic
Today we have a little bit more technical editorial than usual.
The AI model race has spiraled into chaos. OpenAIās GPT-4o-2024-11-20, suspected to be faster but less capable than its predecessor, briefly edged out Gemini Exp 1114 ā until Gemini Exp 1121 reclaimed the lead. Versioning has been abandoned in favor of date-stamped releases, leaving us with an endless cycle of incremental updates masquerading as breakthroughs.
This unrelenting push for dominance has left developers and researchers frustrated. The anticipated "GPT-5" and "Claude 4" remain elusive, while the industry obsesses over achieving benchmarks. Labs prioritize leaderboard rankings over delivering meaningful progress, creating a landscape where clarity and innovation are often sacrificed for speed.
There is area though where the real work is happening: this week gave us a few meaningful small models and reserch on embeddings.
Take Hymba, NVIDIAās hybrid-head architecture combining transformer attention with state space models (SSMs). Hymba-1.5B not only outperforms larger counterparts like Llama-3.2-3B but does so with drastically reduced memory usage and boosted throughput. Its sliding window attention and learnable meta tokens are innovations worth celebrating.
Similarly, SlimLM redefines on-device AI. Built for smartphones, this compact language model balances privacy and efficiency, achieving impressive document assistance tasks directly on devices like the Samsung Galaxy S24. Small models are quietly making transformative contributions.
Multimodality also sees breakthroughs. BlueLM-V-3B is a large multimodal but for mobile devices. It excels in multilingual OCR and image-to-text tasks, leveraging embeddings to optimize mobile efficiency.
Meanwhile, Jina CLIP v2 delivers multilingual, multimodal embeddings for text and images, combining power with compactness through Matryoshka representations.
While "frontier labs" chase benchmarks, these compact models quietly redefine efficiency and usability, highlighting how smaller, more focused developments might be more meaningful than incremental improvements in large models.
If you like Turing Post, consider becoming a paid subscriber or sharing this digest with a friend. It helps us keep Monday digests free ā
Twitter library
Weekly recommendation from AI practitioneršš¼
Anthropic caught our AI practitioner's attention today with the launch of the Model Context Protocol (MCP), an open standard for connecting AI systems to various data sources. Designed to simplify integrations, it offers secure, scalable connections to tools like Google Drive and GitHub. Developers can access pre-built servers, SDKs, and an open-source repository to build smarter, context-aware AI. Itās a much-needed solution, and if others adopt it beyond Claude, it could be a game-changer.
Not a subscriber yet? Subscribe to receive our digests and articles:
Top Research ā is all about models today
TĆ¼lu 3: Open Models, Closed Gaps
The Allen Institute for AIās TĆ¼lu 3 raises the bar for open post-training. With curated prompts, synthetic fine-tuning, and a trailblazing RLVR framework, it bests Llama 3.1-Instruct and nips at the heels of proprietary systems on GSM8K and IFEval. Open-source innovation with competitive precision āread more
Marco-o1: Alibabaās Path to Open Reasoning
Alibabaās Marco-o1 embraces Chain-of-Thought tuning and Monte Carlo Tree Search to tackle open-ended challenges. With +6% MGSM gains and outperforming Google Translate in nuanced tasks, it redefines reasoningās potential. Self-correcting, confident, and cutting-edge āread more
DeepSeek-R1-Lite: Another take on OpenAI
DeepSeek introduces R1-Lite-Preview, a reasoning AI designed for logical inference, math-heavy tasks, and real-time problem-solving. Leveraging chain-of-thought reasoning, it matches or surpasses OpenAIās o1-preview on benchmarks like AIME and MATH. With plans to open source its R1 models and APIs, DeepSeek aims to energize AI innovation worldwide. Chinese companies are catching up āread more
Bi-Mamba: Binary Brilliance
Bi-Mamba, a joint effort from MBZUAI and CMU, makes 1-bit modeling a reality. It cuts storage by 80%, saves energy, and rivals full-precision models like Mamba-2. Tailored for low-bit hardware, it proves efficiency can shine without compromise āread more
Pixtral Large: Mistralās Multimodal Giant
Mistral AIās Pixtral Large, with 124B parameters, redefines multimodal AI. From documents to high-res images, it handles enterprise challenges effortlessly, outpacing GPT-4o and Claude-3.5 Sonnet on key tests. A new multimodal champion emerges āread more
You can find the rest of the curated research at the end of the newsletter.
We are reading
Which countries are leading in AI? by Stanford HAI
Kai-Fu Lee on U.S. AGI Hegemony by Recode China AI
How Mark Zuckerberg has fully rebuilt Meta around Llama by Fortune (paywall)
The History of RAND and Commentary from Its CEO by ChinaTalk
News from The Usual Suspects Ā©
AlphaQubit: Sharpening Quantumās Edge
Google DeepMindās AlphaQubit tackles quantum computingās Achillesā heel: error correction. By leveraging advanced neural networks, it boosts accuracy by 30% on Googleās Sycamore processor, surpassing traditional decoders. While too slow for real-time use, itās a promising leap toward scalable quantum systems.Qwen2.5-Turbo: 1 million token context!
Alibabaās Qwen2.5-Turbo shatters token limits with 1M-token contexts, slicing processing time by 4.3x thanks to sparse attention. Capable of analyzing vast texts or codebases, it outperforms GPT-4 and redefines cost efficiency at Ā„0.3/1M tokens. Practical adoption remains a challenge, but the potential is game-changing.H Dives Into the 'Agentic' Era with Runner H
Paris-based AI startup H unveils Runner H, its first product after raising $220 million in funding. With a compact 2-billion-parameter LLM, the platform targets businesses with agentic tools for robotic process automation (RPA), quality assurance, and outsourcing. Runner H claims efficiency and performance beyond bigger rivals like Anthropic. A bold entry in AI's second era? Time will tell. We havenāt tried it, but we joined the waitlist.Microsoft Copilot: The Future of Workflow
At Ignite 2024, Microsoft unveiled new Copilot capabilities, automating tasks and enhancing collaboration. With features like Copilot Actions, Pages, and Teamsā Interpreter agent, productivity soars across global teams. The Copilot Control System ensures secure adoption, solidifying Microsoftās AI leadership. Much more detailed about their updates on their blog.Cerebras: Breaking the Speed Barrier
Cerebrasā LLM inference processor now delivers molecular dynamics simulations 700 times faster than the Frontier supercomputer. What once took two years of computation can now be achieved in a day, redefining scientific research timelines. So cool.Anthropic: $4B More for Responsible AI
Amazon doubles down on Anthropic with a fresh $4B investment, as the company highlights progress on voluntary AI safety commitments. Building AI responsibly while competing at the highest levels.
Read morexAI: Elon Muskās AI Ambitions Soar
Itās kinda crazy bur Elon Muskās xAI raised another $5B, doubling its valuation to $50B. With backing from heavyweights like Qatar Investment Authority and Andreessen Horowitz, Muskās vision for AI dominance accelerates.
š We support š
More interesting research papers from last week
Multimodal and Vision-Language Models
Multimodal Autoregressive Pre-training of Large Vision Encoders: Sets a new benchmark in image-text understanding with scalable architectures and autoregressive reconstruction for multimodal reasoning.
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization: Optimizes multimodal reasoning with innovative fine-tuning techniques, achieving superior performance across reasoning tasks.
SAMURAI: Adapting Segment Anything Model For Zero-Shot Visual Tracking
Adapts Segment Anything for visual tracking with motion-aware memory, achieving high performance in challenging environments.
Reinforcement Learning and Transfer Learning
Natural Language Reinforcement Learning
Redefines reinforcement learning components using natural language for interpretable and knowledge-rich decision-making.Model-Based Transfer Learning For Contextual Reinforcement Learning
Improves reinforcement learning efficiency across tasks using a transfer learning approach for contextual environments.
Retrieval and Knowledge Synthesis
Openscholar: Synthesizing Scientific Literature With Retrieval-Augmented LMs: Builds a retrieval-augmented LLM, outperforming GPT-4o in synthesizing scientific queries from a large literature dataset.
Drowning in Documents: Consequences of Scaling Reranker Inference: Analyzes challenges of scaling rerankers for large datasets, showing limitations and proposing robust listwise reranking alternatives.
Memory, Efficiency, and Scaling Innovations
Ultra-Sparse Memory Network: Introduces ultra-sparse architectures that reduce latency and improve memory efficiency, rivaling larger dense models.
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training: Optimizes long-context processing by resolving precision issues, accelerating training while preserving performance.
Loss-to-Loss Prediction: Scaling Laws for All Datasets: Develops predictive scaling laws for model performance across tasks and datasets, enhancing efficiency and planning.
Agent Architectures and Robotics
Generative World Explorer: Develops Genex for imaginative exploration in 3D environments, enabling decision-making and belief revision without physical movement.
One to Rule Them All: Natural Language to Bind Communication, Perception, and Action: Introduces a robotic architecture integrating LLMs for task execution and dynamic environmental adaptation.
Alignment, Verification, Guardrails, and Safety Frameworks
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering: Proposes a post-training paradigm for foundation models, enhancing scalability and alignment with verification techniques.
Building Trust: Foundations of Security, Safety, and Transparency in AI: Establishes safety and security frameworks for responsible AI development and standardized risk management.
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models: Identifies directions in model representations to mitigate hallucinations and refine entity recognition.
A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection: Proposes a guardrail methodology for detecting off-topic prompts, utilizing synthetic datasets and fine-tuning for robust performance.
Leave a review! |
Please send this newsletter to your colleagues if it can help them enhance their understanding of AI and stay ahead of the curve. You will get a 1-month subscription!
Reply