• Turing Post
  • Posts
  • šŸŒ#77: Amid Big Model Chaos: Small Models and Embeddings Steal the Spotlight

šŸŒ#77: Amid Big Model Chaos: Small Models and Embeddings Steal the Spotlight

plus our usual collection of interesting articles, relevant news, and research papers. Dive in!

This Week in Turing Post:

  • Wednesday, AI 101, Model: Whatā€™s so cool about Llava-o1?

  • Friday/Saturday after Thanksgiving, itā€™s a holiday in the use. So ā†’

šŸ¦ƒ What would you like to receive on that day šŸ¦ƒ

Login or Subscribe to participate in polls.

Thank you for voting!

The main topic

Today we have a little bit more technical editorial than usual.

The AI model race has spiraled into chaos. OpenAIā€™s GPT-4o-2024-11-20, suspected to be faster but less capable than its predecessor, briefly edged out Gemini Exp 1114 ā€“ until Gemini Exp 1121 reclaimed the lead. Versioning has been abandoned in favor of date-stamped releases, leaving us with an endless cycle of incremental updates masquerading as breakthroughs.

This unrelenting push for dominance has left developers and researchers frustrated. The anticipated "GPT-5" and "Claude 4" remain elusive, while the industry obsesses over achieving benchmarks. Labs prioritize leaderboard rankings over delivering meaningful progress, creating a landscape where clarity and innovation are often sacrificed for speed.

There is area though where the real work is happening: this week gave us a few meaningful small models and reserch on embeddings.

Take Hymba, NVIDIAā€™s hybrid-head architecture combining transformer attention with state space models (SSMs). Hymba-1.5B not only outperforms larger counterparts like Llama-3.2-3B but does so with drastically reduced memory usage and boosted throughput. Its sliding window attention and learnable meta tokens are innovations worth celebrating.

Similarly, SlimLM redefines on-device AI. Built for smartphones, this compact language model balances privacy and efficiency, achieving impressive document assistance tasks directly on devices like the Samsung Galaxy S24. Small models are quietly making transformative contributions.

Multimodality also sees breakthroughs. BlueLM-V-3B is a large multimodal but for mobile devices. It excels in multilingual OCR and image-to-text tasks, leveraging embeddings to optimize mobile efficiency.

Meanwhile, Jina CLIP v2 delivers multilingual, multimodal embeddings for text and images, combining power with compactness through Matryoshka representations.

While "frontier labs" chase benchmarks, these compact models quietly redefine efficiency and usability, highlighting how smaller, more focused developments might be more meaningful than incremental improvements in large models.

If you like Turing Post, consider becoming a paid subscriber or sharing this digest with a friend. It helps us keep Monday digests free ā†’

Twitter library

Weekly recommendation from AI practitioneršŸ‘šŸ¼

Anthropic caught our AI practitioner's attention today with the launch of the Model Context Protocol (MCP), an open standard for connecting AI systems to various data sources. Designed to simplify integrations, it offers secure, scalable connections to tools like Google Drive and GitHub. Developers can access pre-built servers, SDKs, and an open-source repository to build smarter, context-aware AI. Itā€™s a much-needed solution, and if others adopt it beyond Claude, it could be a game-changer.

Not a subscriber yet? Subscribe to receive our digests and articles:

Top Research ā€“ is all about models today

TĆ¼lu 3: Open Models, Closed Gaps
The Allen Institute for AIā€™s TĆ¼lu 3 raises the bar for open post-training. With curated prompts, synthetic fine-tuning, and a trailblazing RLVR framework, it bests Llama 3.1-Instruct and nips at the heels of proprietary systems on GSM8K and IFEval. Open-source innovation with competitive precision ā†’read more

Marco-o1: Alibabaā€™s Path to Open Reasoning
Alibabaā€™s Marco-o1 embraces Chain-of-Thought tuning and Monte Carlo Tree Search to tackle open-ended challenges. With +6% MGSM gains and outperforming Google Translate in nuanced tasks, it redefines reasoningā€™s potential. Self-correcting, confident, and cutting-edge ā†’read more

DeepSeek-R1-Lite: Another take on OpenAI
DeepSeek introduces R1-Lite-Preview, a reasoning AI designed for logical inference, math-heavy tasks, and real-time problem-solving. Leveraging chain-of-thought reasoning, it matches or surpasses OpenAIā€™s o1-preview on benchmarks like AIME and MATH. With plans to open source its R1 models and APIs, DeepSeek aims to energize AI innovation worldwide. Chinese companies are catching up ā†’read more

Bi-Mamba: Binary Brilliance
Bi-Mamba, a joint effort from MBZUAI and CMU, makes 1-bit modeling a reality. It cuts storage by 80%, saves energy, and rivals full-precision models like Mamba-2. Tailored for low-bit hardware, it proves efficiency can shine without compromise ā†’read more

Pixtral Large: Mistralā€™s Multimodal Giant
Mistral AIā€™s Pixtral Large, with 124B parameters, redefines multimodal AI. From documents to high-res images, it handles enterprise challenges effortlessly, outpacing GPT-4o and Claude-3.5 Sonnet on key tests. A new multimodal champion emerges ā†’read more

You can find the rest of the curated research at the end of the newsletter.

We are reading

News from The Usual Suspects Ā©

  • AlphaQubit: Sharpening Quantumā€™s Edge
    Google DeepMindā€™s AlphaQubit tackles quantum computingā€™s Achillesā€™ heel: error correction. By leveraging advanced neural networks, it boosts accuracy by 30% on Googleā€™s Sycamore processor, surpassing traditional decoders. While too slow for real-time use, itā€™s a promising leap toward scalable quantum systems.

  • Qwen2.5-Turbo: 1 million token context!
    Alibabaā€™s Qwen2.5-Turbo shatters token limits with 1M-token contexts, slicing processing time by 4.3x thanks to sparse attention. Capable of analyzing vast texts or codebases, it outperforms GPT-4 and redefines cost efficiency at Ā„0.3/1M tokens. Practical adoption remains a challenge, but the potential is game-changing.

  • H Dives Into the 'Agentic' Era with Runner H
    Paris-based AI startup H unveils Runner H, its first product after raising $220 million in funding. With a compact 2-billion-parameter LLM, the platform targets businesses with agentic tools for robotic process automation (RPA), quality assurance, and outsourcing. Runner H claims efficiency and performance beyond bigger rivals like Anthropic. A bold entry in AI's second era? Time will tell. We havenā€™t tried it, but we joined the waitlist.

  • Microsoft Copilot: The Future of Workflow
    At Ignite 2024, Microsoft unveiled new Copilot capabilities, automating tasks and enhancing collaboration. With features like Copilot Actions, Pages, and Teamsā€™ Interpreter agent, productivity soars across global teams. The Copilot Control System ensures secure adoption, solidifying Microsoftā€™s AI leadership. Much more detailed about their updates on their blog.

  • Cerebras: Breaking the Speed Barrier
    Cerebrasā€™ LLM inference processor now delivers molecular dynamics simulations 700 times faster than the Frontier supercomputer. What once took two years of computation can now be achieved in a day, redefining scientific research timelines. So cool.

  • Anthropic: $4B More for Responsible AI
    Amazon doubles down on Anthropic with a fresh $4B investment, as the company highlights progress on voluntary AI safety commitments. Building AI responsibly while competing at the highest levels.
    Read more

  • xAI: Elon Muskā€™s AI Ambitions Soar
    Itā€™s kinda crazy bur Elon Muskā€™s xAI raised another $5B, doubling its valuation to $50B. With backing from heavyweights like Qatar Investment Authority and Andreessen Horowitz, Muskā€™s vision for AI dominance accelerates.

šŸŒ We support šŸŒ

More interesting research papers from last week

Multimodal and Vision-Language Models

Reinforcement Learning and Transfer Learning

Retrieval and Knowledge Synthesis

Memory, Efficiency, and Scaling Innovations

Agent Architectures and Robotics

Alignment, Verification, Guardrails, and Safety Frameworks

Leave a review!

Login or Subscribe to participate in polls.

Please send this newsletter to your colleagues if it can help them enhance their understanding of AI and stay ahead of the curve. You will get a 1-month subscription!

Reply

or to participate.