FOD#61: AI Fall: Time to Build

Let's get even more practical

Next Week in Turing Post:

  • Thursday, a guest post: Your infrastructure shouldnā€™t live in a black box

  • Friday, AI Infra Unicorns: A Deep Dive into Graphcore

The main topic: AI Fall šŸ‚

This Monday saw significant market drops across stocks, cryptocurrencies, and oil due to growing concerns over a rapidly slowing U.S. economy. Criticisms of the Fedā€™s pace on rate adjustments are intensifying, fueling fears of a potential recession. Investors are on edge, closely watching for whatā€™s next.

Is it the right time to talk about the AI bubble/winter?
A lot of people think so. But this topic has been surfacing for the last year and a half. Exactly a year ago, we already discussed the AI hype, likening it to historical bubbles such as the dot-com and ICO crazes. With massive investments in generative AI (GenAI), some experts back then warned of an impending bubble that could lead to another AI winter. However, others argued that AI's tangible benefits and established industry presence might prevent such a crash. Last week, Ben Thompson also drew parallels with the 1990s tech boom, driven not by the necessity of building, but by the fear of missing out. This fear is pushing investors to focus on the risks of underbuilding rather than the potential dangers of excess.

This frenetic pace of development actually begs for an AI Fall (and a few will fall) ā€“ a period of reflection and sustainable growth. A moment to gather crops, see what bore fruit, and what failed to pass the sprout stage. The industry is transitioning from hype to building practical tools that will integrate AI more deeply into our lives. The next phase will determine whether weā€™re on the brink of an AI winter or at the dawn of a transformative era.

What Are We Really Building?

The question we must ask ourselves is: What are we truly aiming to achieve by pumping trillions of dollars into AI, particularly large language models (LLMs) and multimodal foundation models? Are we blindly chasing bigger models and more data, even when the internet itself may not provide enough raw material for meaningful expansion?
How much more capable will GPT-5 or 6 be? They might be better at answering questions, but it doesnā€™t answer the question: What are we building at the end of the day? Even Sam Altman himself, in a recent interview with Joe Rogan, shared that when he started OpenAI, he believed AI would take on the heavy lifting for him. But what are we really automating? Are we addressing genuine needs, or are we caught in a loop of creating increasingly complex systems without a clear purpose?

Challenges

Yes, indeed, despite ongoing investments, the industry faces significant hurdles: imbalanced growth, unproven revenue models, and increasing skepticism from financial heavyweights like Goldman Sachs and Sequoia Capital.

As the AI arms race intensifies, so does the debate over capital expenditures. David Cahn recently argued that the current debate isn't just about whether AI CapEx is too high, but whether the speed and necessity of infrastructure buildout are warranted. The competition among major cloud providers like Microsoft, Amazon, and Google is driving rapid expansion, but at what cost? Smaller players are being squeezed, and today's investments could become obsolete if AI progress outpaces the physical infrastructure being built.

The Shift from AGI Dreams to Practical AI Tools

But again, what is it that we are building? AI has already achieved a lot. Despite concerns, AI is delivering real value. Itā€™s an amazingly useful tool. There's still much potential. In this context, Nicholas Carlini's reflections on the value of LLMs are telling. Despite their limitations, these models are already making a tangible impact on productivity ā€“ Carlini himself reports a 50% improvement in his work. This suggests that while we may not yet be at the AGI level, the benefits of AI are very real and growing.

Mass adoption doesn't happen overnight, but generative AI is already democratizing the use of AI tools, saving time, and improving productivity. A new wave of practitioners is on the rise, poised to build more tools and help corporations integrate AI into their operations. Weā€™re in a building phase, not just a training or bubbling phase.

I donā€™t believe in AI Winter, the same way I donā€™t believe in reaching AGI (anytime soon). For the first, we've already built too many useful tools across industries, from medicine to journalism. As for the second, we havenā€™t gotten closer to understanding what intelligence is.
It's a time for careful consideration, strategic investments, and perhaps most importantly, a clear-eyed understanding of what we truly want AI to achieve.
Even if some question whether we need ever-larger models right now, the industry has made tangible progress. Itā€™s time to roll up our sleeves and start developing those case studies that will push progress further. It will not be AGI; it will be us equipped with our super cool AI tools.

Cheers, to the AI Fall.

In partnership with

SciSpace is a next-gen AI platform for researchers where you can effortlessly browse 280 million+ papers, conduct effortless literature reviews, chat with, understand, and summarize PDFs with its AI copilot, and so much more. 

If you love it, get 40% off an annual subscription with code TP40 or 20% off a monthly subscription with code TP20.

Announcements

We came back with a few announcements to make:

  • Starting this week, in every FOD, we include ā€˜Weekly recommendations from an AI practitioner šŸ‘šŸ¼ā€™ ā€“ 2-3 links from someone who builds with AI. Never sponsored, just what works.

  • Weā€™re announcing Turing Post Korea! Yes, Turing Post is now available in Korean, thanks to our initial reader and now collaborator, Byoungchan (Ben) Eum. Read the full announcement here.

  • In the latest FOD, we asked you what we should cover. Two topics are the absolute leaders: AGI and Agents. Weā€™ve decided to highlight Superintelligence/AGI (what we prefer to refer to as human-level intelligence) on Fridays on our Twitter. But Agents, oh Agents, thatā€™s a truly amazing topic to tackle from both historical and practical perspectives. In August, we will publish fewer articles because we are actively working on a series about AI agents. Stay tuned.

NEW! Weekly recommendations from AI practitioneršŸ‘šŸ¼:

  • Cursor and Aider ā€“ they are similar (Aider is free, Cursor is paid - thatā€™s what Copilot would have been) 

  • Superwhisper ā€“ if you get it into your flow, then ah, thatā€™s what Svoice assistant was supposed to be!

Twitter Library

Check our latest collections:

News from The Usual Suspects Ā©

  • Google crushing it this week:

    1. Gemini 1.5 Pro outperforms competitors: GPT-4 and Claude 3.5 are behind in benchmarks.

    2. Gemma 2 2B ā€“ a smaller, safer, more transparent model. With ShieldGemma for content moderation and Gemma Scope for model interpretability, Googleā€™s open-source push could redefine how we trust and understand AI. And itā€™s all wrapped in a neat, research-friendly package ā€“ talk about a gem!

    3. Google hires talent from Character.AI: Google's acquisition of top talent from Character.AI follows the trend. InflectionAI, AdeptAI, Stability (in a sense), CharacterAI ā€“ whoā€™s next?

    4. Google Cloud expands database portfolio with new AI features, including graph and vector search in Spanner SQL.

    5. Google unveils three AI features for Chrome like Google Lens, natural language search in your history and Tab compare. Looks useful!

  • GitHub Plays Hardball with AI

    • GitHub's new beta, GitHub Models, brings AI experimentation directly to developersā€™ fingertips. With Metaā€™s Llama 3.1 and OpenAIā€™s GPT-4o on tap, itā€™s a one-stop shop for AI model comparisons. By embedding AI tools seamlessly into its ecosystem, GitHub is aiming to outshine platforms like Hugging Face, making AI development as smooth as a single commit.

  • Contamination Crisis in NLP

    • The 2024 CONDA Data Contamination Report uncovers a major issue: AI models like GPT-4 and PaLM-2 unknowingly feasting on evaluation data, leading to misleadingly high scores. With 91 sources contaminated, this report pushes for transparency and stricter evaluation methods in the NLP community. Consider it the AI worldā€™s version of a doping scandal.

  • Nvidia in the Hot Seat

    • Nvidia is in a bind, juggling antitrust probes and chip delays while secretly training robots with Appleā€™s Vision Pro. Their latest AI chip design flub might slow them down, but Nvidiaā€™s influence in the tech world keeps growing ā€“ just not without some bumps along the way.

  • Groq - congrats!

  • From Stability to Flux

    • Former Stability.ai developers have founded Black Forest Labs, and announced the Flux.1 suite that is free and on par with Midjourney and DALL-E 3. The startup has secured $31 million in seed funding led by Andreessen Horowitz and plans to release text-to-video models next.

In other newsletters:

  1. Chips for Peace: how the U.S. and its allies can lead on safe and beneficial AI by Cullen O'Keefe

  2. Llama 3.1 launched and it is gooooood! by MLOps

The freshest research papers, categorized for your convenience

Our top (all from Meta this week!)

  • The Llama 3 Herd of Models

    This research paper introduces the "Llama 3 Herd," a collection of models based on the Llama 3 architecture. Meta's approach to making high-performance AI models accessible for fine-tuning by others represents a significant shift in the AI landscape. This move lowers the barrier to entry in the AI race, challenging the dominance of large tech companies by making it possible for smaller players to participate without the need for enormous financial resources.

  • SAM 2: Segment Anything in Images and Videos

    Researchers from Meta FAIR introduced SAM 2, a unified model for segmenting objects in both images and videos. SAM 2 employs a transformer-based architecture with a streaming memory to efficiently process video frames, providing improved accuracy and reduced user interaction compared to previous models. The model is trained on the SA-V dataset, the largest video segmentation dataset to date. SAM 2 outperforms its predecessor, Segment Anything Model (SAM), in both speed and accuracy across various benchmarks.

  • MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts

    Researchers from Meta FAIR introduced MoMa, a novel modality-aware mixture-of-experts (MoE) architecture for pre-training mixed-modal, early-fusion language models. MoMa utilizes modality-specific expert groups for text and image processing, improving efficiency with significant FLOPs savings compared to dense baselines. The model demonstrates superior pre-training efficiency and performance, particularly when combined with mixture-of-depths (MoD), though it faces challenges in causal inference. This approach paves the way for more resource-efficient multimodal AI systems.

  • Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge

    Researchers from Meta FAIR, UC Berkeley, and NYU propose "Meta-Rewarding," a method to enhance LLMs by allowing them to judge and refine their own judgments, improving their alignment and instruction-following abilities. The model's accuracy improved significantly on benchmarks like AlpacaEval 2 and Arena-Hard, achieving notable gains without relying on human supervision. This approach shows promise in developing self-improving models that can autonomously enhance their judgment and response quality.

A lot of research papers last week about non-english languages (including chemical language)

  • JACOLBERTV2.5: Optimising Multi-Vector Retrievers to Create State-of-the-Art Japanese Retrievers with Constrained Resources improves Japanese language retrieval by optimizing multi-vector retrievers with constrained computational resources ā†’read the paper

  • A Large Encoder-Decoder Family of Foundation Models For Chemical Language introduces a family of transformer-based models for chemical tasks, achieving state-of-the-art performance in molecular property prediction and classification ā†’read the paper

  • SeaLLMs 3: Open Foundation and Chat Multilingual LLMs for Southeast Asian Languages introduces a multilingual model tailored for Southeast Asian languages, improving performance and efficiency in various tasksā†’read the paper

  • ATHAR: A High-Quality and Diverse Dataset for Classical Arabic to English Translation provides a dataset for improving Classical Arabic to English translation, addressing gaps in existing resources ā†’read the paper

  • Sentiment Analysis of Lithuanian Online Reviews Using Large Language Models explores sentiment analysis for Lithuanian using fine-tuned transformer models, highlighting challenges in less-resourced languages ā†’read the paper

  • Meltemi: The first open Large Language Model for Greek develops Meltemi 7B, the first open-source LLM for Greek, showing strong performance on Greek language benchmarks ā†’read the paper

  • Adapting Safe-for-Work Classifier for Malaysian Language Text: Enhancing Alignment in LLM-Ops Framework creates a Safe-for-Work classifier for Malay text to detect unsafe content, improving safety in AI applications for Malaysiaā†’read the paper

  • Knesset-DictaBERT: A Hebrew Language Model for Parliamentary Proceedings fine-tunes a Hebrew language model on parliamentary texts for enhanced analysis of political discourseā†’read the paper

  • Non Verbis, Sed Rebus: Large Language Models are Weak Solvers of Italian Rebuses assesses LLMs' performance on Italian rebuses, revealing limitations in sequential reasoning despite fine-tuning ā†’read the paper

Language Model Improvements and Optimization

  • Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning enhances text embeddings using contrastive fine-tuning for better performance in semantic similarity tasks ā†’read the paper

  • ThinK: Thinner Key Cache by Query-Driven Pruning optimizes memory usage in LLMs during inference by pruning less important cache channels based on query criteria ā†’read the paper

Vision and Multimodal Models

  • OmniParser for Pure Vision Based GUI Agent develops a vision-based method for parsing UI screenshots into structured elements, improving model performance across various applications ā†’read the paper

  • Mixture of Nested Experts: Adaptive Processing of Visual Tokens proposes a framework to enhance Vision Transformers by dynamically routing visual tokens to reduce computational costs while maintaining accuracy ā†’read the paper

  • Visual Riddles: A Commonsense and World Knowledge Challenge for Vision and Language Models rests vision-language models on a benchmark of riddles requiring commonsense and world knowledge, highlighting the challenge in integrating these aspectsā†’read the paper

  • Visual Riddles: A Commonsense and World Knowledge Challenge for Vision and Language Models introduces a benchmark challenging vision-language models with commonsense riddles, exposing gaps in model performanceā†’read the paper

Leave a review!

Login or Subscribe to participate in polls.

If you like Turing Post, consider becoming a paid subscriber. Youā€™ll immediately get full access to all our articles, investigations, and tech series ā†’

Reply

or to participate.