- Turing Post
- Posts
- FOD#61: AI Fall: Time to Build
FOD#61: AI Fall: Time to Build
Let's get even more practical
Next Week in Turing Post:
Thursday, a guest post: Your infrastructure shouldnāt live in a black box
Friday, AI Infra Unicorns: A Deep Dive into Graphcore
The main topic: AI Fall š
This Monday saw significant market drops across stocks, cryptocurrencies, and oil due to growing concerns over a rapidly slowing U.S. economy. Criticisms of the Fedās pace on rate adjustments are intensifying, fueling fears of a potential recession. Investors are on edge, closely watching for whatās next.
Is it the right time to talk about the AI bubble/winter?
A lot of people think so. But this topic has been surfacing for the last year and a half. Exactly a year ago, we already discussed the AI hype, likening it to historical bubbles such as the dot-com and ICO crazes. With massive investments in generative AI (GenAI), some experts back then warned of an impending bubble that could lead to another AI winter. However, others argued that AI's tangible benefits and established industry presence might prevent such a crash. Last week, Ben Thompson also drew parallels with the 1990s tech boom, driven not by the necessity of building, but by the fear of missing out. This fear is pushing investors to focus on the risks of underbuilding rather than the potential dangers of excess.
This frenetic pace of development actually begs for an AI Fall (and a few will fall) ā a period of reflection and sustainable growth. A moment to gather crops, see what bore fruit, and what failed to pass the sprout stage. The industry is transitioning from hype to building practical tools that will integrate AI more deeply into our lives. The next phase will determine whether weāre on the brink of an AI winter or at the dawn of a transformative era.
What Are We Really Building?
The question we must ask ourselves is: What are we truly aiming to achieve by pumping trillions of dollars into AI, particularly large language models (LLMs) and multimodal foundation models? Are we blindly chasing bigger models and more data, even when the internet itself may not provide enough raw material for meaningful expansion?
How much more capable will GPT-5 or 6 be? They might be better at answering questions, but it doesnāt answer the question: What are we building at the end of the day? Even Sam Altman himself, in a recent interview with Joe Rogan, shared that when he started OpenAI, he believed AI would take on the heavy lifting for him. But what are we really automating? Are we addressing genuine needs, or are we caught in a loop of creating increasingly complex systems without a clear purpose?
Challenges
Yes, indeed, despite ongoing investments, the industry faces significant hurdles: imbalanced growth, unproven revenue models, and increasing skepticism from financial heavyweights like Goldman Sachs and Sequoia Capital.
As the AI arms race intensifies, so does the debate over capital expenditures. David Cahn recently argued that the current debate isn't just about whether AI CapEx is too high, but whether the speed and necessity of infrastructure buildout are warranted. The competition among major cloud providers like Microsoft, Amazon, and Google is driving rapid expansion, but at what cost? Smaller players are being squeezed, and today's investments could become obsolete if AI progress outpaces the physical infrastructure being built.
The Shift from AGI Dreams to Practical AI Tools
But again, what is it that we are building? AI has already achieved a lot. Despite concerns, AI is delivering real value. Itās an amazingly useful tool. There's still much potential. In this context, Nicholas Carlini's reflections on the value of LLMs are telling. Despite their limitations, these models are already making a tangible impact on productivity ā Carlini himself reports a 50% improvement in his work. This suggests that while we may not yet be at the AGI level, the benefits of AI are very real and growing.
Mass adoption doesn't happen overnight, but generative AI is already democratizing the use of AI tools, saving time, and improving productivity. A new wave of practitioners is on the rise, poised to build more tools and help corporations integrate AI into their operations. Weāre in a building phase, not just a training or bubbling phase.
I donāt believe in AI Winter, the same way I donāt believe in reaching AGI (anytime soon). For the first, we've already built too many useful tools across industries, from medicine to journalism. As for the second, we havenāt gotten closer to understanding what intelligence is.
It's a time for careful consideration, strategic investments, and perhaps most importantly, a clear-eyed understanding of what we truly want AI to achieve.
Even if some question whether we need ever-larger models right now, the industry has made tangible progress. Itās time to roll up our sleeves and start developing those case studies that will push progress further. It will not be AGI; it will be us equipped with our super cool AI tools.
Cheers, to the AI Fall.
In partnership with
SciSpace is a next-gen AI platform for researchers where you can effortlessly browse 280 million+ papers, conduct effortless literature reviews, chat with, understand, and summarize PDFs with its AI copilot, and so much more.
If you love it, get 40% off an annual subscription with code TP40 or 20% off a monthly subscription with code TP20.
Announcements
We came back with a few announcements to make:
Starting this week, in every FOD, we include āWeekly recommendations from an AI practitioner šš¼ā ā 2-3 links from someone who builds with AI. Never sponsored, just what works.
Weāre announcing Turing Post Korea! Yes, Turing Post is now available in Korean, thanks to our initial reader and now collaborator, Byoungchan (Ben) Eum. Read the full announcement here.
In the latest FOD, we asked you what we should cover. Two topics are the absolute leaders: AGI and Agents. Weāve decided to highlight Superintelligence/AGI (what we prefer to refer to as human-level intelligence) on Fridays on our Twitter. But Agents, oh Agents, thatās a truly amazing topic to tackle from both historical and practical perspectives. In August, we will publish fewer articles because we are actively working on a series about AI agents. Stay tuned.
NEW! Weekly recommendations from AI practitioneršš¼:
Cursor and Aider ā they are similar (Aider is free, Cursor is paid - thatās what Copilot would have been)
Superwhisper ā if you get it into your flow, then ah, thatās what Svoice assistant was supposed to be!
Twitter Library
Check our latest collections:
News from The Usual Suspects Ā©
Google crushing it this week:
Gemini 1.5 Pro outperforms competitors: GPT-4 and Claude 3.5 are behind in benchmarks.
Gemma 2 2B ā a smaller, safer, more transparent model. With ShieldGemma for content moderation and Gemma Scope for model interpretability, Googleās open-source push could redefine how we trust and understand AI. And itās all wrapped in a neat, research-friendly package ā talk about a gem!
Google hires talent from Character.AI: Google's acquisition of top talent from Character.AI follows the trend. InflectionAI, AdeptAI, Stability (in a sense), CharacterAI ā whoās next?
Google Cloud expands database portfolio with new AI features, including graph and vector search in Spanner SQL.
Google unveils three AI features for Chrome like Google Lens, natural language search in your history and Tab compare. Looks useful!
GitHub Plays Hardball with AI
GitHub's new beta, GitHub Models, brings AI experimentation directly to developersā fingertips. With Metaās Llama 3.1 and OpenAIās GPT-4o on tap, itās a one-stop shop for AI model comparisons. By embedding AI tools seamlessly into its ecosystem, GitHub is aiming to outshine platforms like Hugging Face, making AI development as smooth as a single commit.
Contamination Crisis in NLP
The 2024 CONDA Data Contamination Report uncovers a major issue: AI models like GPT-4 and PaLM-2 unknowingly feasting on evaluation data, leading to misleadingly high scores. With 91 sources contaminated, this report pushes for transparency and stricter evaluation methods in the NLP community. Consider it the AI worldās version of a doping scandal.
Nvidia in the Hot Seat
Nvidia is in a bind, juggling antitrust probes and chip delays while secretly training robots with Appleās Vision Pro. Their latest AI chip design flub might slow them down, but Nvidiaās influence in the tech world keeps growing ā just not without some bumps along the way.
Groq - congrats!
Weāve raised some more dosh in a Series D to help us carry on hacking. Also, @ylecun joins us at @GroqInc as a technical advisor.
ā Satnam Singh (@satnam6502)
4:05 PM ā¢ Aug 5, 2024
From Stability to Flux
Former Stability.ai developers have founded Black Forest Labs, and announced the Flux.1 suite that is free and on par with Midjourney and DALL-E 3. The startup has secured $31 million in seed funding led by Andreessen Horowitz and plans to release text-to-video models next.
In other newsletters:
|
|
The freshest research papers, categorized for your convenience
Our top (all from Meta this week!)
The Llama 3 Herd of Models
This research paper introduces the "Llama 3 Herd," a collection of models based on the Llama 3 architecture. Meta's approach to making high-performance AI models accessible for fine-tuning by others represents a significant shift in the AI landscape. This move lowers the barrier to entry in the AI race, challenging the dominance of large tech companies by making it possible for smaller players to participate without the need for enormous financial resources.
SAM 2: Segment Anything in Images and Videos
Researchers from Meta FAIR introduced SAM 2, a unified model for segmenting objects in both images and videos. SAM 2 employs a transformer-based architecture with a streaming memory to efficiently process video frames, providing improved accuracy and reduced user interaction compared to previous models. The model is trained on the SA-V dataset, the largest video segmentation dataset to date. SAM 2 outperforms its predecessor, Segment Anything Model (SAM), in both speed and accuracy across various benchmarks.
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts
Researchers from Meta FAIR introduced MoMa, a novel modality-aware mixture-of-experts (MoE) architecture for pre-training mixed-modal, early-fusion language models. MoMa utilizes modality-specific expert groups for text and image processing, improving efficiency with significant FLOPs savings compared to dense baselines. The model demonstrates superior pre-training efficiency and performance, particularly when combined with mixture-of-depths (MoD), though it faces challenges in causal inference. This approach paves the way for more resource-efficient multimodal AI systems.
Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge
Researchers from Meta FAIR, UC Berkeley, and NYU propose "Meta-Rewarding," a method to enhance LLMs by allowing them to judge and refine their own judgments, improving their alignment and instruction-following abilities. The model's accuracy improved significantly on benchmarks like AlpacaEval 2 and Arena-Hard, achieving notable gains without relying on human supervision. This approach shows promise in developing self-improving models that can autonomously enhance their judgment and response quality.
A lot of research papers last week about non-english languages (including chemical language)
JACOLBERTV2.5: Optimising Multi-Vector Retrievers to Create State-of-the-Art Japanese Retrievers with Constrained Resources improves Japanese language retrieval by optimizing multi-vector retrievers with constrained computational resources āread the paper
A Large Encoder-Decoder Family of Foundation Models For Chemical Language introduces a family of transformer-based models for chemical tasks, achieving state-of-the-art performance in molecular property prediction and classification āread the paper
SeaLLMs 3: Open Foundation and Chat Multilingual LLMs for Southeast Asian Languages introduces a multilingual model tailored for Southeast Asian languages, improving performance and efficiency in various tasksāread the paper
ATHAR: A High-Quality and Diverse Dataset for Classical Arabic to English Translation provides a dataset for improving Classical Arabic to English translation, addressing gaps in existing resources āread the paper
Sentiment Analysis of Lithuanian Online Reviews Using Large Language Models explores sentiment analysis for Lithuanian using fine-tuned transformer models, highlighting challenges in less-resourced languages āread the paper
Meltemi: The first open Large Language Model for Greek develops Meltemi 7B, the first open-source LLM for Greek, showing strong performance on Greek language benchmarks āread the paper
Adapting Safe-for-Work Classifier for Malaysian Language Text: Enhancing Alignment in LLM-Ops Framework creates a Safe-for-Work classifier for Malay text to detect unsafe content, improving safety in AI applications for Malaysiaāread the paper
Knesset-DictaBERT: A Hebrew Language Model for Parliamentary Proceedings fine-tunes a Hebrew language model on parliamentary texts for enhanced analysis of political discourseāread the paper
Non Verbis, Sed Rebus: Large Language Models are Weak Solvers of Italian Rebuses assesses LLMs' performance on Italian rebuses, revealing limitations in sequential reasoning despite fine-tuning āread the paper
Language Model Improvements and Optimization
Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning enhances text embeddings using contrastive fine-tuning for better performance in semantic similarity tasks āread the paper
ThinK: Thinner Key Cache by Query-Driven Pruning optimizes memory usage in LLMs during inference by pruning less important cache channels based on query criteria āread the paper
Vision and Multimodal Models
OmniParser for Pure Vision Based GUI Agent develops a vision-based method for parsing UI screenshots into structured elements, improving model performance across various applications āread the paper
Mixture of Nested Experts: Adaptive Processing of Visual Tokens proposes a framework to enhance Vision Transformers by dynamically routing visual tokens to reduce computational costs while maintaining accuracy āread the paper
Visual Riddles: A Commonsense and World Knowledge Challenge for Vision and Language Models rests vision-language models on a benchmark of riddles requiring commonsense and world knowledge, highlighting the challenge in integrating these aspectsāread the paper
Visual Riddles: A Commonsense and World Knowledge Challenge for Vision and Language Models introduces a benchmark challenging vision-language models with commonsense riddles, exposing gaps in model performanceāread the paper
Leave a review! |
Reply