FOD#60: A breather

+ the curated list of the most relevant research papers of the week, as well as your favorite news from The Usual Suspects.

Next Week in Turing Post:

  • Wednesday, AI 101: What is Long Short-Term Memory and the recent xLSTM?

  • Friday, deep dive into another AI Unicorns.

If you like Turing Post, consider becoming a paid subscriber. You’ll immediately get full access to all our articles, investigations, and tech series →

A breather

It’s time to take a breather before the crazy end of summer and an even crazier September. The AI industry is shipping more and more models, tools, methods, frameworks, libraries, benchmarks, evals, heated discussions, and promises to end the world every day. Even more interesting changes and achievements are coming soon! We are going to step back and reevaluate what content will bring the most value to you, our readers.

I’m also working on a book. I can hear you sigh, “Another book about AI?” But no, it’s actually about citizen diplomacy. In the AI world, we are seriously concerned with machine dominance, but no worries – humans might end themselves first if the current separation due to crazy politics prevails. Citizen diplomacy tackles that very problem. It is about the difference that each of us can make on a bigger scale. It's about uniting and making the impossible possible. So this is my plan for the end of July: to reevaluate the best content about ML for the best audience (that’s you) and to dive into, or even delve into, the fascinating story of citizen diplomacy to learn from its history. These lessons are the most valuable and the most overlooked.

The next week is still full of content (xLSTM and an AI unicorn investigation!). But no emails from July 22 to August 5. Please reach out if you have feedback, ideas for what we should cover, or just words of support. We love you.

And here is a question: starting August 5 →

In our weekly analysis, what should we dedicate more time to:

Login or Subscribe to participate in polls.

Please click the link below. Our partners offer a great free webinar →

Free Webinar: Fine-tune and evaluate LLMs on your data with SuperAnnotate and Databricks

Date: July 30th, 10 am PST / 7 pm CET

While most companies have large amounts of data, they are often unprepared for LLM fine-tuning. During fine-tuning, LLMs learn to mimic the style and content of the data used, and it is therefore important that this data represents the intended model behavior exactly. 

During this webinar, we will explore how you can leverage SuperAnnotate with Databricks to refine your data, fine-tune large language models, and thoroughly evaluate the results to make well-informed deployment decisions.

Twitter Library

News from The Usual Suspects ©

Hedge Funds Go Ga-Ga for South Korea

  • Hedge funds like Britain's Man Group and Singapore's FengHe Fund Management are betting on South Korea's chipmakers, driven by AI demand for high-end memory chips. Their investments in SK Hynix and Samsung Electronics have boosted South Korea's KOSPI index, anticipating growth fueled by government backing and rising chip prices.

  • BTW, an interesting announcement related to South Korea will be waiting for you when we are back from the break in August. Stay tuned.

A16zinitiates operation “oxygen”

  • The famous VC firm is amassing over 20,000 GPUs, including Nvidia H100s, to attract AI startups. Through its initiative, "oxygen," the firm plans to rent these chips to secure equity in companies needing high-end computing power. If AI is the new electricity, GPU, apparently, is the new oxygen 🤔.

OpenAI: Levels towards AGI with Strawberry on top

  • OpenAI has defined five levels to measure progress towards artificial general intelligence (AGI). They are:

    • Level 1: Current AI that can interact conversationally with people.

    • Level 2 (Reasoners): AI systems that can solve basic problems like a human with a doctorate-level education.

    • Level 3 (Agents): AI systems capable of taking actions on a user’s behalf over several days.

    • Level 4: AI that can innovate and come up with new ideas.

    • Level 5 (Organizations): AI that can perform the work of entire organizations.

  • OpenAIOpenAI develops project Strawberry for advanced reasoning. This initiative involves developing models capable of planning and navigating the internet autonomously, aiming to improve AI's reasoning skills and move closer to human-like intelligence.

Additional reading on the topic: Researchers from Tsinghua University and Shanghai AI Laboratory propose Specialized Generalist AI (SGI), blending intuitive and analytical processing to approach AGI SGI excels in specific tasks while preserving general capabilities, offering a roadmap for bridging current AI to AGI →read the paper

In other newsletters/posts:

The freshest research papers, categorized for your convenience

Our top:

FlashAttention-3 (→read the paper)

  • It is the latest and a very important breakthrough in AI efficiency, providing a 2-4x speedup while maintaining accuracy. Developed by researchers from Colfax, Meta, NVIDIA, Georgia Tech, Princeton, and Together AI, it uses asynchronous Tensor Cores and FP8 low-precision support to improve GPU utilization. This enhancement allows for better handling of long contexts and optimized performance on modern GPUs, paving the way for new applications and more efficient transformer-based AI systems.

Agents-related

  • Internet of Agents: Proposes a collaborative framework integrating diverse autonomous agents to overcome limitations in multi-agent systems, enhancing intelligence and interaction →read the paper

  • AgentInstruct: Develops an agentic framework that autonomously generates synthetic data to teach language models new skills, significantly improving model performance →read the paper

  • GTA: Introduces a benchmark to evaluate language model agents in real-world scenarios, highlighting existing models' limitations in tool-use capabilities →read the paper

Important in VLMs

  • Mobility VLA: Combines Vision-Language Models and topological graphs for effective multimodal instruction navigation in complex environments →read the paper

  • MambaVision: Develops a hybrid architecture that integrates Transformer self-attention into the Mamba model, enhancing performance in various vision tasks →read the paper

  • PaliGemma: Combines a vision encoder and a language model to effectively transfer knowledge across diverse vision-language tasks →read the paper

  • Vision Language Models are Blind: Reveals significant perceptual limitations of Vision Language Models in basic visual tasks, highlighting a gap in their visual processing capabilities →read the paper

  • MJ-BENCH: Introduces a benchmark for evaluating multimodal judges in text-to-image generation, assessing their performance on various criteria including safety and bias →read the paper

Language Model Infrastructure and Optimization

  • Unified Database: Integrates vector and scalar indices to enhance query performance for large language models →read the paper

  • H2O-Danube3 Technical Report: Presents small LLMs optimized for mobile devices, highlighting efficient operation and accessibility →read the paper

  • SPREADSHEETLLM: Enhances LLMs' ability to handle complex spreadsheet data through advanced serialization and compression techniques →read the paper

  • Q-GaLore: Combines quantization and adaptive low-rank projections to reduce memory usage during LLM training →read the paper

  • Inference Performance Optimization for Large Language Models on CPUs: Optimizes LLM inference on CPUs using techniques like SlimAttention and an INT8 KV cache approach →read the paper

Language Model Applications and Enhancements

  • Toto: Introduces a foundation model for time-series forecasting optimized for observability metrics →read the paper

  • Gradient Boosting Reinforcement Learning: Extends gradient boosting techniques to reinforcement learning for improved performance on structured tasks →read the paper

  • Autoregressive Speech Synthesis without Vector Quantization: Proposes an autoregressive TTS model that enhances output diversity and robustness →read the paper

  • LETS-C: Utilizes language embeddings for time-series classification, demonstrating high performance with reduced computational costs →read the paper

Evaluating and Improving Model Reliability

  • Evaluating Language Model Context Windows: Benchmarks long context models and introduces techniques to improve accuracy in QA tasks →read the paper

  • Lynx: Develops an open-source model for detecting hallucinations in Retrieval-Augmented Generation systems →read the paper

  • Speculative RAG: Enhances Retrieval-Augmented Generation by verifying drafts generated by specialized models, improving performance and reducing latency →read the paper

  • Lookback Lens: Detects contextual hallucinations in LLMs using attention maps, providing a tool to reduce hallucinations →read the paper

Cognitive and Memory Enhancements in Models

  • Associative Recurrent Memory Transformer: Develops a new architecture for processing long sequences efficiently using associative memory →read the paper

  • Human-Like Episodic Memory for Infinite Context LLMs: Integrates features of human episodic memory into LLMs to manage infinite context lengths →read the paper

Enhancing Model Training and Updating

  • MUSCLE: Introduces a model update strategy that minimizes negative flips during LLM updates, ensuring consistent task performance →read the paper

  • Characterizing Prompt Compression Methods for Long Context Inference: Evaluates methods for prompt compression in LLMs, identifying best practices for accuracy and efficiency →read the paper

  • PAS: Develops a plug-and-play system for augmenting LLM prompts, improving performance with minimal human intervention →read the paper

  • InverseCoder: Enhances code LLMs by generating natural language instructions from code, improving model diversity and performance →read the paper

Model Oversight and Evaluation

  • On scalable oversight with weak LLMs judging strong LLMs: Investigates scalable oversight methods like debate and consultancy for supervising advanced AI, recommending debate for effectiveness →read the paper

  • On Leakage of Code Generation Evaluation Datasets: Identifies contamination sources in code generation datasets and introduces a cleaner benchmark for evaluating LLMs →read the paper

  • An Accurate Detection is Not All You Need to Combat Label Noise in Web-Noisy Datasets: Proposes a hybrid approach to improve classification performance in noisy datasets by combining unsupervised learning with noise detection methods →read the paper

Understanding Model Behaviors and Limitations

  • Self-Recognition in Language Models: Investigates whether LLMs can recognize their own outputs, revealing insights into model decision-making processes →read the paper

  • From Loops to Oops: Studies fallback behaviors of LLMs under uncertainty, detailing how advanced models handle errors and uncertainties →read the paper

  • Understanding Visual Feature Reliance through the Lens of Complexity: Analyzes how deep learning models prioritize features based on complexity, impacting model decisions →read the paper

Please send this newsletter to your colleagues if it can help them enhance their understanding of AI and stay ahead of the curve. You will get a 1-month subscription!

Reply

or to participate.