FOD#63: Open-Ended Exploration

or how we become AI-augmented PLUS the news from the usual suspects, and the hands-down best-curated list of last week's papers and models

The main topic

A few FODs ago, we asked you what topics would be most interesting for you to read about. One of the common interests was open-endedness in AI which refers to the ability to explore and generate novel outcomes without predefined constraints or fixed goals. It coincided that from all directions, we see amazing developments in this field. Today, we will look at four papers from the last week that work on moving this field even further.

The AI Scientist: Open-Ended Scientific Discovery

The concept of open-endedness is central to "The AI Scientist," which seeks to autonomously generate research ideas, execute experiments, and write papers. Although the current implementation is limited and the quality of outputs is medium, the framework embodies the idea of open-ended discovery by allowing the system to explore novel research directions without a predetermined path. This aligns with the broader goal of achieving open-endedness in AI, where the system's capacity for innovation and discovery is not restricted by predefined rules or boundaries.

Image Credit: The original paper

Cosine Genie: Automated Design in Software Engineering

Cosine's Genie model, while not explicitly framed as an open-ended system, demonstrates characteristics that contribute to open-ended discovery in software engineering:

  • Autonomous task completion: Genie's ability to perform a wide range of programming tasks autonomously suggests it can explore solution spaces without constant human guidance.

  • Human-like reasoning: By training on datasets that capture the decision-making processes of real software engineers, Genie may be able to approach problems with a more open-ended, creative mindset.

  • Collaborative potential: The model's ability to work alongside human developers opens up possibilities for human-AI collaborative open-ended discovery in software development.

This autonomy in problem-solving and design is a key aspect of open-endedness, where the model’s output isn’t just a repetition of learned patterns but rather a product of creative exploration within the constraints of software engineering.

Automated Design of Agentic Systems (ADAS): Evolving Agentic Systems

ADAS introduces a new dimension to open-endedness by focusing on the automated design and evolution of agentic systems. The use of a meta agent to iteratively design, test, and refine agents within a code-defined space exemplifies open-endedness in a dynamic, evolving context. The meta agent’s ability to discover novel building blocks and combine them in innovative ways aligns with the broader goal of open-ended AI research – creating systems that can autonomously evolve and adapt to new challenges and environments.

LONGWRITER: Ultra-Long Text Generation

While seemingly less related, LongWriter addresses the open-endedness of language generation itself. By enabling the creation of coherent, ultra-long texts, it expands the potential for AI to assist in creative writing, technical documentation, and other applications where generating large amounts of text is necessary. The ability to generate 10,000+ words from long contexts pushes the boundaries of what language models can achieve, allowing them to create detailed and nuanced narratives or documents without strict adherence to a predefined structure.

Potential and Challenges

While these systems showcase significant advancements in open-endedness, they also highlight the challenges, such as the risk of generating low-quality or unjustified conclusions (as seen in The AI Scientist) or the lack of transparency in the development process (as with Genie).

Conclusion

The open-ended capabilities of these systems offer the potential to advance scientific discovery, improve software development, and extend AI-generated content. However, they also bring up critical questions about the necessity of human oversight, the extent to which AI can make independent discoveries, and the broader implications for various fields as these technologies progress. Despite the challenges, we are living in times when AI is increasingly contributing to the expansion of human knowledge and creativity. We are becoming AI-augmented. And that’s thrilling.

If you like Turing Post, consider becoming a paid subscriber. You’ll immediately get full access to all our articles, investigations, and tech series →

One of the main roadblocks on the path to human-level or superintelligence is the ability of machines to reason. Last week, researchers from Microsoft Research Asia and Harvard University introduced rStar, a self-play mutual reasoning method that significantly enhances the problem-solving abilities of small language models (SLMs) without fine-tuning. By using Monte Carlo Tree Search (MCTS) to generate reasoning trajectories and a second SLM to verify these paths, rStar increases accuracy in benchmarks like GSM8K from 12.51% to 63.91% for LLaMA2-7B and from 36.46% to 81.88% for Mistral-7B.

Twitter Library (all about IMAGES!)

News from The Usual Suspects ©

StackOverflow developer survey 2024: AI

Google Bets on Reality with Gemini

  • At its "Made by Google" event, Google shifted gears from AI hype to practical applications, showcasing its Gemini model integrated deeply into Android. Senior VP Rick Osterloh made it clear: no more empty promises—it's time for AI to deliver. Yet, despite this, Pixel's market impact remains minimal. Meanwhile, the DOJ's antitrust scrutiny looms, potentially threatening Google's integrated AI strategy.

Snowflake vs. Databricks: The AI Showdown

  • Snowflake and Databricks are locked in a fierce battle for AI supremacy, with Databricks outbidding Snowflake for Tabular. This rivalry has heated up with aggressive moves like Databricks' "SnowMelt" campaign. But with tech giants like Microsoft entering the fray, both companies might face an even tougher fight ahead.

Anthropic's Claude Gets Clever with Caching

  • Anthropic's latest innovation, prompt caching for its Claude models, slashes costs by up to 90% and cuts latency by 85%. Available in public beta, this feature is a game-changer for extended AI conversations and complex tasks, with Notion already onboard to optimize its AI assistant.

MIT's AI Risk Repository: Navigating the Unknown

  • MIT has launched an AI Risk Repository, a detailed catalog of over 700 risks associated with AI. With categories ranging from causal to domain-specific risks, this tool is invaluable for developers, researchers, and policymakers navigating the increasingly complex AI landscape.

Midjourney's All-in-One Image Editor

  • Midjourney has rolled out a unified AI image editor, bringing together inpainting, outpainting, and more under one roof. Despite facing a class-action lawsuit, Midjourney pushes forward with innovations like a virtual "brush" tool and seamless message mirroring between web and Discord platforms.

Hugging Face

In other newsletters:

The freshest research papers, categorized for your convenience

Models and Their Enhancements

  • Falcon Mamba 7B – an open-source State Space Language Model (SSLM) that outperforms traditional transformer models, offering efficient processing for long text generation →read TII blog

  • Hermes 3 – a versatile open-source model that excels in multi-turn conversations and roleplaying, available in multiple sizes, and sets a new benchmark in its class →read the paper

  • Grok-2, which excels in code, math, and reasoning tasks, outperforming major competitors and improving instruction following and factuality →read the paper

  • Imagen 3 – a text-to-image model that surpasses competitors in quality and accuracy, with robust safety measures to prevent misuse →read the paper

  • xGen-MM (BLIP-3) – an advanced multimodal model framework that excels in visual-language tasks and supports both single and multi-image inputs →read the paper

  • JPEG-LM – an LLM that generates images as compressed JPEG files, simplifying visual generation and improving image quality, especially for complex elements →read the paper

  • AQUILA2 TECHNICAL REPORT introduces the Aquila2 series, bilingual models that outperform competitors with efficient training and strong performance, even after quantization →read the paper

Our top of other research papers

  • Towards Flexible Perception with Visual Memory

    Researchers from Google DeepMind propose a new visual memory model combining deep neural networks with a flexible database to enhance image classification. This model allows for easy addition and removal of data, enabling scalability from individual samples to billion-scale datasets without retraining. It introduces RankVoting, which outperforms previous aggregation methods, achieving 88.5% top-1 accuracy on ImageNet. The system demonstrates capabilities in lifelong learning, machine unlearning, and interpretable decision-making, showcasing the benefits of an explicit visual memory in deep learning →read the paper

  • Training Language Models on the Knowledge Graph: Insights on Hallucinations and Their Detectability

    Researchers from Google DeepMind studied hallucinations in LLMs by training models on knowledge graphs, where factual content is fully controlled. They found that larger and longer-trained models hallucinate less on seen data but still struggle with unseen data, requiring significantly more compute than previously thought optimal. Despite this, detecting hallucinations becomes harder as models scale up, showing a trade-off between model size, training duration, and hallucination detectability →read the paper

  • Automated Design of Agentic Systems

    Researchers from the University of British Columbia and the Vector Institute propose the Automated Design of Agentic Systems (ADAS) to autonomously create and improve agentic systems using Foundation Models (FMs). Their method, Meta Agent Search, allows a "meta" agent to program new agents iteratively in code. Experiments show these automatically discovered agents outperform state-of-the-art, manually designed systems in diverse domains, including math and reading comprehension, and demonstrate strong cross-domain generalization and robustness →read the paper

Innovative Techniques in Model Design and Application

  • Layerwise Recurrent Router for Mixture-of-Experts introduces a new approach to enhance routing in large models by sharing routing information across layers, improving both efficiency and performance →read the paper

  • Solving a Rubik’s Cube Using Its Local Graph Structure proposes a novel method for solving the Rubik’s Cube by modeling it as a graph, enhancing search efficiency while reducing the solution length →read the paper

Innovations in Model Training and Efficiency

  • How to Prune and Distill Llama-3.1 8B to an NVIDIA Llama-3.1-Minitron 4B Model demonstrates a method to effectively reduce LLM sizes through pruning and distillation, improving performance while cutting down compute costs →read the paper

  • I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm introduces a novel iterative self-enhancement approach for LLMs, enabling continuous self-alignment and significant performance improvements using minimal external signals →read the paper

Understanding Model Training and Tuning

  • Amuro & Char: Analyzing the Relationship between Pre-Training and Fine-Tuning of Large Language Models investigates the interaction between pre-training and fine-tuning in LLMs, revealing insights into how these processes impact performance and task retention →read the paper

  • Can Large Language Models Understand Symbolic Graphics Programs? evaluates LLMs' ability to understand symbolic graphics programs, introducing a new benchmark and technique to improve comprehension of these programs →read the paper

  • Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents proposes a framework that integrates the diverse strengths of software engineering agents, significantly improving problem-solving capabilities →read the paper

Leave a review!

Login or Subscribe to participate in polls.

Please send this newsletter to your colleagues if it can help them enhance their understanding of AI and stay ahead of the curve. You will get a 1-month subscription!

Reply

or to participate.