Turing Post
Posts
🌁#75: What is Metacognitive AI

🌁#75: What is Metacognitive AI

we discuss questions of cognition, consciousness, and eventually treating AI as something possessing morality, plus the usual collection of interesting articles, relevant news, and research papers. Dive in!

Ksenia Se
November 11, 2024

This Week in Turing Post:

Wednesday, AI 101, Technique: Mixture of Depth
Friday, Friday, AI Unicorns: Perplexity (we apologize for the delay with this article – the common cold has hit us hard.)

If you like Turing Post, consider clicking on Hubspot ad below or sharing this digest with a friend. It helps us keep Monday digests free →

The main topic – next level of antropomorphizing AI

While on one side there are heated discussions over OpenAI's scaling challenges and reports that the latest GPT models may be underperforming, and on the other side Sam Altman is claiming AGI is near, possibly coming in 2025, last week’s papers on AI metacognition and welfare present a reminder that AI development is not just about speed and power but also about taking a thoughtful, measured approach. In The Centrality of AI Metacognition, the authors (a very impressive list of authors!) point out a key shortfall: while AI systems are getting better at specific tasks, they lack the ability to recognize their own limits and adapt accordingly. This self-monitoring, or metacognition, is what allows humans to assess when they might be venturing into the unknown or making assumptions that need a second look. For AI, having a similar capacity could mean the difference between reliably handling new scenarios and running into errors when faced with something outside its training data.

Metacognition in AI is a stabilizer. If an AI can understand when it doesn’t have enough context or when it needs to adapt its approach, it becomes a more reliable tool in unpredictable situations. Building these capacities might seem less urgent than achieving top-notch performance on specific tasks, but the long-term benefits of a more resilient, adaptable system are hard to ignore. Metacognitive AI is one of the next important research directions.

On a different note, Taking AI Welfare Seriously suggests a broader question: Could we reach a point where we need to consider the welfare of AI itself? This isn’t to say AI will need protection anytime soon, but as systems grow more autonomous, we might eventually face ethical questions about how they’re treated or deployed. The paper encourages us to think proactively about this, suggesting that establishing basic ethical guidelines now could prevent dilemmas later.

Both papers, in their own way, highlight that AI development isn’t just about building systems that are faster or smarter – it’s about building systems that can operate responsibly in the world we’re creating. Metacognition and ethical awareness may not be the most immediate priorities (or maybe they are!) but they represent a more cautious and reflective path forward. These are small steps toward creating AI that isn’t just capable but also thoughtful in how it approaches challenges and potential risks.

The tricky part here is that we might not know what metacognition is for machines. We might need to abandon human-centric thinking and be open to new ways of understanding intelligence. Rather than modeling metacognition as a human trait, we may need to explore forms of self-assessment uniquely suited to machines. This could mean designing AI that develops its own kind of introspection – perhaps by continuously evaluating the reliability of its outputs or adjusting its approach based on feedback loops that don’t rely on human-like awareness. As we inch closer to advanced AGI claims, perhaps what’s truly on the horizon is not just intelligence (which we still need to define!) but a form of machine introspection that transforms how AI systems learn, interact, and evolve.

Twitter library

10 xLSTM Models

Explore enhanced xLSTM models for various tasks

www.turingpost.com/p/xlstm-options

Weekly recommendation from AI practitioner👍🏼

We built a GPT-4o-powered cleaning robot.
- $250 for the robot arms
- 4 days to build
Open source is truly democratizing the field of robotics.
@KasparJanssen
— Jannik Grothusen (@JannikGrothusen)
7:10 PM • Nov 2, 2024

Not a subscriber yet? Subscribe to receive our digests and articles:

Top Research

Mixture-of-Transformers (MoT): A Sparse and Scalable Architecture for Multi-Modal Foundation Models proposed by researchers from Meta and Stanford. MoT architecture is important because it addresses the high computational costs and inefficiencies involved in training large, multi-modal models. Traditional dense models process multiple data types (text, images, speech) in a unified way, which demands significant resources, limits scalability, and complicates training. MoT’s approach introduces sparsity by activating only relevant model components per modality, reducing FLOPs and computational load while maintaining model performance →read the paper
Agent K v1.0: Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level introduced by researchers from Huawei Noah’s Ark and UCL developed Agent K v1.0, an autonomous data science agent that manages the entire data science lifecycle by learning from experience. Agent K v1.0 is important because it automates complex data science tasks, achieving expert-level performance on Kaggle, which shows that LLMs can autonomously handle workflows that typically require skilled human data scientists. This scalability enhances productivity and serves as a benchmark for using AI in high-level problem-solving, demonstrating AI’s potential to learn, adapt, and improve with experience →read the paper
Decoding Dark Matter: Specialized Sparse Autoencoders (SSAEs) for Interpreting Rare Concepts in Foundation Models introduced by researchers from Carnegie Mellon. This research matters because it improves our ability to interpret foundation models (FMs) by capturing rare, domain-specific features that are usually overlooked. These “dark matter” concepts are important for AI safety and fairness, as they can include subtle biases or unintentional behaviors that may otherwise go unnoticed. SSAEs help isolate and control these features, which could lead to fairer models, safer use in specific fields like healthcare, and a clearer understanding of how FMs function →read the paper
Artificial Intelligence, Scientific Discovery, and Product Innovation by Aidan Toner-Rodgers. The key findings reveal that AI-assisted scientists discovered 44% more materials, which led to a 39% increase in patent filings and a 17% rise in downstream product innovation. These discoveries also resulted in novel compounds and radical innovations, with significant effects among high-ability scientists, whose output nearly doubled. However, lower-ability researchers didn’t see a lot of benefits, widening productivity disparities →read the paper

1/10 Today we're launching FrontierMath, a benchmark for evaluating advanced mathematical reasoning in AI. We collaborated with 60+ leading mathematicians to create hundreds of original, exceptionally challenging math problems, of which current AI systems solve less than 2%.
— Epoch AI (@EpochAIResearch)
9:05 PM • Nov 8, 2024

You can find the rest of the curated research at the end of the newsletter.

We are reading

Is it really over for LLMS? A balanced thought piece by Devansh

News from The Usual Suspects ©

Microsoft
- Microsoft’s Magentic-One introduces a coordinated team of AI agents like WebSurfer and FileSurfer, handling complex web and file workflows with a safety-first approach →their GitHub
Microsoft and OpenAI
- Medprompt by Microsoft and OpenAI enhances diagnostic accuracy with chain-of-thought reasoning, elevating medical model performance without traditional prompt tuning →read the paper
OpenAI
- Facing slower improvements, OpenAI shifts Orion training to synthetic data, indicating a potential slowing in the industry’s AGI ambitions →The Infromation
- Meanwhile, Sam Altman says AGI arrives in 2025 🚂 →on YouTube
- Good news for OpenAI, it dismissed claims of copyright misuse in a lawsuit, marking a pivotal moment for copyright in generative AI and setting precedents for future disputes →Reuters
- OpenAI’s “Predicted Outputs” feature reduces GPT-4o latency, allowing for quicker responses in fast-paced applications and an overall smoother experience →read their blog
Google
- Gemini is now accessible from the OpenAI Library →read their blog
Defense Llama: Scale AI’s National Security Specialist
- Scale AI’s Defense Llama, a secure Llama 3 variant, supports U.S. defense operations, with capabilities for mission planning and intelligence analysis in high-security settings →read their blog
Department of Defence shows more and more interest
- Jericho Security wins the Pentagon’s first AI contract, using adaptive simulations to combat phishing and deepfake threats – an AI milestone in national defense →VentureBeat
Mistral API Adds Precision to Content Moderation
- Mistral’s Ministral 8B model brings nuanced content moderation, covering nine sensitive categories and diverse languages for a global audience →check their blog
NVIDIA
- NVIDIA expands NeMo with NeMo Curator and Cosmos tokenizers, boosting generative AI development across video, image, and text. Faster data processing and high-quality tokenization mean efficient, high-fidelity visuals for industries like robotics and automotive. Cosmos tokenizers’ 12x speed gain sets a new standard →read their blog

More interesting research papers from last week (categorized for your convenience)

Language Model Alignment & Optimization

The Semantic Hub Hypothesis proposes a unified semantic processing hub across languages and data types in LLMs, enhancing versatility but embedding potential biases.
Self-Consistency Preference Optimization improves reasoning by preferring consistent responses, boosting zero-shot accuracy without needing labeled data.
Sample-Efficient Alignment For LLMs introduces a sampling-based alignment approach, improving efficiency under limited feedback.
SALSA: Soup-Based Alignment Learning enhances model stability in reinforcement learning through a "model soup" of averaged weights.

Efficient Model Compression & Quantization

Give Me BF16 Or Give Me Death? analyzes quantization formats, balancing accuracy and cost for efficient model deployment.
BitNet a4.8: 4-Bit Activations For 1-Bit LLMs reduces parameter requirements by using 4-bit activations, supporting fast, large-scale deployment.
SPARSING LAW examines neuron sparsity in LLMs, identifying efficient patterns for activation reduction.

Multimodal Processing & Vision-Language Models

Inference Optimal VLMs Need Only One Visual Token shows that fewer visual tokens but larger model size can improve VLM efficiency.
A Systematic Analysis Of Multimodal LLM Data Contamination detects data contamination in multimodal models, highlighting the need for clean datasets.
LLM2CLIP: Language Models Unlock Richer Visual Representation integrates LLMs to enhance multimodal learning, improving cross-lingual retrieval.

Adaptive & Dynamic Action Models

WEBRL: Training LLM Web Agents trains web agents with a curriculum that evolves through agent learning, improving task success rates.
DynaSaur: Large Language Agents Beyond Predefined Actions allows agents to create actions on-the-fly, handling unforeseen tasks with Python-based adaptability.
THANOS: Skill-Of-Mind-Infused Agents enhances conversational agents with social skills, improving response accuracy and empathy.

Data Efficiency & Retrieval-Optimized Systems

DELIFT: Data Efficient Language Model Instruction Fine-Tuning optimizes fine-tuning by selecting the most informative data, cutting dataset size significantly.
HtmlRAG: HTML Is Better Than Plain Text improves RAG systems by preserving HTML structure, enhancing retrieval quality.
M3DOCRAG: Multi-Modal Retrieval For Document Understanding introduces a multimodal RAG framework to handle multi-page and document QA tasks with visual data.
Needle Threading: LLMs For Long-Context Retrieval examines LLMs’ retrieval capabilities, identifying limits in handling extended contexts.
HtmlRAG: HTML-Based RAG System utilizes HTML structure in retrieval-augmented generation, improving document comprehension.

Surveys & Foundational Studies

Survey Of Cultural Awareness In Language Models reviews cultural inclusivity in LLMs, emphasizing diverse and ethically sound datasets.
OPENCODER: The Open Cookbook For Code Models provides a comprehensive open-source guide for building high-performance code LLMs.

Transformer Innovations & Architectural Optimization

Polynomial Composition Activations enhances model expressivity using polynomial activations, optimizing parameter efficiency.
Hunyuan-Large: An Open-Source MoE Model presents a large-scale MoE model, excelling across language, math, and coding tasks.
Balancing Pipeline Parallelism With Vocabulary Parallelism improves transformer training efficiency by balancing memory across vocabulary layers.

Leave a review!

Please send this newsletter to your colleagues if it can help them enhance their understanding of AI and stay ahead of the curve. You will get a 1-month subscription!

Reply

or to participate.