- Turing Post
- Posts
- FOD#44: How Far Are We?
FOD#44: How Far Are We?
Let's discuss the recent papers that offer a few insights on potential of human-machines collaboration
Though the question, "How far are we from achieving human-level intelligence in machines (or AGI, or ASI)" predates the term "artificial intelligence" itself, it saw a significant resurgence on Twitter last week, prompted by the Musk vs. OpenAI lawsuit (Musk accuses OpenAI of abandoning open-source principles and prioritizing profit over safety, hindering the safe development of AGI.) But far more interesting were the papers and an article that came out last week tackling this question. Today, we will discuss "How Far Are We from Intelligent Visual Deductive Reasoning?", "How Far Are We From Automating Front-End Engineering?", and Stephen Wolfram’s article “Can AI Solve Science?” These papers offer fascinating explorations of the differences between human and artificial intelligence.
Turing Post is a reader-supported publication. To have full access to our most interesting articles and investigations, become a paid subscriber →
In "How Far Are We from Intelligent Visual Deductive Reasoning?", researchers from Apple explore Vision-Language Models (VLMs), like GPT-4V, in visual-based deductive reasoning, a complex yet less studied area, using Raven’s Progressive Matrices (RPMs)*.
*Raven's Progressive Matrices are a nonverbal intelligence test measuring abstract reasoning, using patterns to assess cognitive functioning without language.
What caught my attention was the finding that AI systems like VLMs struggle with tasks requiring abstract pattern recognition and deduction. The paper notes, "VLMs struggle to solve these tasks mainly because they are unable to perceive and comprehend multiple, confounding abstract patterns in RPM examples." This inability to deal with abstract concepts marks a fundamental difference between computational processing and human cognitive abilities. Being a sophisticated pattern recognizer doesn’t equate to sentience.
Another intriguing point was the models' overconfidence. The observation that "all the tested models never express any level of uncertainty" highlights the importance of doubt and uncertainty in human cognition, suggesting a nuanced aspect of intelligence that current AI lacks.
In "Design2Code: How Far Are We From Automating Front-End Engineering?", researchers from Stanford University, Georgia Tech, Microsoft, and Google DeepMind have developed a benchmark for Design2Code, aiming to evaluate how well multimodal LLMs convert visual designs into code. Here, the replacement of humans came closer. Despite some limitations, there were considerable advancements in using generative AI for converting designs into front-end code. It’s remarkable that "annotators think GPT-4V generated webpages can replace the original reference webpages in 49% of cases in terms of visual appearance and content; and in 64% of cases, GPT-4V generated webpages are considered better." This finding challenges traditional notions of artistic and creative value, questioning whether creativity is uniquely human or can be algorithmically reproduced – or even surpassed.
However, significant limitations persist. VLMs struggle with "recalling visual elements from the input webpages and generating correct layout designs." posing questions about understanding and interpretation.
So, the important question is actually not how far we are from AGI (whatever it is), but how we embrace human-AI collaboration most effectively.
In that sense, Stephen Wolfram's blog post “Can AI Solve Science?” serves as an excellent example. In the very beginning, he plainly states that AI cannot solve all scientific questions. However, there is significant value in AI assisting scientific progress. He discusses how LLMs can serve as a new kind of linguistic interface to computational capabilities, providing high-level "autocomplete" for scientific work. As he usually does, he emphasizes the transformative potential of representing the world computationally and suggests that pockets of computational reducibility* can be found by AI as well.
*A pocket of computational reducibility – a fascinating concept introduced by Wolfram – is a situation or problem within a complex system where, despite the system's overall unpredictability, predictable patterns or simplified behaviors emerge, allowing for easier understanding or calculation.
Wolfram argues that AI can significantly aid scientific discovery by providing new tools for analysis and exploration, but its ability to completely "solve" science is limited by fundamental principles such as computational irreducibility. The future of AI in science lies in its integration with human creativity and understanding, leveraging its strengths to uncover new knowledge within the constraints of what is computationally possible.
We might be able to survive without front-end developers (no offense intended), but scientists remain indispensable!
To summarize:
AGI is always 3-8 years away.
— Pedro Domingos (@pmddomingos)
9:51 PM • Mar 10, 2024
Twitter Library
News from The Usual Suspects ©
Cohere and its commitment to the research community
Today, we’re excited to release Command-R, a new RAG-optimized LLM aimed at large-scale production workloads.
Command-R fits into the emerging “scalable” category of models that balance high efficiency with strong accuracy, enabling companies to move beyond proof of concept, and… twitter.com/i/web/status/1…
— cohere (@cohere)
7:43 PM • Mar 11, 2024
Hugging Face
Starting an ambitious open robotics project and hiring Remi Cadene, a former Tesla scientist, to expand beyond software into robotics, including work on humanoid robots like Tesla's Optimus.
Offers an amazing tool:
♾️AutoMerger
I made a neat little tool to automatically merge models on @huggingface.
It already created a few competitive models during the weekend. Here's how it works. 🧵
🪟 Space: huggingface.co/spaces/mlabonn…
🤗 Models: huggingface.co/automergertwitter.com/i/web/status/1…— Maxime Labonne (@maximelabonne)
9:44 AM • Mar 11, 2024
Russia’s talent is invisible
According to this analysis “The Global AI Talent Tracker 2.0”, Russia has no chance in the AI race:
Inflection enhances its Pi
Inflection launches Inflection-2.5, enhancing its personal AI, Pi, with IQ capabilities alongside its empathetic EQ. They claim to compete with GPT-4, achieving significant efficiency and performance improvements, especially in STEM areas, with less computational resource usage.
Chips
TSMC (based in Taiwan), the world's largest contract chipmaker, is expected to receive over $5 billion in U.S. federal grants for an Arizona chip plant. This funding, part of the CHIPS and Science Act of 2022, aims to boost domestic semiconductor production. TSMC's $40 billion investment in the plant marks one of the largest foreign investments in U.S. history.
OpenAI: new members on the board
Sam Altman is reinstated as a member of the OpenAI board. He is joined by three women: Sue Desmond-Hellmann, with a rich history as the CEO of the Bill and Melinda Gates Foundation, brings extensive experience in healthcare and philanthropy. Nicole Seligman, a seasoned executive with roles at Sony Entertainment and as a lawyer, offers legal and entertainment industry insights. Fidji Simo, leading Instacart and with a background at Meta Platforms, including as head of Facebook, contributes expertise in technology and e-commerce. This diverse trio enhances OpenAI's governance with their wide-ranging expertise and perspectives.
Elon’s Grok
This week, @xai will open source Grok
— Elon Musk (@elonmusk)
8:41 AM • Mar 11, 2024
Anthropic
Offers a great collection of prompts in its Prompt library
Nathan Lambert insists on the conversation about clarifying what open-source LLM is and suggests new terms to make it clearer.
A blog post about a new open-source system that allows one to train a 70b language model at home.
Training great LLMs entirely from ground up in the wilderness as a startup by Yi Tay.
2024 generative AI predictions by CBInsights
The freshest research papers, categorized for your convenience
Enhancements in Language Models and Multimodal Understanding
ChatMusician: Showcases an LLM's intrinsic ability to understand and generate music, expanding LLMs' applications beyond text. Read the paper
Gemini 1.5: Demonstrates advanced multimodal understanding by processing extensive contexts of text, video, and audio. Read the paper
NaturalSpeech 3: Enhances TTS synthesis with a factorized diffusion model for zero-shot natural speech generation. Read the paper
Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters: Utilizes fine-tuned MLMs for superior image-text data filtering, improving dataset quality for MLM training. Read the paper
SaulLM-7B: Introduces the first LLM designed for the legal domain, demonstrating proficiency in legal text understanding and generation. Read the paper
Novel Training and Evaluation Techniques
MathScale: Proposes a method for scaling instruction tuning in mathematical reasoning, significantly enhancing LLMs' problem-solving abilities. Read the paper
ShortGPT: Reveals redundancy in LLM layers and proposes a pruning strategy, maintaining performance while reducing model size. Read the paper
GaLore: Enhances memory efficiency in LLM training without sacrificing performance by applying gradient low-rank projection. Read the paper
Learning to Decode Collaboratively with Multiple Language Models: Develops a method for collaborative decoding among multiple LLMs, improving performance across various tasks. Read the paper
Stop Regressing: Advocates for training RL value functions through classification instead of regression, boosting performance and scalability. Read the paper
Advances in Generative Models and Data Synthesis
MAGID: Introduces a framework for generating synthetic multimodal datasets, overcoming limitations in data privacy and diversity. Read the paper
Genie: Trains a generative model to create interactive virtual worlds from text, images, or sketches, advancing interactive environment simulation. Read the paper
Scalability and Efficiency in AI Systems
MegaScale: Details a system for training LLMs on over 10,000 GPUs, tackling challenges in training efficiency and stability at large scales. Read the paper
DenseMamba: Improves State Space Models with dense hidden connections for efficient large language models, enhancing performance with minimal parameter increase. Read the paper
Exploring New Frontiers in AI and Machine Learning
Inference via Interpolation: Illustrates that planning and prediction in time series can be simplified through learned contrastive representations. Read the paper
LLMs in the Imaginarium: Employs simulated trial and error for tool learning, significantly enhancing LLMs' practical application capabilities. Read the paper
Resonance RoPE: Aims to improve context length generalization in LLMs by refining interpolation techniques for out-of-distribution token positions. Read the paper
AtP: Enhances the localization of behaviors in LLMs to specific components, improving diagnostic capabilities and model understanding. Read the paper
Learning and Leveraging World Models in Visual Representation Learning: Investigates using world models for enhancing visual representation learning, beyond reinforcement learning applications. Read the paper
Platforms and Tools for Model Evaluation and Interaction
Chatbot Arena: Provides a platform for evaluating LLMs based on human preferences, offering insights into model rankings through pairwise comparisons. Read the paper
Teaching Large Language Models to Reason with Reinforcement Learning: Explores enhancing LLMs' reasoning capabilities using RLHF, comparing different algorithms and reward strategies. Read the paper
If you decide to become a Premium subscriber, you can expense this subscription through your company! Join our community of forward-thinking professionals. Please also send this newsletter to your colleagues if it can help them enhance their understanding of AI and stay ahead of the curve. 🤍 Thank you for reading.
How was today's FOD?Please give us some constructive feedback |
Reply