- Turing Post
- Posts
- FOD#54: Relatively quiet week in AI is still full of storms
FOD#54: Relatively quiet week in AI is still full of storms
explore what it means + the best weekly curated list of research papers
Next Week in Turing Post:
Wednesday, AI 101: In the second episode, we discuss Mamba, a deep learning architecture focused on sequence modeling, and how it rivals Transformers;
Friday: A new investigation into the world if AI Unicorns.
If you like Turing Post, consider becoming a paid subscriber. You’ll immediately get full access to all our articles, investigations, and tech series →
Last week, aside from the significant Microsoft Build event – which we covered here and here with exclusive insights from Microsoft's CTO – the tech world was relatively quiet, buzzing softly with rumors and notable struggles.
Rumors: Elon Musk's AI startup, xAI, is reportedly planning to construct a "gigafactory of compute," utilizing 100,000 Nvidia H100 GPUs. This super (duper-giga-mega) computer aims to be four times larger than current AI clusters and operational by fall 2025. xAI is raising $6 billion and may partner with Oracle for $10 billion in cloud servers. The project, requiring 100 megawatts of power, aims to compete with similar large-scale supercomputers from Microsoft and OpenAI. Scaling laws in action.
Rumors ahead of Apple’s WWDC 2024 suggest a focus on Apple's "applied intelligence" initiatives. The event is expected to highlight practical AI features like transcribing voice memos, auto-generated emojis, and a potential partnership with OpenAI for deeper chatbot integration. Worth noting also that, according to CBInsights, Apple leads in AI acquisitions:
Struggles: Snowflake's attempt to acquire Reka AI for over $1 billion has fallen through. Reka, a startup specializing in LLMs, was seen as a strategic addition to enhance Snowflake's generative AI capabilities. Meanwhile, Adept AI, Humane, and Stability AI are all exploring sales due to challenges such as competition from tech giants, product issues, and mismanagement. This trend follows the recent acqui-hire of the Inflection AI team by Microsoft. The overheated AI market is experiencing a reality check. As the AI hype cycle cools, consolidation is accelerating. Companies with strong fundamentals may find soft landings, but many others are likely to face devaluations or shutdowns. Interesting and rough times!
Click the link below so we can make some money on this ad 🙂 You might also like what they offer →
Are You Ready for the AI Age?
The age of artificial intelligence has arrived, and businesses must adapt or risk falling behind. Discover how AI technologies like machine learning and natural language processing could impact your business and how you can harness them to grow your competitive advantage.
In the MIT Artificial Intelligence: Implications for Business Strategy online short course you’ll gain:
Practical knowledge and a foundational understanding of AI's current state
The ability to identify and leverage AI opportunities for organizational growth
A focus on the managerial rather than technical aspects of AI to prepare you for strategic decision making
Twitter Library
News from The Usual Suspects ©
Anthropic: Mapping the Mind of an LLM
Anthropic has made a significant advancement in AI safety by mapping the "mind" of Claude Sonnet, their large language model. This breakthrough, detailed in their latest paper, uses "dictionary learning" techniques to identify how millions of concepts are represented within the model. The research enhances interpretability, potentially allowing better monitoring and manipulation of AI behavior, and highlights the importance of ongoing safety research.
OpenAI in turmoil
In comparison, OpenAI doesn’t provide better monitoring of what and how it works, though it surely uses a lot of manipulation of human behavior. Most of their superalignment team left, and the scandal about vested equity emerged.:
When I left @OpenAI a little over a year ago, I signed a non-disparagement agreement, with non-disclosure about the agreement itself, for no other reason than to avoid losing my vested equity. (Thread)
— Jacob Hilton (@JacobHHilton)
7:38 PM • May 24, 2024
Additionally, Scarlett Johansson accused them of using her voice in the new GPT-4o, though later, The Washington Post shared records that show that OpenAI didn’t copy Scarlett Johansson’s voice for ChatGPT. But I’m thinking about Sam Altman's recent appearance at Microsoft Build. Satya Nadella, of course, demonstratively supported his partners. For a few minutes, the huge screens at the Seattle Convention Center were covered with 'Microsoft 💜 OpenAI.' However, Satya didn’t talk to Sam Altman. Altman appeared only at the end of the show for a quick fire-chat with Kevin Scott, Microsoft's CTO. Cheerful and relaxed, Scott dominated the stage, while OpenAI’s CEO moved like a robot and kept repeating that the models would get smarter. There was nothing else coherent coming out. As my husband noted, Sam Altman has talked so much that he's got nothing left to say. Maybe that’s why he keeps getting into trouble.
On the positive side, ScAIrlett has approved it all after all.
Jerky, 7-Fingered Scarlett Johansson Appears In Video To Express Full-Fledged Approval Of OpenAI
— The Onion (@TheOnion)
9:00 PM • May 23, 2024
Meta: New AI and Technology Advisory Group
Mark Zuckerberg has established a new advisory group to assist Meta with its AI and technology strategies. The group includes heavyweights such as the CEOs of Stripe and Shopify, a former CEO of GitHub, and a former Microsoft executive. Successful white male industry leaders will advise Zuckerberg on how to better utilize AI in its hardware and software products.
Google: Ads in AI-Generated Search Answers
Google is integrating ads into its AI-generated search answers, termed AI Overviews, to blend its primary revenue stream with the new AI format. That by itself sounds like a shitty proposition, but considering that last week Google's AI Overview faced backlash for generating inaccurate and nonsensical responses, it’s just laughable. Just imagine how Google not only suggests adding glue to better stick cheese to pizza but will also offer you a non-toxic variant for sale right away. All for users' convenience.
The freshest research papers, categorized for your convenience
Our top
Cohere: Aya 23 Model Release
Cohere For AI has launched Aya 23, a new family of multilingual generative language models supporting 23 languages. Available in 8-billion and 35-billion parameter models with open weights, Aya 23 builds on the Aya initiative with a focus on multilingual performance and global AI research support. The models are accessible on Hugging Face, and more information can be found on Cohere's research page.
The Foundation Model Transparency Index v1.1
Researchers from Stanford University, Princeton University, and MIT analyzed transparency among leading foundation model developers. The Foundation Model Transparency Index v1.1 evaluated 14 developers based on 100 indicators, revealing an average score increase from 37 to 58 over six months. The improvement was driven by direct reporting of previously undisclosed information. Persistent opacity issues include data access and labor transparency. These findings highlight the potential for policy interventions to further enhance transparency →read the paper
2024 Generative AI Red Teaming Challenge: Transparency Report
Researchers from Humane Intelligence, Seed AI, and AI Village organized the first public red teaming event for closed-source API models at DEF CON 31, evaluating eight LLMs. This event highlighted the potential for public red teaming to enhance AI oversight and policy development by identifying biases, misdirections, and cybersecurity vulnerabilities in AI models →read the report
Enhancing Model Efficiency and Performance:
Layer-Condensed KV Cache for Efficient Inference of Large Language Models - Introduces a method to reduce memory usage in LLMs by condensing the key-value cache to only a few layers, enhancing inference efficiency read the paper.
SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization - Enhances transformer efficiency by simplifying linear attention and using progressively re-parameterized batch normalization read the paper.
Your Transformer is Secretly Linear - Discovers that transformer decoders exhibit a near-perfect linear relationship between sequential layers, providing insights into transformer architectures read the paper.
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning - Proposes a fine-tuning method for LLMs using high-rank updating to preserve parameter efficiency read the paper.
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention - Further reduces the key-value cache size in transformers, extending efficiency improvements read the paper.
Advancing Multimodal and Robotics Capabilities:
Grounded 3D-LLM with Referent Tokens - Integrates 3D vision and language models using referent tokens, unifying 3D tasks under a generative framework read the paper.
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models - Leverages a Mamba architecture to enhance comprehension and response capabilities in vision-language tasks read the paper.
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability - Develops a multimodal LLM that enhances alignment capabilities between text and image pairs read the paper.
Octo: An Open-Source Generalist Robot Policy - Trains a transformer-based policy for robotic manipulation, adapting to diverse sensors and action spaces read the paper.
Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation - A lightweight transformer model for generating audio from images and vice versa read the paper.
Novel Methodologies and Tools in AI:
Observational Scaling Laws and the Predictability of Language Model Performance - Proposes a generalized scaling law to predict LLM performance variations with scale read the paper.
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework - Introduces a framework for scalable reinforcement learning from human feedback in LLMs read the paper.
The Road Less Scheduled - Proposes a novel optimization approach that eliminates the need for learning rate schedules by unifying scheduling and iterate averaging read the paper.
Cross-Language and Data Sampling Techniques:
Dynamic Data Sampler for Cross-Language Transfer Learning in Large Language Models - Introduces a cross-language transfer learning framework using a dynamic data sampler read the paper.
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach - Proposes a clustering-based method for automatic data curation in self-supervised learning read the paper.
Advanced Applications and Theoretical Insights:
INDUS: Effective and Efficient Language Models for Scientific Applications - Develops a suite of LLMs tailored for scientific applications, introducing new benchmarks and outperforming general-purpose models read the paper.
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization - Explores whether transformers can implicitly reason over parametric knowledge through extended training read the paper.
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data - Enhances LLM theorem-proving capabilities with a model fine-tuned on synthetic data read the paper.
Diffusion Models:
FIFO-Diffusion: Generating Infinite Videos from Text without Training - Presents an innovative method for text-based infinite video generation using pretrained diffusion models read the article.
DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis - Combines State Space Models with diffusion models for efficient high-resolution image synthesis read the paper.
Privacy-Preserving Diffusion Models Using Homomorphic Encryption - Trains diffusion models on encrypted data, ensuring privacy while maintaining performance read the paper.
RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance - Customizes diffusion models to generate identity-preserving images from user-provided references read the paper.
Semantica: An Adaptable Image-Conditioned Diffusion Model - Develops a diffusion model for generating images based on the semantics of a conditioning image read the paper.
Leave a review! |
Please send this newsletter to your colleagues if it can help them enhance their understanding of AI and stay ahead of the curve. You will get a 1-month subscription!
Reply