FOD#27: "Now And Then"

we want to pay tribute to the beauty of what technology can (re)create

While governments compete over who can impose better AI restrictions while simultaneously hoping to stimulate innovation and win the race,

…and companies vie to outdo GPT-4 and other benchmarks,

…and AI researchers, psychologists, effective altruism activists, and casual bystanders argue over AGI,

let's pause for 4 minutes and 35 seconds to appreciate what AI has helped achieve after more than 28 years of human effort falling short:

Over 20 million views and an immeasurable volume of tears shed in joyful recognition. It was wonderful.

Additional info: A short film, including Peter Jackson’s remarks on how it was made possible.

You are currently on the free list. Join Premium members from top companies like Datadog, FICO, UbiOps, etc., AI labs such as MIT, Berkeley, and .gov, as well as many VCs, to learn and start understanding AI →

News from The Usual Suspects ©

Google DeepMind offers another positive AI news

  • Google DeepMind AlphaFold and Isomorphic Labs introduced the updates to their groundbreaking AlphaFold model. The latest iteration presents a transformative advance in scientific discovery, particularly drug development. DeepMind's AI can now predict the structures of various biomolecules, including nucleic acids and ligands, with near-atomic accuracy. This expansion from proteins to a broader range of molecules could significantly cut drug discovery times and inform early trials. AlphaFold exemplifies the increasing influence of AI in expediting scientific research and innovation, potentially leading to faster breakthroughs in various domains, including therapeutics and environmental sustainability. This is truly remarkable.

AI Summit in the UK – a new pact signed

  • This pact, known as the Bletchley Declaration, saw signatories from the US, EU, China, and 25 other nations agreeing on a unified strategy for preempting and curtailing AI risks. While the declaration acknowledges the perilous implications of AI and suggests frameworks for risk reduction, it stops short of enacting specific regulations. The declaration, spanning about 1,300 words, emphasizes the urgency of international collaboration to address the challenges posed by cutting-edge AI technologies. All very important words are properly used.

Additional reading: The case for a little AI regulation by Platformer and AI Summit a start but global agreement a distant hope by Reuters

​Cohere’s update

  • Cohere introduced Embed v3, an advanced model for generating document embeddings, boasting top performance on a few benchmarks. It excels in matching document topics to queries and content quality, improving search applications and retrieval-augmentation generation (RAG) systems. The new version offers models with 1024 or 384 dimensions, supports over 100 languages, and is optimized for cost-effective scalability. The models demonstrate superior capability in handling noisy datasets and multi-hop queries, vital for real-world applications and RAG systems.

Microsoft develops smaller models

  • Microsoft introduced Phi 1.5 – a compact AI model with multimodal capabilities, meaning it can process images as well as text. Despite being significantly smaller than OpenAI's GPT-4, with only 1.3 billion parameters, it demonstrates advanced features like those found in larger models. Phi 1.5 is open-source, emphasizing the trend towards efficient AI that’s accessible and less demanding on computational resources. This innovation not only offers a glimpse into economical AI deployment but also contributes to fundamental AI research, exploring how AI models learn and the potential for democratizing AI technology. Microsoft's progress with Phi 1.5 underscores a broader movement in AI research towards creating powerful yet smaller models that could proliferate outside of big tech companies, transforming industries with their efficiency and capability. Meta set a good example of open-sourcing the models!

Kai-fu Lee’s 01.AI

  • 01.AI, a Chinese startup, soared to unicorn status with a valuation of over $1 billion, buoyed by a fundraising that included Alibaba's cloud unit. They recently launched Yi-34B, a superior open-source AI model excelling in key performance metrics over existing models like Meta's Llama 2. This model caters to both English and Chinese developers and marks a significant milestone in China's AI landscape, amid growing competition and political tensions with US AI advancements.

Speaking about open-source

Elon Musk and his ‘snarky’ Grok

Elon Musk’s xAI announced Grok, an AI model with a few interesting features:

it's not open-source, it’s not yet available (will be available only to Twitter’s premium+ subscribers = $16), it’s not state-of-the-art​ (can't compete with GPT4), it’s not addressing AI risks Musk’s was talking about earlier. But it’s “snarky” and is built on Twitter data.

What it reminds me of? Chatbot Tay and this lovely article title from 2016: “Twitter taught Microsoft’s AI chatbot to be a racist asshole in less than a day.

Additional reading: The problem with Elon Musk’s ‘first principles thinking’ by Untangled

The question is: how many models do we really need?

Twitter Library

Other news, categorized for your convenience

Video and Vision Understanding

  • MM-VID - Enhances video understanding through transcription into detailed scripts by GPT-4V, facilitating comprehension of long-form content and character identification →the paper

  • Battle of the Backbones - Evaluates different pre-trained models in computer vision tasks to guide selection for practitioners, comparing CNNs, ViTs, and others →the paper

  • LLaVA-Interactive - Demonstrates a multimodal interaction system capable of dialogues and image-related tasks, leveraging pre-built AI models →the paper

Large Language Models (LLMs) Enhancements and Evaluation

  • ChipNeMo - Applies domain-adapted LLMs to chip design, utilizing techniques like custom tokenizers and fine-tuning for specific tasks such as script generation and bug analysis →the paper 

  • Evaluating LLMs - A survey categorizing LLM evaluations into knowledge, alignment, safety, and more, serving as a comprehensive overview of LLM performance →the paper

  • The Alignment Ceiling - Addresses the objective mismatch in reinforcement learning from human feedback (RLHF), proposing solutions for aligning LLMs with user expectations for safety and performance →the paper

  • Learning From Mistakes (LeMa) - Improves reasoning in LLMs by fine-tuning them on mistake-correction data pairs, enhancing their math problem-solving abilities →the paper

Model Efficiency and Distillation

  • Distil-Whisper - Focuses on distilling large speech recognition models into smaller, faster, and more robust versions without significant performance loss →the paper

  • FlashDecoding++ - Presents a method for faster inference of LLMs on GPUs, aiming to maintain performance while increasing efficiency →the paper

Innovative Prompting and Cross-Modal Interfaces

  • Zero-shot Adaptive Prompting - Improves zero-shot performance of LLMs by introducing self-adaptive prompting methods that generate pseudo-demonstrations for better task handling →the paper

  • De-Diffusion - Transforms images into text representations, enhancing cross-modal tasks by enabling the use of standard text-to-image LLMs for image processing →the paper

Coding

  • Phind has introduced its 7th-gen model, surpassing GPT-4 in coding proficiency while offering a 5x speed increase. This model, fine-tuned with over 70 billion tokens, achieved a 74.7% HumanEval pass rate. Despite HumanEval's limited real-world applicability, user feedback suggests Phind's model is equally or more helpful than GPT-4 for practical queries. It also boasts a 16k token context capacity and utilizes NVIDIA's TensorRT-LLM for rapid processing, although some consistency issues remain to be polished →read more

  • DeepSeek Coder is a suite of open-source code language models, varying in size from 1B to 33B parameters, trained on a mix of code and natural language in both English and Chinese. The training utilized a large 2T token dataset and specialized tasks to create base and instruction-tuned versions. These models, with their project-level code understanding, offer state-of-the-art code completion and are free for both research and commercial use →read more

3D Content Generation Innovations – what a rich 3D week!

  • Stable 3D by Stability AI streamlines 3D creation, allowing rapid generation of textured 3D models from images or text, simplifying the design process for professionals and amateurs alike, with output ready for further refinement in standard tools →read more

  • Genie by Luma AI introduced a discord-integrated text-to-3D model generator ready for a research preview →read more

  • DreamCraft3D unveils a hierarchical approach for creating detailed and consistent 3D objects, using a view-dependent diffusion model and Bootstrapped Score Distillation for geometric precision and textured refinement →read more

  • Rodin Gen-1 by Deemos offers a GenAI model that is capable of producing complex 3D shapes and photorealistic PBR textures based on textual inputs. This represents a significant leap in text-to-3D synthesis, particularly in creating textures that mimic real-world lighting and reflection properties →read more

In other newsletters

Thank you for reading, please feel free to share with your friends and colleagues. In the next couple of weeks, we are announcing our referral program 🤍

Another week with fascinating innovations! We call this overview “Froth on the Daydream" - or simply, FOD. It’s a reference to the surrealistic and experimental novel by Boris Vian – after all, AI is experimental and feels quite surrealistic, and a lot of writing on this topic is just a froth on the daydream.

How was today's FOD?

Please give us some constructive feedback

Login or Subscribe to participate in polls.

Reply

or to participate.