Turing Post
Posts
FOD#52: OpenAI's new GPT-4o – what it can and cannot do

FOD#52: OpenAI's new GPT-4o – what it can and cannot do

trying out the new GPT-4o and – as always – providing you with the most relevant news, research papers and must-reads

Ksenia Se
May 13, 2024

Next Week in Turing Post:

Wednesday, Computer Vision History Series: we discuss ImageNet, a dataset that enabled AlexNet, which in turn led to the emergence of GenAI;
Friday: An interview with Eli Hooten, Director of Engineering at Sentry, about AI and coding

If you like Turing Post, consider becoming a paid subscriber. You’ll immediately get full access to all our articles, investigations, and tech series →

I really enjoyed seeing Mira Murati, OpenAI’ CTO, as the presenter for their Spring updates. This is a smart move considering Sam Altman’s almost constant occupation of all media to the point that you can't tell if he is real or it's a GPT hallucination. Mira Murati seems to be more matter-of-fact and down to earth. Today, she revealed the new flagship model, GPT-4o.

Smartly, the day before Google I/O, they didn't risk revealing GPT-5 or search capabilities, but instead demonstrated a new GPT-4o with impressive real-time conversational speech capacities. This could be part of a deal with Apple, who apparently wants to replace Siri with OpenAI technology.

With this new model, Apple will certainly benefit from implementing it.

The demo was truly impressive: ChatGPT read a bedtime story in a normal voice, then more dramatically, then sang it in a surprisingly pleasant voice. It also perceived emotions, laughed, even flirted. It recognized video in real-time and helped solve a math problem, guiding the user with hints and equations “looking” at what he was writing down.

While a few newsletters burst out with over-the-top praise, exaggerating an already impressive demo, it's important to remember that all demos are staged. I was able to try the model out, and it doesn't bring the same results, which is perfectly normal. GPT-4o is worth a lot with its actual capabilities, and it will only get better with updates.

GPT-4o is our new state-of-the-art frontier model. We’ve been testing a version on the LMSys arena as im-also-a-good-gpt2-chatbot 🙂. Here’s how it’s been doing.
— William Fedus (@LiamFedus)
5:01 PM • May 13, 2024

It's just not at the movie level, it’s not HER, not at all.

During the presentations both Murati and Brockman asked ChatGPT to sing multiple times. For them, it did. I tried really hard, but it doesn't sing. After many attempts, I asked if it could do a singing or robotic voice, and it said it couldn't. This part of the demo was unnecessarily misleading. No flirting, no singing, no sighs.

UPDATE:

hope you enjoyed!
the new voice mode will be live in the coming weeks for plus users.
we'll have more stuff to share soon :)
— Sam Altman (@sama)
5:29 PM • May 13, 2024

There is no option to stream video for ChatGPT as well, or at least it didn't have this option, but apparently, they're rolling things out slowly.

Still – I want to see and try everything by myself :)

It certainly nice to converse with AI and potentially use it even hands-free. What truly amazed me was the real-time translation. I checked the English to Russian translation, and though the model has an accent when speaking Russian (it made me laugh), it translates very well and pronounces with decent expression.

Another smart move from OpenAI is making GPT-4o free for everyone. OpenAI's initial pricing strategy of charging for GPT-4 while offering GPT-3.5 for free limited user adoption of the superior model. Making GPT-4o free allows users to experience the advancements firsthand, potentially driving conversions to paid subscriptions. And, providing OpenAI with more data.

That might really boost mass adoption. With this update, AI is truly introduced to our phones. Now, with the conversational option and great (as always) UX, it becomes just a regular app on their phones. Can you imagine being born with that AI capability at hand? Well, our kids are. And they will use it the way we can’t imagine yet.

Twitter Library

Get Started with RAG: A Comprehensive Guide with Webinars & Workshops

Master Retrieval-Augmented Generation (RAG) for Your AI Projects

www.turingpost.com/p/rag-tutorials-webinars-workshops

News from The Usual Suspects ©

OpenAI, again

As GPT-4o is not enough, OpenAI has introduced the Model Spec, a new initiative aimed at guiding AI behavior to ensure transparency and public engagement in shaping AI responses. This adaptive document will evolve with community feedback and research, balancing ethical considerations and practical uses. The Model Spec also integrates aspects of RLHF transparency and personalization.

Microsoft vs prompt engineers

If you considered to become a prompt engineer, maybe don’t. Microsoft’s latest Copilot AI update transforms everyday users into prompt engineers, helping them craft better prompts.

Anthropic vs prompt engineers

Anthropic has released Metaprompt, a tool designed to enhance performance in Claude-powered applications by converting brief task descriptions into optimized prompt templates. It utilizes a few-shot prompt method with variables like subject, length, and tone.

Databricks and FMOps

Databricks eats more of FMOps space. They just launched Vector Search, integrating it into their platform with governance tools. This feature optimizes embedding storage and retrieval, supporting real-time similarity searches using the HNSW algorithm and managing sync with Delta tables. Academics rule!

IBM wants to keep up

It just unveiled eight open-source Granite code models, ranging from 3 to 34 billion parameters, trained on 116 programming languages. Available on Hugging Face in both base and instruct modes, these models excel in code generation, bug fixes, and documentation, even translating COBOL to modern languages. Despite their excellence, a limitation is their smaller context lengths (2k to 8k tokens), which may hinder detailed tasks in large codebases.

KAN and the buzzing ML community

Kolmogorov–Arnold Networks (KAN) made a splash last week. The ML community discusses it nonstop. This blog explains why KANs are a potential alternative to multi-layer perceptrons (MLPs) and the current landscape of machine learning. This blog argues that KAN is just MLP.

We are watching

Google DeepMind CEO on Drug Discovery, Hype, Isomorphic | Bloomberg
Google CEO Sundar Pichai and the Future of AI | Bloomberg
A Conversation with Elon Musk | Milken Institute

In other newsletters

How Diffusion Models are Improving AI by Artificial Made SImple
The Illustrated Word2vec by Jay Alammar
How Good Are the Latest Open LLMs? And Is DPO Better Than PPO? by Sebastian Raschka
ChatBotArena: The peoples’ LLM evaluation, the future of evaluation, the incentives of evaluation, and gpt2chatbot by interconnects.ai

The freshest research papers, categorized for your convenience

Our top-3

xLSTM: Extended Long Short-Term Memory

Researchers from ELLIS Unit, LIT AI Lab, Institute for Machine Learning, JKU Linz, Austria introduced the xLSTM, enhancing traditional LSTMs by integrating exponential gating and novel memory structures. xLSTM improves over standard LSTM and offers competitive performance to state-of-the-art Transformers in various language modeling tasks. Modifications include scalar and matrix memory models, allowing both better parallelization and efficiency, particularly in handling longer sequences and larger datasets effectively →read the paper

AlphaFold 3

Researchers from Google DeepMind have developed AlphaFold 3, a new model that significantly advances the prediction of complex biomolecular structures. This version introduces a diffusion-based architecture capable of accurately modeling interactions across proteins, nucleic acids, small molecules, and more. AlphaFold 3 outperforms specialized tools in predicting protein-ligand, protein-nucleic acid, and antibody-antigen interactions, demonstrating its effectiveness across a broad spectrum of biomolecular research within a unified deep learning framework →read the paper

DrEureka: Language Model Guided Sim-To-Real Transfer

Researchers from the University of Pennsylvania and NVIDIA have developed DrEureka, an algorithm that leverages LLMs to automate reward design and domain randomization for robot training. DrEureka enhances the sim-to-real transfer by automating tasks that were manually intensive, showing competitive results in tasks like quadruped locomotion and novel challenges such as balancing a quadruped on a yoga ball. This approach reduces human labor and could accelerate the development of robotic skills adaptable to dynamic real-world applications →read the paper

BONUS: International Conference on Learning Representations (ICLP Awards): 5 Outstanding Paper winners

Generalization in Diffusion Models Arises from Geometry-Adaptive Harmonic Representations by Zahra Kadkhodaie, Florentin Guth, Eero P Simoncelli, Stéphane Mallat →read the paper
ICLR Presentation
Learning Interactive Real-World Simulators by Sherry Yang, Yilun Du, Seyed Kamyar Seyed Ghasemipour, Jonathan Tompson, Leslie Pack Kaelbling, Dale Schuurmans, Pieter Abbeel →read the paper
ICLR Presentation
Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors by Ido Amos, Jonathan Berant, Ankit Gupta
→read the paper
ICLR Presentation
Protein Discovery with Discrete Walk-Jump Sampling by Nathan C. Frey, Dan Berenberg, Karina Zadorozhny, Joseph Kleinhenz, Julien Lafrance-Vanasse, Isidro Hotzel, Yan Wu, Stephen Ra, Richard Bonneau, Kyunghyun Cho, Andreas Loukas, Vladimir Gligorijevic, Saeed Saremi to→read the paper
ICLR Presentation
Vision Transformers Need Registers by Timothée Darcet, Maxime Oquab, Julien Mairal, Piotr Bojanowski →read the paper
ICLR Presentation

Enhancements in Language Model Architecture and Efficiency

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model: Introduces an advanced Mixture-of-Experts language model that significantly reduces training costs and improves computational efficiency while maintaining high performance →read the paper
Characterizing the Accuracy - Efficiency Trade-off of Low-rank Decomposition in Language Models: Explores the balance between model compression and performance, demonstrating that Tucker decomposition can effectively reduce model size with minimal accuracy loss →read the paper

Security / RAG Survey / Innovative Strategies

Can LLMs Deeply Detect Complex Malicious Queries? A Framework for Jailbreaking via Obfuscating Intent: Examines vulnerabilities in LLMs' ability to detect complex malicious queries, introducing a method that successfully evades security mechanisms →read the paper
A Survey on RAG Meets LLMs: Towards Retrieval-Augmented Large Language Models: Reviews the integration of Retrieval-Augmented Generation techniques with LLMs, addressing challenges such as hallucinations and enhancing model robustness and relevance →read the paper
Position: Leverage Foundational Models for Black-Box Optimization: Proposes the integration of LLMs with black-box optimization strategies, suggesting potential enhancements in diverse fields through improved strategy engineering →read the paper

If you decide to becoming a Premium subscriber, you can expense this subscription through your company. Please also send this newsletter to your colleagues if it can help them enhance their understanding of AI and stay ahead of the curve.

Thank you for reading! We appreciate you. 🤍

Leave a review!

Reply

or to participate.