• Turing Post
  • Posts
  • FOD#43: How do you Prompt a Black Box?

FOD#43: How do you Prompt a Black Box?

we explore the updated problem of black box and offer the best curated list of the freshest ML news and papers

Next Week in Turing Post:

  • Wednesday, Token 1.23: Detecting and Mitigating Bias

  • Friday, AI Infra Unicorns: CoreWeave

If you like Turing Post, please consider supporting us. You will also get full access to our most interesting articles and investigations →

In today's world, for many people, conversing with AI has become as routine as discussing one's coffee preferences with a barista (or it soon will be!). Yet, here lies an irony: the more we interact with AI, the more elusive our understanding of these conversations becomes. This irony essentially represents a modern twist on the "black box" dilemma, which has perplexed the ML community for years.

The "black box" problem refers to the opaque decision-making processes of ML models (large language models (LLMs) including), where the rationale behind any given response is shrouded in complexity. Despite advances in technology, the inner workings of these models, governed by billions or soon trillions of parameters, remain largely inscrutable. Their decision-making is a puzzle, complicated by nonlinear interactions that defy straightforward interpretation.

Prompting – the buzzing word of 2023 – doesn't make it any better: we now have obscured layers of communication activated with every prompt. What we see – the prompt we type – is merely the surface. Beneath lies a hidden dialogue, an augmented system prompt, which is a complex, coded conversation the model conducts with itself, away from our understanding. And who knows what a model whispers to itself?

So, if you were confused about prompting amid the avalanche of articles, blogs, and tutorials about it – you should be. As Ethan Mollick’s research reveals, contrary to intuition, the most effective prompts involve imaginative scenarios, such as pretending to navigate a Star Trek episode or a political thriller, demonstrating that traditional logical or direct prompts may not always yield the best responses from AI.

But it also reveals that it’s not coherent and might change with a new version of the model. He mentions the futility of seeking a universal "magic phrase" for AI interaction, the effectiveness of specific prompting techniques like adding context, few-shot learning, and Chain of Thought, and the significant impact that prompts can have on AI performance.

But for me – and I’ve been using AI a lot – many times the most straightforward prompts, or "magic words," can be surprisingly effective.

How to explain it? A few years back, Explainable AI (XAI) was heralded as a solution to the "black box" issue, with entities like DARPA leading the charge (they created the XAI toolkit, that has not been updated since 2021). However, the buzz around XAI seems to have dimmed, overtaken by a broader focus on Responsible AI. Is Responsible AI the solution? Let me know if you want to share your insights and write a guest post about it.

So, that’s what we end up with:

How do machines make decisions? – We don’t know!

How to talk (prompt) to them? – We don’t know as well!

But, please, keep shipping to us new, larger (though we will also take smaller) models! Why? – We don’t know! But we can’t stop.

Twitter Library

News from The Usual Suspects ©

Elon Musk vs OpenAI

  • The narrative that we discussed in the editorial gains another layer with Elon Musk’s lawsuit against OpenAI over a breach of contract (it’s not open anymore which makes Elon unhappy). This legal battle could potentially unveil some of OpenAI's internal operations, offering a rare glimpse into the workings of advanced AI models.

Anthropic and three Claudes

  • Meet Opus, Sonnet, and the upcoming Haiku – new Claude models – each excelling in deep processing, efficiency, and speed, respectively. Opus surpasses GPT-4's performance in benchmarks, supporting text and images with a large context window, priced at $15 per million tokens. They promise enterprise-grade security, including SOC II and HIPAA compliance, with AWS/GCP compatibility, boasting features like a 200K context window, multimodality, low hallucination rates, and high accuracy on long documents. The models cover undergraduate to graduate-level knowledge and basic mathematics, with Sonnet being free and approximately twice as fast as GPT-4.

Groq – a new player with big plans

  • Groq – recently becoming famous for its Language Processing Unit (LPU) that makes inference much faster (read our explanation of what it is here) acquired Definitive Intelligence to enhance AI solutions and cloud platforms. Moving towards their goal to provide high-speed inference for generative AI and establish a foothold in the competitive custom AI chips market.

Lightricks – the oldest GenAI Unicorn

Lightricks (read their profile here), known for apps like Facetune, announced LTX Studio, an AI-powered filmmaking tool. It aids creators from ideation to generating AI-powered clips and understanding storylines. It's web-based, free initially, and invites waitlist sign-ups. The tool crafts scripts, storyboards, and characters, allowing scene customization and character editing.

Google – from Gemini to Genie

  • Google first surprised the internet with the extra-woke Gemini, then melted everybody's hearts with Genie. Google DeepMind's Genie is a groundbreaking generative model capable of creating playable 2D video games from text, sketches, or photos. Uniquely, it learns detailed controls from unlabeled videos, understanding actions and their variations within environments. Although in early development, Genie's potential spans simulations, gaming, and robotics, marking a new frontier in generative AI.

  • Stack Overflow and Google Cloud have partnered to deliver new AI-powered features to developers through the Stack Overflow platform, Google Cloud Console, and Gemini for Google Cloud. Good move to get access to trusted and accurate knowledge, and code from the Stack Overflow community.

Patterns of giants

  • Microsoft, known for its substantial investment in OpenAI, has expanded its AI ecosystem by investing €15 million ($16.3 million) in Paris-based Mistral AI and forming partnerships with AI startups Cohere and Mistral, integrating their models into Azure's offerings.

  • Following suit, Alibaba has recently made strategic investments in several Chinese generative AI startups, including Moonshot AI, Baichuan AI, Zhipu AI, and 01.AI (founded by Kai-Fu Lee). These moves aim to diversify Alibaba's stakes in China's AI sector and foster early ties with emerging leaders in the field. In addition to these investments, Alibaba Cloud has launched Model Studio to aid AI development and announced significant price reductions to boost AI innovation in China.

To OpenAI, to close the circle

  • So many things are happening to OpenAI: → The Sora demo receives mixed feedback for its impressive visuals but questionable physics and biology; OpenAI researcher Andrej Karpathy – who never participated in any scandal and has the highest authority among researchers – leaves the company; ChatGPT experiences a significant issue, causing it to malfunction for several hours; Microsoft invests in others, Claude Opus beats GPT-4 across the benchmarks; Elon Musk wants to justice.

The freshest research papers, categorized for your convenience

Special category: Definitely Worth Reading:

  • The Era of 1-bit LLMs: Discusses the development and advantages of 1-bit LLMs, promising significant cost reductions and efficiency improvements. Read the paper

  • Beyond Language Models: Introduces bGPT, a model that simulates the digital world beyond traditional modalities, predicting and diagnosing algorithms or hardware behavior. Read the paper

Language Models in Specialized Domains

  • ChatMusician: Integrates music understanding and generation capabilities into LLMs, demonstrating LLMs' potential in music composition. Read the paper

  • StructLM: Aims to bridge LLMs' gap in interpreting structured data, enhancing their ability to ground knowledge in tables, graphs, and databases. Read the paper

  • StarCoder2 and The Stack v2: Focuses on responsibly creating Code LLMs, contributing to advancements in coding benchmarks and emphasizing model openness. Read the paper

  • Video as the New Language for Real-World Decision Making: Discusses video generation's potential as a unified interface for diverse tasks, outlining challenges and future directions. Read the paper

Enhancing and Merging Language Model Capabilities

  • FUSECHAT: Proposes a method to fuse knowledge from multiple chat models, improving chat model performance through a novel merging technique. Read the paper

  • Nemotron-4 15B Technical Report: Details a multilingual language model that showcases superior performance in coding tasks and multilingual capabilities. Read the paper

  • Do Large Language Models Latently Perform Multi-Hop Reasoning?: Explores latent multi-hop reasoning in LLMs, revealing their inherent capabilities and limitations in complex reasoning tasks. Read the paper

Scaling and Efficiency in Model Training

  • MegaScale: Discusses a system for training LLMs on over 10,000 GPUs, tackling efficiency and stability challenges in large-scale model training. Read the paper

  • Towards Optimal Learning of Language Models: Proposes a theory for optimizing LLM learning, aiming for reduced training steps and improved performance. Read the paper

  • Griffin: Introduces a model combining gated linear recurrences with local attention, offering an efficient alternative for language processing tasks. Read the paper

  • When scaling meets LLM finetuning: Investigates the effects of scaling on fine-tuning LLMs, providing insights into data, model, and method impacts on bilingual tasks. Read the paper

Improving Robustness and Diversity in AI

  • Rainbow Teaming: Generates diverse adversarial prompts to enhance LLM robustness, employing an open-ended search method for prompt discovery. Read the paper

  • Priority Sampling of Large Language Models for Compilers: Proposes a deterministic sampling technique for code generation, improving sample diversity and model performance in compiler optimization. Read the paper

Generative Models and Interactive Environments

  • Genie: Trains a generative model to create interactive virtual worlds from various inputs, advancing generative AI and simulation capabilities. Read the paper

In other newsletters

  • We don’t miss any papers, creating a weekly roundup of the freshest papers for you. However, if you feel you've missed a few weeks and need to catch up, Sebastian Raschka’s monthly summary is the best way to do it.

  • One can only admire how Gary Markus turns every news piece to be about *him*.

  • An Interview with Nat Friedman and Daniel Gross Reasoning About AI in Stratechery by Ben Thompson.

We are watching

No AI was involved! Please enjoy this collaboration between two dear friends who met during the conference initiated by TrackTwo: an institute for citizen diplomacy (another passion of mine).

If you decide to become a Premium subscriber, remember, that in most cases, you can expense this subscription through your company! Join our community of forward-thinking professionals. Please also send this newsletter to your colleagues if it can help them enhance their understanding of AI and stay ahead of the curve. 🤍 Thank you for reading

How was today's FOD?

Please give us some constructive feedback

Login or Subscribe to participate in polls.

Join the conversation

or to participate.