Turing Post
Posts
FOD#37: Trust, but Verify

FOD#37: Trust, but Verify

Questions about trustworthiness in LLMs and the key AI actors plus the best-curated list of research papers

Ksenia Se
January 15, 2024

Next Week in Turing Post:

Wednesday, Token 1.17: Deploying ML Model: Best practices feat. LLMs
Friday, Foreign AI Affairs: In-depth piece on South Korea's AI sector.

Turing Post is a reader-supported publication. To have full access to our most interesting articles and investigations, become a paid subscriber →

The recent surge in the sophistication of Large Language Models (LLMs) – both proprietary and open-source – presents a paradox of potential and perplexity. These systems, characterized by their remarkable natural language processing capabilities, have propelled us into a new era of technological marvels. Yet, they also bring many challenges, chiefly in the realm of trustworthiness.

Last week's paper, “TrustLLM: Trustworthiness in LLMs” – a joint work of almost 70 researchers – underscores the multifaceted nature of trustworthiness in LLMs. It highlights how these models, while excelling in tasks like stereotype rejection and natural language inference, still grapple with issues of truthfulness, safety, fairness, and privacy. These findings echo the complexities of ensuring AI that is both effective and ethically sound.

The paper also poses a question: “To what extent can we genuinely trust LLMs?”

But can we genuinely trust LLMs? We can’t.

Much better would be to adopt the principle of 'trust, but verify.' This approach, reminiscent of Cold War-era diplomacy, is increasingly relevant in the digital age, especially with advancements in AI. It suggests a balanced strategy: embracing the utility and potential of these models while stringently scrutinizing their mechanisms and outcomes.

When working with LLMs, you can trust your expertise in verifying the work that LLM automates or accelerates for you. But you can't just genuinely trust it. I even think that, along with the new role of an AI engineer, we should have a new job position for an in-house AI Verifier, akin to a fact-checker in a media publication.

The other news from last week ‘complements’ the insights from the paper. Anthropic's research on AI systems reveals a startling facet of deceptive 'sleeper agents' within LLMs. The paper studies threat models where AI models could be secretly trained or emerge to behave safely during training but unsafely in deployment. This discovery of hidden, hazardous capabilities within models, capable of evading standard safety protocols, exposes a critical vulnerability.

I touched on the idea of sleeper agent LLMs at the end of my recent video, as a likely major security challenge for LLMs (perhaps more devious than prompt injection).
The concern I described is that an attacker might be able to craft special kind of text (e.g. with a trigger… twitter.com/i/web/status/1…
— Andrej Karpathy (@karpathy)
Jan 12, 2024

Meanwhile, the nuanced shift in OpenAI's policy, discreetly lifting the prohibition on military applications, adds another layer to the debate. This move, aligning with the U.S. Defense Department’s stance, prompts a critical examination of the ethical and safety implications of AI in high-stakes domains like defense and intelligence. Here, a question of the trustworthiness of the people who build Large Language Models (LLMs) also raises its head.

On a more commercial and, so to speak, physical note, the launch of the Rabbit R1, a standalone AI device, exemplifies the rapid integration of AI in consumer technology. Its innovative use of a Large Action Model (LAM)* signals a shift towards more intuitive, seamless interactions between humans and AI-powered devices. However, it also raises concerns about the trustworthiness and security of such pervasive AI integration in everyday life.

*Many publications mistakenly attribute the coining of LAM to the Rabbit R1 team, when in fact, it was Salesforce Chief Scientist Silvio Savarese who coined it in June 2023 in his blog post “Towards Actionable Generative AI'“. Trust, but verify ;)

Adding to the global perspective: In its "Global Risks Report 2024," the World Economic Forum identified AI-generated misinformation and disinformation, along with the resultant societal polarization, as more significant threats in its list of top 10 risks for the next two years, surpassing concerns such as climate change, war, and economic instability.

As we navigate this era of groundbreaking AI advancements, the "trust, but verify" principle remains a beacon. We need to balance the excitement of AI's potential with rigorous, ongoing scrutiny of its trustworthiness, safety, and ethical implications.

Become an AI & ChatGPT genius for free (Holiday Sale) - Join a 3-hour ChatGPT & AI workshop (worth $99) at a 100% discount!

Register Here: (FREE for the First 100 people)

Twitter Library

10 companies for synthetic data generation

+ discover what synthetic data is and how to use it

www.turingpost.com/p/synthetic-data-companies

The freshest research papers, categorized for your convenience

Efficient Model Architectures

MoE-Mamba: Efficient Selective State Space Models with MoE. Researchers from the University of Warsaw developed MoE-Mamba, integrating Mamba, a State Space Model (SSM), with a Mixture of Experts (MoE) layer. This model outperforms both Mamba and Transformer-MoE in efficiency and performance, achieving equivalent results to Mamba with fewer training steps →read the paper
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM. Researchers from the University of Cambridge and University College London introduced "Blending," a method combining smaller AI models to match or exceed the performance of larger models like ChatGPT →read the paper
Soaring from 4K to 400K: Extending LLM’s Context with Activation Beacon. Researchers from the Beijing Academy of AI and Gaoling School of AI developed Activation Beacon, a module enhancing LLMs' context window length →read the paper

Benchmark and Evaluation

CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution. Researchers from MIT CSAIL and Meta AI developed CRUXEval, a benchmark comprising 800 Python functions for evaluating code models' reasoning and execution skills →read the paper
Narrowing the Knowledge Evaluation Gap: Open-Domain Question Answering with Multi-Granularity Answers. Researchers from Google Research and Tel Aviv University introduce GRANOLA QA, an evaluation setting for open-domain question answering (QA) that considers multi-granularity answers →read the paper
TOFU: A Task of Fictitious Unlearning for LLMs. Researchers from Carnegie Mellon University introduced TOFU, a benchmark for evaluating unlearning in LLMs using synthetic author profiles →read the paper

Attention Mechanisms and Model Efficiency

Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in LLMs. Researchers from OpenNLPLab developed Lightning Attention-2, an advanced linear attention mechanism for LLMs that efficiently handles unlimited sequence lengths without increased memory usage or decreased speed →read the paper
Transformers are Multi-State RNNs. Researchers from The Hebrew University of Jerusalem and FAIR AI at Meta, redefined decoder-only transformers as a variant of Recurrent Neural Networks (RNNs) called infinite Multi-State RNNs (MSRNNs) →read the paper
Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models. Researchers from Google Research and Tel Aviv University introduce "Patchscopes," a new framework for analyzing hidden representations in LLMs →read the paper

Enhancing Model Performance

Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding. Researchers from University of California, San Diego and Google introduce "Chain-of-Table," a framework that enhances table-based reasoning in LLMs for tasks like table-based question answering and fact verification →read the paper
Secrets of RLHF in LLMs Part II: Reward Modeling. Researchers from Fudan NLP Lab & Fudan Vision and Learning Lab investigate Reinforcement Learning from Human Feedback (RLHF) in LLMs, focusing on improving reward models used for alignment →read the paper
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models. Researchers introduced DeepSeekMoE, an innovative Mixture-of-Experts (MoE) architecture for LLMs →read the paper

Machine Translation and Cross-Lingual Applications

Tuning LLMs with Contrastive Alignment Instructions for Machine Translation (MT) in Unseen, Low-resource Languages. Researchers from Apple introduced "contrastive alignment instructions" (AlignInstruct) to enhance MT in LLMs for unseen, low-resource languages →read the paper

Efficient Model Inference

Efficient LLM Inference Solution on Intel GPU. Researchers from Intel developed an efficient inference solution for LLMs on Intel GPUs. This solution focuses on reducing latency and increasing throughput for LLMs →read the paper

In other newsletters

Is this what will replace Transformers? Long-Context Retrieval Models with Monarch Mixer · Hazy Research (stanford.edu)
If you are interested in reportage from CES – Hardcore software is the one to go to (a very long reportage!)
A wonderful overview of Self-Attention, Multi-Head Attention, Cross-Attention, and Causal-Attention in LLMs by Sebastian Raschka
12 techniques to reduce your LLM API bill and launch blazingly fast products by AI Tidbits
DPO praise by Andrew Ng – a very interesting read

We are watching

Part of “trust, but verify” is education. We are watching this video, thinking about: How should we prepare our kids for a world with AI agents and large models with unimaginable capabilities. This animation is relatively good, but I can’t imagine a kid watching it and understanding all the words. It’s more for adults who desperately try to make sense of AI. The question remains: how do we teach our kids about AI so that they trust it but can also verify it?

If you decide to become a Premium subscriber, remember, that in most cases, you can expense this subscription through your company! Join our community of forward-thinking professionals. Please also send this newsletter to your colleagues if it can help them enhance their understanding of AI and stay ahead of the curve. 🤍 Thank you for reading

How was today's FOD?

Please give us some constructive feedback

Join the conversation

or to participate.