Turing Post
Posts
FOD#35: The last week of 2023 in Research

FOD#35: The last week of 2023 in Research

Gems! Let’s keep the momentum going by shedding light on these fascinating studies

Ksenia Se
January 01, 2024

Next Week in Turing Post:

Wednesday, Token 1.15: We start the year with Hallucinations! Exploring hallucinations in foundation models: causes, detection, mitigation, and why hallucinations are generally problematic, but can be beneficial.
Friday, Unicorn Chronicles: We uncover Mistral AI’s story – a French AI startup that stunned everyone last year with several significant achievements.

Only 1 week left to upgrade with 40% OFF

It’s tough to be published in the last week of the year – nobody seems to care, focusing mostly on self-achievements, predictions for the next year, or reflecting on the year in review. Yet, this final week of 2023 was also rich in research papers that deserve attention and might spark an idea for a new project, research, or company in 2024.

So, today we concentrate on those who were the last in 2023, but certainly not the least. Let’s keep the momentum going by shedding light on these fascinating studies.

We are feeling very grateful that you stayed with us this year, exploring together the history of ML, and the new exciting developments in AI. Your engagement makes our journey at Turing Post incredibly rewarding. We hope you have time to rest enough to make 2024 the year of the brightest ideas coming to fruition. We promise to provide insights to help you with that. Happy 2024!

Research papers, categorized for your convenience

Model Enhancement and Instruction Tuning

WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation
- Researchers from Microsoft introduced WaveCoder, a model enhancing instruction tuning of Code Language Models (LLMs). It's quite impressive how they developed a novel LLM-based Generator-Discriminator framework to produce high-quality, diverse instruction data from open source code, addressing the issue of data duplication and quality control in instruction data generation. Their dataset, CodeOcean, comprises 20,000 instances across four code-related tasks, significantly improving model generalization. Experiments showed that WaveCoder outperforms other models in generalization across different code tasks, also maintaining high efficiency in code generation tasks →read the paper
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
- Researchers from Upstage AI, South Korea, developed a novel technique called Depth Up-Scaling (DUS) to efficiently and effectively upscale base LLMs. DUS simplifies the scaling process without needing complex changes for training and inference. It’s quite striking that using DUS, they created SOLAR 10.7B, an LLM with 10.7 billion parameters, outperforming existing models like Llama 2 and Mistral 7B in various NLP tasks. They also introduced SOLAR 10.7B-Instruct, fine-tuned for following complex instructions, surpassing Mixtral-8x7B →read the paper
Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4
- If you haven’t yet figured out what is it exactly that you need, researchers from Mohamed bin Zayed University of AI offer 26 principles to enhance querying and prompting LLMs. These principles aim to improve the quality of responses from LLMs by guiding their understanding and reaction to prompts. Tested on LLaMA-1/2 and GPT-3.5/4 models, these principles showed significant improvement in response quality and accuracy, especially with larger models →read the paper

Alignment and Content Moderation

Reasons to Reject?
- Exciting research from Tencent AI Lab & The Chinese University of Hong Kong on a new method to align LLMs using language feedback, specifically judgments, instead of rewards or demonstrations. They introduced Contrastive Unlikelihood Training (CUT) for this purpose, contrasting generation probabilities under different contexts to detect and correct inappropriate content in LLMs. Their results revealed that CUT, applied to LLaMA2-13b with just 1317 judgment data, significantly outperformed a 175B DaVinci003 model and improved alignment in both cold-start and warm-start scenarios, as well as in generalist and specialist applications →read the paper
The LLM Surgeon
- Researchers from Imperial College London, Qualcomm AI Research, and University of Amsterdam developed LLM Surgeon, the first method to successfully perform structured pruning on LLMs, offering a trade-off between compute and accuracy during compression. Their approach involves scaling Kronecker-factored curvature approximations* and computing dynamic allocation of structures for removal, alongside updates of remaining weights. It allows for up to 30% pruning of models like OPT and Llamav2-7B with minimal performance loss and provides a framework for unstructured, semi-structured, and structured pruning, improving upon weight updates by considering more correlations between weights while remaining computationally efficient →read the paper

Kronecker-factored curvature approximation is a technique in ML that approximates the curvature of loss functions to efficiently and effectively compute updates during training, enhancing optimization

Generative AI and System Implications

Generative AI Beyond LLMs: System Implications of Multi-Modal Generation
- Researchers from FAIR at Meta and Harvard University conducted a study on the system implications of multi-modal Generative AI models, particularly text-to-image (TTI) and text-to-video (TTV) generation. They found that diffusion-based TTI models, after optimization, still had Convolution as a major execution time component, whereas Transformer-based models were more affected by Linear layers. These models showed variable sequence lengths during inference and unique challenges in processing spatial and temporal information. Optimizations for LLMs did not directly apply to TTI/TTV models, indicating a need for new approaches. The study also highlighted the importance of considering the temporal dimension in TTV models, as Temporal Attention presented unique system bottlenecks →read the paper

One of the topics that captivates our attention and that we want to pay more attention to in 2024 is foundation model for time series forecasting! Stay tuned.

TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
- Researchers introduced TinyGPT-V, a multimodal large language model (MLLM) optimized for high performance with minimal computational resources. Unlike existing MLLMs like LLaVA and MiniGPT-4, TinyGPT-V requires only a 24G GPU for training and an 8G GPU or CPU for inference, making it more accessible and efficient. It combines a language backbone with pre-trained vision modules from BLIP-2 or CLIP, boasting 2.8B parameters. TinyGPT-V's unique quantization process allows for local deployment and inference on various 8G devices. This one is interesting as it offers an affordable, efficient, and powerful solution for applying MLLMs in various practical settings →read the paper
Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models
- Researchers from Stanford and Meta AI explored the commonsense reasoning abilities of Gemini MLLM across 12 datasets. Their study compared Gemini with four LLMs and another MLLM. Results showed Gemini's performance was comparable to GPT-3.5 Turbo and slightly lagged behind GPT-4 Turbo. Gemini faced challenges in temporal and social reasoning tasks, and in understanding emotions in images. Further development in commonsense reasoning for LLMs and MLLMs are underway →read the paper

Specialized Applications

Generative AI for Math: Part I MATHPILE: A Billion-Token-Scale Pretraining Corpus for Math
- Researchers introduced MATHPILE, a math-focused pretraining corpus with about 9.5 billion tokens. The corpus, prioritizing data quality over quantity, includes textbooks, lecture notes, scientific papers, and web content related to mathematics. It underwent extensive preprocessing, cleaning, filtering, and deduplication to ensure high quality. MATHPILE aims to enhance mathematical reasoning in language models and will be open-sourced to benefit future research in this domain →read the paper
LARP: Language-agent role play for open-world games
- Researchers presented a framework for enhancing interactions in open-world games using language agents. LARP integrates a cognitive architecture for memory processing and decision-making, an environment interaction module for dynamic action space learning, and a postprocessing method for personality alignment. It aims to provide a more immersive gaming experience by enabling agents to adapt to complex environments, maintain coherent long-term memory, and exhibit unique backgrounds and personalities →read the paper

Turing Post is a reader-supported publication. To receive new articles, have access to the archive and support our work, become a free or paid subscriber →

We are watching

If you're into real AI retro this holiday season, watch 'Metropolis.' Made in 1927 by Fritz Lang, it's a seminal sci-fi film that, while not explicitly about AI, prefigures its themes. The film depicts a futuristic city with advanced technology, including a lifelike robot, reflecting early visions of artificial intelligence and its potential impact on society, class, and human identity. Its portrayal of technology as both a marvel and a threat resonates deeply in today's AI-driven world. It’s free on YouTube →

Twitter Library

8 Free Courses to Master Large Language Models

Join over 40,000 readers for in-depth knowledge and forward-thinking analysis, to make smarter decisions about AI & ML. Save time. Gain wisdom. Stay ahead.

www.turingpost.com/p/llms-courses

Thank you for reading, please feel free to share with your friends and colleagues. In the next couple of weeks, we are announcing our referral program 🤍

Another week with fascinating innovations! We call this overview “Froth on the Daydream" - or simply, FOD. It’s a reference to the surrealistic and experimental novel by Boris Vian – after all, AI is experimental and feels quite surrealistic, and a lot of writing on this topic is just a froth on the daydream.

How was today's FOD?

Please give us some constructive feedback

Join the conversation

or to participate.