• Turing Post
  • Posts
  • 15+ papers to understand foundation models

15+ papers to understand foundation models

Familiarize yourself with transformers, diffusion-based models and generative AI in general

Before moving to the list, we recommend reading our explainer of the Transformer and Diffusion-Based Foundation model architectures. Also, if you want to understand where large language models (LLMs) come from, check our historical series. Now, to the resources!

Transformers and large language models

  1. Efficient Transformers: A Survey

    This paper discusses the evolution of Transformer models in natural language processing and other domains. The aim is to guide researchers through the plethora of Transformer variants (X-formers), offering a comprehensive overview across multiple domains. โ†’ Read here

  2. A Survey of Transformers

    The survey provides a comprehensive review of Transformer variants (X-formers) in AI fields. It introduces the vanilla Transformer and a new taxonomy of X-formers, covering architectural modifications, pre-training, and applications. โ†’ Read here

  3. Vision Language Transformers: A Survey

    This paper discusses the adaptation of transformers to vision-language modeling, their performance, transfer learning approach, and implications in vision-language tasks. โ†’ Read here

  4. Recent Advances in Vision Transformer: A Survey and Outlook of Recent Work

    This paper offers a comprehensive overview of Vision Transformers (ViTs) in computer vision, comparing them to CNNs. A GitHub repository compiles related papers. โ†’ Read here

  5. A Survey on Visual Transformer

    This review categorizes and analyzes vision transformer models across various tasks in computer vision. It covers applications like backbone network, high/mid-level vision, and video processing. โ†’ Read here

  6. Transformers in Vision: A Survey

    The survey provides a comprehensive overview of Transformer models in computer vision. It covers applications like image classification, object detection, multi-modal tasks, and more. โ†’ Read here

  7. Transformers in Time Series: A Survey

    This paper systematically reviews Transformer applications in time series modeling. It discusses network structure adaptations, applications in forecasting, anomaly detection, and classification. โ†’ Read here

  8. A Survey of Large Language Models

    This survey reviews the evolution and impact of Large Language Models (LLMs) in language understanding and generation. It discusses the scaling effect, capabilities of enlarged models, and specific abilities of LLMs. It focuses on pre-training, adaptation tuning, and utilization. โ†’ Read here

  9. Efficient Large Language Models: A Survey

    This survey reviews efficient Large Language Models (LLMs), essential for tasks like language understanding, generation, and reasoning. The literature is organized into model-centric, data-centric, and framework-centric perspectives. A GitHub repository compiles and maintains the surveyed papers, serving as a resource for understanding efficient LLM developments. โ†’ Read here

  10. Formal Algorithms for Transformers

    This document provides a mathematically precise overview of transformer architectures and algorithms, focusing on their design and training rather than on specific results. It is intended for readers familiar with basic ML concepts and neural network architectures. The paper covers the fundamentals of transformers, and their key components, and previews some prominent models. โ†’ Read here

  11. Multimodal Learning with Transformers: A Survey

    This survey presents a comprehensive overview of Transformer techniques in multimodal learning. It covers theoretical aspects, applications, challenges, and future research directions in multimodal Transformer models. โ†’ Read here

Diffusion Models

  1. Diffusion Models: A Comprehensive Survey of Methods and Applications

    This overview discusses diffusion models in deep generative models, categorizing research into efficient sampling and handling special structures. It reviews applications across various fields and suggests future exploration areas. A GitHub repository compiles and maintains the surveyed papers. โ†’ Read here

  2. Diffusion Models in Vision: A Survey

    This paper presents a comprehensive survey on denoising diffusion models in computer vision. It covers the two-stage process of these models: the forward diffusion that adds Gaussian noise to input data, and the reverse diffusion to recover the original data. Despite their computational demands, diffusion models are praised for their quality and diversity in sample generation. The paper reviews three diffusion model frameworks: denoising diffusion probabilistic models, noise-conditioned score networks, and stochastic differential equations. It compares diffusion models with other deep generative models and categorizes diffusion models in computer vision. Lastly, it discusses current limitations and future research directions. โ†’ Read here

  3. A Survey on Generative Diffusion Model

    This survey explores advancements in diffusion models, a subset of deep generative models known for high-quality data generation. The paper highlights challenges like the iterative generation process and the models' restriction to high-dimensional Euclidean spaces. It presents advanced techniques for enhancing diffusion models, including faster sampling and new diffusion processes, and strategies for manifold and discrete space implementation. The paper also discusses maximum likelihood training and methods to bridge arbitrary distributions. It reviews applications in various fields and concludes with a summary of the field's limitations and future directions. A GitHub repository compiles related papers. โ†’ Read here

  4. Text-to-image Diffusion Models in Generative AI: A Survey

    The paper reviews the application of text-to-image diffusion models in generative AI. Starting with a basic introduction to diffusion models for image synthesis, it progresses to discuss how text conditioning improves these models. The survey includes a review of state-of-the-art text-to-image synthesis methods, applications in creative generation and image editing, and the challenges faced in this domain. It concludes with a discussion on potential future developments in text-to-image diffusion models. โ†’ Read here

Catalogs of models

  1. A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT

    This survey reviews Pretrained Foundation Models (PFMs) like BERT and ChatGPT. It covers basics, pretraining methods, and applications across data modalities. The review discusses model efficiency, security, privacy, and future research challenges in PFMs. โ†’ Read here

  2. Transformer models: an introduction and catalog

    This paper aims to catalog and classify the most popular Transformer models, known for their wide-ranging applications and often unique names. The catalog includes models trained through self-supervised learning and those further trained with human-in-the-loop approaches, covering both foundational models like BERT or GPT-3 and interactive models like InstructGPT. โ†’ Read here

Every day we post helpful lists and bite-sized explanations on our Twitter. Please join us there:

 

Reply

or to participate.