Model Collection
⚠️
This section is under heavy development.
This section consists of a collection and summary of notable and foundational LLMs. (Data adopted from Papers with Code (opens in a new tab) and the recent work by Zhao et al. (2023) (opens in a new tab).
Models
| Model | Release Date | Description | 
|---|---|---|
| BERT (opens in a new tab) | 2018 | Bidirectional Encoder Representations from Transformers | 
| GPT (opens in a new tab) | 2018 | Improving Language Understanding by Generative Pre-Training | 
| RoBERTa (opens in a new tab) | 2019 | A Robustly Optimized BERT Pretraining Approach | 
| GPT-2 (opens in a new tab) | 2019 | Language Models are Unsupervised Multitask Learners | 
| T5 (opens in a new tab) | 2019 | Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | 
| BART (opens in a new tab) | 2019 | Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension | 
| ALBERT (opens in a new tab) | 2019 | A Lite BERT for Self-supervised Learning of Language Representations | 
| XLNet (opens in a new tab) | 2019 | Generalized Autoregressive Pretraining for Language Understanding and Generation | 
| CTRL (opens in a new tab) | 2019 | CTRL: A Conditional Transformer Language Model for Controllable Generation | 
| ERNIE (opens in a new tab) | 2019 | ERNIE: Enhanced Representation through Knowledge Integration | 
| GShard (opens in a new tab) | 2020 | GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding | 
| GPT-3 (opens in a new tab) | 2020 | Language Models are Few-Shot Learners | 
| LaMDA (opens in a new tab) | 2021 | LaMDA: Language Models for Dialog Applications | 
| PanGu-α (opens in a new tab) | 2021 | PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation | 
| mT5 (opens in a new tab) | 2021 | mT5: A massively multilingual pre-trained text-to-text transformer | 
| CPM-2 (opens in a new tab) | 2021 | CPM-2: Large-scale Cost-effective Pre-trained Language Models | 
| T0 (opens in a new tab) | 2021 | Multitask Prompted Training Enables Zero-Shot Task Generalization | 
| HyperCLOVA (opens in a new tab) | 2021 | What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers | 
| Codex (opens in a new tab) | 2021 | Evaluating Large Language Models Trained on Code | 
| ERNIE 3.0 (opens in a new tab) | 2021 | ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation | 
| Jurassic-1 (opens in a new tab) | 2021 | Jurassic-1: Technical Details and Evaluation | 
| FLAN (opens in a new tab) | 2021 | Finetuned Language Models Are Zero-Shot Learners | 
| MT-NLG (opens in a new tab) | 2021 | Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model | 
| Yuan 1.0 (opens in a new tab) | 2021 | Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning | 
| WebGPT (opens in a new tab) | 2021 | WebGPT: Browser-assisted question-answering with human feedback | 
| Gopher (opens in a new tab) | 2021 | Scaling Language Models: Methods, Analysis & Insights from Training Gopher | 
| ERNIE 3.0 Titan (opens in a new tab) | 2021 | ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation | 
| GLaM (opens in a new tab) | 2021 | GLaM: Efficient Scaling of Language Models with Mixture-of-Experts | 
| InstructGPT (opens in a new tab) | 2022 | Training language models to follow instructions with human feedback | 
| GPT-NeoX-20B (opens in a new tab) | 2022 | GPT-NeoX-20B: An Open-Source Autoregressive Language Model | 
| AlphaCode (opens in a new tab) | 2022 | Competition-Level Code Generation with AlphaCode | 
| CodeGen (opens in a new tab) | 2022 | CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis | 
| Chinchilla (opens in a new tab) | 2022 | Shows that for a compute budget, the best performances are not achieved by the largest models but by smaller models trained on more data. | 
| Tk-Instruct (opens in a new tab) | 2022 | Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks | 
| UL2 (opens in a new tab) | 2022 | UL2: Unifying Language Learning Paradigms | 
| PaLM (opens in a new tab) | 2022 | PaLM: Scaling Language Modeling with Pathways | 
| OPT (opens in a new tab) | 2022 | OPT: Open Pre-trained Transformer Language Models | 
| BLOOM (opens in a new tab) | 2022 | BLOOM: A 176B-Parameter Open-Access Multilingual Language Model | 
| GLM-130B (opens in a new tab) | 2022 | GLM-130B: An Open Bilingual Pre-trained Model | 
| AlexaTM (opens in a new tab) | 2022 | AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model | 
| Flan-T5 (opens in a new tab) | 2022 | Scaling Instruction-Finetuned Language Models | 
| Sparrow (opens in a new tab) | 2022 | Improving alignment of dialogue agents via targeted human judgements | 
| U-PaLM (opens in a new tab) | 2022 | Transcending Scaling Laws with 0.1% Extra Compute | 
| mT0 (opens in a new tab) | 2022 | Crosslingual Generalization through Multitask Finetuning | 
| Galactica (opens in a new tab) | 2022 | Galactica: A Large Language Model for Science | 
| OPT-IML (opens in a new tab) | 2022 | OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization | 
| LLaMA (opens in a new tab) | 2023 | LLaMA: Open and Efficient Foundation Language Models | 
| GPT-4 (opens in a new tab) | 2023 | GPT-4 Technical Report | 
| PanGu-Σ (opens in a new tab) | 2023 | PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing | 
| BloombergGPT (opens in a new tab) | 2023 | BloombergGPT: A Large Language Model for Finance | 
| PaLM 2 (opens in a new tab) | 2023 | A Language Model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. |