Welcome to Open Instruct
This repo serves as an open effort on instruction-tuning and post-training popular pretrained language models on publicly available datasets. We release this repo and will keep updating it with:
- Code for finetuning language models with latest techniques and instruction datasets in a unified format.
- Code for DPO, preference finetuning and reinforcement learning with verifiable rewards (RLVR).
- Checkpoints or other useful artifacts that we build in our exploration.
We also support some evaluations natively in the codebase, but these are now unmaintained and instead we suggest using OLMES, which we used for TÜLU 3. Below are some of our papers:
-
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training
- Latest research on open post-training techniques and methodologies
- Comprehensive details on our most recent model training approaches
-
How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources
- Our first paper introducing the project's foundation and vision
- Presents initial findings and exploration of instruction tuning with open resources
-
Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2
- Second paper focusing on Llama-2 model adaptations
- Details our work with direct preference optimization (DPO) techniques
-
Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback
- Most recent research comparing reinforcement learning approaches
- Analyzes best practices for both DPO and PPO methodologies
Try some of the models we train with Open Instruct. There is a free demo or download them from HuggingFace:
Stage | Llama 3.1 8B | Llama 3.1 70B | OLMo-2 7B | OLMo-2 13B |
---|---|---|---|---|
Base Model | meta-llama/Llama-3.1-8B | meta-llama/Llama-3.1-70B | allenai/OLMo2-7B-1124 | allenai/OLMo-2-13B-1124 |
SFT | allenai/Llama-3.1-Tulu-3-8B-SFT | allenai/Llama-3.1-Tulu-3-70B-SFT | allenai/OLMo-2-1124-7B-SFT | allenai/OLMo-2-1124-13B-SFT |
DPO | allenai/Llama-3.1-Tulu-3-8B-DPO | allenai/Llama-3.1-Tulu-3-70B-DPO | allenai/OLMo-2-1124-7B-DPO | allenai/OLMo-2-1124-13B-DPO |
Final Models (RLVR) | allenai/Llama-3.1-Tulu-3-8B | allenai/Llama-3.1-Tulu-3-70B | allenai/OLMo-2-1124-7B-Instruct | allenai/OLMo-2-1124-13B-Instruct |
Final Models (RLVR) | (🔥 New, trained with GRPO) allenai/Llama-3.1-Tulu-3.1-8B | |||
Reward Model (RM) | allenai/Llama-3.1-Tulu-3-8B-RM | (Same as 8B) | allenai/OLMo-2-1124-7B-RM | (Same as 7B) |
News
- [2025-02-12] We released the
allenai/Llama-3.1-Tulu-3.1-8B
model, which is trained with our GRPO recipe and outperforms the oldallenai/Llama-3.1-Tulu-3-8B
model in almost all of our evals. - [2024-11-22] We released TÜLU 3: Pushing Frontiers in Open Language Model Post-Training and updated our entire stack of open post-training recipes with both Llama 3.1 and OLMo 2.
- [2024-07-01] We released Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback and have majorly updated our codebase to support new models and package versions.
- [2023-11-27] We released Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2. Check out our models here. We have added a DPO finetuning script for replicating our results.
- [2023-09-26] We switched to use the official alpaca-eval library to run AlpacaFarm evaluation but use regenerated longer reference outputs. This will change our numbers reported in the paper. We will update the paper soon.
- [2023-09-25] Supported using vLLM for our evaluations, which speeds up the evaluation by 10x.
- [2023-09-17] Supported LoRA and QLoRA finetuning. See here for more details.
- [2023-08-18] Added support for ToxiGen/TruthfulQA evaluation. Check our
scripts/eval/
for examples of running them. - [2023-08-08] Supported several new instruction dataset, including LIMA / WizardLM / Open-Orca. See the preparation script for details. Performance hasn't been evaluated yet.
- [2023-08-06] Supported LLaMa 2 finetuning and FlashAttention-2 by bumping the version of transformers and many other dependencies.
- [2023-06-29] Added licensing info for our released models.
- [2023-06-09] Released Tülu (a suite of LLaMa models fully-finetuned on a strong mix of datasets) and many other checkpoints on HuggingFace [Links].
- [2023-06-09] Initial release of the codebase containing the training and evaluation code for our arxiv paper.
Citation
If you used this repository or our models, please cite our work:
Tulu 1:
@misc{wang2023far,
title={How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources},
author={Yizhong Wang and Hamish Ivison and Pradeep Dasigi and Jack Hessel and Tushar Khot and Khyathi Raghavi Chandu and David Wadden and Kelsey MacMillan and Noah A. Smith and Iz Beltagy and Hannaneh Hajishirzi},
year={2023},
eprint={2306.04751},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Tulu 2:
@misc{ivison2023camels,
title={Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2},
author={Hamish Ivison and Yizhong Wang and Valentina Pyatkin and Nathan Lambert and Matthew Peters and Pradeep Dasigi and Joel Jang and David Wadden and Noah A. Smith and Iz Beltagy and Hannaneh Hajishirzi},
year={2023},
eprint={2311.10702},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Tulu 2.5:
@misc{ivison2024unpacking,
title={Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback},
author={Hamish Ivison and Yizhong Wang and Jiacheng Liu and Zeqiu Wu and Valentina Pyatkin and Nathan Lambert and Noah A. Smith and Yejin Choi and Hannaneh Hajishirzi},
year={2024},
eprint={2406.09279},
archivePrefix={arXiv},
primaryClass={cs.CL},
}
Tulu 3:
@article{lambert2024tulu3,
title = {Tülu 3: Pushing Frontiers in Open Language Model Post-Training},
author = {
Nathan Lambert and Jacob Morrison and Valentina Pyatkin and Shengyi Huang and Hamish Ivison and Faeze Brahman and Lester James V. Miranda and Alisa Liu and Nouha Dziri and Shane Lyu and Yuling Gu and Saumya Malik and Victoria Graf and Jena D. Hwang and Jiangjiang Yang and Ronan Le Bras and Oyvind Tafjord and Chris Wilhelm and Luca Soldaini and Noah A. Smith and Yizhong Wang and Pradeep Dasigi and Hannaneh Hajishirzi
},
year = {2024},
email = {tulu@allenai.org}
}