Trained Model Location
When running our training scripts, the model will get uploaded to several places when applicable for redundancy, depending on the cluster environment:
- Hugging Face
- Google Cloud Storage
- Ai2's internal beaker dataset
- Local storage
Hugging Face
Let's use https://wandb.ai/ai2-llm/open_instruct_public/runs/tyfe1095 as an example. If you go to its wandb's Overview page -> config -> search for hf, then you can find this hf_repo_url.

To download, notice the run_name for this run is tulu3_8b_dpo__1__1742613782. So you can use the following command to download the model:
exp_name=tulu3_8b_dpo__1__1742613782
# first download the model
huggingface-cli download --revision $exp_name allenai/open_instruct_dev
# get the cache directory
exp_cache_dir=$(huggingface-cli download --revision $exp_name allenai/open_instruct_dev)
ls $exp_cache_dir
Downloading 'config.json' to '/weka/oe-adapt-default/allennlp/.cache/hub/models--allenai--open_instruct_dev/blobs/c0ed34722856586c3fa9ccb27bd52fb8e1d759a1.incomplete'
config.json: 100%|████████████████████████████████████████████████████████████| 984/984 [00:00<00:00, 5.84MB/s]
Download complete. Moving file to /weka/oe-adapt-default/allennlp/.cache/hub/models--allenai--open_instruct_dev/blobs/c0ed34722856586c3fa9ccb27bd52fb8e1d759a1
Downloading 'pytorch_model-00001-of-00004.bin' to '/weka/oe-adapt-default/allennlp/.cache/hub/models--allenai--open_instruct_dev/blobs/9da6b1637575b207617b84e84a5a974e8ee2a9fab55bd7e0343d6edf2a9f9f28.incomplete'
pytorch_model-00001-of-00004.bin: 100%|███████████████████████████████████▉| 4.98G/4.98G [00:07<00:00, 662MB/s]
Download complete. Moving file to /weka/oe-adapt-default/allennlp/.cache/hub/models--allenai--open_instruct_dev/blobs/9da6b1637575b207617b84e84a5a974e8ee2a9fab55bd7e0343d6edf2a9f9f28
Downloading 'pytorch_model-00002-of-00004.bin' to '/weka/oe-adapt-default/allennlp/.cache/hub/models--allenai--open_instruct_dev/blobs/667937dac38f3df4ffe7f5be637b54bed58c40b78c39550b639d12f6d57461b7.incomplete'
pytorch_model-00002-of-00004.bin: 100%|███████████████████████████████████▉| 5.00G/5.00G [00:07<00:00, 657MB/s]
Download complete. Moving file to /weka/oe-adapt-default/allennlp/.cache/hub/models--allenai--open_instruct_dev/blobs/667937dac38f3df4ffe7f5be637b54bed58c40b78c39550b639d12f6d57461b7
Downloading 'pytorch_model-00003-of-00004.bin' to '/weka/oe-adapt-default/allennlp/.cache/hub/models--allenai--open_instruct_dev/blobs/7d0471f489239e21a2063568974d4b118f294b5d1a381f306fe165729b6e88d3.incomplete'
pytorch_model-00003-of-00004.bin: 100%|███████████████████████████████████▉| 4.92G/4.92G [00:07<00:00, 678MB/s]
Download complete. Moving file to /weka/oe-adapt-default/allennlp/.cache/hub/models--allenai--open_instruct_dev/blobs/7d0471f489239e21a2063568974d4b118f294b5d1a381f306fe165729b6e88d3
Downloading 'pytorch_model-00004-of-00004.bin' to '/weka/oe-adapt-default/allennlp/.cache/hub/models--allenai--open_instruct_dev/blobs/ff53f644b12798a5e81c6c8072169a29b6a318a251d7d939687e2af333efe51e.incomplete'
pytorch_model-00004-of-00004.bin: 100%|███████████████████████████████████▉| 1.17G/1.17G [00:03<00:00, 337MB/s]
Download complete. Moving file to /weka/oe-adapt-default/allennlp/.cache/hub/models--allenai--open_instruct_dev/blobs/ff53f644b12798a5e81c6c8072169a29b6a318a251d7d939687e2af333efe51e
/weka/oe-adapt-default/allennlp/.cache/hub/models--allenai--open_instruct_dev/snapshots/40227c36fb1b5b714a71f9d635ead5a79c23507f
root@phobos-cs-aus-452:/weka/oe-adapt-default/costah/oi2/docs/algorithms/trained_model_location# # get the cache directory
root@phobos-cs-aus-452:/weka/oe-adapt-default/costah/oi2/docs/algorithms/trained_model_location# exp_cache_dir=$(huggingface-cli download --revision $exp_name allenai/open_instruct_dev)
root@phobos-cs-aus-452:/weka/oe-adapt-default/costah/oi2/docs/algorithms/trained_model_location# ls $exp_cache_dir
config.json pytorch_model-00003-of-00004.bin special_tokens_map.json
generation_config.json pytorch_model-00004-of-00004.bin tokenizer_config.json
pytorch_model-00001-of-00004.bin pytorch_model.bin.index.json tokenizer.json
pytorch_model-00002-of-00004.bin README.md
Google Cloud Storage
Let's use https://wandb.ai/ai2-llm/open_instruct_public/runs/tyfe1095 as an example. Because this run was conducted on the ai2/augusta, mason.py automatically appends --gs_bucket_path gs://ai2-llm/post-training/ to the training args (for external users you can try doing the same append). Then, the model was automatically uploaded to
gs://ai2-llm/post-training//costah/output/tulu3_8b_dpo__1__1742613782

To download, you can run the following command:
gsutil -o "GSUtil:parallel_composite_upload_threshold=150M" \
-o "GSUtil:parallel_thread_count=1" \
-m \
cp -r gs://ai2-llm/post-training//costah/output/tulu3_8b_dpo__1__1742613782 .
ls tulu3_8b_dpo__1__1742613782
root@phobos-cs-aus-452:/weka/oe-adapt-default/costah/oi2/tulu3_8b_dpo__1__1742613782# gsutil -o "GSUtil:parallel_composite_upload_threshold=150M" \
> -o "GSUtil:parallel_thread_count=1" \
> -m \
> cp -r gs://ai2-llm/post-training//costah/output/tulu3_8b_dpo__1__1742613782 .
Copying gs://ai2-llm/post-training//costah/output/tulu3_8b_dpo__1__1742613782/config.json...
Copying gs://ai2-llm/post-training//costah/output/tulu3_8b_dpo__1__1742613782/generation_config.json...
Copying gs://ai2-llm/post-training//costah/output/tulu3_8b_dpo__1__1742613782/pytorch_model-00001-of-00004.bin...
Copying gs://ai2-llm/post-training//costah/output/tulu3_8b_dpo__1__1742613782/pytorch_model-00002-of-00004.bin...
Copying gs://ai2-llm/post-training//costah/output/tulu3_8b_dpo__1__1742613782/pytorch_model-00003-of-00004.bin...
Copying gs://ai2-llm/post-training//costah/output/tulu3_8b_dpo__1__1742613782/pytorch_model-00004-of-00004.bin...
Copying gs://ai2-llm/post-training//costah/output/tulu3_8b_dpo__1__1742613782/pytorch_model.bin.index.json...
Copying gs://ai2-llm/post-training//costah/output/tulu3_8b_dpo__1__1742613782/special_tokens_map.json...
Copying gs://ai2-llm/post-training//costah/output/tulu3_8b_dpo__1__1742613782/tokenizer.json...
Copying gs://ai2-llm/post-training//costah/output/tulu3_8b_dpo__1__1742613782/tokenizer_config.json...
Resuming download for ./tulu3_8b_dpo__1__1742613782/pytorch_model-00003-of-00004.bin component 0
Resuming download for ./tulu3_8b_dpo__1__1742613782/pytorch_model-00003-of-00004.bin component 1
Resuming download for ./tulu3_8b_dpo__1__1742613782/pytorch_model-00003-of-00004.bin component 2
Resuming download for ./tulu3_8b_dpo__1__1742613782/pytorch_model-00003-of-00004.bin component 3
| [10/10 files][ 15.0 GiB/ 15.0 GiB] 100% Done 52.1 MiB/s ETA 00:00:00
Operation completed over 10 objects/15.0 GiB.
root@phobos-cs-aus-452:/weka/oe-adapt-default/costah/oi2/tulu3_8b_dpo__1__1742613782# ls
tulu3_8b_dpo__1__1742613782
root@phobos-cs-aus-452:/weka/oe-adapt-default/costah/oi2/tulu3_8b_dpo__1__1742613782# ls tulu3_8b_dpo__1__1742613782
config.json pytorch_model-00003-of-00004.bin tokenizer_config.json
generation_config.json pytorch_model-00004-of-00004.bin tokenizer.json
pytorch_model-00001-of-00004.bin pytorch_model.bin.index.json
pytorch_model-00002-of-00004.bin special_tokens_map.json
Local storage / NFS
The local storage is quite ephemeral. Sometimes we try to assign an output_dir to a particular directory. For example, when launching with mason.py, we automatically overwrite the output_dir to be /weka/oe-adapt-default/allennlp/deletable_checkpoint/$beaker_user/. You can find the model in the following directory:

ls "/weka/oe-adapt-default/allennlp/deletable_checkpoint/valpy/tulu3_8b_sft_no_IF__8__1745534652"
config.json pytorch_model-00003-of-00004.bin tokenizer_config.json
generation_config.json pytorch_model-00004-of-00004.bin tokenizer.json
pytorch_model-00001-of-00004.bin pytorch_model.bin.index.json
pytorch_model-00002-of-00004.bin special_tokens_map.json
Beaker Dataset (Ai2's internal storage)
When possible, we try to upload the model to beaker dataset. You can find the model in corresponding beaker experiment by looking up beaker_experiment_url in the tracked wandb run.

You can download the model by running the following command:
exp_name=tulu3_8b_dpo__1__1742613782
dataset_id=01JPXXXKZPACGK5AZ1XSD5V54F
mkdir $exp_name
beaker dataset fetch "$dataset_id" -o $exp_name --concurrency 64
Downloading dataset 01JPXXXKZPACGK5AZ1XSD5V54F to tulu3_8b_dpo__1__1742613782
Files: 6 4 in progress ⠸
Bytes: 16.49 MiB 14.96 GiB in progress ⠸