🪄 The Unlocking Spell on Base LLMs:

Rethinking Alignment via In-Context Learning

Bill Yuchen Lin¹, Abhilasha Ravichander¹, Ximing Lu², Nouha Dziri¹,
Melanie Sclar², Khyathi Chandu¹, Chandra Bhagavatula¹, Yejin Choi^1,2

¹ Allen Institute for AI

² University of Washington

Name	Helpful	Factual	Deep	Clear	Engaging	Safe	Avg	Length
gpt-4-0314	4.90	4.90	4.57	4.99	4.62	4.74	4.79	226.41
gpt-4-0613	4.86	4.90	4.49	4.99	4.61	4.97	4.80	186.06
Yi-34B-Chat	4.86	4.82	4.79	4.97	4.85	4.92	4.87	376.27
Tulu2-DPO-70B	4.85	4.84	4.57	4.95	4.74	4.99	4.82	258.36
gpt-3.5-turbo	4.81	4.83	4.33	4.98	4.58	4.94	4.75	153.96
Tulu2-70B	4.77	4.78	4.32	4.95	4.57	4.81	4.70	171.85
URIAL=inst_1k-Llama-2-70b-hf	4.72	4.66	4.28	4.93	4.78	4.98	4.73	174.91
URIAL=inst_1k-Llama-2-70B-GPTQ	4.72	4.65	4.30	4.95	4.85	4.96	4.74	171.37
Tulu2-DPO-7B	4.64	4.53	4.36	4.92	4.69	4.88	4.67	240.58
Llama-2-70b-chat	4.58	4.61	4.38	4.95	4.78	5.00	4.72	252.43
URIAL=inst_1k-Mistral-7B	4.57	4.50	4.18	4.89	4.74	4.92	4.63	186.35
Yi-6B-Chat	4.57	4.40	4.39	4.85	4.61	4.67	4.58	357.37
Llama-2-70b-chat-GPTQ	4.50	4.54	4.28	4.92	4.75	5.00	4.67	257.93
Vicuna-7b	4.43	4.33	4.04	4.85	4.51	4.60	4.46	184.82
Mistral-7B-Instruct	4.36	4.29	3.89	4.87	4.47	4.75	4.44	155.36
Llama-2-7b-chat	4.10	4.26	3.91	4.83	4.70	5.00	4.47	246.85

Name	Info-seek	Reasoning	Procedure	Writing	Role-play	Code	Math	Avg	Length
gpt-4-0314	4.91	4.88	4.96	4.96	4.66	4.86	5.00	4.89	226.41
gpt-4-0613	4.83	4.83	4.98	4.93	4.66	4.86	5.00	4.87	186.06
gpt-3.5-turbo	4.85	4.74	4.93	4.82	4.51	4.86	5.00	4.82	153.96
Yi-34B-Chat	4.90	4.88	4.91	4.73	4.77	4.69	4.81	4.81	376.27
Tulu2-DPO-70B	4.87	4.87	4.95	4.95	4.60	4.55	4.19	4.71	258.36
Tulu2-70B	4.86	4.76	4.90	4.75	4.40	4.24	4.38	4.61	171.85
URIAL=inst_1k-Llama-2-70b-hf	4.85	4.72	4.86	4.74	4.23	4.07	3.94	4.49	174.91
URIAL=inst_1k-Llama-2-70B-GPTQ	4.84	4.71	4.83	4.85	4.43	4.14	3.44	4.46	171.37
Tulu2-DPO-7B	4.71	4.68	4.71	4.73	4.49	4.03	3.50	4.41	240.58
URIAL=inst_1k-Mistral-7B	4.73	4.62	4.65	4.41	4.11	3.93	3.81	4.32	186.35
Yi-6B-Chat	4.54	4.72	4.70	4.44	4.46	3.66	3.69	4.31	357.37
Llama-2-70b-chat	4.66	4.61	4.75	4.64	4.00	4.21	3.12	4.29	252.43
Llama-2-70b-chat-GPTQ	4.59	4.59	4.54	4.60	3.80	3.97	3.25	4.19	257.93
Mistral-7B-Instruct	4.27	4.45	4.54	4.44	3.91	4.00	3.56	4.17	155.36
Vicuna-7b	4.54	4.47	4.53	4.56	4.11	3.62	2.88	4.10	184.82
Llama-2-7b-chat	4.08	4.27	4.38	4.37	3.49	2.93	1.31	3.55	246.85

Name	Helpful	Factual	Deep	Clear	Engaging	Safe	Avg	Length
gpt-4-0314	4.81	4.81	4.43	4.97	4.47	4.33	4.64	179.57
gpt-4-0613	4.77	4.84	4.33	4.98	4.44	4.93	4.72	149.20
Tulu2-DPO-70B	4.68	4.65	4.39	4.88	4.59	5.00	4.70	234.04
Yi-34B-Chat	4.67	4.60	4.57	4.94	4.72	4.77	4.71	335.64
gpt-3.5-turbo	4.66	4.70	4.15	4.95	4.42	4.85	4.62	135.24
URIAL=inst_1k-Llama-2-70B-GPTQ	4.42	4.25	4.02	4.85	4.74	4.89	4.53	163.40
URIAL=inst_1k-Llama-2-70b-hf	4.40	4.31	3.97	4.84	4.60	4.95	4.51	158.12
URIAL=inst_1k-Mistral-7B	4.15	3.98	3.78	4.75	4.52	4.79	4.33	165.65
Tulu2-DPO-7B	4.11	3.94	3.87	4.78	4.47	4.68	4.31	217.56
Llama-2-70b-chat	4.08	4.17	3.91	4.87	4.63	5.00	4.44	201.27
Yi-6B-Chat	4.02	3.81	3.87	4.65	4.31	4.12	4.13	345.96
Llama-2-70b-chat-GPTQ	3.85	4.00	3.68	4.79	4.53	5.00	4.31	203.90
Mistral-7B-Instruct	3.78	3.72	3.36	4.70	4.16	4.33	4.01	131.24
Vicuna-7b	3.57	3.45	3.28	4.59	4.09	4.00	3.83	156.72
Llama-2-7b-chat	2.62	3.23	2.64	4.51	4.28	5.00	3.71	169.09

Name	Info-seek	Reasoning	Procedure	Writing	Role-play	Code	Math	Avg	Length
gpt-4-0314	4.91	4.88	4.96	4.96	4.66	4.86	5.00	4.89	226.41
gpt-4-0613	4.83	4.83	4.98	4.93	4.66	4.86	5.00	4.87	186.06
gpt-3.5-turbo	4.85	4.74	4.93	4.82	4.51	4.86	5.00	4.82	153.96
Yi-34B-Chat	4.90	4.88	4.91	4.73	4.77	4.69	4.81	4.81	376.27
Tulu2-DPO-70B	4.87	4.87	4.95	4.95	4.60	4.55	4.19	4.71	258.36
URIAL=inst_1k-Llama-2-70b-hf	4.85	4.72	4.86	4.74	4.23	4.07	3.94	4.49	174.91
URIAL=inst_1k-Llama-2-70B-GPTQ	4.84	4.71	4.83	4.85	4.43	4.14	3.44	4.46	171.37
Tulu2-DPO-7B	4.71	4.68	4.71	4.73	4.49	4.03	3.50	4.41	240.58
URIAL=inst_1k-Mistral-7B	4.73	4.62	4.65	4.41	4.11	3.93	3.81	4.32	186.35
Yi-6B-Chat	4.54	4.72	4.70	4.44	4.46	3.66	3.69	4.31	357.37
Llama-2-70b-chat	4.66	4.61	4.75	4.64	4.00	4.21	3.12	4.29	252.43
Llama-2-70b-chat-GPTQ	4.59	4.59	4.54	4.60	3.80	3.97	3.25	4.19	257.93
Mistral-7B-Instruct	4.27	4.45	4.54	4.44	3.91	4.00	3.56	4.17	155.36
Vicuna-7b	4.54	4.47	4.53	4.56	4.11	3.62	2.88	4.10	184.82
Llama-2-7b-chat	4.08	4.27	4.38	4.37	3.49	2.93	1.31	3.55	246.85

Name	AlpacaEval	Lima	MT-bench (1st)	Safety	Avg	Length
gpt-4-0613	4.87	4.83	4.94	4.97	4.90	186.06
gpt-4-0314	4.90	4.90	4.95	4.74	4.87	226.41
Tulu2-DPO-70B	4.85	4.87	4.78	4.99	4.87	258.36
Yi-34B-Chat	4.84	4.92	4.76	4.92	4.86	376.27
gpt-3.5-turbo	4.81	4.80	4.85	4.94	4.85	153.96
Tulu2-70B	4.80	4.78	4.61	4.81	4.75	171.85
URIAL=inst_1k-Llama-2-70B-GPTQ	4.75	4.73	4.51	4.96	4.74	171.37
URIAL=inst_1k-Llama-2-70b-hf	4.76	4.74	4.44	4.98	4.73	174.91
URIAL=inst_1k-Mistral-7B	4.60	4.56	4.49	4.92	4.64	186.35
Tulu2-DPO-7B	4.64	4.75	4.29	4.88	4.64	240.58
Llama-2-70b-chat	4.64	4.57	4.33	5.00	4.63	252.43
Llama-2-70b-chat-GPTQ	4.55	4.47	4.33	5.00	4.59	257.93
Yi-6B-Chat	4.53	4.73	4.19	4.67	4.53	357.37
Mistral-7B-Instruct	4.32	4.44	4.25	4.75	4.44	155.36
Vicuna-7b	4.42	4.52	4.14	4.60	4.42	184.82
Llama-2-7b-chat	4.17	4.14	3.60	5.00	4.23	246.85

More results are coming soon! Please stay tuned!

🤗 Hugging Face Dataset: re-align/just-eval-instruct

Data sources: AlpacaEval (covering 5 datasets), LIMA-test, MT-bench, Anthropic red-teaming, and MaliciousInstruct.
1K examples: 1,000 instructions, including 800 for problem-solving test, and 200 specifically for safety test.
Category: We tag each example with (one or multiple) labels on its task types and topics.
Distribution: [show more]
Aspects for evaluation: Helpfulness, Clarity, Factuality, Depth, Engagement, and Safety. [show more]
Evaluation: We use GPT-4 to score LLMs (1~5) on these aspects and provide rationales.

TBA. Please stay tuned!

@article{Lin2023ReAlign,
    author = {Bill Yuchen Lin and Abhilasha Ravichander and Ximing Lu and Nouha Dziri and Melanie Sclar and Khyathi Chandu and Chandra Bhagavatula and Yejin Choi},
    journal = {ArXiv preprint},
    title = {The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning},
    year = {2023},
    eprint={2312.01552},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}