Notes from the Wired

Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Published: February 1, 2024

Paper Title: Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Link to Paper: https://arxiv.org/abs/2204.07705
Date: 16. April 2022
Paper Type: NLP, LLM, Instruction
Short Abstract:
In this paper, the authors investigate the generalization capabilities of NLP models to unseen tasks.
It’s one of the first papers that uses instructions finetuning.

1. Introduction

This paper constructs a meta-dataset called SUPER-NATURALINSTRUCTIONS, a large benchmark comprising 1616 NLP tasks along with their natural language instructions. Each task in the dataset is accompanied by an instruction that describes the task for the model.

Furthermore, they train their own transformer model called Tk-Instruct, which has 11 billion parameters and is evaluated on the SUPER-NATURALINSTRUCTIONS dataset.

2. SUPER-NATURALINSTRUCTIONS

SUPER-NATURAL-INSTRUCTIONS is a meta-dataset consisting of various NLP tasks and plain language instructions that describe them.

All tasks in the dataset follow the same schema:

The model’s mission is, given the instruction for the task, to solve the instance of the task.

The benchmark was collected through a large community effort on GitHub, and quality control was conducted through automated tests and the review of GitHub pull requests.

3. T_k-INSTRUCT Model

Their model, Tk-INSTRUCT, is trained on SUPER-NATURAL-INSTRUCTIONS and based on the pre-trained T5 Models; each task is defined by a natural language instruction and some examples. The model is expected to solve these tasks based on the instruction and the examples. Basically they use the T5 Model as base and use their dataset to instruction fine-tune it.

4. Benchmarks and Results

The authors compare their Tk-Instruct model with other models such as InstructGPT.

Overall, they find that instruction-tuning enables stronger generalization to unseen tasks. Instruction-tuned models perform better compared to their untuned LLM counterparts. Furthermore, they find that the more tasks a model observes, the better it becomes at generalization, while a large number of training instances help less.

5. Conclusion

This paper, together with InstructGPT, demonstrates the power of instruction fine-tuning and how it can significantly increase the performance of LLM on NLP tasks. Furthermore, it provides a large dataset that allows the testing of models on a broad range of tasks.