--- license: apache-2.0 --- license: apache-2.0 --- **Paper**: [Adapting Language Models to Compress Contexts](https://arxiv.org/abs/2305.14788) **Code**: https://github.com/princeton-nlp/AutoCompressors **Models**: - Llama-2-7b fine-tuned models: [AutoCompressor-Llama-2-7b-6k](https://huggingface.co/princeton-nlp/AutoCompressor-Llama-2-7b-6k/), [FullAttention-Llama-2-7b-6k](https://huggingface.co/princeton-nlp/FullAttention-Llama-2-7b-6k) - OPT-2.7b fine-tuned models: [AutoCompressor-2.7b-6k](https://huggingface.co/princeton-nlp/AutoCompressor-2.7b-6k), [AutoCompressor-2.7b-30k](https://huggingface.co/princeton-nlp/AutoCompressor-2.7b-30k), [RMT-2.7b-8k](https://huggingface.co/princeton-nlp/RMT-2.7b-8k) - OPT-1.3b fine-tuned models: [AutoCompressor-1.3b-30k](https://huggingface.co/princeton-nlp/AutoCompressor-1.3b-30k), [RMT-1.3b-30k](https://huggingface.co/princeton-nlp/RMT-1.3b-30k) --- AutoCompressor-Llama-2-7b-6k is a model fine-tuned from [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) following the AutoCompressor method in [Adapting Language Models to Compress Contexts](https://arxiv.org/abs/2305.14788). This model is fine-tuned on 15B tokens from [RedPajama dataset](https://github.com/togethercomputeub.com/togethercomputer/RedPajama-Data). The pre-trained Llama-2 model is fine-tuned on sequences of 6,144 tokens with 50 summary vectors, summary accumulation, randomized segmenting, and stop-gradients. To get started, download the [`AutoCompressor`](https://github.com/princeton-nlp/AutoCompressors) repository and load the model as follows: ``` from auto_compressor_llama import LlamaAutoCompressorModel model = LlamaAutoCompressorModel.from_pretrained("princeton-nlp/AutoCompressor-Llama-2-7b-6k") ``` **Evaluation** We record the perplexity achieved by our Llama-2-7B models on segments of 2048 tokens, conditioned on different amounts of context. FullAttention-Llama-2-7b-6k uses full uncompressed contexts whereas AutoCompressor-Llama-2-7b-6k compresses segments of 2048 tokens into 50 summary vectors. | Context Tokens | 0 |512 | 2048 | 4096 | 6144 | | -----------------------------|-----|-----|------|------|------| | Pre-trained Llama-2-7b | 5.52|5.15 |4.98 |- |- | | FullAttention-Llama-2-7b-6k | 5.40|5.06 | 4.88 | 4.80 | 4.76 | | AutoCompressor-Llama-2-7b-6k | 5.40|5.16 | 5.11 | 5.08 | 5.07 | See [Adapting Language Models to Compress Contexts](https://arxiv.org/abs/2305.14788) for more evaluations, including evaluation on 11 in-context learning tasks. ## Bibtex ``` @misc{chevalier2023adapting, title={Adapting Language Models to Compress Contexts}, author={Alexis Chevalier and Alexander Wettig and Anirudh Ajith and Danqi Chen}, year={2023}, eprint={2305.14788}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```