Llama-2-7B finetuned in three stages:
- 1B tokens of CulturaX (75% Estonain, 25% English)
- 1M English->Estonian sentence-pairs from CCMatrix (500000), WikiMatrix (400000), Europarl (50000), and OpenSubtitles (50000) as Alpaca-style translation instructions
- Alpaca-cleaned and Alpaca-est (both ~50000 instructions)
Alpaca-est is an instruction dataset generated for Estonian with gpt-3.5-turbo-0613, following Alpaca.
- Downloads last month
- 10
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.