🌐 NoMIRACL Dataset [EMNLP'24]

nthakur 's Collections

updated 9 days ago

A collection of multilingual relevance assessment datasets. We also have SFT fine-tuned models (Mistral-7B & Llama-3 8B)

NoMIRACL: Knowing When You Don't Know for Robust Multilingual Retrieval-Augmented Generation

Paper • 2312.11361 • Published Dec 18, 2023 • 1
miracl/nomiracl

Updated 9 days ago • 286 • 11

Note This is the NoMIRACL evaluation dataset (contains both relevant and non-relevant subsets); used for relevance assessment of multilingual LLMs.
miracl/nomiracl-instruct

Viewer • Updated 9 days ago • 23.9k • 77

Note This is the instruct version of NoMIRACL dataset -- can be used for finetuning LLMs for multilingual relevance.
miracl/miracl-corpus

Viewer • Updated Jan 5, 2023 • 77.2M • 2.67k • 44
nthakur/Mistral-7B-Instruct-v0.2-nomiracl-sft

Updated Jul 26

Note Fine-tuned Mistral-7B-Instruct-v0.2 version on the NoMIRACL instruct dataset -- More robust than Llama-3 & Mistral-7B Instruct v0.3.
nthakur/Meta-Llama-3-8B-Instruct-nomiracl-sft

Updated Jul 26 • 8

Note Fine-tuned Llama-3-8B-Instruct version on the NoMIRACL instruct dataset.
nthakur/Mistral-7B-Instruct-v0.3-nomiracl-sft

Updated Jul 26 • 6

Note Fine-tuned Mistral-7B-Instruct-v0.3 version on the NoMIRACL instruct dataset.