metadata
license: mit
datasets:
- gbharti/finance-alpaca
- lavita/ChatDoctor-HealthCareMagic-100k
- laion/OIG
- openai/webgpt_comparisons
- taskydata/GPT4Tools
- DataProvenanceInitiative/cot_submix_original
- 0x70DA/stackoverflow-chat-data
language:
- en
library_name: adapter-transformers
pipeline_tag: text-classification
Attempt to reproduce Mixture-of-LoRAs classifier
Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models
https://arxiv.org/pdf/2403.03432
Datasets
We evenly sample about 10k training data and 2k validation data on each dataset.
From laion/OIG
was taken only:
- unified_merged_code_xp3.jsonl
- unified_grade_school_math_instructions.jsonl
- unified_mathqa_flanv2_kojma_cot.jsonl