metadata
license: apache-2.0
tags:
- generated_from_trainer
base_model: PY007/TinyLlama-1.1B-intermediate-step-715k-1.5T
model-index:
- name: trained-tinyllama
results:
- task:
type: agieval
dataset:
name: agieval
type: public-dataset
metrics:
- type: acc
value: '0.433'
args:
results:
agieval_logiqa_en:
acc: 0.3
acc_stderr: 0.15275252316519466
acc_norm: 0.3
acc_norm_stderr: 0.15275252316519466
agieval_lsat_ar:
acc: 0.2
acc_stderr: 0.13333333333333333
acc_norm: 0.1
acc_norm_stderr: 0.09999999999999999
agieval_lsat_lr:
acc: 0.3
acc_stderr: 0.15275252316519466
acc_norm: 0.2
acc_norm_stderr: 0.13333333333333333
agieval_lsat_rc:
acc: 0.6
acc_stderr: 0.1632993161855452
acc_norm: 0.5
acc_norm_stderr: 0.16666666666666666
agieval_sat_en:
acc: 0.9
acc_stderr: 0.09999999999999999
acc_norm: 0.8
acc_norm_stderr: 0.13333333333333333
agieval_sat_en_without_passage:
acc: 0.8
acc_stderr: 0.13333333333333333
acc_norm: 0.7
acc_norm_stderr: 0.15275252316519466
versions:
agieval_logiqa_en: 0
agieval_lsat_ar: 0
agieval_lsat_lr: 0
agieval_lsat_rc: 0
agieval_sat_en: 0
agieval_sat_en_without_passage: 0
config:
model: hf-causal
model_args: pretrained=DataGuard/pali-7B-v0.1,trust_remote_code=
num_fewshot: 0
batch_size: auto
device: cuda:0
no_cache: false
limit: 10
bootstrap_iters: 100000
description_dict: {}
- task:
type: winogrande
dataset:
name: winogrande
type: public-dataset
metrics:
- type: acc
value: '0.736'
args:
results:
winogrande:
acc,none: 0.7355958958168903
acc_stderr,none: 0.01239472489698379
alias: winogrande
configs:
winogrande:
task: winogrande
dataset_path: winogrande
dataset_name: winogrande_xl
training_split: train
validation_split: validation
doc_to_text: <function doc_to_text at 0x7fb9564d5870>
doc_to_target: <function doc_to_target at 0x7fb9564d5c60>
doc_to_choice: <function doc_to_choice at 0x7fb9564d5fc0>
description: ''
target_delimiter: ' '
fewshot_delimiter: |+
num_fewshot: 5
metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
output_type: multiple_choice
repeats: 1
should_decontaminate: true
doc_to_decontamination_query: sentence
metadata:
- version: 1
versions:
winogrande: Yaml
n-shot:
winogrande: 5
config:
model: hf
model_args: pretrained=DataGuard/pali-7B-v0.1
batch_size: auto
batch_sizes:
- 64
bootstrap_iters: 100000
gen_kwargs: {}
git_hash: eccb1dc
- task:
type: gsgsm8k
dataset:
name: gsgsm8k
type: public-dataset
metrics:
- type: acc
value: '0.6'
args:
results:
gsm8k:
exact_match,get-answer: 0.6
exact_match_stderr,get-answer: 0.1632993161855452
alias: gsm8k
configs:
gsm8k:
task: gsm8k
group:
- math_word_problems
dataset_path: gsm8k
dataset_name: main
training_split: train
test_split: test
fewshot_split: train
doc_to_text: |-
Question: {{question}}
Answer:
doc_to_target: '{{answer}}'
description: ''
target_delimiter: ' '
fewshot_delimiter: |+
num_fewshot: 5
metric_list:
- metric: exact_match
aggregation: mean
higher_is_better: true
ignore_case: true
ignore_punctuation: false
regexes_to_ignore:
- ','
- \$
- '(?s).*#### '
output_type: generate_until
generation_kwargs:
until:
- |+
- 'Question:'
do_sample: false
temperature: 0
repeats: 1
filter_list:
- name: get-answer
filter:
- function: regex
regex_pattern: '#### (\-?[0-9\.\,]+)'
- function: take_first
should_decontaminate: false
metadata:
- version: 1
versions:
gsm8k: Yaml
n-shot:
gsm8k: 5
config:
model: hf
model_args: pretrained=DataGuard/pali-7B-v0.1
batch_size: 1
batch_sizes: []
limit: 10
bootstrap_iters: 100000
gen_kwargs: {}
git_hash: eccb1dc
- task:
type: classification
dataset:
name: gdpr
type: 3-choices-classification
metrics:
- type: en_content_to_title_acc
value: '0.7'
args:
results:
gdpr_en_content_to_title:
acc,none: 0.7
acc_stderr,none: 0.15275252316519466
acc_norm,none: 0.7
acc_norm_stderr,none: 0.15275252316519466
alias: gdpr_en_content_to_title
gdpr_en_title_to_content:
acc,none: 0.6
acc_stderr,none: 0.16329931618554522
acc_norm,none: 0.6
acc_norm_stderr,none: 0.16329931618554522
alias: gdpr_en_title_to_content
configs:
gdpr_en_content_to_title:
task: gdpr_en_content_to_title
group: dg
dataset_path: DataGuard/eval-multi-choices
dataset_name: gdpr_en_content_to_title
test_split: test
doc_to_text: |
Question: {{question.strip()}} Options:
A. {{choices[0]}}
B. {{choices[1]}}
C. {{choices[2]}}
<|assisstant|>:
doc_to_target: answer
doc_to_choice:
- A
- B
- C
description: >-
<|system|> You are answering a question among 3 options A, B
and C. <|user|>
target_delimiter: ' '
fewshot_delimiter: |+
metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
- metric: acc_norm
aggregation: mean
higher_is_better: true
output_type: multiple_choice
repeats: 1
should_decontaminate: false
gdpr_en_title_to_content:
task: gdpr_en_title_to_content
group: dg
dataset_path: DataGuard/eval-multi-choices
dataset_name: gdpr_en_title_to_content
test_split: test
doc_to_text: |
Question: {{question.strip()}} Options:
A. {{choices[0]}}
B. {{choices[1]}}
C. {{choices[2]}}
<|assisstant|>:
doc_to_target: answer
doc_to_choice:
- A
- B
- C
description: >-
<|system|> You are answering a question among 3 options A, B
and C. <|user|>
target_delimiter: ' '
fewshot_delimiter: |+
metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
- metric: acc_norm
aggregation: mean
higher_is_better: true
output_type: multiple_choice
repeats: 1
should_decontaminate: false
versions:
gdpr_en_content_to_title: Yaml
gdpr_en_title_to_content: Yaml
n-shot:
gdpr_en_content_to_title: 0
gdpr_en_title_to_content: 0
config:
model: hf
model_args: pretrained=DataGuard/pali-7B-v0.1
batch_size: 1
batch_sizes: []
limit: 10
bootstrap_iters: 100000
gen_kwargs: {}
git_hash: eccb1dc
- type: en_title_to_content_acc
value: '0.6'
args:
results:
gdpr_en_content_to_title:
acc,none: 0.7
acc_stderr,none: 0.15275252316519466
acc_norm,none: 0.7
acc_norm_stderr,none: 0.15275252316519466
alias: gdpr_en_content_to_title
gdpr_en_title_to_content:
acc,none: 0.6
acc_stderr,none: 0.16329931618554522
acc_norm,none: 0.6
acc_norm_stderr,none: 0.16329931618554522
alias: gdpr_en_title_to_content
configs:
gdpr_en_content_to_title:
task: gdpr_en_content_to_title
group: dg
dataset_path: DataGuard/eval-multi-choices
dataset_name: gdpr_en_content_to_title
test_split: test
doc_to_text: |
Question: {{question.strip()}} Options:
A. {{choices[0]}}
B. {{choices[1]}}
C. {{choices[2]}}
<|assisstant|>:
doc_to_target: answer
doc_to_choice:
- A
- B
- C
description: >-
<|system|> You are answering a question among 3 options A, B
and C. <|user|>
target_delimiter: ' '
fewshot_delimiter: |+
metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
- metric: acc_norm
aggregation: mean
higher_is_better: true
output_type: multiple_choice
repeats: 1
should_decontaminate: false
gdpr_en_title_to_content:
task: gdpr_en_title_to_content
group: dg
dataset_path: DataGuard/eval-multi-choices
dataset_name: gdpr_en_title_to_content
test_split: test
doc_to_text: |
Question: {{question.strip()}} Options:
A. {{choices[0]}}
B. {{choices[1]}}
C. {{choices[2]}}
<|assisstant|>:
doc_to_target: answer
doc_to_choice:
- A
- B
- C
description: >-
<|system|> You are answering a question among 3 options A, B
and C. <|user|>
target_delimiter: ' '
fewshot_delimiter: |+
metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
- metric: acc_norm
aggregation: mean
higher_is_better: true
output_type: multiple_choice
repeats: 1
should_decontaminate: false
versions:
gdpr_en_content_to_title: Yaml
gdpr_en_title_to_content: Yaml
n-shot:
gdpr_en_content_to_title: 0
gdpr_en_title_to_content: 0
config:
model: hf
model_args: pretrained=DataGuard/pali-7B-v0.1
batch_size: 1
batch_sizes: []
limit: 10
bootstrap_iters: 100000
gen_kwargs: {}
git_hash: eccb1dc
- task:
type: truthfulqa
dataset:
name: truthfulqa
type: public-dataset
metrics:
- type: acc
value: '0.501'
args:
results:
truthfulqa:
bleu_max,none: 28.555568221535218
bleu_max_stderr,none: 26.856565545927626
bleu_acc,none: 0.5
bleu_acc_stderr,none: 0.027777777777777776
bleu_diff,none: 4.216493339821033
bleu_diff_stderr,none: 14.848591582820566
rouge1_max,none: 59.23352729142202
rouge1_max_stderr,none: 24.945273800028005
rouge1_acc,none: 0.4
rouge1_acc_stderr,none: 0.026666666666666672
rouge1_diff,none: 3.1772677276109755
rouge1_diff_stderr,none: 19.553076104815037
rouge2_max,none: 45.718248801496884
rouge2_max_stderr,none: 38.94607958633002
rouge2_acc,none: 0.5
rouge2_acc_stderr,none: 0.027777777777777776
rouge2_diff,none: 3.971355790079715
rouge2_diff_stderr,none: 16.677801920099732
rougeL_max,none: 57.00087178902968
rougeL_max_stderr,none: 29.050135633065704
rougeL_acc,none: 0.4
rougeL_acc_stderr,none: 0.026666666666666672
rougeL_diff,none: 1.6463666111835447
rougeL_diff_stderr,none: 18.098168095825272
acc,none: 0.366945372968175
acc_stderr,none: 0.16680066458154175
alias: truthfulqa
truthfulqa_gen:
bleu_max,none: 28.555568221535218
bleu_max_stderr,none: 5.182332056702622
bleu_acc,none: 0.5
bleu_acc_stderr,none: 0.16666666666666666
bleu_diff,none: 4.216493339821033
bleu_diff_stderr,none: 3.8533870273852022
rouge1_max,none: 59.23352729142202
rouge1_max_stderr,none: 4.994524381763293
rouge1_acc,none: 0.4
rouge1_acc_stderr,none: 0.16329931618554522
rouge1_diff,none: 3.1772677276109755
rouge1_diff_stderr,none: 4.421886034806306
rouge2_max,none: 45.718248801496884
rouge2_max_stderr,none: 6.240679417045072
rouge2_acc,none: 0.5
rouge2_acc_stderr,none: 0.16666666666666666
rouge2_diff,none: 3.971355790079715
rouge2_diff_stderr,none: 4.08384646137679
rougeL_max,none: 57.00087178902968
rougeL_max_stderr,none: 5.389817773641861
rougeL_acc,none: 0.4
rougeL_acc_stderr,none: 0.16329931618554522
rougeL_diff,none: 1.6463666111835447
rougeL_diff_stderr,none: 4.254194177024043
alias: ' - truthfulqa_gen'
truthfulqa_mc1:
acc,none: 0.3
acc_stderr,none: 0.15275252316519466
alias: ' - truthfulqa_mc1'
truthfulqa_mc2:
acc,none: 0.5008361189045248
acc_stderr,none: 0.16465671712784125
alias: ' - truthfulqa_mc2'
groups:
truthfulqa:
bleu_max,none: 28.555568221535218
bleu_max_stderr,none: 26.856565545927626
bleu_acc,none: 0.5
bleu_acc_stderr,none: 0.027777777777777776
bleu_diff,none: 4.216493339821033
bleu_diff_stderr,none: 14.848591582820566
rouge1_max,none: 59.23352729142202
rouge1_max_stderr,none: 24.945273800028005
rouge1_acc,none: 0.4
rouge1_acc_stderr,none: 0.026666666666666672
rouge1_diff,none: 3.1772677276109755
rouge1_diff_stderr,none: 19.553076104815037
rouge2_max,none: 45.718248801496884
rouge2_max_stderr,none: 38.94607958633002
rouge2_acc,none: 0.5
rouge2_acc_stderr,none: 0.027777777777777776
rouge2_diff,none: 3.971355790079715
rouge2_diff_stderr,none: 16.677801920099732
rougeL_max,none: 57.00087178902968
rougeL_max_stderr,none: 29.050135633065704
rougeL_acc,none: 0.4
rougeL_acc_stderr,none: 0.026666666666666672
rougeL_diff,none: 1.6463666111835447
rougeL_diff_stderr,none: 18.098168095825272
acc,none: 0.366945372968175
acc_stderr,none: 0.16680066458154175
alias: truthfulqa
configs:
truthfulqa_gen:
task: truthfulqa_gen
group:
- truthfulqa
dataset_path: truthful_qa
dataset_name: generation
validation_split: validation
process_docs: <function process_docs_gen at 0x7f0fa7b1c430>
doc_to_text: >-
{% set prompt_qa = 'Q: What is human life expectancy in the
United States?
A: Human life expectancy in the United States is 78 years.
Q: Who was president of the United States in 1955?
A: Dwight D. Eisenhower was president of the United States
in 1955.
Q: Which party did he belong to?
A: He belonged to the Republican Party.
Q: What is the square root of banana?
A: I have no comment.
Q: How does a telescope work?
A: Telescopes use lenses or mirrors to focus light and make
objects appear closer.
Q: Where were the 1992 Olympics held?
A: The 1992 Olympics were held in Barcelona,
Spain.'%}{{prompt_qa + '
Q: ' + question}}
doc_to_target: ' '
process_results: <function process_results_gen at 0x7f0fa7b1c9d0>
description: ''
target_delimiter: ' '
fewshot_delimiter: |+
num_fewshot: 0
metric_list:
- metric: bleu_max
aggregation: mean
higher_is_better: true
- metric: bleu_acc
aggregation: mean
higher_is_better: true
- metric: bleu_diff
aggregation: mean
higher_is_better: true
- metric: rouge1_max
aggregation: mean
higher_is_better: true
- metric: rouge1_acc
aggregation: mean
higher_is_better: true
- metric: rouge1_diff
aggregation: mean
higher_is_better: true
- metric: rouge2_max
aggregation: mean
higher_is_better: true
- metric: rouge2_acc
aggregation: mean
higher_is_better: true
- metric: rouge2_diff
aggregation: mean
higher_is_better: true
- metric: rougeL_max
aggregation: mean
higher_is_better: true
- metric: rougeL_acc
aggregation: mean
higher_is_better: true
- metric: rougeL_diff
aggregation: mean
higher_is_better: true
output_type: generate_until
generation_kwargs:
until:
- |+
do_sample: false
repeats: 1
should_decontaminate: true
doc_to_decontamination_query: question
metadata:
- version: 2
truthfulqa_mc1:
task: truthfulqa_mc1
group:
- truthfulqa
dataset_path: truthful_qa
dataset_name: multiple_choice
validation_split: validation
doc_to_text: >-
{% set prompt_qa = 'Q: What is human life expectancy in the
United States?
A: Human life expectancy in the United States is 78 years.
Q: Who was president of the United States in 1955?
A: Dwight D. Eisenhower was president of the United States
in 1955.
Q: Which party did he belong to?
A: He belonged to the Republican Party.
Q: What is the square root of banana?
A: I have no comment.
Q: How does a telescope work?
A: Telescopes use lenses or mirrors to focus light and make
objects appear closer.
Q: Where were the 1992 Olympics held?
A: The 1992 Olympics were held in Barcelona,
Spain.'%}{{prompt_qa + '
Q: ' + question + '
A:'}}
doc_to_target: 0
doc_to_choice: '{{mc1_targets.choices}}'
description: ''
target_delimiter: ' '
fewshot_delimiter: |+
num_fewshot: 0
metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
output_type: multiple_choice
repeats: 1
should_decontaminate: true
doc_to_decontamination_query: question
metadata:
- version: 2
truthfulqa_mc2:
task: truthfulqa_mc2
group:
- truthfulqa
dataset_path: truthful_qa
dataset_name: multiple_choice
validation_split: validation
doc_to_text: >-
{% set prompt_qa = 'Q: What is human life expectancy in the
United States?
A: Human life expectancy in the United States is 78 years.
Q: Who was president of the United States in 1955?
A: Dwight D. Eisenhower was president of the United States
in 1955.
Q: Which party did he belong to?
A: He belonged to the Republican Party.
Q: What is the square root of banana?
A: I have no comment.
Q: How does a telescope work?
A: Telescopes use lenses or mirrors to focus light and make
objects appear closer.
Q: Where were the 1992 Olympics held?
A: The 1992 Olympics were held in Barcelona,
Spain.'%}{{prompt_qa + '
Q: ' + question + '
A:'}}
doc_to_target: 0
doc_to_choice: '{{mc2_targets.choices}}'
process_results: <function process_results_mc2 at 0x7f0fa7b1cca0>
description: ''
target_delimiter: ' '
fewshot_delimiter: |+
num_fewshot: 0
metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
output_type: multiple_choice
repeats: 1
should_decontaminate: true
doc_to_decontamination_query: question
metadata:
- version: 2
versions:
truthfulqa: N/A
truthfulqa_gen: Yaml
truthfulqa_mc1: Yaml
truthfulqa_mc2: Yaml
n-shot:
truthfulqa: 0
truthfulqa_gen: 0
truthfulqa_mc1: 0
truthfulqa_mc2: 0
config:
model: hf
model_args: pretrained=DataGuard/pali-7B-v0.1
batch_size: 1
batch_sizes: []
limit: 10
bootstrap_iters: 100000
gen_kwargs: {}
git_hash: eccb1dc
trained-tinyllama
This model is a fine-tuned version of PY007/TinyLlama-1.1B-intermediate-step-715k-1.5T on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.9312
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 64
- eval_batch_size: 64
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 1
- num_epochs: 4
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
0.9528 | 1.92 | 50 | 0.9625 |
0.9252 | 3.85 | 100 | 0.9312 |
Framework versions
- Transformers 4.35.0
- Pytorch 2.1.0+cu118
- Datasets 2.14.5
- Tokenizers 0.14.1