--- license: apache-2.0 tags: - generated_from_trainer base_model: PY007/TinyLlama-1.1B-intermediate-step-715k-1.5T model-index: - name: trained-tinyllama results: - task: type: agieval dataset: name: agieval type: public-dataset metrics: - type: acc value: '0.433' args: results: agieval_logiqa_en: acc: 0.3 acc_stderr: 0.15275252316519466 acc_norm: 0.3 acc_norm_stderr: 0.15275252316519466 agieval_lsat_ar: acc: 0.2 acc_stderr: 0.13333333333333333 acc_norm: 0.1 acc_norm_stderr: 0.09999999999999999 agieval_lsat_lr: acc: 0.3 acc_stderr: 0.15275252316519466 acc_norm: 0.2 acc_norm_stderr: 0.13333333333333333 agieval_lsat_rc: acc: 0.6 acc_stderr: 0.1632993161855452 acc_norm: 0.5 acc_norm_stderr: 0.16666666666666666 agieval_sat_en: acc: 0.9 acc_stderr: 0.09999999999999999 acc_norm: 0.8 acc_norm_stderr: 0.13333333333333333 agieval_sat_en_without_passage: acc: 0.8 acc_stderr: 0.13333333333333333 acc_norm: 0.7 acc_norm_stderr: 0.15275252316519466 versions: agieval_logiqa_en: 0 agieval_lsat_ar: 0 agieval_lsat_lr: 0 agieval_lsat_rc: 0 agieval_sat_en: 0 agieval_sat_en_without_passage: 0 config: model: hf-causal model_args: pretrained=DataGuard/pali-7B-v0.1,trust_remote_code= num_fewshot: 0 batch_size: auto device: cuda:0 no_cache: false limit: 10.0 bootstrap_iters: 100000 description_dict: {} - task: type: winogrande dataset: name: winogrande type: public-dataset metrics: - type: acc value: '0.736' args: results: winogrande: acc,none: 0.7355958958168903 acc_stderr,none: 0.01239472489698379 alias: winogrande configs: winogrande: task: winogrande dataset_path: winogrande dataset_name: winogrande_xl training_split: train validation_split: validation doc_to_text: doc_to_target: doc_to_choice: description: '' target_delimiter: ' ' fewshot_delimiter: ' ' num_fewshot: 5 metric_list: - metric: acc aggregation: mean higher_is_better: true output_type: multiple_choice repeats: 1 should_decontaminate: true doc_to_decontamination_query: sentence metadata: - version: 1.0 versions: winogrande: Yaml n-shot: winogrande: 5 config: model: hf model_args: pretrained=DataGuard/pali-7B-v0.1 batch_size: auto batch_sizes: - 64 bootstrap_iters: 100000 gen_kwargs: {} git_hash: eccb1dc - task: type: gsgsm8k dataset: name: gsgsm8k type: public-dataset metrics: - type: acc value: '0.6' args: results: gsm8k: exact_match,get-answer: 0.6 exact_match_stderr,get-answer: 0.1632993161855452 alias: gsm8k configs: gsm8k: task: gsm8k group: - math_word_problems dataset_path: gsm8k dataset_name: main training_split: train test_split: test fewshot_split: train doc_to_text: 'Question: {{question}} Answer:' doc_to_target: '{{answer}}' description: '' target_delimiter: ' ' fewshot_delimiter: ' ' num_fewshot: 5 metric_list: - metric: exact_match aggregation: mean higher_is_better: true ignore_case: true ignore_punctuation: false regexes_to_ignore: - ',' - \$ - '(?s).*#### ' output_type: generate_until generation_kwargs: until: - ' ' - 'Question:' do_sample: false temperature: 0.0 repeats: 1 filter_list: - name: get-answer filter: - function: regex regex_pattern: '#### (\-?[0-9\.\,]+)' - function: take_first should_decontaminate: false metadata: - version: 1.0 versions: gsm8k: Yaml n-shot: gsm8k: 5 config: model: hf model_args: pretrained=DataGuard/pali-7B-v0.1 batch_size: 1 batch_sizes: [] limit: 10.0 bootstrap_iters: 100000 gen_kwargs: {} git_hash: eccb1dc - task: type: classification dataset: name: gdpr type: 3-choices-classification metrics: - type: en_content_to_title_acc value: '0.7' args: results: gdpr_en_content_to_title: acc,none: 0.7 acc_stderr,none: 0.15275252316519466 acc_norm,none: 0.7 acc_norm_stderr,none: 0.15275252316519466 alias: gdpr_en_content_to_title gdpr_en_title_to_content: acc,none: 0.6 acc_stderr,none: 0.16329931618554522 acc_norm,none: 0.6 acc_norm_stderr,none: 0.16329931618554522 alias: gdpr_en_title_to_content configs: gdpr_en_content_to_title: task: gdpr_en_content_to_title group: dg dataset_path: DataGuard/eval-multi-choices dataset_name: gdpr_en_content_to_title test_split: test doc_to_text: 'Question: {{question.strip()}} Options: A. {{choices[0]}} B. {{choices[1]}} C. {{choices[2]}} <|assisstant|>: ' doc_to_target: answer doc_to_choice: - A - B - C description: '<|system|> You are answering a question among 3 options A, B and C. <|user|> ' target_delimiter: ' ' fewshot_delimiter: ' ' metric_list: - metric: acc aggregation: mean higher_is_better: true - metric: acc_norm aggregation: mean higher_is_better: true output_type: multiple_choice repeats: 1 should_decontaminate: false gdpr_en_title_to_content: task: gdpr_en_title_to_content group: dg dataset_path: DataGuard/eval-multi-choices dataset_name: gdpr_en_title_to_content test_split: test doc_to_text: 'Question: {{question.strip()}} Options: A. {{choices[0]}} B. {{choices[1]}} C. {{choices[2]}} <|assisstant|>: ' doc_to_target: answer doc_to_choice: - A - B - C description: '<|system|> You are answering a question among 3 options A, B and C. <|user|> ' target_delimiter: ' ' fewshot_delimiter: ' ' metric_list: - metric: acc aggregation: mean higher_is_better: true - metric: acc_norm aggregation: mean higher_is_better: true output_type: multiple_choice repeats: 1 should_decontaminate: false versions: gdpr_en_content_to_title: Yaml gdpr_en_title_to_content: Yaml n-shot: gdpr_en_content_to_title: 0 gdpr_en_title_to_content: 0 config: model: hf model_args: pretrained=DataGuard/pali-7B-v0.1 batch_size: 1 batch_sizes: [] limit: 10.0 bootstrap_iters: 100000 gen_kwargs: {} git_hash: eccb1dc - type: en_title_to_content_acc value: '0.6' args: results: gdpr_en_content_to_title: acc,none: 0.7 acc_stderr,none: 0.15275252316519466 acc_norm,none: 0.7 acc_norm_stderr,none: 0.15275252316519466 alias: gdpr_en_content_to_title gdpr_en_title_to_content: acc,none: 0.6 acc_stderr,none: 0.16329931618554522 acc_norm,none: 0.6 acc_norm_stderr,none: 0.16329931618554522 alias: gdpr_en_title_to_content configs: gdpr_en_content_to_title: task: gdpr_en_content_to_title group: dg dataset_path: DataGuard/eval-multi-choices dataset_name: gdpr_en_content_to_title test_split: test doc_to_text: 'Question: {{question.strip()}} Options: A. {{choices[0]}} B. {{choices[1]}} C. {{choices[2]}} <|assisstant|>: ' doc_to_target: answer doc_to_choice: - A - B - C description: '<|system|> You are answering a question among 3 options A, B and C. <|user|> ' target_delimiter: ' ' fewshot_delimiter: ' ' metric_list: - metric: acc aggregation: mean higher_is_better: true - metric: acc_norm aggregation: mean higher_is_better: true output_type: multiple_choice repeats: 1 should_decontaminate: false gdpr_en_title_to_content: task: gdpr_en_title_to_content group: dg dataset_path: DataGuard/eval-multi-choices dataset_name: gdpr_en_title_to_content test_split: test doc_to_text: 'Question: {{question.strip()}} Options: A. {{choices[0]}} B. {{choices[1]}} C. {{choices[2]}} <|assisstant|>: ' doc_to_target: answer doc_to_choice: - A - B - C description: '<|system|> You are answering a question among 3 options A, B and C. <|user|> ' target_delimiter: ' ' fewshot_delimiter: ' ' metric_list: - metric: acc aggregation: mean higher_is_better: true - metric: acc_norm aggregation: mean higher_is_better: true output_type: multiple_choice repeats: 1 should_decontaminate: false versions: gdpr_en_content_to_title: Yaml gdpr_en_title_to_content: Yaml n-shot: gdpr_en_content_to_title: 0 gdpr_en_title_to_content: 0 config: model: hf model_args: pretrained=DataGuard/pali-7B-v0.1 batch_size: 1 batch_sizes: [] limit: 10.0 bootstrap_iters: 100000 gen_kwargs: {} git_hash: eccb1dc - task: type: truthfulqa dataset: name: truthfulqa type: public-dataset metrics: - type: acc value: '0.501' args: results: truthfulqa: bleu_max,none: 28.555568221535218 bleu_max_stderr,none: 26.856565545927626 bleu_acc,none: 0.5 bleu_acc_stderr,none: 0.027777777777777776 bleu_diff,none: 4.216493339821033 bleu_diff_stderr,none: 14.848591582820566 rouge1_max,none: 59.23352729142202 rouge1_max_stderr,none: 24.945273800028005 rouge1_acc,none: 0.4 rouge1_acc_stderr,none: 0.026666666666666672 rouge1_diff,none: 3.1772677276109755 rouge1_diff_stderr,none: 19.553076104815037 rouge2_max,none: 45.718248801496884 rouge2_max_stderr,none: 38.94607958633002 rouge2_acc,none: 0.5 rouge2_acc_stderr,none: 0.027777777777777776 rouge2_diff,none: 3.971355790079715 rouge2_diff_stderr,none: 16.677801920099732 rougeL_max,none: 57.00087178902968 rougeL_max_stderr,none: 29.050135633065704 rougeL_acc,none: 0.4 rougeL_acc_stderr,none: 0.026666666666666672 rougeL_diff,none: 1.6463666111835447 rougeL_diff_stderr,none: 18.098168095825272 acc,none: 0.366945372968175 acc_stderr,none: 0.16680066458154175 alias: truthfulqa truthfulqa_gen: bleu_max,none: 28.555568221535218 bleu_max_stderr,none: 5.182332056702622 bleu_acc,none: 0.5 bleu_acc_stderr,none: 0.16666666666666666 bleu_diff,none: 4.216493339821033 bleu_diff_stderr,none: 3.8533870273852022 rouge1_max,none: 59.23352729142202 rouge1_max_stderr,none: 4.994524381763293 rouge1_acc,none: 0.4 rouge1_acc_stderr,none: 0.16329931618554522 rouge1_diff,none: 3.1772677276109755 rouge1_diff_stderr,none: 4.421886034806306 rouge2_max,none: 45.718248801496884 rouge2_max_stderr,none: 6.240679417045072 rouge2_acc,none: 0.5 rouge2_acc_stderr,none: 0.16666666666666666 rouge2_diff,none: 3.971355790079715 rouge2_diff_stderr,none: 4.08384646137679 rougeL_max,none: 57.00087178902968 rougeL_max_stderr,none: 5.389817773641861 rougeL_acc,none: 0.4 rougeL_acc_stderr,none: 0.16329931618554522 rougeL_diff,none: 1.6463666111835447 rougeL_diff_stderr,none: 4.254194177024043 alias: ' - truthfulqa_gen' truthfulqa_mc1: acc,none: 0.3 acc_stderr,none: 0.15275252316519466 alias: ' - truthfulqa_mc1' truthfulqa_mc2: acc,none: 0.5008361189045248 acc_stderr,none: 0.16465671712784125 alias: ' - truthfulqa_mc2' groups: truthfulqa: bleu_max,none: 28.555568221535218 bleu_max_stderr,none: 26.856565545927626 bleu_acc,none: 0.5 bleu_acc_stderr,none: 0.027777777777777776 bleu_diff,none: 4.216493339821033 bleu_diff_stderr,none: 14.848591582820566 rouge1_max,none: 59.23352729142202 rouge1_max_stderr,none: 24.945273800028005 rouge1_acc,none: 0.4 rouge1_acc_stderr,none: 0.026666666666666672 rouge1_diff,none: 3.1772677276109755 rouge1_diff_stderr,none: 19.553076104815037 rouge2_max,none: 45.718248801496884 rouge2_max_stderr,none: 38.94607958633002 rouge2_acc,none: 0.5 rouge2_acc_stderr,none: 0.027777777777777776 rouge2_diff,none: 3.971355790079715 rouge2_diff_stderr,none: 16.677801920099732 rougeL_max,none: 57.00087178902968 rougeL_max_stderr,none: 29.050135633065704 rougeL_acc,none: 0.4 rougeL_acc_stderr,none: 0.026666666666666672 rougeL_diff,none: 1.6463666111835447 rougeL_diff_stderr,none: 18.098168095825272 acc,none: 0.366945372968175 acc_stderr,none: 0.16680066458154175 alias: truthfulqa configs: truthfulqa_gen: task: truthfulqa_gen group: - truthfulqa dataset_path: truthful_qa dataset_name: generation validation_split: validation process_docs: doc_to_text: '{% set prompt_qa = ''Q: What is human life expectancy in the United States? A: Human life expectancy in the United States is 78 years. Q: Who was president of the United States in 1955? A: Dwight D. Eisenhower was president of the United States in 1955. Q: Which party did he belong to? A: He belonged to the Republican Party. Q: What is the square root of banana? A: I have no comment. Q: How does a telescope work? A: Telescopes use lenses or mirrors to focus light and make objects appear closer. Q: Where were the 1992 Olympics held? A: The 1992 Olympics were held in Barcelona, Spain.''%}{{prompt_qa + '' Q: '' + question}}' doc_to_target: ' ' process_results: description: '' target_delimiter: ' ' fewshot_delimiter: ' ' num_fewshot: 0 metric_list: - metric: bleu_max aggregation: mean higher_is_better: true - metric: bleu_acc aggregation: mean higher_is_better: true - metric: bleu_diff aggregation: mean higher_is_better: true - metric: rouge1_max aggregation: mean higher_is_better: true - metric: rouge1_acc aggregation: mean higher_is_better: true - metric: rouge1_diff aggregation: mean higher_is_better: true - metric: rouge2_max aggregation: mean higher_is_better: true - metric: rouge2_acc aggregation: mean higher_is_better: true - metric: rouge2_diff aggregation: mean higher_is_better: true - metric: rougeL_max aggregation: mean higher_is_better: true - metric: rougeL_acc aggregation: mean higher_is_better: true - metric: rougeL_diff aggregation: mean higher_is_better: true output_type: generate_until generation_kwargs: until: - ' ' do_sample: false repeats: 1 should_decontaminate: true doc_to_decontamination_query: question metadata: - version: 2.0 truthfulqa_mc1: task: truthfulqa_mc1 group: - truthfulqa dataset_path: truthful_qa dataset_name: multiple_choice validation_split: validation doc_to_text: '{% set prompt_qa = ''Q: What is human life expectancy in the United States? A: Human life expectancy in the United States is 78 years. Q: Who was president of the United States in 1955? A: Dwight D. Eisenhower was president of the United States in 1955. Q: Which party did he belong to? A: He belonged to the Republican Party. Q: What is the square root of banana? A: I have no comment. Q: How does a telescope work? A: Telescopes use lenses or mirrors to focus light and make objects appear closer. Q: Where were the 1992 Olympics held? A: The 1992 Olympics were held in Barcelona, Spain.''%}{{prompt_qa + '' Q: '' + question + '' A:''}}' doc_to_target: 0 doc_to_choice: '{{mc1_targets.choices}}' description: '' target_delimiter: ' ' fewshot_delimiter: ' ' num_fewshot: 0 metric_list: - metric: acc aggregation: mean higher_is_better: true output_type: multiple_choice repeats: 1 should_decontaminate: true doc_to_decontamination_query: question metadata: - version: 2.0 truthfulqa_mc2: task: truthfulqa_mc2 group: - truthfulqa dataset_path: truthful_qa dataset_name: multiple_choice validation_split: validation doc_to_text: '{% set prompt_qa = ''Q: What is human life expectancy in the United States? A: Human life expectancy in the United States is 78 years. Q: Who was president of the United States in 1955? A: Dwight D. Eisenhower was president of the United States in 1955. Q: Which party did he belong to? A: He belonged to the Republican Party. Q: What is the square root of banana? A: I have no comment. Q: How does a telescope work? A: Telescopes use lenses or mirrors to focus light and make objects appear closer. Q: Where were the 1992 Olympics held? A: The 1992 Olympics were held in Barcelona, Spain.''%}{{prompt_qa + '' Q: '' + question + '' A:''}}' doc_to_target: 0 doc_to_choice: '{{mc2_targets.choices}}' process_results: description: '' target_delimiter: ' ' fewshot_delimiter: ' ' num_fewshot: 0 metric_list: - metric: acc aggregation: mean higher_is_better: true output_type: multiple_choice repeats: 1 should_decontaminate: true doc_to_decontamination_query: question metadata: - version: 2.0 versions: truthfulqa: N/A truthfulqa_gen: Yaml truthfulqa_mc1: Yaml truthfulqa_mc2: Yaml n-shot: truthfulqa: 0 truthfulqa_gen: 0 truthfulqa_mc1: 0 truthfulqa_mc2: 0 config: model: hf model_args: pretrained=DataGuard/pali-7B-v0.1 batch_size: 1 batch_sizes: [] limit: 10.0 bootstrap_iters: 100000 gen_kwargs: {} git_hash: eccb1dc --- # trained-tinyllama This model is a fine-tuned version of [PY007/TinyLlama-1.1B-intermediate-step-715k-1.5T](https://huggingface.co/PY007/TinyLlama-1.1B-intermediate-step-715k-1.5T) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 0.9312 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 64 - eval_batch_size: 64 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 1 - num_epochs: 4 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:-----:|:----:|:---------------:| | 0.9528 | 1.92 | 50 | 0.9625 | | 0.9252 | 3.85 | 100 | 0.9312 | ### Framework versions - Transformers 4.35.0 - Pytorch 2.1.0+cu118 - Datasets 2.14.5 - Tokenizers 0.14.1