|
--- |
|
license: apache-2.0 |
|
pipeline_tag: text-generation |
|
tags: |
|
- code |
|
datasets: |
|
- semiotic/SynQL-KaggleDBQA-Train |
|
language: |
|
- en |
|
base_model: |
|
- google-t5/t5-3b |
|
--- |
|
|
|
# Model Card for T5-3B/SynQL-KaggleDBQA-Train-Run-01 |
|
- Developed by: Semiotic Labs |
|
- Model type: [Text to SQL] |
|
- License: [Apache-2.0] |
|
- Finetuned from model: [google-t5/t5-3b](https://huggingface.co/google-t5/t5-3b) |
|
- Dataset used for finetuning: [semiotic/SynQL-KaggleDBQA-Train](https://huggingface.co/datasets/semiotic/SynQL-KaggleDBQA-Train/blob/main/README.md) |
|
|
|
## Model Context |
|
|
|
Example metadata can be found below, context represents the prompt that is presented to the model. Database schemas follow the encoding method proposed by [Shaw et al (2020)](https://arxiv.org/pdf/2010.12725). |
|
``` |
|
"query": "SELECT count(*) FROM singer", |
|
"question": "How many singers do we have?", |
|
"context": "How many singers do we have? | concert_singer | stadium : stadium_id, location, name, capacity, highest, lowest, average | singer : singer_id, name, country, song_name, song_release_year, age, is_male | concert : concert_id, concert_name, theme, stadium_id, year | singer_in_concert : concert_id, singer_id", |
|
"db_id": "concert_singer", |
|
``` |
|
## Model Results |
|
|
|
Evaluation set: [KaggleDBQA/test](https://github.com/Chia-Hsuan-Lee/KaggleDBQA) |
|
|
|
Evaluation metrics: [Execution Accuracy] |
|
|
|
| Model | Data | Run | Execution Accuracy | |
|
|-------|------|-----|-------------------| |
|
| T5-3B | semiotic/SynQL-KaggleDBQA | 00 | 0.3514 | |
|
| T5-3B | semiotic/SynQL-KaggleDBQA | 01 | 0.3514 | |
|
| T5-3B | semiotic/SynQL-KaggleDBQA | 02 | 0.3514 | |