Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
@@ -46,7 +46,6 @@ accross various devices, can be found [here](https://aihub.qualcomm.com/models/q
|
|
46 |
- Supported languages: English, Chinese, German, French, Spanish, Portuguese, Italian, Dutch, Russian, Czech, Polish, Arabic, Persian, Hebrew, Turkish, Japanese, Korean, Vietnamese, Thai, Indonesian, Malay, Lao, Burmese, Cebuano, Khmer, Tagalog, Hindi, Bengali, Urdu.
|
47 |
- TTFT: Time To First Token is the time it takes to generate the first response token. This is expressed as a range because it varies based on the length of the prompt. The lower bound is for a short prompt (up to 128 tokens, i.e., one iteration of the prompt processor) and the upper bound is for a prompt using the full context length (4096 tokens).
|
48 |
- Response Rate: Rate of response generation after the first response token.
|
49 |
-
- Tiny MMLU: Tiny MMLU (Massive Multitask Language Understanding) is an English language benchmark designed to measure knowledge acquired during pretraining by evaluating models exclusively in zero-shot and few-shot settings. This makes the benchmark more challenging and more similar to how we evaluate humans.
|
50 |
|
51 |
| Model | Device | Chipset | Target Runtime | Response Rate (tokens per second) | Time To First Token (range, seconds)
|
52 |
|---|---|---|---|---|---|
|
|
|
46 |
- Supported languages: English, Chinese, German, French, Spanish, Portuguese, Italian, Dutch, Russian, Czech, Polish, Arabic, Persian, Hebrew, Turkish, Japanese, Korean, Vietnamese, Thai, Indonesian, Malay, Lao, Burmese, Cebuano, Khmer, Tagalog, Hindi, Bengali, Urdu.
|
47 |
- TTFT: Time To First Token is the time it takes to generate the first response token. This is expressed as a range because it varies based on the length of the prompt. The lower bound is for a short prompt (up to 128 tokens, i.e., one iteration of the prompt processor) and the upper bound is for a prompt using the full context length (4096 tokens).
|
48 |
- Response Rate: Rate of response generation after the first response token.
|
|
|
49 |
|
50 |
| Model | Device | Chipset | Target Runtime | Response Rate (tokens per second) | Time To First Token (range, seconds)
|
51 |
|---|---|---|---|---|---|
|