qaihm-bot commited on
Commit
d15b2d0
·
verified ·
1 Parent(s): 7c9c2f5

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +0 -1
README.md CHANGED
@@ -46,7 +46,6 @@ accross various devices, can be found [here](https://aihub.qualcomm.com/models/q
46
  - Supported languages: English, Chinese, German, French, Spanish, Portuguese, Italian, Dutch, Russian, Czech, Polish, Arabic, Persian, Hebrew, Turkish, Japanese, Korean, Vietnamese, Thai, Indonesian, Malay, Lao, Burmese, Cebuano, Khmer, Tagalog, Hindi, Bengali, Urdu.
47
  - TTFT: Time To First Token is the time it takes to generate the first response token. This is expressed as a range because it varies based on the length of the prompt. The lower bound is for a short prompt (up to 128 tokens, i.e., one iteration of the prompt processor) and the upper bound is for a prompt using the full context length (4096 tokens).
48
  - Response Rate: Rate of response generation after the first response token.
49
- - Tiny MMLU: Tiny MMLU (Massive Multitask Language Understanding) is an English language benchmark designed to measure knowledge acquired during pretraining by evaluating models exclusively in zero-shot and few-shot settings. This makes the benchmark more challenging and more similar to how we evaluate humans.
50
 
51
  | Model | Device | Chipset | Target Runtime | Response Rate (tokens per second) | Time To First Token (range, seconds)
52
  |---|---|---|---|---|---|
 
46
  - Supported languages: English, Chinese, German, French, Spanish, Portuguese, Italian, Dutch, Russian, Czech, Polish, Arabic, Persian, Hebrew, Turkish, Japanese, Korean, Vietnamese, Thai, Indonesian, Malay, Lao, Burmese, Cebuano, Khmer, Tagalog, Hindi, Bengali, Urdu.
47
  - TTFT: Time To First Token is the time it takes to generate the first response token. This is expressed as a range because it varies based on the length of the prompt. The lower bound is for a short prompt (up to 128 tokens, i.e., one iteration of the prompt processor) and the upper bound is for a prompt using the full context length (4096 tokens).
48
  - Response Rate: Rate of response generation after the first response token.
 
49
 
50
  | Model | Device | Chipset | Target Runtime | Response Rate (tokens per second) | Time To First Token (range, seconds)
51
  |---|---|---|---|---|---|