@appvoid on Hugging Face: "700m parameters are the sweet spot for cpu usage, please let's make more of…"

How long did it take to reply and what are your context window limits? Model type?

it takes 3-5 seconds to reply when the prompt is longer than 30-50 words on average but it increases linearly with number of tokens in the prompt, the one on the picture is llama 3 1b but the one i'm using right now is arco 2 which is a llama model, cannot keep any kind of general knowledge, i noticed with qwen 2 (and later confirmed with meta's model) that you don't need a lot of parameters to get general knowledge, you just need tons of data

Join the conversation