Sleeping
๐ข
top_k
arbitrarily discarding high-quality continuations? Or top_p
forgetting to exclude low-probability tokens, derailing your generation? Try out the new min_p
flag in generate
, fresh from a PR merged today! ๐ฅฌmin_p
flag) and multiplies it by the probability of the most likely token in the distribution for the next token. All tokens less likely than the resulting value are filtered. What happens with this strategy?min_p
to a low value, between 0.05 and 0.1. It behaves particularly well for creative text generation when paired up with temperature > 1.SentenceTransformer("all-MiniLM-L6-v2", backend="onnx")
. Does your model not have an ONNX or OpenVINO file yet? No worries - it'll be autoexported for you. Thank me later ๐from_model2vec
or with from_distillation
where you do the distillation yourself. It'll only take 5 seconds on GPU & 2 minutes on CPU, no dataset needed.