smcleod
/

phi-4

@@ -26,8 +26,8 @@ library_name: transformers
 ... OK tokenizer seems a bit off
-OK, tokenizer seems a bit off 😂
-(llama.cpp) root at nas in /mnt/llm/models llama-cli -m phi-4.etf16-Q6_K.gguf -p "Tell me a joke." -n 256 -t 8 -c 2048 --temp 0.8 -ngl 99
 ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
 ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
 ggml_cuda_init: found 2 CUDA devices:
@@ -177,7 +177,7 @@ llama_perf_context_print:        load time =    1693.08 ms
 llama_perf_context_print: prompt eval time =      26.42 ms /     7 tokens (    3.77 ms per token,   264.96 tokens per second)
 llama_perf_context_print:        eval time =    3993.62 ms /   238 runs   (   16.78 ms per token,    59.60 tokens per second)
 llama_perf_context_print:       total time =    4034.65 ms /   245 tokens
 ----

 ... OK tokenizer seems a bit off
+```
+llama-cli -m phi-4.etf16-Q6_K.gguf -p "Tell me a joke." -n 256 -t 8 -c 2048 --temp 0.8 -ngl 99
 ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
 ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
 ggml_cuda_init: found 2 CUDA devices:
 llama_perf_context_print: prompt eval time =      26.42 ms /     7 tokens (    3.77 ms per token,   264.96 tokens per second)
 llama_perf_context_print:        eval time =    3993.62 ms /   238 runs   (   16.78 ms per token,    59.60 tokens per second)
 llama_perf_context_print:       total time =    4034.65 ms /   245 tokens
+```
 ----