t-tech
/

T-pro-it-1.0-Q5_K_M-GGUF

Inference Endpoints

Model card Files Files and versions Community

T-pro-it-1.0-Q5_K_M-GGUF / README.md

germanjke's picture

Create README.md

dd86373 verified 4 days ago

|

3.33 kB

	---
	language:
	- ru
	base_model: t-tech/T-pro-it-1.0
	tags:
	- llama-cpp
	---

	# T-pro-it-1.0-Q5_K_M-GGUF

	🚨 T-pro is designed for further fine-tuning and is not intended as a ready-to-use conversational assistant. Users are advised to exercise caution and are responsible for any additional training and oversight required to ensure the model's responses meet acceptable ethical and safety standards. The responsibility for incorporating this model into industrial or commercial solutions lies entirely with those who choose to deploy it.

	## Description

	This repository contains the [`T-pro-it-1.0`](https://huggingface.co/t-tech/T-pro-it-1.0/) model, which has been quantized into the GGUF format using the [`llama.cpp`](https://github.com/ggerganov/llama.cpp) repository.

	## 📊 Benchmarks

	Proprietary models:

	\| Benchmark \| T-pro-it-1.0 \| T-pro-it-1.0-Q4_K_M \|T-pro-it-1.0-Q5_K_M \|T-pro-it-1.0-Q6_K \|T-pro-it-1.0-Q8_0 \|GPT-4o \| GPT-4o-mini \| GigaChat Max 1.0.26.20 \|
	\|------------------------------------------------\|-----------------------\|------------------------\|-----------------------\|------------------\|------------------\|------------------------------\|-----------------------\|---------------------\|
	\| Arena-Hard-Ru \| 90.17 \| 89.0 \|89.29 \|88.5 \|89.35 \| <u>84.87</u> \| 81 \| - \|

	Open-source models:

	\| Benchmark \| T-pro-it-1.0 \| T-pro-it-1.0-Q4_K_M \|T-pro-it-1.0-Q5_K_M \|T-pro-it-1.0-Q6_K \|T-pro-it-1.0-Q8_0 \| Qwen-2.5-32B-Instruct \| T-pro-it-1.0 \| gemma-2-27b-it \| Llama-3.3-70B-Instruct \|
	\|------------------------------------------------\|---------------------------\|------------------------\|-----------------------\|------------------\|------------------\|-------------------------------\|------------------------------\|------------------------\|------------------------\|
	\| Arena-Hard-Ru \| 90.17 \| 89.0 \|89.29 \|88.5 \|89.35 \| 74.54 \| <u>80.23</u> \| 66.4 \| 76.51 \|

	## Llama.cpp usage

	### Server

	From HF:

	```bash
	llama-server --hf-repo t-tech/T-pro-it-1.0-Q5_K_M-GGUF --hf-file t-pro-it-1.0-q5_k_m.gguf -c 8192
	```

	Or locally:

	```bash
	./build/bin/llama-server -m t-pro-it-1.0-q5_k_m.gguf -c 8192
	```

	### POST

	```bash
	curl --request POST \
	--url http://localhost:8080/completion \
	--header "Content-Type: application/json" \
	--data '{
	"prompt": "<\|im_start\|>user\nРасскажи мне чем отличается Python от C++?\n<\|im_end\|>\n<\|im_start\|>assistant\n",
	"n_predict": 256
	}'

	```


	## ollama usage

	### Serve

	```bash
	ollama serve
	```

	### Run

	From HF:

	```bash
	ollama run hf.co/t-tech/T-pro-it-1.0-Q5_K_M-GGUF/
	```

	Or locally:

	```bash
	ollama create example -f Modelfile
	ollama run example "Расскажи мне про отличия C++ и Python"
	```

	where `Modelfile` is

	```bash
	FROM ./t-pro-it-1.0-q5_k_m.gguf
	```