cognitivecomputations
/

DeepSeek-V3-AWQ

Text Generation

4-bit precision

Model card Files Files and versions Community

DeepSeek-V3-AWQ / README.md

v2ray's picture

Fixed cache and added prefill ability.

604068d 2 months ago

|

372 Bytes

	---
	license: mit
	language:
	- en
	- zh
	base_model:
	- deepseek-ai/DeepSeek-V3
	pipeline_tag: text-generation
	library_name: transformers
	---
	# DeepSeek V3 AWQ
	AWQ of the DeepSeek V3 chat model.

	This quant modified some of the model code to fix the overflow issue when using float16.

	Tested on vLLM with 8x H100, inference speed 5 tokens/s with batch size 1 and short prompts.