ytzfhqs
/

Qwen2.5-med-book-main-classification

Model card Files Files and versions Community

ytzfhqs commited on Oct 2

Commit

f26ca2d

•

1 Parent(s): d822c70

Update README.md

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -8,6 +8,7 @@ metrics:
 base_model:
 - Qwen/Qwen2.5-0.5B
 ---
 The model is an intermediate product of the [EPCD (Easy-Data-Clean-Pipeline)](https://github.com/ytzfhqs/EDCP) project, primarily used to distinguish between the main content and non-content (such as book introductions, publisher information, writing standards, revision notes) of **medical textbooks** after performing OCR using [MinerU](https://github.com/opendatalab/MinerU). The base model uses [Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B), avoiding the length limitation of the Bert Tokenizer while providing higher accuracy.
 # Data Composition

 base_model:
 - Qwen/Qwen2.5-0.5B
 ---
+# Qwen2.5-med-book-main-classification
 The model is an intermediate product of the [EPCD (Easy-Data-Clean-Pipeline)](https://github.com/ytzfhqs/EDCP) project, primarily used to distinguish between the main content and non-content (such as book introductions, publisher information, writing standards, revision notes) of **medical textbooks** after performing OCR using [MinerU](https://github.com/opendatalab/MinerU). The base model uses [Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B), avoiding the length limitation of the Bert Tokenizer while providing higher accuracy.
 # Data Composition