Update README.md
Browse files
README.md
CHANGED
@@ -8,6 +8,7 @@ metrics:
|
|
8 |
base_model:
|
9 |
- Qwen/Qwen2.5-0.5B
|
10 |
---
|
|
|
11 |
The model is an intermediate product of the [EPCD (Easy-Data-Clean-Pipeline)](https://github.com/ytzfhqs/EDCP) project, primarily used to distinguish between the main content and non-content (such as book introductions, publisher information, writing standards, revision notes) of **medical textbooks** after performing OCR using [MinerU](https://github.com/opendatalab/MinerU). The base model uses [Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B), avoiding the length limitation of the Bert Tokenizer while providing higher accuracy.
|
12 |
|
13 |
# Data Composition
|
|
|
8 |
base_model:
|
9 |
- Qwen/Qwen2.5-0.5B
|
10 |
---
|
11 |
+
# Qwen2.5-med-book-main-classification
|
12 |
The model is an intermediate product of the [EPCD (Easy-Data-Clean-Pipeline)](https://github.com/ytzfhqs/EDCP) project, primarily used to distinguish between the main content and non-content (such as book introductions, publisher information, writing standards, revision notes) of **medical textbooks** after performing OCR using [MinerU](https://github.com/opendatalab/MinerU). The base model uses [Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B), avoiding the length limitation of the Bert Tokenizer while providing higher accuracy.
|
13 |
|
14 |
# Data Composition
|