|
--- |
|
library_name: transformers |
|
tags: [] |
|
--- |
|
|
|
# Model Card for TRPaliGemma |
|
|
|
This model is fine-tuned PaliGemma model for the Table recognition task. |
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
Table recognition is a branch of Document AI. |
|
In the existing Table recognition, the structure of the table and the OCR results were calculated and combined, respectively. |
|
For this reason, unnecessary predictions are sometimes made in the process of parsing the table.(ex. bbox) |
|
Using VLM, the structure and text of the table will be predicted at the same time, eliminating unnecessary predictions and integrating the two tasks into one. |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. |
|
|
|
- **Developed by:** Seokhyun Choi |
|
- **Funded by [optional]:** [More Information Needed] |
|
- **Shared by [optional]:** [More Information Needed] |
|
- **Model type:** Vision Language Model |
|
- **Language(s) (NLP):** English |
|
- **License:** [More Information Needed] |
|
- **Finetuned from model [optional]:** PaliGemma |
|
|
|
### Model Sources [optional] |
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
- **Repository:** [More Information Needed] |
|
- **Paper [optional]:** [More Information Needed] |
|
- **Demo [optional]:** [More Information Needed] |
|
|
|
## Uses |
|
|
|
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
|
|
|
### Direct Use |
|
|
|
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. --> |
|
|
|
This model can convert a tabular images into HTML. |
|
|
|
### Downstream Use [optional] |
|
|
|
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app --> |
|
|
|
It can be used in document automation systems using Document AI. |
|
|
|
### Out-of-Scope Use |
|
|
|
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. --> |
|
|
|
This is a fine-tuned model with only the tabular images that exist within the PDF, so you won't get good performance in the tabular images in the wild. |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
<!-- This section is meant to convey both technical and sociotechnical limitations. --> |
|
|
|
This model simply converts table images into HTML. |
|
To gain additional analysis or knowledge, |
|
you need to learn an NLP model for analysis using HTML or fine-tune the new PaliGemma model by constructing new data. |
|
|
|
## How to Get Started with the Model |
|
|
|
inference : https://www.kaggle.com/code/mldlchoidh/tr-inference |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. --> |
|
|
|
Pubtables1-1M |
|
|