qhfmshal
/

TRPaliGemma

Image-Text-to-Text

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

TRPaliGemma / README.md

qhfmshal's picture

Update README.md

14f738d verified about 2 months ago

|

history blame contribute delete

2.94 kB

	---
	library_name: transformers
	tags: []
	---

	# Model Card for TRPaliGemma

	This model is fine-tuned PaliGemma model for the Table recognition task.
	<!-- Provide a quick summary of what the model is/does. -->


	## Model Details

	### Model Description

	Table recognition is a branch of Document AI.
	In the existing Table recognition, the structure of the table and the OCR results were calculated and combined, respectively.
	For this reason, unnecessary predictions are sometimes made in the process of parsing the table.(ex. bbox)
	Using VLM, the structure and text of the table will be predicted at the same time, eliminating unnecessary predictions and integrating the two tasks into one.

	<!-- Provide a longer summary of what this model is. -->

	This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

	- Developed by: Seokhyun Choi
	- Funded by [optional]: [More Information Needed]
	- Shared by [optional]: [More Information Needed]
	- Model type: Vision Language Model
	- Language(s) (NLP): English
	- License: [More Information Needed]
	- Finetuned from model [optional]: PaliGemma

	### Model Sources [optional]

	<!-- Provide the basic links for the model. -->

	- Repository: [More Information Needed]
	- Paper [optional]: [More Information Needed]
	- Demo [optional]: [More Information Needed]

	## Uses

	<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

	### Direct Use

	<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

	This model can convert a tabular images into HTML.

	### Downstream Use [optional]

	<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->

	It can be used in document automation systems using Document AI.

	### Out-of-Scope Use

	<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->

	This is a fine-tuned model with only the tabular images that exist within the PDF, so you won't get good performance in the tabular images in the wild.

	## Bias, Risks, and Limitations

	<!-- This section is meant to convey both technical and sociotechnical limitations. -->

	This model simply converts table images into HTML.
	To gain additional analysis or knowledge,
	you need to learn an NLP model for analysis using HTML or fine-tune the new PaliGemma model by constructing new data.

	## How to Get Started with the Model

	inference : https://www.kaggle.com/code/mldlchoidh/tr-inference

	## Training Details

	### Training Data

	<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

	Pubtables1-1M