Papers
arxiv:2408.04015

Image-to-LaTeX Converter for Mathematical Formulas and Text

Published on Aug 7
Authors:

Abstract

In this project, we train a vision encoder-decoder model to generate LaTeX code from images of mathematical formulas and text. Utilizing a diverse collection of image-to-LaTeX data, we build two models: a base model with a Swin Transformer encoder and a GPT-2 decoder, trained on machine-generated images, and a fine-tuned version enhanced with Low-Rank Adaptation (LoRA) trained on handwritten formulas. We then compare the BLEU performance of our specialized model on a handwritten test set with other similar models, such as Pix2Text, TexTeller, and Sumen. Through this project, we contribute open-source models for converting images to LaTeX and provide from-scratch code for building these models with distributed training and GPU optimizations.

Community

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2408.04015 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2408.04015 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.