File size: 4,879 Bytes
342bcc4
 
 
 
 
2c5ba69
342bcc4
2c5ba69
 
342bcc4
2c5ba69
342bcc4
2c5ba69
342bcc4
2c5ba69
342bcc4
2c5ba69
342bcc4
6cf50c8
 
 
 
 
 
342bcc4
2c5ba69
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1dd0a96
2c5ba69
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1dd0a96
 
 
3d5660a
1dd0a96
 
 
 
 
 
 
2c5ba69
 
 
342bcc4
 
2c5ba69
342bcc4
2c5ba69
 
 
342bcc4
2c5ba69
 
342bcc4
2c5ba69
 
342bcc4
2c5ba69
 
342bcc4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
---
library_name: transformers
tags: []
---

# Model Card: Falconsai/florence-2-invoice

- **Developed by:** Michael Stattelman for Falcons.ai
- **Funded by [optional]:** Falcons.ai

### Model Sources:

- **Repository:** https://github.com/Falcons-ai/florence2_invoice_finetuning

## Model Overview

`Falconsai/florence-2-invoice` is a fine-tuned version of the `microsoft/Florence-2-base-ft` model. This model has been specifically trained to identify and extract key fields from invoice images. The fine-tuning process utilized a curated dataset of invoices annotated to recognize the following fields:

- Billing address, - Discount percentage, - Due date
- Email client, - Header, - Invoice date
- Invoice number, - Name client, - Products
- Remise, - Shipping address, - Subtotal
- Tax, - Tax percentage, - Tel client, - Total


### Base Model
The base model used for fine-tuning is `microsoft/Florence-2-base-ft`, a state-of-the-art vision model developed by Microsoft.

### Fine-tuning Configuration
The fine-tuning process was carried out using a Low-Rank Adaptation (LoRa) configuration with the following parameters:

```python
LoraConfig(
    r=8,
    lora_alpha=8,
    target_modules=["q_proj", "o_proj", "k_proj", "v_proj", "linear", "Conv2d", "lm_head", "fc2"],
    task_type="CAUSAL_LM",
    lora_dropout=0.05,
    bias="none",
    inference_mode=False,
    use_rslora=True,
    init_lora_weights="gaussian",
    revision=REVISION
)
```

### Hardware Used
The fine-tuning process was conducted on an Alienware system, ensuring robust performance and efficient training.

## Dataset
The model was trained on a curated dataset of invoice images. Each invoice was annotated to identify the specific fields listed above. This dataset ensured that the model learned to accurately detect and extract key information from various invoice formats.

## Usage

### Inference
To use this model for inference, you can load it via the Hugging Face Transformers library:

```python
import torch
from PIL import Image
from transformers import (
    AdamW,
    AutoModelForCausalLM,
    AutoProcessor,
    get_scheduler
)
def run_florence_invoice(img, task_prompt, text_input=None):
    image = Image.open(img)

    # Ensure the image is in RGB format
    if image.mode != "RGB":
        image = image.convert("RGB")
        
        model_id2 = "Falconsai/florence-2-invoice"
        model = AutoModelForCausalLM.from_pretrained(model_id2, trust_remote_code=True).eval().cuda()
        processor = AutoProcessor.from_pretrained(model_id2, trust_remote_code=True)

    with torch.no_grad():
        if text_input is None:
            prompt = task_prompt
        else:
            prompt = task_prompt + text_input
        inputs = processor(text=prompt, images=image, return_tensors="pt")
        generated_ids = model.generate(
        input_ids=inputs["input_ids"].cuda(),
        pixel_values=inputs["pixel_values"].cuda(),
        max_new_tokens=1024,
        num_beams=3
        )
        generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
        parsed_answer = processor.post_process_generation(generated_text, task=task_prompt, image_size=(image.width, image.height))

    del model
    del processor

    return parsed_answer
```
```python
## Call the function as follows:
### Return all fields identified:
img = './invoice.png'
run_florence_invoice(img, '<OD>')

### Return Specific field
img = './invoice.png'
results = run_florence_invoice(img, "<CAPTION_TO_PHRASE_GROUNDING>", text_input="invoice date")

```

### Applications
This model is ideal for automating the extraction of key information from invoices in various business and financial applications. It can significantly reduce the manual effort required for data entry and validation in accounting and bookkeeping processes.

## Evaluation
The model has been evaluated on a held-out set of annotated invoice images. The evaluation metrics used included precision, recall, and F1-score for each of the identified fields. Detailed evaluation results and visualizations are available in the `results` directory of the repository.

## Limitations
- The model's performance is dependent on the quality and variability of the training dataset. It may not perform as well on invoices that significantly differ from those seen during training.
- Fine-tuning was conducted with specific LoRa configurations, which may need to be adjusted for different use cases or datasets.

## Contact
For more information or questions about this model, please contact the developers at [your-email@example.com].

## License
This model is licensed under the MIT License. See the `LICENSE` file for more details.

## Acknowledgments
We would like to thank Microsoft for the development of the Florence2 vision model and the broader machine learning community for their contributions and support.