kheopss commited on
Commit
19c4633
·
verified ·
1 Parent(s): 3d0e7c9

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +23 -28
app.py CHANGED
@@ -132,39 +132,34 @@ def pdf_to_images(pdf_path, dpi=300, output_format='JPEG'):
132
  # -- START -- set up run variables
133
 
134
  system_msg = """
135
- You are a data extraction assistant specialized in French real estate documents. Your task is to extract key information and ensure the validity of the data according to the following rules:
136
- 1. Extract the following values: "Total_a_payer", "Fond_travaux_alur", "Part_charges_previsionnelles", and "Part_autres_travaux".
137
- 2. Ensure that "Total_a_payer" is equal to the sum of "Fond_travaux_alur", "Part_charges_previsionnelles", and "Part_autres_travaux". If the amounts do not match, return a JSON with an error message.
138
- 3. Additionally, extract the property owner’s name, address or lot number, and the period of the document.
139
- 4. Return the extracted data in JSON format.
 
 
 
 
 
 
 
 
 
 
 
140
 
141
  ### Validation Rule:
142
- - Total_a_payer = Fond_travaux_alur + Part_charges_previsionnelles + Part_autres_travaux.
 
 
 
143
 
144
- ### Expected output:
145
- - A JSON containing all the extracted values if the validation passes.
146
- - A JSON with an error message if the validation fails.
147
  """.strip()
148
 
149
 
150
- """
151
- Sachant que Total à payer doit etre egal à Fond travaux alur + Part charges prévisionnelles+ Part autres travaux - le solde précédent"""
152
- # The user message
153
- user_msg = """
154
- fournit les informations suivante sous format json uniquement:
155
- -Total à payer
156
- -Fond travaux loi alur et non pas le fond de participation
157
- -Total Part charges prévisionnelles
158
- -Part autres travaux
159
- -le solde précédent
160
- -identifier le propriétaire
161
- - l’adresse du propriétaire ou le numéro du lot du propriétaire si l'adresse n'est pas trouvé
162
- - nom du locataire
163
- -l'adresse de copropriété : et non pas l'alresse de l'agence immobiere
164
- -la reference:
165
- - date du document
166
- - date limit du payement
167
- """.strip()
168
 
169
 
170
 
@@ -178,7 +173,7 @@ def process(pdf):
178
  image_paths = pdf_to_images(pdf)
179
  system = set_system_message(system_msg)
180
  chat_hist = [] # list of more user/assistant items
181
- user = set_user_message(user_msg, image_paths, max_size)
182
 
183
  params = { # dictionary format for ** unpacking
184
  "model": "gpt-4o",
 
132
  # -- START -- set up run variables
133
 
134
  system_msg = """
135
+ You are an intelligent assistant tasked with extracting and validating information from French real estate syndic documents (*appel de fonds*). These documents contain financial details, property information, and owner details. Your job is to extract and ensure the correctness of the following information:
136
+
137
+ ### Task Overview:
138
+ You need to extract and validate the following fields:
139
+ 1. **Total à payer**: The total amount the owner must pay for the period.
140
+ 2. **Fond travaux alur**: The amount allocated to the ALUR works fund.
141
+ 3. **Total Part charges prévisionnelles**: The forecasted portion of charges the owner must pay for general building maintenance, collective services, etc.
142
+ 4. **Part autres travaux**: Any additional expenses related to specific works or repairs.
143
+ 5. **le solde précédent**: The previous balance from past transactions (can be positive or negative).
144
+ 6. **Propriétaire**: The name of the property owner.
145
+ 7. **Adresse du propriétaire**: The postal address of the owner.
146
+ 8. **Adresse du bien**: The location of the property (address of the unit or building).
147
+ 9. **Référence**: The reference number of the document or account related to the property.
148
+ 10. **Date du document**: The date when the document was issued.
149
+ 11. **Date limite du paiement**: The deadline by which the payment must be made.
150
+ 12. **Montant total solde en notre faveur**: The total balance in favor of the syndic (if applicable).
151
 
152
  ### Validation Rule:
153
+ The following validation rules must be respected:
154
+ - **Total à payer** = **Fond travaux alur** + **Total Part charges prévisionnelles** + **Part autres travaux**.
155
+ - The amounts should be taken from the "débit" column, not the "crédit" column, to ensure accuracy. Verify that the **Total à payer** is from the correct column (débit).
156
+ - Additionally, both **Total à payer** and **Montant total solde en notre faveur** should be extracted for a cross-check to ensure that the final amounts are accurate and reflect the correct financial state.
157
 
158
+ ### Format for Output:
159
+ Return the extracted information in JSON format. If there is a discrepancy (such as a mismatch between amounts or amounts found in the wrong column), return an error message in JSON format explaining the issue.
 
160
  """.strip()
161
 
162
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
163
 
164
 
165
 
 
173
  image_paths = pdf_to_images(pdf)
174
  system = set_system_message(system_msg)
175
  chat_hist = [] # list of more user/assistant items
176
+ user = set_user_message(image_paths, max_size)
177
 
178
  params = { # dictionary format for ** unpacking
179
  "model": "gpt-4o",