--- license: apache-2.0 language: - grc datasets: - Ericu950/Inscriptions_1 base_model: - meta-llama/Meta-Llama-3.1-8B-Instruct library_name: transformers tags: - epigraphy - textual criticism - philology - Ancient Greek - merge - mergekit --- # Epigr_2_Llama-3.1-8B-Instruct_text This is a finetuned version Llama-3.1-8B-Instruct specialized on reconstructing spans of 1–20 missing characters in ancient Greek inscriptions. In spans of 1–10 missing characters it did so with a Character Error Rate of 20.5%, a top-1 accuracy of 63.7%, and top-20 of 83.0% on a test set of 7,811 unseen editions of inscriptions. See https://arxiv.org/abs/2409.13870. ## Usage To run the model on a GPU with large memory capacity, follow these steps: ### 1. Download and load the model ```python import json from transformers import pipeline, AutoTokenizer, LlamaForCausalLM from accelerate import init_empty_weights, load_checkpoint_and_dispatch import torch import warnings warnings.filterwarnings("ignore", message=".*copying from a non-meta parameter in the checkpoint*") model_id = "Ericu950/Epigr_2_Llama-3.1-8B-Instruct_text" with init_empty_weights(): model = LlamaForCausalLM.from_pretrained(model_id) model = load_checkpoint_and_dispatch( model, model_id, device_map="auto", offload_folder="offload", offload_state_dict=True, ) tokenizer = AutoTokenizer.from_pretrained(model_id) generation_pipeline = pipeline( "text-generation", model=model, tokenizer=tokenizer, device_map="auto", ) ``` ### 2. Run inference on an inscription of your choice ```python # this is https://inscriptions.packhum.org/text/359280?bookid=879&location=1678, Cos and Calymna IG XII,4 5:4043 inscription_edition = "----εκτηι ισταμενου· ευμολποσ μολπου επεστατει· πρυτανεων γνωμη μεσσηνεωσ του διονοσ κατασταθεντοσ υπο --—ου του μυγαλου ερμωνοσ του μυιστρου κατασταθεντοσ υπο --—ρατου του προμαχου μολπου του μολπου λεοντοσ του -—ιππου κατασταθεντοσ υπο αριστοφανου του νουμηνιου του στησιοχου ηρακλειτου του αρτεμιδωρου δημοφωντοσ του πρυτανιοσ δαμωνοσ του ονφαλιωνοσ· επειδη οι δικασται οι αποσταλεντεσ εισ καλυμναν κομιζουσιν ψηφισμα παρα του δημου του καλυμνιων εν ωι γεγραπται οτι ο δημοσ ο καλυμνιων στεφανοι τον δημον χρυσωι στεφανωι αρετησ ενεκεν και ευνοιασ τησ εισ αυτον στεφανοι δε και τουσ δικαστασ τουσ αποσταλεντασ χρυσωι στεφανωι καλοκαγαθιασ ενεκεν κλεανδρον διοδωρου λεοντα ευβουλου κεφαλον δρακοντοσ θεοδωρον νουμηνιου λεοντα δρακοντιδου και περι τουτων οιεται δειν επιμελειαν ποιησασθαι τον δημον οπωσ ο τησ πολεωσ στεφανοσ αναγορευθηι και ο των δικαστων εν τωι θεατρωι διονυσιοισ δεδοχθαι τωι δημωι· τον μεν αγωνοθετην αναγγειλαι τον τησ πολεωσ στεφανον και τον των δικαστων κυκλιων τηι πρωτηι· επηιν[7 missing letters] και τουσ δικαστασ τουσ αποσταλεντασ επειδη αξιοι γενομενοι του δημου τιμασ περιεποιησαν τηι πολει·" system_prompt = "Fill in the missing letters in this inscription!" input_messages = [ {"role": "system", "content": system_prompt}, {"role": "user", "content": inscription_edition}, ] terminators = [ tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|>") ] outputs = generation_pipeline( input_messages, max_new_tokens=10, num_beams=30, # Set this as high as your memory will allow! num_return_sequences=10, early_stopping=True, ) beam_contents = [] for output in outputs: generated_text = output.get('generated_text', []) for item in generated_text: if item.get('role') == 'assistant': beam_contents.append(item.get('content')) real_response = "ησθαι δε" print(f"The masked sequence: {real_response}") for i, content in enumerate(beam_contents, start=1): print(f"Suggestion {i}: {content}") ``` ### Expected Output: ``` The masked sequence: ησθαι δε Suggestion 1: ησθαι δε Suggestion 2: εσθαι δε Suggestion 3: εισθαι δ Suggestion 4: εσαι δε ο Suggestion 5: εκεν δε ο Suggestion 6: εισθαι ο Suggestion 7: εσ ο δημος Suggestion 8: εσεν δε ο Suggestion 9: εισθαι δε Suggestion 10: εσ δε και ``` ## Usage on free tier in Google Colab If you don’t have access to a larger GPU but want to try the model out, you can run it in a quantized format in Google Colab. **The quality of the responses will deteriorate significantly!** Follow these steps: ### Step 1: Connect to free GPU 1. Click Connect arrow_drop_down near the top right of the notebook. 2. Select Change runtime type. 3. In the modal window, select T4 GPU as your hardware accelerator. 4. Click Save. 5. Click the Connect button to connect to your runtime. After some time, the button will present a green checkmark, along with RAM and disk usage graphs. This indicates that a server has successfully been created with your required hardware. ### Step 2: Install Dependencies ```python !pip install -U bitsandbytes import os os._exit(00) ``` ### Step 3: Download and quantize the model ```python from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, pipeline import torch quant_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16 ) model = AutoModelForCausalLM.from_pretrained("Ericu950/Epigr_2_Llama-3.1-8B-Instruct_text", device_map = "auto", quantization_config = quant_config) tokenizer = AutoTokenizer.from_pretrained("Ericu950/Epigr_2_Llama-3.1-8B-Instruct_text") generation_pipeline = pipeline( "text-generation", model=model, tokenizer=tokenizer, device_map="auto", ) ``` ### Step 4: Run inference on an inscription of your choice ```python inscription_edition = "----εκτηι ισταμενου· ευμολποσ μολπου επεστατει· πρυτανεων γνωμη μεσσηνεωσ του διονοσ κατασταθεντοσ υπο --—ου του μυγαλου ερμωνοσ του μυιστρου κατασταθεντοσ υπο --—ρατου του προμαχου μολπου του μολπου λεοντοσ του -—ιππου κατασταθεντοσ υπο αριστοφανου του νουμηνιου του στησιοχου ηρακλειτου του αρτεμιδωρου δημοφωντοσ του πρυτανιοσ δαμωνοσ του ονφαλιωνοσ· επειδη οι δικασται οι αποσταλεντεσ εισ καλυμναν κομιζουσιν ψηφισμα παρα του δημου του καλυμνιων εν ωι γεγραπται οτι ο δημοσ ο καλυμνιων στεφανοι τον δημον χρυσωι στεφανωι αρετησ ενεκεν και ευνοιασ τησ εισ αυτον στεφανοι δε και τουσ δικαστασ τουσ αποσταλεντασ χρυσωι στεφανωι καλοκαγαθιασ ενεκεν κλεανδρον διοδωρου λεοντα ευβουλου κεφαλον δρακοντοσ θεοδωρον νουμηνιου λεοντα δρακοντιδου και περι τουτων οιεται δειν επιμελειαν ποιησασθαι τον δημον οπωσ ο τησ πολεωσ στεφανοσ αναγορευθηι και ο των δικαστων εν τωι θεατρωι διονυσιοισ δεδοχθαι τωι δημωι· τον μεν αγωνοθετην αναγγειλαι τον τησ πολεωσ στεφανον και τον των δικαστων κυκλιων τηι πρωτηι· επηιν[7 missing letters] και τουσ δικαστασ τουσ αποσταλεντασ επειδη αξιοι γενομενοι του δημου τιμασ περιεποιησαν τηι πολει·" system_prompt = "Fill in the missing letters in this inscription!" input_messages = [ {"role": "system", "content": system_prompt}, {"role": "user", "content": inscription_edition}, ] terminators = [ tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|>") ] outputs = generation_pipeline( input_messages, max_new_tokens=10, num_beams=25, # Set this as high as your memory will allow! num_return_sequences=10, early_stopping=True, ) beam_contents = [] for output in outputs: generated_text = output.get('generated_text', []) for item in generated_text: if item.get('role') == 'assistant': beam_contents.append(item.get('content')) real_response = "ησθαι δε" print(f"The masked sequence: {real_response}") for i, content in enumerate(beam_contents, start=1): print(f"Suggestion {i}: {content}") ``` ### Expected Output: ``` The masked sequence: ησθαι δε Suggestion 1: ησαμενοσ· Suggestion 2: ησμενοσ· Suggestion 3: ησασθαι· Suggestion 4: ημενουν 0· Suggestion 5: ησται δε 0 Suggestion 6: ησθαι δε Suggestion 7: ησαμεθα· Suggestion 8: ημεν δε 00· Suggestion 9: ησθαι δε· Suggestion 10: ησατω δε 0 ``` Observe that performance declines! If we change ```python load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16 ``` in the second cell to ```python load_in_8bit=True, ``` we get ``` The masked sequence: ησθαι δε Suggestion 1: ησθαι δε Suggestion 2: εσθαι δε Suggestion 3: εσαι δε ο Suggestion 4: εισθαι δ Suggestion 5: εσ ο δημος Suggestion 6: εσεν δε ο Suggestion 7: εσ ο δημο Suggestion 8: εκεν δε ο Suggestion 9: εσαι δε σ Suggestion 10: εισθαι ο ``` ## Information about configuration for merging The finetuned model was remerged with Llama-3.1-8B-Instruct using the [TIES](https://arxiv.org/abs/2306.01708) merge method. This did not afect CER or top-1 accuracy, but the effect on top-20 accuracy was positive. The following YAML configuration was used: ```yaml models: - model: original # Llama 3.1 - model: DDbDP_reconstructer_5 # A model fintuned on the 95 % of the DDbDP for 11 epochs parameters: density: 0.5 weight: 1 merge_method: ties base_model: original # Llama 3.1 parameters: normalize: true dtype: bfloat16 ```