File size: 3,297 Bytes
d3373bb
 
 
 
 
 
 
 
7db3c3a
 
 
d3373bb
95f965b
 
 
eed4429
95f965b
 
 
 
 
df2f076
95f965b
4193aa5
95f965b
4193aa5
95f965b
 
 
 
 
 
92b7c3d
95f965b
92b7c3d
 
 
95f965b
 
92b7c3d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95f965b
 
92b7c3d
95f965b
 
2dc988b
95f965b
 
 
92b7c3d
2dc988b
95f965b
 
92b7c3d
 
 
 
 
 
 
95f965b
2dc988b
95f965b
 
 
2dc988b
95f965b
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
---
language:
- sr
tags:
  - Srpski
  - Serbian
  - GPT2
  - generisanje
  - generation
name:
  - Serbian-GPT-2
---

# GPT-2 Model Trained on Serbian Corpus

![flag.png](https://cdn-uploads.huggingface.co/production/uploads/64fc6ba4e0dc35986bc3b6ee/gCUs3UIix41opzOu1mkD7.png)

By sharing this model, we aim to foster further research and applications in Serbian language processing.

### Introduction:

This GPT-2 model has been tuned on an extensive Serbian corpus, boasting a richness of 43 million tokens. It is designed to generate high-quality text in Serbian, capturing the nuances and intricacies of the language.

### Dataset Details: 

The dataset encompasses a diverse range of topics, representing various aspects of the Serbian language and culture. Size: 43 million tokens.

### Model Usage:

This model can be utilized for various NLP tasks such as text generation, summarization, translation, and more. Due to its comprehensive training on a vast corpus, it promises accurate and contextually relevant outputs, especially for tasks related to the Serbian language.


### Download and Decryption the Model:

    import os
    import requests
    from transformers import GPT2LMHeadModel
    from cryptography.fernet import Fernet

    os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

    print("\nDownload Serbian-GPT-2 model...")

    # Download Serbian-GPT-2 model
    model_name = 'edukom/Serbian-GPT-2'
    base_url = f'https://huggingface.co/{model_name}/resolve/main/'
    files_to_download = ['added_tokens.json', 'config.json', 'generation_config.json', 'merges.txt', 'pytorch_model.bin', 'special_tokens_map.json', 'tokenizer.json', 'tokenizer_config.json', 'vocab.json']

    cache_dir = 'path/to/where/you/want/to/store/the/model'

    for file in files_to_download:
        response = requests.get(base_url + file)
        with open(os.path.join(cache_dir, file), 'wb') as f:
            f.write(response.content)

    # Decryption pytorch_model.bin
    key = input("\nEnter the decryption key: ").encode()
    cipher_suite = Fernet(key)

    decryption_data = os.path.join(cache_dir, 'pytorch_model.bin')

    try:
        with open(decryption_data, 'rb') as file:
            encrypted_data = file.read()

        decrypted_data = cipher_suite.decrypt(encrypted_data)

        with open(decryption_data, 'wb') as file:
            file.write(decrypted_data)

        # Loading Serbian-GPT-2 model
        model = GPT2LMHeadModel.from_pretrained(cache_dir)
        print("\nCongratulations, the Serbian-GPT-2 model is ready for use ヅ\n")

    except Exception as e:
        print(f"\nError during decryption: {e}")
        print("\nYou can decrypt the model by contacting the author of this model who will add the key, email: info@edukom.rs")

    # Now you can use the Serbian-GPT-2 model for further operations...

### Licensing:

The author of this model is the company **Edukom AI**. The model is protected by encryption and its use requires a decryption key. Please check the licensing terms if you intend to use the model for commercial purposes. For any questions or if you need decryption keys, feel free to contact us at **info@edukom.rs**

![Screenshot.png](https://cdn-uploads.huggingface.co/production/uploads/64fc6ba4e0dc35986bc3b6ee/UoIvwAez4ZoiEsHyx-vn6.png)