Lukashenko Generator

There is the Text-to-Text Generative AI This model generate the phrases which are familar to that the Diktator and fascist from Belarus - Aliaksandr Lukashenko could speak.

Documentation

Description
Quick Start
Try in HF space

Description

The model used the dataset: NebulasBellum/pizdziuk_luka the parquet dataset version which was collected by the telegram chanel Pul Pervogo. Only with the help of this chanel we could make this great job like creating the fascist Diktator Lukashenko speech :)

Model was trained with 340 epochs and produce a very good results.

loss: 0.2963 - accuracy: 0.9159

The model all time in the improving by use the extended datasets which all time improving by adding the new speach of the Fascist Lukashenko. All information collected from public, and not only :) (thanks our partizanz).

Right now the model folder NebulasBellum/Lukashenko_tarakan contain all neccesary files for download and use the model in the TensorFlow library with collected weights weights_lukash.h5

Quick Start

For use this model with the TensorFlow library you need:

Download the model:

md Luka_Pizdziuk
cd Luka_Pizdziuk
git clone https://huggingface.co/NebulasBellum/Lukashenko_tarakan/tree/main

and create the python script:

import tensorflow as tf
import copy
import numpy as np


# add the start generation of the lukashenko speech from the simple seed
seed_text = 'я не глядя поддержу'
weights_path = 'weights_lukash.h5'
model_path = 'Lukashenko_tarakan'

# Load the model to the Keras from saved files
model = tf.keras.models.load_model(model_path)
model.load_weights(weights_path)
# Show the Model summary
model.summary()

# Load the dataset to the model
with open('source_text_lukash.txt', 'r') as source_text_file:
    data = source_text_file.read().splitlines()

tmp_data = copy.deepcopy(data)
sent_length = 0
for idx, line in enumerate(data):
    if len(line) < 5:
        tmp_data.pop(idx)
    else:
        sent_length += len(line.split())
data = tmp_data
lstm_length = int(sent_length / len(data))

# Tokenize the dataset
token = tf.keras.preprocessing.text.Tokenizer()
token.fit_on_texts(data)
encoded_text = token.texts_to_sequences(data)
# Vocabular size
vocab_size = len(token.word_counts) + 1

# Create the sequences
datalist = []
for d in encoded_text:
    if len(d) > 1:
        for i in range(2, len(d)):
            datalist.append(d[:i])

max_length = 20
sequences = tf.keras.preprocessing.sequence.pad_sequences(datalist, maxlen=max_length, padding='pre')

# X - input data, y - target data
X = sequences[:, :-1]
y = sequences[:, -1]

y = tf.keras.utils.to_categorical(y, num_classes=vocab_size)
seq_length = X.shape[1]

# Generate the Lukashenko speech from the seed
generated_text = ''
number_lines = 3
for i in range(number_lines):
    text_word_list = []
    for _ in range(lstm_length * 2):
        encoded = token.texts_to_sequences([seed_text])
        encoded = tf.keras.preprocessing.sequence.pad_sequences(encoded, maxlen=seq_length, padding='pre')

        y_pred = np.argmax(model.predict(encoded), axis=-1)

        predicted_word = ""
        for word, index in token.word_index.items():
            if index == y_pred:
                predicted_word = word
                break

        seed_text = seed_text + ' ' + predicted_word
        text_word_list.append(predicted_word)

    seed_text = text_word_list[-1]
    generated_text = ' '.join(text_word_list)
    generated_text += '\n'

print(f"Lukashenko are saying: {generated_text}")

Try in HF space

The ready to check space with working model are placed here:

Hugging Face Test Space

To contribute to the project you could help

tp TRX: TDqjSX6dB6eaFbpHRhX8CCZUSYDmMVvvmb

to BNB: 0x107119102c2EC84099cDce3D5eFDE2dcbf4DEB2a

25% goes to the Help Ukraine to Win

NebulasBellum
/

Lukashenko_tarakan

You need to agree to share your contact information to access this model

Lukashenko Generator

Documentation

Description

Quick Start

Try in HF space

Dataset used to train NebulasBellum/Lukashenko_tarakan