Lukashenko Generator
There is the Text-to-Text Generative AI This model generate the phrases which are familar to that the Diktator and fascist from Belarus - Aliaksandr Lukashenko could speak.
Documentation
Description
The model used the dataset: NebulasBellum/pizdziuk_luka the parquet dataset version which was collected by the telegram chanel Pul Pervogo. Only with the help of this chanel we could make this great job like creating the fascist Diktator Lukashenko speech :)
Model was trained with 340 epochs and produce a very good results.
loss: 0.2963 - accuracy: 0.9159
The model all time in the improving by use the extended datasets which all time improving by adding the new speach of the Fascist Lukashenko. All information collected from public, and not only :) (thanks our partizanz).
Right now the model folder NebulasBellum/Lukashenko_tarakan
contain all neccesary files for download and use the model in the TensorFlow
library
with collected weights weights_lukash.h5
Quick Start
For use this model with the TensorFlow
library you need:
- Download the model:
md Luka_Pizdziuk
cd Luka_Pizdziuk
git clone https://huggingface.co/NebulasBellum/Lukashenko_tarakan/tree/main
and create the python script:
import tensorflow as tf
import copy
import numpy as np
# add the start generation of the lukashenko speech from the simple seed
seed_text = 'я не глядя поддержу'
weights_path = 'weights_lukash.h5'
model_path = 'Lukashenko_tarakan'
# Load the model to the Keras from saved files
model = tf.keras.models.load_model(model_path)
model.load_weights(weights_path)
# Show the Model summary
model.summary()
# Load the dataset to the model
with open('source_text_lukash.txt', 'r') as source_text_file:
data = source_text_file.read().splitlines()
tmp_data = copy.deepcopy(data)
sent_length = 0
for idx, line in enumerate(data):
if len(line) < 5:
tmp_data.pop(idx)
else:
sent_length += len(line.split())
data = tmp_data
lstm_length = int(sent_length / len(data))
# Tokenize the dataset
token = tf.keras.preprocessing.text.Tokenizer()
token.fit_on_texts(data)
encoded_text = token.texts_to_sequences(data)
# Vocabular size
vocab_size = len(token.word_counts) + 1
# Create the sequences
datalist = []
for d in encoded_text:
if len(d) > 1:
for i in range(2, len(d)):
datalist.append(d[:i])
max_length = 20
sequences = tf.keras.preprocessing.sequence.pad_sequences(datalist, maxlen=max_length, padding='pre')
# X - input data, y - target data
X = sequences[:, :-1]
y = sequences[:, -1]
y = tf.keras.utils.to_categorical(y, num_classes=vocab_size)
seq_length = X.shape[1]
# Generate the Lukashenko speech from the seed
generated_text = ''
number_lines = 3
for i in range(number_lines):
text_word_list = []
for _ in range(lstm_length * 2):
encoded = token.texts_to_sequences([seed_text])
encoded = tf.keras.preprocessing.sequence.pad_sequences(encoded, maxlen=seq_length, padding='pre')
y_pred = np.argmax(model.predict(encoded), axis=-1)
predicted_word = ""
for word, index in token.word_index.items():
if index == y_pred:
predicted_word = word
break
seed_text = seed_text + ' ' + predicted_word
text_word_list.append(predicted_word)
seed_text = text_word_list[-1]
generated_text = ' '.join(text_word_list)
generated_text += '\n'
print(f"Lukashenko are saying: {generated_text}")
Try in HF space
The ready to check space with working model are placed here:
To contribute to the project you could help
tp TRX: TDqjSX6dB6eaFbpHRhX8CCZUSYDmMVvvmb
to BNB: 0x107119102c2EC84099cDce3D5eFDE2dcbf4DEB2a
25% goes to the Help Ukraine to Win
- Downloads last month
- 3