vluz
/

toxmodel30

Model card Files Files and versions Community

toxmodel30 / README.md

vluz's picture

Update README.md

1e6869c over 1 year ago

|

1.86 kB

	---
	license: cc0-1.0
	---

	Note: Due to nature of toxic comments, data and code contain explicit language.

	Data is from kaggle, the Toxic Comment Classification Challenge
	<br>
	https://www.kaggle.com/competitions/jigsaw-toxic-comment-classification-challenge/data?select=train.csv.zip

	Dataset used for training: https://huggingface.co/datasets/vluz/Tox

	Trained over 30 epoch in a runpod

	### 🤗 Running demo here:
	https://huggingface.co/spaces/vluz/Tox

	<hr>

	Code requires pandas, tensorflow, and streamlit. All can be installed via `pip`.

	```python
	import os
	import pickle
	import streamlit as st
	import tensorflow as tf
	from tensorflow.keras.layers import TextVectorization


	@st.cache_resource
	def load_model():
	model = tf.keras.models.load_model(os.path.join("model", "toxmodel.keras"))
	return model


	@st.cache_resource
	def load_vectorizer():
	from_disk = pickle.load(open(os.path.join("model", "vectorizer.pkl"), "rb"))
	new_v = TextVectorization.from_config(from_disk['config'])
	new_v.adapt(tf.data.Dataset.from_tensor_slices(["xyz"])) # Keras bug
	new_v.set_weights(from_disk['weights'])
	return new_v


	st.title("Toxic Comment Test")
	st.divider()
	model = load_model()
	vectorizer = load_vectorizer()
	input_text = st.text_area("Comment:", "I love you man, but fuck you!", height=150)
	if st.button("Test"):
	with st.spinner("Testing..."):
	inputv = vectorizer([input_text])
	output = model.predict(inputv)
	res = (output > 0.5)
	st.write(["toxic","severe toxic","obscene","threat","insult","identity hate"], res)
	st.write(output)
	```


	Put `toxmodel.keras` and `vectorizer.pkl` into the `model` dir.

	Then do:
	```
	stramlit run toxtest.py
	```

	Expected results from default prompt are positive for 0 and 2

	<hr>

	Full code can be found here:
	<br>
	https://github.com/vluz/ToxTest/
	<br>