Toxic_Detection / app.py
dragonities's picture
Initial commit for Toxic Detection project
f21981d
# -*- coding: utf-8 -*-
"""ai-portfolio.ipynb
Automatically generated by Colab.
Original file is located at
https://colab.research.google.com/drive/1XN71Q8R5ctujwjQB0XsGHB7KBp4hP6wR
# Project: Portfolio - Final Project
**Instructions for Students:**
Please carefully follow these steps to complete and submit your assignment:
1. **Completing the Assignment**: You are required to work on and complete all tasks in the provided assignment. Be disciplined and ensure that you thoroughly engage with each task.
2. **Creating a Google Drive Folder**: If you don't previously have a folder for collecting assignments, you must create a new folder in your Google Drive. This will be a repository for all your completed assignment files, helping you keep your work organized and easy to access.
3. **Uploading Completed Assignment**: Upon completion of your assignment, make sure to upload all necessary files, involving codes, reports, and related documents into the created Google Drive folder. Save this link in the 'Student Identity' section and also provide it as the last parameter in the `submit` function that has been provided.
4. **Sharing Folder Link**: You're required to share the link to your assignment Google Drive folder. This is crucial for the submission and evaluation of your assignment.
5. **Setting Permission toPublic**: Please make sure your **Google Drive folder is set to public**. This allows your instructor to access your solutions and assess your work correctly.
Adhering to these procedures will facilitate a smooth assignment process for you and the reviewers.
**Description:**
Welcome to your final portfolio project assignment for AI Bootcamp. This is your chance to put all the skills and knowledge you've learned throughout the bootcamp into action by creating real-world AI application.
You have the freedom to create any application or model, be it text-based or image-based or even voice-based or multimodal.
To get you started, here are some ideas:
1. **Sentiment Analysis Application:** Develop an application that can determine sentiment (positive, negative, neutral) from text data like reviews or social media posts. You can use Natural Language Processing (NLP) libraries like NLTK or TextBlob, or more advanced pre-trained models from transformers library by Hugging Face, for your sentiment analysis model.
2. **Chatbot:** Design a chatbot serving a specific purpose such as customer service for a certain industry, a personal fitness coach, or a study helper. Libraries like ChatterBot or Dialogflow can assist in designing conversational agents.
3. **Predictive Text Application:** Develop a model that suggests the next word or sentence similar to predictive text on smartphone keyboards. You could use the transformers library by Hugging Face, which includes pre-trained models like GPT-2.
4. **Image Classification Application:** Create a model to distinguish between different types of flowers or fruits. For this type of image classification task, pre-trained models like ResNet or VGG from PyTorch or TensorFlow can be utilized.
5. **News Article Classifier:** Develop a text classification model that categorizes news articles into predefined categories. NLTK, SpaCy, and sklearn are valuable libraries for text pre-processing, feature extraction, and building classification models.
6. **Recommendation System:** Create a simplified recommendation system. For instance, a book or movie recommender based on user preferences. Python's Surprise library can assist in building effective recommendation systems.
7. **Plant Disease Detection:** Develop a model to identify diseases in plants using leaf images. This project requires a good understanding of convolutional neural networks (CNNs) and image processing. PyTorch, TensorFlow, and OpenCV are all great tools to use.
8. **Facial Expression Recognition:** Develop a model to classify human facial expressions. This involves complex feature extraction and classification algorithms. You might want to leverage deep learning libraries like TensorFlow or PyTorch, along with OpenCV for processing facial images.
9. **Chest X-Ray Interpretation:** Develop a model to detect abnormalities in chest X-ray images. This task may require understanding of specific features in such images. Again, TensorFlow and PyTorch for deep learning, and libraries like SciKit-Image or PIL for image processing, could be of use.
10. **Food Classification:** Develop a model to classify a variety of foods such as local Indonesian food. Pre-trained models like ResNet or VGG from PyTorch or TensorFlow can be a good starting point.
11. **Traffic Sign Recognition:** Design a model to recognize different traffic signs. This project has real-world applicability in self-driving car technology. Once more, you might utilize PyTorch or TensorFlow for the deep learning aspect, and OpenCV for image processing tasks.
**Submission:**
Please upload both your model and application to Huggingface or your own Github account for submission.
**Presentation:**
You are required to create a presentation to showcase your project, including the following details:
- The objective of your model.
- A comprehensive description of your model.
- The specific metrics used to measure your model's effectiveness.
- A brief overview of the dataset used, including its source, pre-processing steps, and any insights.
- An explanation of the methodology used in developing the model.
- A discussion on challenges faced, how they were handled, and your learnings from those.
- Suggestions for potential future improvements to the model.
- A functioning link to a demo of your model in action.
**Grading:**
Submissions will be manually graded, with a select few given the opportunity to present their projects in front of a panel of judges. This will provide valuable feedback, further enhancing your project and expanding your knowledge base.
Remember, consistent practice is the key to mastering these concepts. Apply your knowledge, ask questions when in doubt, and above all, enjoy the process. Best of luck to you all!
"""
# Commented out IPython magic to ensure Python compatibility.
# %pip install rggrader
"""## Working Space"""
import random
from transformers import pipeline
import string
from nltk.corpus import wordnet
import nltk
import os
# Unduh resource WordNet
nltk.download("wordnet")
nltk.download("omw-1.4")
# Load GPT-2 untuk menghasilkan kata pengganti
text_generator = pipeline("text-generation", model="gpt2")
# Load pretrained hate speech detection model
hate_speech_classifier = pipeline("text-classification", model="unitary/toxic-bert")
# Confidence threshold untuk mendeteksi toksisitas
CONFIDENCE_THRESHOLD = 0.4
# File path untuk menyimpan mapping negatif ke positif
filepath = "extended_negative_to_positive_words.txt"
# Tambahkan counter untuk melacak jumlah pelanggaran
toxic_counter = {"count": 0}
# Daftar kata positif untuk fallback
positive_words = ["kind", "friendly", "smart", "brilliant", "amazing", "wonderful", "great", "excellent"]
# Fungsi untuk mencari antonim menggunakan WordNet
def find_opposite(word):
antonyms = []
for syn in wordnet.synsets(word):
for lemma in syn.lemmas():
if lemma.antonyms(): # Cek apakah ada antonim
antonyms.append(lemma.antonyms()[0].name())
return antonyms[0] if antonyms else None
# Fungsi untuk menghasilkan kata pengganti secara acak menggunakan GPT-2
def generate_random_antonym(word):
prompt = f"Generate a random positive word to replace the toxic word '{word}':"
try:
response = text_generator(prompt, max_new_tokens=5, truncation=True, num_return_sequences=1)
generated_text = response[0]['generated_text']
# Ambil kata pertama dari hasil yang dihasilkan
random_antonym = generated_text.split(":")[-1].strip().split()[0]
# Validasi apakah hasil hanya terdiri dari alfabet
if random_antonym.isalpha():
return random_antonym
else:
return random.choice(positive_words)
except Exception as e:
print(f"Error in generating random antonym for '{word}': {e}")
# Fallback ke kata positif acak
return random.choice(positive_words)
# Fungsi untuk memuat peta negatif ke positif dari file
def load_neg_to_pos_map(filepath):
neg_to_pos_map = {}
if not os.path.exists(filepath):
with open(filepath, "w") as file:
file.write("")
print(f"File '{filepath}' tidak ditemukan. File baru telah dibuat.")
with open(filepath, "r") as file:
for line_number, line in enumerate(file, start=1):
if line.strip():
parts = line.strip().split(":")
if len(parts) == 2:
neg, pos = parts
neg_to_pos_map[neg.strip().lower()] = pos.strip()
else:
print(f"Warning: Invalid format on line {line_number}: {line.strip()}")
return neg_to_pos_map
# Fungsi untuk memperbarui file mapping
def update_neg_to_pos_file(filepath, word, opposite_word):
with open(filepath, "a") as file:
file.write(f"{word} : {opposite_word}\n")
# Fungsi untuk mendeteksi dan mengganti kata toxic (per kata)
def replace_toxic_words(text, neg_to_pos_map, filepath="extended_negative_to_positive_words.txt"):
words = text.split()
replaced_words = []
replacements = [] # Untuk menyimpan pasangan kata toxic dan penggantinya
for word in words:
clean_word = word.strip(string.punctuation).lower()
result = hate_speech_classifier(clean_word)
label = result[0]['label']
confidence = result[0]['score']
if "toxic" in label.lower() and confidence >= CONFIDENCE_THRESHOLD:
if clean_word in neg_to_pos_map:
replacement = neg_to_pos_map[clean_word]
else:
replacement = generate_random_antonym_with_g4f(clean_word)
neg_to_pos_map[clean_word] = replacement
update_neg_to_pos_file(filepath, clean_word, replacement)
replaced_word = word.replace(clean_word, replacement)
replacements.append((clean_word, replacement))
else:
replaced_word = word
replaced_words.append(replaced_word)
rewritten_text = " ".join(replaced_words)
return rewritten_text, replacements
# Fungsi utama untuk mendeteksi dan mereparafrase teks dengan logika ban
def detect_and_paraphrase_with_ban(text, neg_to_pos_map, filepath="extended_negative_to_positive_words.txt"):
# Periksa apakah pengguna sudah diblokir
if toxic_counter["count"] >= 3:
return "You have been banned for submitting toxic content multiple times. Please refresh to try again."
# Deteksi apakah teks mengandung konten toxic
result = hate_speech_classifier(text)
label = result[0]['label']
confidence = result[0]['score']
if "toxic" in label.lower() and confidence >= CONFIDENCE_THRESHOLD:
# Tambahkan jumlah pelanggaran
toxic_counter["count"] += 1
# Cek apakah pengguna perlu diblokir
if toxic_counter["count"] >= 3:
return "You have been banned for submitting toxic content multiple times. Please refresh to try again."
# Ganti kata-kata toxic dalam teks
rewritten_text, replacements = replace_toxic_words(text, neg_to_pos_map, filepath)
replacement_info = ", ".join([f"'{original}' → '{replacement}'" for original, replacement in replacements])
return (
f"Detection: Toxic (Confidence: {confidence:.2%})\n"
f"Rewritten Text: {rewritten_text}\n"
f"Replacements: {replacement_info if replacement_info else 'None'}"
)
else:
return f"Detection: Non-toxic (Confidence: {confidence:.2%})\nOriginal Text: {text}"
# Fungsi untuk menampilkan hasil deteksi per kata
def test_per_word(text, neg_to_pos_map, filepath="extended_negative_to_positive_words.txt"):
words = text.split()
print(f"Original Text: {text}\n")
print("Per-word Detection and Replacement:")
rewritten_text, replacements = replace_toxic_words(text, neg_to_pos_map, filepath)
toxic_count = 0
for word in words:
clean_word = word.strip(string.punctuation).lower()
result = hate_speech_classifier(clean_word)
label = result[0]['label']
confidence = result[0]['score']
if "toxic" in label.lower() and confidence >= CONFIDENCE_THRESHOLD:
toxic_count += 1
replacement = neg_to_pos_map.get(clean_word, "No Replacement")
print(f"Word: '{word}' -> Label: '{label}', Confidence: {confidence:.2f}, Replacement: '{replacement}'")
else:
print(f"Word: '{word}' -> Label: 'non-toxic', Confidence: {confidence:.2f}, Replacement: None")
print("\nRewritten Text:")
print(rewritten_text)
print(f"\nToxic Words Detected: {toxic_count}")
return rewritten_text
# Memuat peta kata negatif ke positif dari file
neg_to_pos_map = load_neg_to_pos_map(filepath)
# Fungsi untuk mereset counter ban
def reset_toxic_counter():
toxic_counter["count"] = 0
return "Toxic counter has been reset."
# Interface Gradio
import gradio as gr
# Fungsi untuk Gradio
def detect_and_rewrite_chatbot(input_text):
global neg_to_pos_map
if not neg_to_pos_map:
neg_to_pos_map = load_neg_to_pos_map(filepath)
return detect_and_paraphrase_with_ban(input_text, neg_to_pos_map, filepath)
# Fungsi Gradio untuk Reset
def reset_count():
return reset_toxic_counter()
# Buat interface Gradio
with gr.Blocks() as chatbot_interface:
gr.Markdown("## Toxicity Detection")
with gr.Row():
input_text = gr.Textbox(label="Input Text", placeholder="Type something...", lines=2)
output_text = gr.Textbox(label="Output Text", interactive=False)
with gr.Row():
submit_button = gr.Button("Submit")
reset_button = gr.Button("Reset Ban Count")
submit_button.click(detect_and_rewrite_chatbot, inputs=input_text, outputs=output_text)
reset_button.click(reset_count, outputs=output_text)
# Jalankan Gradio
if __name__ == "__main__":
chatbot_interface.launch()