A newer version of the Streamlit SDK is available:
1.41.1
Common NLP Tasks
This app demonstrates common NLP techniques and use cases.
This app demonstrate typical NLP techniques used in real-world use cases, such as NER & POS Recognition, Text Summarization, and Text Classification.
NER
has applications in various industries for example infinance
it can extract essential information from earnings reports, financial statements, news articles, and product mentions for automated analysis, fraud detection, and investment opportunities. Inmedia and entertainment
, it analyzes text for content creation and personalized content creation. Ine-commerce
, it extracts product information from reviews, customer feedback, and descriptions, enabling automated analysis and personalized recommendations.POS
tagging is an essential NLP technique used in machine translation, word meaning disambiguation, question answering parsing, and so on.Text Summarization
has plethora of use-cases in real world. For exampleMedia monitoring
for sensitive and objectional content,Helping disabled people
in presenting only short and relevant content, producing succinct summary ofMeetings and video-conferencing
, Summarization of financial documents like earning reports and financial news to quickly derive market signals etc.Text Classification
has multitude of applications such asCategorizing customer support tickets
(billing, feedback, questions complaints etc),sentiment analysis
(customer feedback, tweets etc),Content moderation
(hate speech, obscene language, NSFW etc).
App UI Details
The app has four tabs: "ABOUT", "NER & POS", "TEXT SUMMARIZATION", and "TEXT CLASSIFICATION".
ABOUT Tab
This page
NER & POS Tab
Given a text fragment, named entities (NER) and parts of speech (POS) in the text can be extracted with a click of button:
Because of hardware resource constraints on public cloud hosting, the app uses a "small" language model to illustrate functionality that is far from ideal. A bigger model running on more capable hardware will yield much better results.
TEXT SUMMARIZATION Tab
A brief text summary is generated from a specified text. The summation technique (TextRankSummarizer, LexRankSummarizer, LsaSummarizer) and the length of the summary text may be selected by the user.
TEXT CLASSIFICATION Tab
The text classifier can determine the emotion
portrayed by a sentence or paragraph given a sentence or paragraph. The LogisticRegression
classifier is used to detect emotions in this app. The classifier was trained using labeled data from 34,000 samples.
Please keep in mind that this is far from flawless. Given the training and inference hardware restrictions, the corpus utilized for training is tiny, and the model employed is basic. Training on a much bigger text corpus and employing a model capable of classifying non-linear data (e.g., XGBoost, RandomForest, or a Neural Network) would provide significantly superior results.
Project Source
Contact Me
Appendix
Local Installation and Run
To run the app locally...
- Create a conda or virtual environment and activate it
- install python 3.11.0 or above
- execute below commands from terminal/command-prompt
git clone https://github.com/sssingh/nlp_ner_summarization_classification
cd nlp_ner_summarization_classification
pip install -r requirements.txt
streamlit run src/app.py
- Open any browser and then visit
localhost:8501
NOTE: The trained text classifier is kept in artifacts
folder as logistic_regression_model.joblib
file. If you wish to re-train the model again and make changes to its hyperparameter (or use another classifier) then...
- Modify
src/logistic_regression_model.py
script - execute below commands from terminal/command-prompt
pip install -r requirements.txt
streamlit run src/app.py