sentiment-analysis / README.md
Tymec's picture
Merge branch 'master' of https://github.com/Tymec/projekt-psi
b43b167
|
raw
history blame
1.08 kB
Sentiment Analysis
---
### Usage
1. Clone the repository
2. `cd` into the repository
3. Run `just install` to install the dependencies
4. Run `just run --help` to see the available commands
### Datasets
- [Sentiment140](https://www.kaggle.com/datasets/kazanova/sentiment140)
- [IMDb](https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews)
- [Amazon Reviews](https://www.kaggle.com/datasets/bittlingmayer/amazonreviews)
### Required tools
- `just`
- `poetry`
### TODO
- [ ] CLI using `click` (commands: predict, train, evaluate) with settings set via flags or environment variables
- [ ] GUI using `gradio` (tabs: predict, train, evaluate, compare, settings)
- [ ] For the sklearn model, add more classifiers
- [ ] Use random search for hyperparameter tuning and grid search for fine-tuning
- [ ] Finish the text pre-processing transformer
- [ ] For vectorization, use custom stopwords
- [ ] Write own tokenizer/vectorizer
- [ ] Add more datasets
- [ ] Add more models (e.g. BERT)
- [ ] Write tests
- [ ] Use xgboost?
- [ ] Deploy to huggingface?