File size: 1,078 Bytes
433d10a
38fb4a1
433d10a
 
 
 
 
 
667fe9d
85ac990
 
 
 
667fe9d
391bd16
 
 
 
667fe9d
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
Sentiment Analysis
---

### Usage
1. Clone the repository
2. `cd` into the repository
3. Run `just install` to install the dependencies
4. Run `just run --help` to see the available commands

### Datasets
- [Sentiment140](https://www.kaggle.com/datasets/kazanova/sentiment140)
- [IMDb](https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews)
- [Amazon Reviews](https://www.kaggle.com/datasets/bittlingmayer/amazonreviews)

### Required tools
- `just`
- `poetry`

### TODO
- [ ] CLI using `click` (commands: predict, train, evaluate) with settings set via flags or environment variables
- [ ] GUI using `gradio` (tabs: predict, train, evaluate, compare, settings)
- [ ] For the sklearn model, add more classifiers
- [ ] Use random search for hyperparameter tuning and grid search for fine-tuning
- [ ] Finish the text pre-processing transformer
- [ ] For vectorization, use custom stopwords
- [ ] Write own tokenizer/vectorizer
- [ ] Add more datasets
- [ ] Add more models (e.g. BERT)
- [ ] Write tests
- [ ] Use xgboost?
- [ ] Deploy to huggingface?