LipNet Phonemes Predictors
Project was developed on using python3.8, in a Linux Ubuntu 24.04
run python -m pip install -r requirements.txt
to make sure your dependencies are the same as mine
the list of video files to be used for training and validation when training normal LipNet (not phonemes prediction)
are in unseen_train.txt and unseen_test.txt respectively.
the datasets are zipped in lip/*.zip, unzip them into the same location and run python main.py
to start training
hyperparamters are found in options.py
Project Setup
- pull this repo using
git pull https://huggingface.co/SilentSpeak/torchnet phonemes
- initialize a python virtualenv for this project using
python3.8 -m venv venv
- initialize the virtualenv using
source venv/bin/activate
- run
python -m pip install -r requirements.txt
to get dependencies - install git LFS using
git lfs install
- pull the GRID dataset and saved tensorboard runs using
git lfs pull
Following the project setup, you can run training as follows:
To run training for the LipNet phonemes predictor, run python main.py
To run training for the LipNet phonemes to text transformer predictor, run python TransformerTrainer.py
To run training for the LipNet-to-BiGRU-to-text transformer predictor, run python TranslatorTrainer.py
To run evaluation for the lipnet phonemes predictor + phonemes-to-text transformer end-to-end pipeline,
run cd tests && python lipnet-pipeline.py
. The model weights used in lipnet-pipeline.py
are included in the repo as
LFS files in the saved-weights
folder.
The LRS2 dataset was too large to include in the repo, and access to the LRS2 dataset is conditional on accepting
the non-commercial usage license. However, the config file for training on the LRS2 dataset can be found in options_lrs2.py
, and the preprocessing code for the LRS2 dataset can be found in scripts/extract_crop_lips_v2.py
and scripts/generate_lsr2_train.py
.
The LRS2 dataset itself can be be found at https://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrs2.html