About

This model was created to support experiments for evaluating phonetic transcription with the Buckeye corpus as part of https://github.com/ginic/multipa/tree/buckeye_experiments. This is a version of facebook/wav2vec2-large-xlsr-53 fine tuned on a very specific subset of the Buckeye corpus. For details about specific model parameters, please view the config.json here or training scripts in scripts/buckeye_experiments on the buckeye_experiments branch of the GitHub repository.

Experiment Details

Vary the random seed to select training data while keeping an even 50/50 gender split to measure statistical significance of changing training data selection. Retrain with the same model parameters, but different data seeding to measure statistical significance of data seed, keeping 50/50 gender split.

Goals:

  • Establish whether data variation with the same gender makeup is statistically significant in changing performance on the test set

Params to vary:

  • training data seed (--train_seed): [7 (default), 91, 15, 139, 503]
Downloads last month
164
Safetensors
Model size
316M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for ginic/data_seed_4_wav2vec2-large-xlsr-buckeye-ipa

Finetunes
1 model

Space using ginic/data_seed_4_wav2vec2-large-xlsr-buckeye-ipa 1