---
datasets:
- mozilla-foundation/common_voice_13_0
metrics:
- wer
pipeline_tag: summarization
---
# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->

This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->

```python
class WhisperCTC(nn.Module):
    def __init__(
        self,
        encoder_id: str = "tuanio/whisper-encoder.tiny.en",
        dropout: float = 0.1,
        vocab_size: int = 47,
    ):
        super().__init__()
        self.encoder = WhisperEncoder.from_pretrained(encoder_id)
        print("Freezing Whisper Encoder...")
        self.encoder._freeze_parameters()
        print("Freezed!")
        self.lm_head = nn.Sequential(
            nn.SiLU(),
            nn.Dropout(dropout),
            nn.Linear(self.encoder.config.d_model, vocab_size),
        )
        nn.init.kaiming_uniform_(
            self.lm_head[-1].weight, mode="fan_in", nonlinearity="relu"
        )

    def forward(self, feat: Tensor, attn_mask: Tensor):
        enc = self.encoder(
            input_features=feat, attention_mask=attn_mask
        ).last_hidden_state
        logits = self.lm_head(enc)
        log_probs = nn.functional.log_softmax(logits, dim=-1)
        return log_probs
```


- **Developed by:** [More Information Needed]
- **Shared by [optional]:** [More Information Needed]
- **Model type:** [More Information Needed]
- **Language(s) (NLP):** [More Information Needed]
- **License:** [More Information Needed]
- **Finetuned from model [optional]:** [More Information Needed]

### Model Sources [optional]

<!-- Provide the basic links for the model. -->

- **Repository:** [More Information Needed]
- **Paper [optional]:** [More Information Needed]
- **Demo [optional]:** [More Information Needed]

## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

### Direct Use

<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

[More Information Needed]

### Downstream Use [optional]

<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->

[More Information Needed]

### Out-of-Scope Use

<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->

[More Information Needed]

## Bias, Risks, and Limitations

<!-- This section is meant to convey both technical and sociotechnical limitations. -->

[More Information Needed]

### Recommendations

<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

## How to Get Started with the Model

Use the code below to get started with the model.

[More Information Needed]

## Training Details

### Training Data

- IndictTTS: https://www.kaggle.com/datasets/tuannguyenvananh/indictts-english

[More Information Needed]

### Training Procedure 

<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

#### Preprocessing [optional]

[More Information Needed]


#### Training Hyperparameters

```yaml
data_cfg:
  dataset:
    processor:
      feat_extractor_id: ${model_cfg.model.encoder_id}
      tokenizer_id: ${model_cfg.tokenizer_id}
    path:
      base:
        indict_tts: ../IndicTTS
        cv: ../
      train:
        - train_data/indict_tts_train.jsonl
        # - train_data/cv_train.jsonl
      test:
        - train_data/indict_tts_test.jsonl
        # - train_data/cv_test.jsonl
      dev:
        - train_data/indict_tts_dev.jsonl
        # - train_data/cv_dev.jsonl
  dataloader:
    batch_size: 46
    num_workers: 8
    pin_memory: True

model_cfg:
  tokenizer_id: tuanio/wav2vec2-phoneme-ipa-ctc
  model:
    dropout: 0.1
    encoder_id: tuanio/whisper-encoder.medium.en
  optim:
    lr: 1.25e-05
    betas: [0.9, 0.998]
    weight_decay: 0.01
  scheduler:
    name: linear
    total_steps: -1
    warmup_ratio: 0.05
    interval: step
    frequency: 1

trainer_cfg:
  log:
    wandb: True
  logger_wandb:
    project: aped_indian-lish
    name: whisper-medium-indict-tts-only-from-epoch1
    log_model: all
  arguments:
    accelerator: gpu
    devices: -1
    max_epochs: 10
    log_every_n_steps: 1
    enable_checkpointing: True
    accumulate_grad_batches: 2
    inference_mode: True
    gradient_clip_val: 5.0
    check_val_every_n_epoch: 1
    val_check_interval: null


experiment_cfg:
  train: True
  valid: True
  test: True
  ckpt:
    resume_ckpt: True
    ckpt_path: ckpt/medium.epoch3.ckpt
```
#### Speeds, Sizes, Times [optional]

<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->

[More Information Needed]

## Evaluation

<!-- This section describes the evaluation protocols and provides the results. -->

### Testing Data, Factors & Metrics

#### Testing Data

<!-- This should link to a Data Card if possible. -->

[More Information Needed]

#### Factors

<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->

[More Information Needed]

#### Metrics

<!-- These are the evaluation metrics being used, ideally with a description of why. -->

[More Information Needed]

### Results

[More Information Needed]

#### Summary


## Model Examination [optional]

<!-- Relevant interpretability work for the model goes here -->

[More Information Needed]

## Environmental Impact

<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

- **Hardware Type:** [More Information Needed]
- **Hours used:** [More Information Needed]
- **Cloud Provider:** [More Information Needed]
- **Compute Region:** [More Information Needed]
- **Carbon Emitted:** [More Information Needed]

## Technical Specifications [optional]

### Model Architecture and Objective

[More Information Needed]

### Compute Infrastructure

[More Information Needed]

#### Hardware

[More Information Needed]

#### Software

[More Information Needed]

## Citation [optional]

<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

**BibTeX:**

[More Information Needed]

**APA:**

[More Information Needed]

## Glossary [optional]

<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->

[More Information Needed]

## More Information [optional]

[More Information Needed]

## Model Card Authors [optional]

[More Information Needed]

## Model Card Contact

[More Information Needed]