Batched files with different languages
#30
by
Oscaarjs
- opened
Is it possible to set different languages for different files in when processing files in a batch?
I.e; I can do
pipe = pipeline(
"automatic-speech-recognition",
model=self.model,
tokenizer=self.processor.tokenizer,
feature_extractor=self.processor.feature_extractor,
torch_dtype=self.torch_dtype,
device=self.device,
)
and then
pipe(
files,
chunk_length_s=self.config.get("chunk_length_s", 30),
batch_size=self.config.get("batch_size", 24),
return_timestamps=True,
return_language=True,
generate_kwargs={"language": "en"},
)
Where files is a list of paths to files.
But this applies for all files. Is it possible to somehow set it individually for each file? I can ofc use a batch_size of 1 and just process it iteratively with different kwargs for each file, but I'd like to get the speed-up that the batching might entail.