Batched files with different languages

#30
by Oscaarjs - opened

Is it possible to set different languages for different files in when processing files in a batch?

I.e; I can do

pipe = pipeline(
            "automatic-speech-recognition",
            model=self.model,
            tokenizer=self.processor.tokenizer,
            feature_extractor=self.processor.feature_extractor,
            torch_dtype=self.torch_dtype,
            device=self.device,
        )

and then

pipe(
            files,
            chunk_length_s=self.config.get("chunk_length_s", 30),
            batch_size=self.config.get("batch_size", 24),
            return_timestamps=True,
            return_language=True,
            generate_kwargs={"language": "en"},
        )

Where files is a list of paths to files.

But this applies for all files. Is it possible to somehow set it individually for each file? I can ofc use a batch_size of 1 and just process it iteratively with different kwargs for each file, but I'd like to get the speed-up that the batching might entail.

Sign up or log in to comment