Add documentation for Diarization
Browse files- docs/options.md +19 -0
docs/options.md
CHANGED
@@ -80,6 +80,17 @@ number of seconds after the line has finished. For instance, if a line ends at 1
|
|
80 |
Note that detected lines in gaps between speech sections will not be included in the prompt
|
81 |
(if silero-vad or silero-vad-expand-into-gaps) is used.
|
82 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
83 |
# Command Line Options
|
84 |
|
85 |
Both `app.py` and `cli.py` also accept command line options, such as the ability to enable parallel execution on multiple
|
@@ -132,3 +143,11 @@ If the average log probability is lower than this value, treat the decoding as f
|
|
132 |
|
133 |
## No speech threshold
|
134 |
If the probability of the <|nospeech|> token is higher than this value AND the decoding has failed due to `logprob_threshold`, consider the segment as silence. Default is 0.6.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
80 |
Note that detected lines in gaps between speech sections will not be included in the prompt
|
81 |
(if silero-vad or silero-vad-expand-into-gaps) is used.
|
82 |
|
83 |
+
## Diarization
|
84 |
+
|
85 |
+
If checked, Pyannote will be used to detect speakers in the audio, and label them as (SPEAKER 00), (SPEAKER 01), etc.
|
86 |
+
|
87 |
+
This requires a HuggingFace API key to function, which can be supplied with the `--auth_token` command line option for the CLI,
|
88 |
+
set in the `config.json5` file for the GUI, or provided via the `HK_AUTH_TOKEN` environment variable.
|
89 |
+
|
90 |
+
## Diarization - Speakers
|
91 |
+
|
92 |
+
The number of speakers to detect. If set to 0, Pyannote will attempt to detect the number of speakers automatically.
|
93 |
+
|
94 |
# Command Line Options
|
95 |
|
96 |
Both `app.py` and `cli.py` also accept command line options, such as the ability to enable parallel execution on multiple
|
|
|
143 |
|
144 |
## No speech threshold
|
145 |
If the probability of the <|nospeech|> token is higher than this value AND the decoding has failed due to `logprob_threshold`, consider the segment as silence. Default is 0.6.
|
146 |
+
|
147 |
+
## Diarization - Min Speakers
|
148 |
+
|
149 |
+
The minimum number of speakers for Pyannote to detect.
|
150 |
+
|
151 |
+
## Diarization - Max Speakers
|
152 |
+
|
153 |
+
The maximum number of speakers for Pyannote to detect.
|