devanshpandey commited on
Commit
40ee922
1 Parent(s): f51df25

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -7
README.md CHANGED
@@ -1,15 +1,12 @@
1
  ---
2
  license: apache-2.0
3
  ---
4
- Hertz-dev is an open-source, first-of-its-kind base model for full-duplex conversational audio.
 
5
 
6
- Hertz-dev is an 8.5B parameter transformer trained on 20 million unique hours of high-quality audio data. This repo contains code for both mono- and full-duplex generation; we expect to do a full Transformers library integration in the near future.
7
 
8
- Hertz-dev is a base model, without fine-tuning, RLHF, or instruction-following behavior. It can be fine-tuned for almost 𝘢𝘯𝘺 audio modeling task, from live translation to classification.
9
-
10
- Base models excel at faithfully modeling their training set, and accurate maps come from contact with reality. From the world’s largest dataset of high-quality real-world conversational audio, hertz-dev exhibits state-of-the art ability in human-like speech patterns such as pauses and emotional inflections.
11
-
12
- Hertz-dev has a 80ms theoretical average latency, and benchmarks 120ms real-world latency on a single RTX 4090, which is 1.5-2x lower than the previous state of the art. Low latency is necessary for natural audio, and we're proud to move the field in this direction.
13
 
14
  ## Setup
15
  To get started, clone the git repository and install requirements with
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+ # Hertz-dev
5
+ Hertz-dev is an open-source, first-of-its-kind base model for full-duplex conversational audio. It is an 8.5B parameter transformer trained on 20 million unique hours of high-quality audio data. This repo contains code for both mono- and full-duplex generation; we expect to do a full Transformers library integration in the near future.
6
 
7
+ Hertz-dev is a base model, without fine-tuning, RLHF, or instruction-following behavior. It can be fine-tuned for almost 𝘢𝘯𝘺 audio modeling task, from live translation to classification. Base models excel at faithfully modeling their training set, and accurate maps come from contact with reality.
8
 
9
+ From the world’s largest known dataset of high-quality real-world conversational audio, hertz-dev exhibits state-of-the art ability in human-like speech patterns such as pauses and emotional inflections. Hertz-dev has a 80ms theoretical average latency, and benchmarks 120ms real-world latency on a single RTX 4090, which is 1.5-2x lower than the previous state of the art. Low latency is necessary for natural audio, and we're proud to move the field in this direction.
 
 
 
 
10
 
11
  ## Setup
12
  To get started, clone the git repository and install requirements with