si-pbc
/

hertz-dev

devanshpandey commited on 22 days ago

Commit

40ee922

•

1 Parent(s): f51df25

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,15 +1,12 @@
 ---
 license: apache-2.0
 ---
-Hertz-dev is an open-source, first-of-its-kind base model for full-duplex conversational audio.
-Hertz-dev is an 8.5B parameter transformer trained on 20 million unique hours of high-quality audio data. This repo contains code for both mono- and full-duplex generation; we expect to do a full Transformers library integration in the near future.
-Hertz-dev is a base model, without fine-tuning, RLHF, or instruction-following behavior. It can be fine-tuned for almost 𝘢𝘯𝘺 audio modeling task, from live translation to classification.
-Base models excel at faithfully modeling their training set, and accurate maps come from contact with reality. From the world’s largest dataset of high-quality real-world conversational audio, hertz-dev exhibits state-of-the art ability in human-like speech patterns such as pauses and emotional inflections.
-Hertz-dev has a 80ms theoretical average latency, and benchmarks 120ms real-world latency on a single RTX 4090, which is 1.5-2x lower than the previous state of the art. Low latency is necessary for natural audio, and we're proud to move the field in this direction.
 ## Setup
 To get started, clone the git repository and install requirements with

 ---
 license: apache-2.0
 ---
+# Hertz-dev
+Hertz-dev is an open-source, first-of-its-kind base model for full-duplex conversational audio. It is an 8.5B parameter transformer trained on 20 million unique hours of high-quality audio data. This repo contains code for both mono- and full-duplex generation; we expect to do a full Transformers library integration in the near future.
+Hertz-dev is a base model, without fine-tuning, RLHF, or instruction-following behavior. It can be fine-tuned for almost 𝘢𝘯𝘺 audio modeling task, from live translation to classification. Base models excel at faithfully modeling their training set, and accurate maps come from contact with reality.
+From the world’s largest known dataset of high-quality real-world conversational audio, hertz-dev exhibits state-of-the art ability in human-like speech patterns such as pauses and emotional inflections. Hertz-dev has a 80ms theoretical average latency, and benchmarks 120ms real-world latency on a single RTX 4090, which is 1.5-2x lower than the previous state of the art. Low latency is necessary for natural audio, and we're proud to move the field in this direction.
 ## Setup
 To get started, clone the git repository and install requirements with