Upload 3 files
Browse files- .gitattributes +1 -0
- README.md +3 -2
- demo/af_sky.txt +11 -0
- demo/af_sky.wav +3 -0
.gitattributes
CHANGED
@@ -35,3 +35,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
TTS-Spaces-Arena-25-Dec-2024.png filter=lfs diff=lfs merge=lfs -text
|
37 |
HEARME.wav filter=lfs diff=lfs merge=lfs -text
|
|
|
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
TTS-Spaces-Arena-25-Dec-2024.png filter=lfs diff=lfs merge=lfs -text
|
37 |
HEARME.wav filter=lfs diff=lfs merge=lfs -text
|
38 |
+
demo/af_sky.wav filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
@@ -12,7 +12,7 @@ pipeline_tag: text-to-speech
|
|
12 |
|
13 |
**Kokoro** is a frontier TTS model for its size of **82 million parameters** (text in/audio out).
|
14 |
|
15 |
-
On 25 Dec 2024, Kokoro v0.19 weights were permissively released in full fp32 precision under an Apache 2.0 license. As of
|
16 |
|
17 |
In the weeks leading up to its release, Kokoro v0.19 was the #1🥇 ranked model in [TTS Spaces Arena](https://huggingface.co/hexgrad/Kokoro-82M#evaluation). Kokoro had achieved higher Elo in this single-voice Arena setting over other models, using fewer parameters and less data:
|
18 |
1. **Kokoro v0.19: 82M params, Apache, trained on <100 hours of audio**
|
@@ -45,7 +45,7 @@ VOICE_NAME = [
|
|
45 |
'af', # Default voice is a 50-50 mix of Bella & Sarah
|
46 |
'af_bella', 'af_sarah', 'am_adam', 'am_michael',
|
47 |
'bf_emma', 'bf_isabella', 'bm_george', 'bm_lewis',
|
48 |
-
'af_nicole',
|
49 |
][0]
|
50 |
VOICEPACK = torch.load(f'voices/{VOICE_NAME}.pt', weights_only=True).to(device)
|
51 |
print(f'Loaded voice: {VOICE_NAME}')
|
@@ -87,6 +87,7 @@ No affiliation can be assumed between parties on different lines.
|
|
87 |
- 26 Dec 2024: `am_adam`, `am_michael`
|
88 |
- 28 Dec 2024: `bf_emma`, `bf_isabella`, `bm_george`, `bm_lewis`
|
89 |
- 30 Dec 2024: `af_nicole`
|
|
|
90 |
|
91 |
### Licenses
|
92 |
- Apache 2.0 weights in this repository
|
|
|
12 |
|
13 |
**Kokoro** is a frontier TTS model for its size of **82 million parameters** (text in/audio out).
|
14 |
|
15 |
+
On 25 Dec 2024, Kokoro v0.19 weights were permissively released in full fp32 precision under an Apache 2.0 license. As of 31 Dec 2024, 10 unique Voicepacks have been released.
|
16 |
|
17 |
In the weeks leading up to its release, Kokoro v0.19 was the #1🥇 ranked model in [TTS Spaces Arena](https://huggingface.co/hexgrad/Kokoro-82M#evaluation). Kokoro had achieved higher Elo in this single-voice Arena setting over other models, using fewer parameters and less data:
|
18 |
1. **Kokoro v0.19: 82M params, Apache, trained on <100 hours of audio**
|
|
|
45 |
'af', # Default voice is a 50-50 mix of Bella & Sarah
|
46 |
'af_bella', 'af_sarah', 'am_adam', 'am_michael',
|
47 |
'bf_emma', 'bf_isabella', 'bm_george', 'bm_lewis',
|
48 |
+
'af_nicole', 'af_sky',
|
49 |
][0]
|
50 |
VOICEPACK = torch.load(f'voices/{VOICE_NAME}.pt', weights_only=True).to(device)
|
51 |
print(f'Loaded voice: {VOICE_NAME}')
|
|
|
87 |
- 26 Dec 2024: `am_adam`, `am_michael`
|
88 |
- 28 Dec 2024: `bf_emma`, `bf_isabella`, `bm_george`, `bm_lewis`
|
89 |
- 30 Dec 2024: `af_nicole`
|
90 |
+
- 31 Dec 2024: `af_sky`
|
91 |
|
92 |
### Licenses
|
93 |
- Apache 2.0 weights in this repository
|
demo/af_sky.txt
ADDED
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Last September, I received an offer from Sam Altman, who wanted to hire me to voice the current ChatGPT 4 system. He told me that he felt that by my voicing the system, I could bridge the gap between tech companies and creatives and help consumers to feel comfortable with the seismic shift concerning humans and AI. He said he felt that my voice would be comforting to people.
|
2 |
+
|
3 |
+
After much consideration and for personal reasons, I declined the offer. Nine months later, my friends, family and the general public all noted how much the newest system named Sky sounded like me.
|
4 |
+
|
5 |
+
When I heard the released demo, I was shocked, angered and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine that my closest friends and news ou'tlits could not tell the difference. Mr. Altman even insinuated that the similarity was intentional, tweeting a single word — hur — a reference to the film in which I voiced a chat system, Samantha, who forms an intimate relationship with a human.
|
6 |
+
|
7 |
+
Two days before the ChatGPT 4 demo was released, Mr. Altman contacted my agent, asking me to reconsider. Before we could connect, the system was out there.
|
8 |
+
|
9 |
+
As a result of their actions, I was forced to hire legal counsel, who wrote two letters to Mr. Altman and OpenAI, setting out what they had done and asking them to detail the exact process by which they created the Sky voice. Consequently, OpenAI reluctantly agreed to take down the Sky voice.
|
10 |
+
|
11 |
+
In a time when we are all grappling with deepfakes and the protection of our own likeness, our own work, our own identities, I believe these are questions that deserve absolute clarity. I look forward to resolution in the form of transparency and the passage of appropriate legislation to help ensure that individual rights are protected.
|
demo/af_sky.wav
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:ce36292bf868aa5f15931f3d81a9f46cc35ea76372e618a5e4453c9542e5ad7e
|
3 |
+
size 5486636
|