bokesyo commited on
Commit
e1b08c1
·
verified ·
1 Parent(s): ca9a10b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -5
README.md CHANGED
@@ -1149,6 +1149,8 @@ model.tts.float()
1149
 
1150
  </details>
1151
 
 
 
1152
  ##### Mimick
1153
 
1154
  `Mimick` task reflects a model's end-to-end speech modeling capability. The model takes audio input, and outputs an ASR transcription and subsequently reconstructs the original audio with high similarity. The higher the similarity between the reconstructed audio and the original audio, the stronger the model's foundational capability in end-to-end speech modeling.
@@ -1173,6 +1175,8 @@ res = model.chat(
1173
 
1174
  </details>
1175
 
 
 
1176
  ##### General Speech Conversation with Configurable Voices
1177
 
1178
  A general usage scenario of MiniCPM-o 2.6 is role-playing a specific character based on the audio prompt. It will mimic the voice of the character to some extent and act like the character in text, including language style. In this mode, MiniCPM-o-2.6 will sounds **more natural and human-like**. Self-defined audio prompts can be used to customize the voice of the character in an end-to-end manner.
@@ -1216,6 +1220,8 @@ print(res)
1216
 
1217
  </details>
1218
 
 
 
1219
  ##### Speech Conversation as an AI Assistant
1220
 
1221
  An enhanced feature of MiniCPM-o-2.6 is to act as an AI assistant, but only with limited choice of voices. In this mode, MiniCPM-o-2.6 is **less human-like and more like a voice assistant**. But it is more instruction-following.
@@ -1257,6 +1263,7 @@ print(res)
1257
  ```
1258
  </details>
1259
 
 
1260
 
1261
  ##### Instruction-to-Speech
1262
 
@@ -1285,6 +1292,8 @@ res = model.chat(
1285
  ```
1286
  </details>
1287
 
 
 
1288
  ##### Voice Cloning
1289
 
1290
  MiniCPM-o-2.6 can also do zero-shot text-to-speech, aka **Voice Cloning**. With this mode, model will act like a TTS model.
@@ -1312,6 +1321,8 @@ res = model.chat(
1312
  ```
1313
  </details>
1314
 
 
 
1315
  ##### Addressing Various Audio Understanding Tasks
1316
 
1317
  MiniCPM-o-2.6 can also be used to address various audio understanding tasks, such as ASR, speaker analysis, general audio captioning, and sound scene tagging.
@@ -1349,11 +1360,6 @@ print(res)
1349
 
1350
 
1351
 
1352
-
1353
-
1354
-
1355
- </details>
1356
-
1357
  ### Vision-Only mode
1358
 
1359
  `MiniCPM-o-2_6` has the same inference methods as `MiniCPM-V-2_6`
 
1149
 
1150
  </details>
1151
 
1152
+ <br/>
1153
+
1154
  ##### Mimick
1155
 
1156
  `Mimick` task reflects a model's end-to-end speech modeling capability. The model takes audio input, and outputs an ASR transcription and subsequently reconstructs the original audio with high similarity. The higher the similarity between the reconstructed audio and the original audio, the stronger the model's foundational capability in end-to-end speech modeling.
 
1175
 
1176
  </details>
1177
 
1178
+ <br/>
1179
+
1180
  ##### General Speech Conversation with Configurable Voices
1181
 
1182
  A general usage scenario of MiniCPM-o 2.6 is role-playing a specific character based on the audio prompt. It will mimic the voice of the character to some extent and act like the character in text, including language style. In this mode, MiniCPM-o-2.6 will sounds **more natural and human-like**. Self-defined audio prompts can be used to customize the voice of the character in an end-to-end manner.
 
1220
 
1221
  </details>
1222
 
1223
+ <br/>
1224
+
1225
  ##### Speech Conversation as an AI Assistant
1226
 
1227
  An enhanced feature of MiniCPM-o-2.6 is to act as an AI assistant, but only with limited choice of voices. In this mode, MiniCPM-o-2.6 is **less human-like and more like a voice assistant**. But it is more instruction-following.
 
1263
  ```
1264
  </details>
1265
 
1266
+ <br/>
1267
 
1268
  ##### Instruction-to-Speech
1269
 
 
1292
  ```
1293
  </details>
1294
 
1295
+ <br/>
1296
+
1297
  ##### Voice Cloning
1298
 
1299
  MiniCPM-o-2.6 can also do zero-shot text-to-speech, aka **Voice Cloning**. With this mode, model will act like a TTS model.
 
1321
  ```
1322
  </details>
1323
 
1324
+ <br/>
1325
+
1326
  ##### Addressing Various Audio Understanding Tasks
1327
 
1328
  MiniCPM-o-2.6 can also be used to address various audio understanding tasks, such as ASR, speaker analysis, general audio captioning, and sound scene tagging.
 
1360
 
1361
 
1362
 
 
 
 
 
 
1363
  ### Vision-Only mode
1364
 
1365
  `MiniCPM-o-2_6` has the same inference methods as `MiniCPM-V-2_6`