data format on "target"
Hello Kehan, nice work
@rafaelvalle would like to ask is there a version of the "target" that does not include the transcription?
Hello
@huckiyang
@rafaelvalle
, thank you for your interest in our work!
I think you are mentioning the relaesed dataset. In our dataset construction process, we feed the "seed transcript" and get the response from Llama as the training target. The seed transcripts are composed of many speech attributes, including transcription and other paralinguistic information. Currently, we do not have a dataset constructed without transcription, but I think it can be easily generated by removing the transcription part in the seed transcript. The template for seed transcripts is consistent.
Feel free to reach out if you have any further questions.