Language filtering error
Hey
@lhoestq
, Interesting space is it is what i think it is ! I know it's still early but i would like to know if there is a specific format languages should be passed in the Language filtering step ? I want to filter on arabic and fasttext code for arabic is "ar" but it gives back an error.
Waiting for the final version of this with the whole datatrove script as well π€
Hi ! Passing "ar" works for me, though I might improve the UI to show the possible language codes.
Also this app shows the pipeline results on a preview of the data which doesn't seem to contain texts in Arabic (it's only 2k samples), but maybe I can improve that as well
It works ! you can filter on arabic language and see the results now :)
let me know if you'd like to see other improvements, I'm always happy to get feedbacks
Thanks a lot @lhoestq π€