Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
alielfilali01 
posted an update Oct 9
Post
1822
Why nobdoy is talking about the new training corpus released by MBZUAI today.

TxT360 is +15 Trillion tokens corpus outperforming FineWeb on several metrics. Ablation studies were done up to 1T tokens.

Read blog here : LLM360/TxT360
Dataset : LLM360/TxT360