Common Pile

Team
community
Activity Feed

AI & ML interests

None defined yet.

Articles

common-pile 's collections 4

Common Pile v0.1
All resources related to Common Pile v0.1, an 8TB dataset of public domain and openly licensed text
Common Pile v0.1 Filtered Data
An LLM pre-training dataset produced by filtering and deduplicating the raw text collected in the Common Pile v0.1