This Gradio app (davanstrien/corpus-creator) takes you from your local files to a Hugging Face Dataset via Llama Index.
The goal of the tool is to make it quicker and easier to quickly get some local files you want to get ready for ML tasks into a Hugging Face Dataset. Perfect for building datasets for: - synthetic data pipelines - annotation - RAG - Other ML tasks that start from a HF dataset
I'll share something more substantial that uses this tomorrow π€