Spaces:
Sleeping
title: Hot Ones Trivia
emoji: π
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 4.31.5
app_file: app.py
pinned: false
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
Hot Ones trivia bot
This is a simple trivia bot as discussed during the last call.
Implementation logic
The words that guide the philosophy of this implementation are:
np.array
people keep reaching for much fancier things way too fast these days
~Andrej Karpathy https://x.com/karpathy/status/1647374645316968449?lang=en
That is, I'm keeping it as simple as possible, but not any simpler.
I implemented two strategies:
- Simple one-shot retrieval: embed the question and all the chunks of the transcript, and then find the chunk that is closest to the question using cosine similarity.
- Agentic-ish retrieval: do the same, rank all chunks by similarity. Check the first chunk, ask the model if it thinks it found the right answer. If it did, great. If it didn't, check the next chunk.
Q&A
Q: Why aren't you using a proper vector database/RAG framework/ollama/verba/weaviate/llamaindex/langchain/...?
A: Because it's not necessary. Going back to our discussion about the happy path, these solutions are ideal if the data is large and on the happy path. For the task at hand, using these libraries would make the implementation more complex, possibly slower, possibly more costly (if we're using a hosted database), so it'd be a bunch of extra effort for negative value and possible vendor lock-in. Also, if it's good enough for Karpathy, it's good enough for me.
Q: How well does this work?
A: From some quick tests, the one-shot retrieval sometimes works, but mostly it doesn't. The agentic retrieval tends to work, but sometimes it has to search through a bunch of chunks. With better RAG, this would be mitigated, but we go back to the cost-quality trade-off.
Q: Why is XYZ unpolished?
A: I'm trying to keep it relatively simple and not spend too much time on it. My priority, was to have a system working end-to-end, featuring some of the components that we discussed. I also tried to keep some basic best practices with some tests and keeping the code relatively clean - though it would need some more attention for a production system.
Repo structure
run_trivia.py
- the main entry point to the program. When you run it, it will start a gradio app with the bot. There are some command-line arguments to facilitate using different embedding sizes/chunk sizes/sample questions.app.py
- the same as above, but you pass the API key in the UI itself. Convenient for public hosting.preprocessing.py
- a script for preprocessing the data by attaching embeddings and relevant metadata. It takes each transcript, chunks it to a specified maximum size (in tokens), embeds it with OpenAI, attaches the relevant metadata, and saves everything in a file.prompts.py
- a file with the prompts and a utility function.generate_questions.ipynb
- a notebook that generates some sample questions for each episode. If this were a more reusable component, I would have turned it into a script, but I can run it once and push the results to git, or you can do so yourself to regenerate the data.core.py
- some general-purpose dataclasses and functionsexploration.ipynb
- you probably don't care about it, but it's somewhat reflective of my exploratory workflow.
How to run
- Clone the repo, setup a virtual environment, install requirements, all that jazz
- Add the OpenAI API key to
.env
file (or otherwise set theOPENAI_API_KEY
environment variable) - Run
python run_trivia.py
for the default settings, OR runpython run_trivia_open.py
to run without needing to have an API key in the environment.
How to use?
When you run run_trivia.py
, it will start a gradio app. Navigate to localhost:7860
in your browser, and use the text box on the left to ask questions. There are also some suggested questions below that you can click to pre-fill the text box.
When you're happy with the question, submit it, and let the bot do its magic. Below the question box, there's also a button to enable the agentic retrieval mode. Note that this will make more API requests, but is more likely to find the right answer - you can watch its progress in the secondary text output on the right.
Anecdotes
Obviously, all of this relies heavily on the performance of LLMs, and hallucinations are not entirely eliminated. For example, one time when I was checking the reference link, I literally got rickrolled by GPT-4.