This collection contains held-out splits for testing Flow-Judge-v0.1.
![](https://cdn-avatars.huggingface.co/v1/production/uploads/63368577d184e6b53c50e6d0/lH9v23G_BlxOhP8eSs6Xa.png)
Flow AI
company
Verified
AI & ML interests
LLM system evaluation, Automatic LM improvements
Organization Card
Flow AI is the system for evaluating and improving your LLM application.
Collections
3
spaces
1
models
7
![](https://cdn-avatars.huggingface.co/v1/production/uploads/63368577d184e6b53c50e6d0/lH9v23G_BlxOhP8eSs6Xa.png)
flowaicom/Flow-Judge-v0.1-W8A16
Updated
•
6
•
1
![](https://cdn-avatars.huggingface.co/v1/production/uploads/63368577d184e6b53c50e6d0/lH9v23G_BlxOhP8eSs6Xa.png)
flowaicom/Flow-Judge-v0.1-W4A16
Updated
•
4
•
1
![](https://cdn-avatars.huggingface.co/v1/production/uploads/63368577d184e6b53c50e6d0/lH9v23G_BlxOhP8eSs6Xa.png)
flowaicom/Flow-Judge-v0.1-FP8
Updated
•
4
•
1
![](https://cdn-avatars.huggingface.co/v1/production/uploads/63368577d184e6b53c50e6d0/lH9v23G_BlxOhP8eSs6Xa.png)
flowaicom/Flow-Judge-v0.1-AWQ
Text Generation
•
Updated
•
484
•
6
![](https://cdn-avatars.huggingface.co/v1/production/uploads/63368577d184e6b53c50e6d0/lH9v23G_BlxOhP8eSs6Xa.png)
flowaicom/Flow-Judge-v0.1
Text Generation
•
Updated
•
1.15k
•
52
![](https://cdn-avatars.huggingface.co/v1/production/uploads/63368577d184e6b53c50e6d0/lH9v23G_BlxOhP8eSs6Xa.png)
flowaicom/Flow-Judge-v0.1-Llamafile
Updated
•
55
•
1
![](https://cdn-avatars.huggingface.co/v1/production/uploads/63368577d184e6b53c50e6d0/lH9v23G_BlxOhP8eSs6Xa.png)
flowaicom/Flow-Judge-v0.1-GGUF
Text Generation
•
Updated
•
27
•
9
datasets
9
flowaicom/legalbench_contracts_qa_subset
Viewer
•
Updated
•
100
•
63
flowaicom/Flow-Judge-v0.1-3-likert-heldout
Viewer
•
Updated
•
300
•
63
flowaicom/Flow-Judge-v0.1-5-likert-heldout
Viewer
•
Updated
•
274
•
73
flowaicom/Flow-Judge-v0.1-binary-heldout
Viewer
•
Updated
•
316
•
59
flowaicom/RAGTruth_test
Viewer
•
Updated
•
2.7k
•
426
flowaicom/covid_qa
Viewer
•
Updated
•
1k
•
35
flowaicom/PubMedQA
Viewer
•
Updated
•
1k
•
43
flowaicom/HaluEval
Viewer
•
Updated
•
10k
•
52
flowaicom/Feedback-Bench
Viewer
•
Updated
•
1k
•
43