A set of models from my experiments with Reinforcement Learning from Human Feedback
Samir R.
sr5434
AI & ML interests
NLP
Recent Activity
updated a model about 9 hours ago
sr5434/model-tempfilesOrganizations
None yet
models 39
sr5434/model-tempfiles
Updated
sr5434/skin-cancer-classifier
Updated • 1
sr5434/DeepSeek-OCR-2-patched
Image-Text-to-Text • 3B • Updated • 9
sr5434/temp-data
Updated
sr5434/rlhf_policy
Text Generation • 0.3B • Updated • 7
sr5434/rm_hh_rlhf
Text Classification • 0.3B • Updated • 4
sr5434/sft_model
Text Generation • 0.3B • Updated • 6
sr5434/americanStoriesWordVectors
Updated
sr5434/PINN-Collection
Updated
sr5434/model-tempfiles-2
Updated
datasets 6
sr5434/temp-data-2
Viewer • Updated • 5.13M • 2
sr5434/temp-data
Viewer • Updated • 30.7M • 2
sr5434/questions-and-answers
Viewer • Updated • 1k • 13
sr5434/cot_convos
Viewer • Updated • 7.05k • 2
sr5434/aesthetics
Viewer • Updated • 12k • 11
sr5434/CodegebraGPT_data
Viewer • Updated • 1.25M • 39 • 1