GRPOのみを事後学習に使用したモデルです。
-
DataPilot/ArrowIdeative-13b-Instruct-test-llm-jp-v0.1
Text Generation • 14B • Updated • 4 -
DataPilot/ArrowIdeative-13b-NeoBase-ZERO-llm-jp-v0.1
Text Generation • 14B • Updated • 59 • 7 -
DataPilot/ArrowIdeative-13b-Instruct-test-llm-jp-v0.2
Text Generation • 14B • Updated • 15 -
DataPilot/ArrowIdeative-13b-NeoBase-ZERO-llm-jp-v0.2A
Text Generation • 14B • Updated • 7