add measurement.json

Browse files

Files changed (2) hide show

README.md +68 -0
measurement.json +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,68 @@

+---
+library_name: transformers
+datasets:
+- BAAI/TACO
+- tasksource/PRM800K
+language:
+- en
+base_model: NovaSky-AI/Sky-T1-32B-Flash
+license: apache-2.0
+---
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+This is a 32B reasoning model preference optimized on top of Sky-T1-32B-Preview to significantly reduce generation lengths while maintaining accuracy. The performance is on par with o1-preview model in both math and coding, while reducing generation lengths by up to 57% relative to Sky-T1-32B-Preview.
+Please see our [blog post](https://novasky-ai.github.io/posts/reduce-overthinking/) for more details.
+- **Developed by:** NovaSky Team from Sky Computing Lab at UC Berkeley.
+## Training Details
+### Training Data
+10K preference pairs in math and coding domains, generated by Sky-T1-32B-Preview.
+### Training Procedure
+We perform Simple Policy Optimization (SimPO) with a batch size of 96, learning rate of 5e-7, gamma of 0.3, and beta of 2.0.
+#### Speeds
+We use Llama-Factory for training. On 8xH100, the SimPO training takes ~2.5 hours with DeepSpeed Zero-3 Offload.
+## Evaluation
+|              |         | Sky-T1-32B-Preview | Sky-T1-32B-Flash | Qwen2.5-32B-Instruct | QwQ-32B- Base | DeepSeek-R1-Distill-Qwen-32B |
+|--------------|---------|:------------------:|:----------------:|:--------------------:|:-------------:|:----------------------------:|
+|    Math500   |     Acc |        88.6        |       88.6       |         76.2         |      89.2     |             90.8             |
+|              | Avg Len |        2124        |    1417 (-33%)   |          522         |      2089     |             2010             |
+|    AIME24    |     Acc |        43.3        |       43.3       |         16.7         |       50      |             66.7             |
+|              | Avg Len |        6881        |    4365 (-37%)   |          970         |      7379     |             9173             |
+|   LCB Easy   |     Acc |        87.4        |        89        |         84.6         |      90.7     |             91.2             |
+|              | Avg Len |        3415        |    2265 (-34%)   |          414         |      3255     |             2775             |
+|  LCB Medium  |     Acc |        56.8        |       56.3       |         40.8         |      56.3     |             76.7             |
+|              | Avg Len |        8263        |    4389 (-47%)   |          535         |      6742     |             6324             |
+|   LCB Hard   |     Acc |        17.9        |       17.9       |          9.8         |      17.1     |             38.2             |
+|              | Avg Len |        14564       |    6199 (-57%)   |          618         |     10450     |             10448            |
+|     MMLU     |     Acc |        82.4        |       81.7       |         80.1         |      85.2     |             82.1             |
+|              | Avg Len |        1087        |    799 (-17%)    |          312         |      1041     |              774             |
+| GPQA Diamond |     Acc |        56.8        |       56.6       |         45.5         |      52.5     |             62.6             |
+|              | Avg Len |        3503        |    2148 (-39%)   |          600         |      3302     |             5108             |
+## Acknowledgement
+We would like to thanks the compute resources from [Lambda Lab](https://lambdalabs.com/service/gpu-cloud?srsltid=AfmBOop5FnmEFTkavVtdZDsLWvHWNg6peXtat-OXJ9MW5GMNsk756PE5) and [AnyScale](https://www.anyscale.com/).
+## Citation
+Please considering citing our blog post if you found it useful for your research. Thank you!
+```bibtex
+@misc{reduce_overthinking_2025,
+  author       = {NovaSky Team},
+  title        = {Think Less, Achieve More: Cut Reasoning Costs by 50% Without Sacrificing Accuracy},
+  howpublished = {https://novasky-ai.github.io/posts/reduce-overthinking},
+  note         = {Accessed: 2025-01-23},
+  year         = {2025}
+}

measurement.json ADDED Viewed

The diff for this file is too large to render. See raw diff