Why is the code-complexity coefficient so high in the demo example?

#16
by icdt - opened

I understand that the gating layer helps to give an aggregated reward score and mitigate verbosity bias.
As explained in your blog behind the motivation for training a MoE style gating layer:

'For instance, for prompts that could easily trigger unsafe responses, the safety objective should be assigned a large coefficient, as we wish the reward model to rank unsafe responses lower than safe ones. However, for prompts for math problem assistance, the safety objective becomes almost useless, and the helpfulness-related objectives should be the primary focus.'

If that is the case, it is interesting that the code-complexity coeffcient ranks the highest even though there is no code involved:

code-complexity: 0.19922

helpsteer-verbosity: -0.10864

ultrafeedback-instruction_following: 0.07861

Would it then be more useful to look at the reward scores only?

RLHFlow org

I also observed that code related objectives are used heavily in other domains. It’s an interesting research question!
Not necessarily to use MoE. If you have a better idea of using objectives, you can follow HelpSteer2’s (https://arxiv.org/abs/2406.08673 ) approach to use fixed gating weights to aggregate reward objectives.

Sign up or log in to comment