Why is the code-complexity coefficient so high in the demo example?
I understand that the gating layer helps to give an aggregated reward score and mitigate verbosity bias.
As explained in your blog behind the motivation for training a MoE style gating layer:
'For instance, for prompts that could easily trigger unsafe responses, the safety objective should be assigned a large coefficient, as we wish the reward model to rank unsafe responses lower than safe ones. However, for prompts for math problem assistance, the safety objective becomes almost useless, and the helpfulness-related objectives should be the primary focus.'
If that is the case, it is interesting that the code-complexity coeffcient ranks the highest even though there is no code involved:
code-complexity: 0.19922
helpsteer-verbosity: -0.10864
ultrafeedback-instruction_following: 0.07861
Would it then be more useful to look at the reward scores only?
I also observed that code related objectives are used heavily in other domains. It’s an interesting research question!
Not necessarily to use MoE. If you have a better idea of using objectives, you can follow HelpSteer2’s (https://arxiv.org/abs/2406.08673 ) approach to use fixed gating weights to aggregate reward objectives.