I understand that the gating layer helps to give an aggregated reward score and mitigate verbosity bias.
As explained in your blog behind the motivation for training a MoE style gating layer:

'For instance, for prompts that could easily trigger unsafe responses, the safety objective should be assigned a large coefficient, as we wish the reward model to rank unsafe responses lower than safe ones. However, for prompts for math problem assistance, the safety objective becomes almost useless, and the helpfulness-related objectives should be the primary focus.'

If that is the case, it is interesting that the code-complexity coeffcient ranks the highest even though there is no code involved:

code-complexity: 0.19922

helpsteer-verbosity: -0.10864

ultrafeedback-instruction_following: 0.07861

Would it then be more useful to look at the reward scores only?

RLHFlow
/

ArmoRM-Llama3-8B-v0.1

Why is the code-complexity coefficient so high in the demo example?

code-complexity: 0.19922

helpsteer-verbosity: -0.10864

ultrafeedback-instruction_following: 0.07861