could you share how much improvement the RL training adds over the SFT model?
could you share how much improvement the RL training adds over the SFT model?