REINFROCE or GRPO #71
Closed
QingyuanWuNothing
started this conversation in
General
Replies: 2 comments 2 replies
-
|
Hey @QingyuanWuNothing , thanks for your interest and your great questions! To clarify, Instead, the multi-turn REINFORCE is achieved by:
|
Beta Was this translation helpful? Give feedback.
0 replies
-
|
Thanks! I am really into your works.
|
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Nice work! I’m a bit confused about the usage of REINFORCE and GRPO in the example.
Did I miss something? Thanks!
Beta Was this translation helpful? Give feedback.
All reactions