detach of value function for single-network PPG

According to the code (https://github.com/openai/phasic-policy-gradient/blob/master/phasic_policy_gradient/train.py#L14), arch 'detach' seems corresponding to the single-network variant described in section 3.6 of the paper. According the paper and the comment in the code, the value function should not be detached from the encoder during aux phase. However, the value function (vfvec) seems always detached according to the code:
https://github.com/openai/phasic-policy-gradient/blob/7295473f0185c82f9eb9c1e17a373135edd8aacc/phasic_policy_gradient/ppg.py#L148-L153

Can you clarify whether it should be detached or not in the aux phase and whether it affects the results reported in the paper?

Thanks



	for k in self.vf_keys:
	if self.detach_value_head:
	x_out[k] = x_out[k].detach()
	aux[k] = self.get_vhead(k)(x_out[k])[..., 0]
	vfvec = aux[self.true_vf_key]
	aux.update({"vpredaux": self.aux_vf_head(pi_x)[..., 0], "vpredtrue": vfvec})

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

detach of value function for single-network PPG #7

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

detach of value function for single-network PPG #7

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions