-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Description
I am having trouble getting the Trajectory conditioned SAC policy to learn. I am using "TC-Driver/TC_Driver/train/train.py" to train the policy with the parameters below. I essentially left all parameters to their default values and tried my best to cross-reference this paper . My mode is Frenet_trajectory, ep_len = 10000 , params_noise =True, use_trajectory=True and I trained for a total of 500,000 steps. I am also not doing a wandb param sweep over reward penalty and am using the heuristic values suggested in the paper.
Would it be possible point me to a correct config file for training hyperparameters or even some pretrained model weights?
env_conf = {
"mode": mode,
"arch": arch,
"map_name": map,
"map": os.path.join(configs_dir, "{}".format(map)),
"map_ext": conf.map_ext,
"random_init": True,
"sx": conf.sx,
"sy": conf.sy,
"stheta": conf.stheta,
"num_agents": 1,
"ep_len": ep_len, # this sets the maximum episode length, it is ~1.5 times the best time for a lap, so it changes from track to track
"obs_type": mode,
"params_noise": params_noise,
"var_mu": (0.075 / 2) ** 2, # if all are set to 0 no noise is applied
"var_Csf": 0,
"var_Csr": 0,
"redraw_upon_reset": True,
"angle_limit": ang_deg * np.pi / 180, # 30 deg
"use_trajectory": use_trajectory,
"max_vel": max_vel,
"display_video": display_video, # TODO not used should be removed
"curriculum": False, # no curriculum velocity for now
"policy_type": "MlpPolicy",
"total_timesteps": 5e5,
"gamma": 0.99,
"env_id":"f110_gym:f110rl-v0",
"use_lidar":True,
"action_pen":0.01,
"params":params,
"output_reg":np.diag([0.0, 0.0]) # steer action, throttle action
}These are the post-training results.
Eval num_timesteps=500000, episode_reward=-1.00 +/- 0.00
Episode length: 1.00 +/- 0.00
---------------------------------
| eval/ | |
| mean_ep_length | 1 |
| mean_reward | -1 |
| time/ | |
| total timesteps | 500000 |
| train/ | |
| actor_loss | 1.04 |
| critic_loss | 1.85e-07 |
| ent_coef | 3.52e-07 |
| ent_coef_loss | 3.9 |
| learning_rate | 0.0003 |
| n_updates | 499899 |
---------------------------------
---------------------------------
| rollout/ | |
| ep_len_mean | 1 |
| ep_rew_mean | -1.04 |
| time/ | |
| episodes | 500000 |
| fps | 166 |
| time_elapsed | 3004 |
| total timesteps | 500000 |
---------------------------------
Thank you!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels