Skip to content

Issues training Trajcetory Conditioned SAC policy #3

@nkepling

Description

@nkepling

I am having trouble getting the Trajectory conditioned SAC policy to learn. I am using "TC-Driver/TC_Driver/train/train.py" to train the policy with the parameters below. I essentially left all parameters to their default values and tried my best to cross-reference this paper . My mode is Frenet_trajectory, ep_len = 10000 , params_noise =True, use_trajectory=True and I trained for a total of 500,000 steps. I am also not doing a wandb param sweep over reward penalty and am using the heuristic values suggested in the paper.

Would it be possible point me to a correct config file for training hyperparameters or even some pretrained model weights?

    env_conf = {
        "mode": mode,
        "arch": arch,
        "map_name": map,
        "map": os.path.join(configs_dir, "{}".format(map)),
        "map_ext": conf.map_ext,
        "random_init": True,
        "sx": conf.sx,
        "sy": conf.sy,
        "stheta": conf.stheta,
        "num_agents": 1,
        "ep_len": ep_len,  # this sets the maximum episode length, it is ~1.5 times the best time for a lap, so it changes from track to track
        "obs_type": mode,
        "params_noise": params_noise,
        "var_mu": (0.075 / 2) ** 2,  # if all are set to 0 no noise is applied
        "var_Csf": 0,
        "var_Csr": 0,
        "redraw_upon_reset": True,
        "angle_limit": ang_deg * np.pi / 180,  # 30 deg
        "use_trajectory": use_trajectory,
        "max_vel": max_vel,
        "display_video": display_video, # TODO not used should be removed
        "curriculum": False, # no curriculum velocity for now
        "policy_type": "MlpPolicy",
        "total_timesteps": 5e5,
        "gamma": 0.99,
        "env_id":"f110_gym:f110rl-v0",
        "use_lidar":True,
        "action_pen":0.01,
        "params":params,
        "output_reg":np.diag([0.0, 0.0]) # steer action, throttle action
    }

These are the post-training results.

Eval num_timesteps=500000, episode_reward=-1.00 +/- 0.00                                                                                             
Episode length: 1.00 +/- 0.00                                                                                                                        
---------------------------------                                                                                                                    
| eval/              |          |                                                                                                                    
|    mean_ep_length  | 1        |                                                                                                                    
|    mean_reward     | -1       |                                                                                                                    
| time/              |          |                                                                                                                    
|    total timesteps | 500000   |                                                                                                                    
| train/             |          |                                                                                                                    
|    actor_loss      | 1.04     |                                                                                                                    
|    critic_loss     | 1.85e-07 |                                                                                                                    
|    ent_coef        | 3.52e-07 |                                                                                                                    
|    ent_coef_loss   | 3.9      |                                                                                                                    
|    learning_rate   | 0.0003   |                                         
|    n_updates       | 499899   |                                         
---------------------------------                                         
---------------------------------                                         
| rollout/           |          |                                         
|    ep_len_mean     | 1        |                                         
|    ep_rew_mean     | -1.04    |                                         
| time/              |          |                                         
|    episodes        | 500000   |                                         
|    fps             | 166      |                                         
|    time_elapsed    | 3004     |                                         
|    total timesteps | 500000   |                                         
---------------------------------   

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions