I tried using DQfD for Atari Breakout but it didn't work (after 4 millions training steps, the average reward reached 40 and doesn't increase pass that. The normal DQN can go over 100). I tried different exploration epsilon (0.1 and 0.01). Different batch size (32 and 64). Different replay buffer size (5e5 and 1e6). However, they couldn't go pass 40.
Do you know what I should do or do you have a set of hyper parameters that works for Breakout.