An attempt to incorporate off-policy RL techniques into the selection phase of a genetic algorithm to improve performance.