I want to know how I go about training my dataset, what characteristics the data needs to have, and what it is