Conversation
|
This looks awesome, thanks so much for the contribution Manish! The big question I have is whether you looked at using MLTable and its API for your input? Were there big hurdles preventing that from being an option? We'd like to build ML algorithms around that API, so if there are things we need to change to add this case, let us know! Decision trees are fairly different than algorithms that work by evaluating some linear loss function and optimizing via gradient descent, so this is a good test for something different that may not fit our existing model. |
|
Thanks Evan. Looking forward to contributing more to the library. Unfortunately, I haven't looked at MLTable since the code was written prior to the open sourcing of the MLI library. As I mentioned in an earlier comment, I will look to make this code compatible with the MLI API and give feedback for any improvements. The fixes should not take me too long. The non-linear data generator will be the trickiest part. When do you think we can start testing performance once I am done? |
|
Evan, I have just performed a major refactoring of the code based on your feedback without changing functionality. A few tasks remain:
I think task 1 is the most important for now. Task 2 can be done in the future. I am wondering whether we can use the same data that you might have used for testing logistic regression or SVM for performance testing while we work on Task 3. Task 4 is again one for the future. |
|
Some more changes.
|
|
This is awesome, thanks Manish - we'll plan to test your code for On Sat, Oct 19, 2013 at 6:43 PM, manishamde notifications@github.comwrote:
|
|
Sounds great Evan! |
There was a problem hiding this comment.
This is a good idea, thanks.
Decision Tree algorithm implemented on top of Spark RDD.
Key features: