Add function to compute scores on problem,algorithm pairs

As requested by Berndt, I'm following up here on my post to the google group requesting a feature enhancement. I've edited the prostate cancer example from the wiki to show what I am suggestion.
Here is the  .rmd code:
## Set-up

this code is taken directly from the example on the wiki

``` r,options,include=FALSE
options(BBmisc.ProgressBar.style = "off")
# preload packages to reduce output in markdown
library(BatchExperiments)
library(ElemStatLearn)
library(glmnet)
library(plyr)
```
## Set up problems and algorithms

``` r,prob_algos
library(BatchExperiments)
library(ElemStatLearn)
library(glmnet)
library(plyr)


# Load data and get a quick overview
data("prostate", package = "ElemStatLearn")
str(prostate)
pairs(prostate, col = c("red", "darkgreen")[prostate$train + 1])


# Create registry
file.dir <- tempfile("prostate_")
reg <- makeExperimentRegistry("prostate_example", file.dir = file.dir, packages = "glmnet", seed = 1)

# we explicitly use the multicore backend here,
# regardless of what you have in your config
setConfig(conf = list(cluster.functions=makeClusterFunctionsMulticore(4)))


# Define subsample function
ss <- function(static, orig=FALSE) {
  nr <- nrow(static)
  train.ind <- if(orig) static$train else sample(static$train)
  list(train = subset(static,  train.ind, select = -train),
       test  = subset(static, !train.ind, select = -train))
}

```
# Modified Code

Now I have modified the algorithm functions to return not the score
on the test set, but the predictor function.
Note that I want to require predictor functions to have the same signature,
so the returned function (pred) in each case is a wrapper to the relevant 
method (predict.lm or predict.glmnet). 

In addition, to allow for scoring via the reduce function below, 
these algorithms return the "dynamic" parameter that they were provided.
**This is the behaviour I want to change! I'm just doing this here to
give a working example.**

``` r
# Function to fit a simple linear model
linear <- function(dynamic) {
  model <- lm(lpsa ~ ., data = dynamic$train)
  return(list(pred=function(x){predict(model,newdata=x)},dynamic=dynamic))
}

# Function to fit a regularized regression model, see ?glmnet
regularized <- function(dynamic, alpha) {
  x <- as.matrix(subset(dynamic$train, select = -lpsa))
  y <- dynamic$train[["lpsa"]]
  model <- cv.glmnet(x, y, alpha = alpha)
  return(list(pred=function(x){predict(model,newx=as.matrix(x))},dynamic=dynamic))
}

```

Now back to the standard code (repl reduced to 5 in final line for convenience)

``` r
# Add problem and algorithms to the registry
addProblem(reg, "prostate", static = prostate, dynamic = ss, seed = 1)
addAlgorithm(reg, "linear", fun = linear)
addAlgorithm(reg, "lasso", fun = regularized)
addAlgorithm(reg, "ridge", fun = regularized)


# Add experiments using the original split
pd <- makeDesign("prostate", design = data.frame(orig = TRUE))
ad <- list(makeDesign("linear"),
           makeDesign("lasso", design = data.frame(alpha = 1)),
           makeDesign("ridge", design = data.frame(alpha = 0)))
addExperiments(reg, pd, ad)


# Add experiments using random splits, 100 replications
pd <- makeDesign("prostate", design = data.frame(orig = FALSE))
addExperiments(reg, pd, ad, repl = 5)
```
## Submit jobs to the cluster

still the original code....

``` r,submit
# Quick overview: defined experiments
summarizeExperiments(reg)


# Chunk jobs together - they are pretty fast
chunked <- chunk(getJobIds(reg), n.chunks = 10, shuffle = TRUE)
submitJobs(reg, chunked)


# Check computational status
showStatus(reg)
```
## Analysis

Here again I have modified the code. I score the output using 
reduceResultsExperiments to run the
reduce_score function. Note I can do this here easily because the res parameter
returned from the methods includes the dynamic parameter (which
includes the test data needed for scoring)

``` r,analysis
# Wait for the jobs first
waitForJobs(reg)

reduce_score = function(job,res){
  y <- res$dynamic$test$lpsa
  x <- subset(res$dynamic$test, select = -lpsa)
  y.hat <- res$pred(x)
  list(r2=1 - sum((y - y.hat)^2) / sum((y - mean(y))^2)) # R^2
}

res <- reduceResultsExperiments(reg, ids=findDone(reg), fun=reduce_score)

res

```

This all seems to work. So what do I want to do? I want
to be able to do the same thing, but without the algorithms
passing their input (dynamic) as output. I want the passing of
the problem outputs (static,dynamic) and the algorithm output (res)
to the scoring function to be automatically handled by a new
function, scoreResultsExperiments(reg, ids=findDone(reg), fun=eg_score) say.
[Just as passing of res to reduce_score is handled by
reduceResultsExperiments]

To be explicit, here would be the new algorithm definitions 
under my proposal- note that these no longer return dynamic as part of the result.

``` r
# Function to fit a simple linear model
linear <- function(dynamic) {
  model <- lm(lpsa ~ ., data = dynamic$train)
  return(list(pred=function(x){predict(model,newdata=x)}))
}

# Function to fit a regularized regression model, see ?glmnet
regularized <- function(dynamic, alpha) {
  x <- as.matrix(subset(dynamic$train, select = -lpsa))
  y <- dynamic$train[["lpsa"]]
  model <- cv.glmnet(x, y, alpha = alpha)
  return(list(pred=function(x){predict(model,newx=as.matrix(x))}))
}

```

And here is the modified scoring function - note it is now supplied the static
and dynamic values. (I don't use the static values in this example)

``` r
eg_score = function(job, static, dynamic, res){
  y <- dynamic$test$lpsa
  x <- subset(dynamic$test, select = -lpsa)
  y.hat <- res$pred(x)
  list(r2=1 - sum((y - y.hat)^2) / sum((y - mean(y))^2)) # R^2
}
```

But of course this does not work :)

``` r
  #res <- scoreResultsExperiments(reg, ids=findDone(reg), fun=eg_score)
```

Thanks for any help! I'd be happy to speak by skype if it would help.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add function to compute scores on problem,algorithm pairs #7

Set-up

Set up problems and algorithms

Modified Code

Submit jobs to the cluster

Analysis

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add function to compute scores on problem,algorithm pairs #7

Description

Set-up

Set up problems and algorithms

Modified Code

Submit jobs to the cluster

Analysis

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions