Response to reviewers - treedata.table
About
In this document, we address all the comments raised during the review phase of treedata.table in rOpenSci. When possible, we use a single commit to answer each of the comments.
Where are changes implemented?
All the changes listed below are implemented in the following branch of our GitHub repository:
https://github.com/uyedaj/treedata.table/tree/cristian
Acknowledgements
We thank two @Bisaloo (reviewer), @karinorman (reviewer), @jooolia (editor) for their comments!
First reviewer (@Bisaloo):
44cceda
abfb2cb
b76f4a8
a9a7bde
16b6fbe
07746cf
ff610ad
3d43122
8b6250e
9cd075a
3bd6574
b38d88a
59872e5
748ddde
9afe310
data(anolis)
td <- as.treedata.table(anolis$phy, anolis$dat)
p <- pull.treedata.table(td, type = "phy")
d <- pull.treedata.table(td, type = "dat")
td2 <- as.treedata.table(p, d)
# You would expect to get identical(td, td2) but this line actually throws an error
as.treedata.table uses a data.frame as input:
data(anolis)
td <- as.treedata.table(anolis$phy, anolis$dat)
p <- pull.treedata.table(td, type = "phy")
d <- pull.treedata.table(td, type = "dat")
td2 <- as.treedata.table(p, as.data.frame(d))
> identical(td,td2)
[1] TRUE
1b5c188
700e0ed
19a777c
9779592
96f84da
b99ea3c
c7ebb3b
cb04cf5
e00fef9
5b1951b
d860070
d4cffa5
cd0c945
016bc06
bfb2572
79cfcb4
8d4b9e1
3cd17b9
4752e44
19a777c
9779592
96f84da
data(anolis)
td <- as.treedata.table(anolis$phy, anolis$dat)
td_old <- td
td_new <- droptreedata.table(tdObject = td, taxa = c("chamaeleonides" ,"eugenegrahami"))
identical(td, td_old)
#> TRUE
identical(td, td_new)
#> FALSE
droptreedata.table() is used to remove species from a treedata.table object. In your example, 2 species (chamaeleonides and eugenegrahami) are being excluded from the treedata.table object:
> data(anolis)
> td <- as.treedata.table(anolis$phy, anolis$dat)
Tip labels detected in column: X
Phylo object detected
No tips were dropped from the original tree/dataset
> td_old <- td
> td_new <- droptreedata.table(tdObject = td, taxa = c("chamaeleonides" ,"eugenegrahami"))
Please confirm that you would like to make changes to the ORIGINAL data?
Type: (1) YES, (2) NO: 1
> nrow(td_new$dat)
[1] 98
> nrow(td_old$dat)
[1] 100
The message was used to inform the user that n taxa are being dropped from the dataset. We have removed this prompt from the revised version. Regarding your tests, (1)identical(td, td_old) must be TRUE because these two are duplicates (td_old <- td). However, (2) identical(td, td_new) must be FALSE given two taxa were dropped from the latter object (please see the difference in row numbers between both objects)
a6d0b60
63a2935
9408ac3
Although we fully agree with the reviewer, both approaches are essentially doing the same. We decided to keep the ellipsis.
8fbaaa6
9b62811
c7ebb3b
We changed the message to “Taxa dropped from the tree/data”:
> data(anolis)
> td <- as.treedata.table(anolis$phy, anolis$dat)
Tip labels detected in column: X
Phylo object detected
No tips were dropped from the original tree/dataset
> summary(td)
A treedata.table object
The dataset contains 11 traits
Continuous traits: tip.label, SVL, PCI_limbs, PCII_head, PCIII_padwidth_vs_tail, PCIV_lamella_num, awesomeness, hostility, attitude
Discrete traits: ecomorph, island
The following traits have missing values: 0
Taxa dropped from the tree: 0
Taxa dropped from the data: 0
We also modified the droptreedata.table() function to keep track of the dropped species in the new object:
> td_new <- droptreedata.table(tdObject = td, taxa = c("chamaeleonides" ,"eugenegrahami"))
2 taxa were dropped from the ORIGINAL treedata.table object
> summary(td_new)
A treedata.table object
The dataset contains 11 traits
Continuous traits: tip.label, SVL, PCI_limbs, PCII_head, PCIII_padwidth_vs_tail, PCIV_lamella_num, awesomeness, hostility, attitude
Discrete traits: ecomorph, island
The following traits have missing values: 0
Taxa dropped from the tree: chamaeleonides, eugenegrahami
Taxa dropped from the data: chamaeleonides, eugenegrahami
292295c#diff-262ada0e7c14967a2a41b70149c367fb
Warning message:
In 1:seq_along(x$dat) :
numerical expression has 11 elements: only the first used
We cannot replicate this warning in the latest version of the package. It was probably fixed in a previous commit!
We will keep the current functions for now but we thank the reviewer for this comment.
The current version has >80% of coverage
ab53e26
b5f19cb
92bf3c8
10354e2
We chose to leave all the parameters in both of the functions. We thank the reviewer for his comment.
fa8fd8f
2b62d02
3acb5cc
3acb5cc
0f31ded
42491d2
cd8fc93
We have completely modified our previous head() function.
500648a
We have completely modified our previous head() function.
500648a
500648a
07d2178
07d2178
07d2178
We cannot replicate this warning in the latest version of the package. It seems to be fixed now!
a6d0b60
We thank the reviewer very much but we have decided to keep the current names.
We have added more details to these functions in the vignette:
9988f78
2b62d02
3acb5cc
9988f78
We acknowledge that this could be a reasonable functionality of our package but this would involve major changes to the main idea behind of the package – to always match the the dataset and tree(s).
3253251
We acknowledge this function may be redundant to the $ operator. However, it provides an explicit way to extract either of the objects. The vignette provides additional details on its functionality.
ca94642
'Character' is tricky here, as it is both R terminology, but also referring to the biological concept of a character (i.e., a quantification of some aspect of organismal form). We preferred to keep it as is to capture the biological meaning of this term.
We cannot replicate this behavior in the latest version of the package. This was probably addressed in a previous commit:
> detectCharacterType(anolis$dat[,1])
[1] "discrete"
> detectAllCharacters(anolis$dat[,1:3])
[1] "discrete" "continuous" "continuous"
Response to reviewers - treedata.table
About
In this document, we address all the comments raised during the review phase of
treedata.tablein rOpenSci. When possible, we use a single commit to answer each of the comments.Where are changes implemented?
All the changes listed below are implemented in the following branch of our GitHub repository:
https://github.com/uyedaj/treedata.table/tree/cristian
Acknowledgements
We thank two @Bisaloo (
reviewer), @karinorman (reviewer), @jooolia (editor) for their comments!First reviewer (@Bisaloo):
rev" role) in the packageDESCRIPTIONfile.44cceda
ORCiDfor the authors if they have one.abfb2cb
b76f4a8
a9a7bde
16b6fbe
DESCRIPTIONis not standard and in all cases unnecessary07746cf
Titleis not actually using title case. I don't care much personally but in my experience, this can cause issues when you submit to CRAN. You can verify this with thetools::toTitleCase()function.ff610ad
Dependswhen you can useImports. Here is a relevant quote from Hadley Wickham and Jenny Bryan's book on R package development: Depends: Prior to the rollout of namespaces in R 2.14.0, Depends was the only way to “depend” on another package. Now, despite the name, you should almost always use Imports, not Depends. You’ll learn why, and when you should still use Depends, in namespaces.3d43122
LICENSEwas correct (beforeuyedaj/treedata.table@7764ef2). You just need to update to name of the copyright holder(s).8b6250e
READMEdoes not contain enough information about the package, how to install and use it. You may find this relevant chapter of rOpenSci's devguide useful.9cd075a
3bd6574
READMEto advertise the fact that you follow the current best practices in package development. In particular, I like badges with code coverage. Here are the instructions on how to do this withcodecov.b38d88a
;). It makes the code harder the read. This will also help with the long lines warning reported by goodpratice in @jooolia's comment:59872e5
748ddde
TRUE/FALSE), you don't need towrite == TRUE.9afe310
inherits()(forS3objects, which is the case for ape objects), oris()(forS4objects). Please see this blog post by Martin Maechler for more info.as.treedata.tableuses a data.frame as input:1b5c188
1:length()byseq_len()inuyedaj/treedata.table@7764ef2, the correct syntax ofseq_len()doesn't use1:.700e0ed
19a777c
9779592
96f84da
message()function instead ofcat(). This will always surely cause issues upon CRAN submission if not corrected. See this detailed SO answer for more info about the differences betweenmessage()andcat().b99ea3c
paste0()inmessage()calls. Themessage()will automatically concatenate the arguments you give it.c7ebb3b
as.treedata.table()with the name_column argument not working. No matter what the user enters in name_column or what the auto-detection code finds, you always use the first column fortip.labels.cb04cf5
e00fef9
name_column = "detect"inas.treedata.table()5b1951b
styler::style_pkg()in the root of your RStudio project.d860070
d4cffa5
cd0c945
016bc06
bfb2572
detectCharacterType()ordetectAllCharacters()79cfcb4
8d4b9e1
discrete/continuouscharacters in the anolis dataset.3cd17b9
4752e44
1:instances from @jooolia's comment. E.g.,19a777c
9779592
96f84da
droptreedata.table(). As far as I can tell, the original data is NOT modified. Using your example:droptreedata.table()is used to remove species from atreedata.tableobject. In your example, 2 species (chamaeleonides and eugenegrahami) are being excluded from thetreedata.tableobject:The message was used to inform the user that n taxa are being dropped from the dataset. We have removed this prompt from the revised version. Regarding your tests, (1)
identical(td, td_old)must beTRUEbecause these two are duplicates (td_old <- td). However, (2)identical(td, td_new)must beFALSEgiven two taxa were dropped from the latter object (please see the difference in row numbers between both objects)a6d0b60
extractVector()may probably be simplified withlazyevalsince you already have it as a dependency63a2935
match.arg(type, c("dat", "phy"))in pull.treedata.table()9408ac3
tdt, do you really need...? Wouldn't anFUNarg be sufficient?Although we fully agree with the reviewer, both approaches are essentially doing the same. We decided to keep the ellipsis.
treedata.table, it would be nice to have atail()method as well8fbaaa6
9b62811
paste()s are unnecessary here sincecat()c7ebb3b
summary.treedata.table()is slightly confusing in my opinion. On the example you provide (with anolis), where no taxa are dropped from the tree or the data, you get:We changed the message to
“Taxa dropped from the tree/data”:We also modified the
droptreedata.table()function to keep track of the dropped species in the new object:292295c#diff-262ada0e7c14967a2a41b70149c367fb
[.treedata.table(), I get the following warning:We cannot replicate this warning in the latest version of the package. It was probably fixed in a previous commit!
expect_is()function byexpect_s3_class(). Excepted these two linesWe will keep the current functions for now but we thank the reviewer for this comment.
codecov: https://codecov.io/github/uyedaj/treedata.table?branch=master. I see no technical limitations that would prevent you to reach 100% coverage for this package but for now, I think you should at least try to reach 80% coverage.The current version has >80% of coverage
ab53e26
Roxygencomments. This produces more readable documentation in the source file in my opinion. Automatic conversion of your currentRoxygencomments to markdown can be done with theroxygen2mdpackage.Please add a sentence for the name_column argument to explain that "
detect" (the default) will auto-detect this column:b5f19cb
as.treedata.table(), you declare but this is unnecessary since you don't usesetDT()here andas.data.table()is prefixed with thedata.table::namespace92bf3c8
roxygencomments. I think it will be less confusing if you remove these(?)10354e2
detectCharacterTyperoxygencomment in the documentation ofdetectAllCharacters()to avoid duplicating the documentation of the function arguments. This is useful because it ensures the documentation of these functions will always stay in sync in the future. No risk of forgetting to update one of the two!We chose to leave all the parameters in both of the functions. We thank the reviewer for his comment.
#'@param returnType Either discrete (the default) or continuousfa8fd8f
hasNames()/forceNames()functions by providing a short explanation of the function purpose and/or a @returnroxygencomment.2b62d02
3acb5cc
extractVector()should be updated to indicate that multiple column names can be passed to..., which means you don't necessarily get a named vector but a list of named vectors, as opposed to what the documentation says.3acb5cc
0f31ded
42491d2
pull.treedata.tableis slightly confusing as it sounds like aS3and it's not. Something along the lines ofpull_treedata.table()orpulldata.data.table()(you already have adropdatatree.table()function so this would make sense) or???would probably be better.cd8fc93
If negative, all but the n last rows of x" (i.e., remove "first"):We have completely modified our previous
head()function.500648a
head.treedata.table()as far as I can tellWe have completely modified our previous
head()function.500648a
NOTEin R CMD check, you need toimportFrom(utils,head)before you try to define another method for this generic. This can be done by adding a #' @importFrom utils headroxygencomment in theroxygenchunk ofhead.treedata.table()for example.500648a
07d2178
[[.treedata.table()andextractVector(), namely that[[.treedata.table()has an extra exact argument to enable partial match while extractVector() can extract multiple columns and accepts non-standard evaluation07d2178
[1] FALSE FALSE" (which are actually output, and not R code) from this chunk, you will be able to remove theeval = FALSE. It's awlays better when all chunks run and can be copied/pasted in the console directly07d2178
Second Reviewer (@karinorman):
td[, head(.SD, 1), by = "ecomorph"]from the vignette don't seem particularly informative, I would suppress unless necessary.We cannot replicate this warning in the latest version of the package. It seems to be fixed now!
droptreedata.tableis unnecessary. For me at least it made me assume that the function was modifying an object in place, which I realized it wasn't after some exploration.a6d0b60
treedata.tablefor the sake of brevity. You could also add a td to other exported functions (e.g. td.extractVector).We thank the reviewer very much but we have decided to keep the current names.
detectCharacterfunctions,filterMatrix,forceNames, andhasNamesseem like they could be helper functions that facilitate the other main functions, but maybe don't need to be exported. If it makes sense I would not export them. If not, a little more explanation of how they fit into a workflow in the vignette would be helpful.We have added more details to these functions in the vignette:
9988f78
forceNamesandhasNamescould use more detailed description and/or justification, I'm not sure their functionality even after running the example.data(anolis)andforceNames(anolis$dat, "row")return seemingly the same object.2b62d02
3acb5cc
9988f78
We acknowledge that this could be a reasonable functionality of our package but this would involve major changes to the main idea behind of the package – to always match the the dataset and tree(s).
filterMatrixwould be cleaner ifcharTypewasn't required as an argument but instead calculated within the function. I have a hard time imagining a scenario in which you would want a vector of character types that didn't match the matrix you were already giving the function.3253251
pull.treedata.tablefunction. Is there some utitlity in having a function that mimics the$operator? Or maybe I'm missing some of the applications of this function?We acknowledge this function may be redundant to the
$operator. However, it provides an explicit way to extract either of the objects. The vignette provides additional details on its functionality.detectCharacterType,detectCharacterChanges, andfilterMatrix.ca94642
character" which is a specific object type, whereas this function appears to perform on multiple vectortypes.detectVectorTypeordetectColumnTypemay be more intuitive.'Character' is tricky here, as it is both R terminology, but also referring to the biological concept of a character (i.e., a quantification of some aspect of organismal form). We preferred to keep it as is to capture the biological meaning of this term.
detectCharacterType(anolis$dat[,1])returns "discrete", butdetectAllCharacters(anolis$dat[,1:3])returns three "continuous" entries. From my understanding of how the functions work I would expect the first entry to be "discrete" to matchdetectCharacterType(anolis$dat[,1]).We cannot replicate this behavior in the latest version of the package. This was probably addressed in a previous commit: