db/notes at master · hawaiiDimensions/db · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
## 03.10.16 LIM - VLSB 5056
## modfy functions to return HDIM number instead of row indices - COMPLETE
## modify empty_list function to return adjusted row indices - COMPLETE
## look at Google R Style Guide (https://google.github.io/styleguide/Rguide.xml) - COMPLETE
## library(stringr) - NOTED
## str_split() can unpack date entries into a new dataframe for analysis - NOTED
## Use Jupyter notebook to initialize code and to introduce package to laymen - EXPLORED
## Comprehensive list of Plot names located in Siteinfo Google Drive file - COMPLETE
## use source() to initialize all relevant functions from a seperate script file (.R) - IRRELEVANT
## output invalid entry information in a comprehensive list returned with a wrapper function - COMPLETE

## 03.10.16 ROMINGER - HILGARD 305
## Function to return list of all mispellings and empty entries by HDIM number - COMPLETE
## Ways to source functions - COMPLETE
##  a. list.files(file_location)
##  b. oldwd() < - setwd(new_directory)
##  c. files2load <- c(file1, file2, file3)
##  d. lapply(files2load, source)
##  e. source(), setwd(oldwd)

## PERSONAL NOTES MARCH
## Look into contstructing a Shiny app with the database information - DONE
## To import google sheets as a .csv, use the url suffix "pub?gid=0&single=true&output=csv" - NOTED
https://docs.google.com/spreadsheets/d/1Ve2NZwNuGMteQDOoewitaANfTDXLy8StoHOPv7uGmTM/pub?gid=0&single=true&output=csv

## 03.29.16 ROMINGER - HILGARD 305
## Write a wrapper function for diagnose db for colEvent - COMPLETE
## Develop package with the written functions - DONE
## Write functions that make automated corrections that creates new corrected
## dataframe and with also notes for what was corrected. - DONE
## Check for duplicated HDIM numbers - COMPLETE
## Integrate photos with the database and site
## Possible project: Shiny app of ecological data integrated with geographic data

## PERSONAL NOTES APRIL
## Possible field project: subterranean arthropod survey under the dimensions in biodiversity grant.
## Such a project would involve an expedition(s) into the lava tubes underneath the rainforests of
## the Big Island of Hawaii and would seek to gain an understanding of the native and invasive fauna of
## the subterranean ecosystem of the Big Island through various sampling methods and next generation DNA
## sequencing.

## 04.11.16 ROMINGER - WELLMAN 220
## Integrate list of correct database entries into a spreadsheet file (.csv) to be referenced in the
## database checker package functions. - DONE
## For the automatic database correction function, use the apply family of functions (sapply) instead of
## a for loop. - DONE
## The lava tubes on the Big Island were briefly surveyed back in the 60s and 70s - many of the arthropod
## species within have been desginated as endangered species, which may make securing permits for future
## expeditions problematic. - NOTED

## 04.19.16 MOTHUR_16S WORKSHOP - KOSHLAND 238
## New version of mothur just came out last week
## raw fastq reads -> quality trimming -> merge R1 & R2 reads -> cluster reads into OTUs -> organize reads into an OTU ## table -> higher-level analyses
## Amplicon processing issues: controls (positive/negative), pair reads or not, OTU binning, taxonomy assignments, low ## abundance OTUs
## SILVA most accurate database, although slower
## https://cgrlucb.wikispaces.com/16S+Amplicon+Sequencing+Data+Analysis+Workshop?responseToken=08be8900eac6b4a4c470840e35e7e9d5a

## 04.20.16 PERSONAL NOTES
## Function to repair invalid date entries in database
## Function to repair invalid contingent factor entries by method in database

## 09.28.16
## Sourced values from synonym tables.

## 09.29.16 ROMINGER - HILGARD 305
##
## hdimDB PACKAGE INITIALIZED
## > devtools::install_github('hawaiiDimensions/db/hdimDB')
## > library(hdimDB)
##
## Github Issues - LISTED
## - mispellings
## - time
## - examples not working
##
## Create fake database for @example roxygen tag - COMPLETE
## Shiny app for dbChecker visualization - Done

## 09.30.16 PERSONAL NOTES
##
## Fixed error with arguments in mapply call - IRRELEVANT
##
## 10.20.16 ROMINGER - HILGARD 305
## 1. Work on Shiny App - DONE
##     a. Don't use rapply for Shiny error compilation Andy will
##        push over script (compile_errors.R) to test.scripts folder - DONE
## 2. Add workspace to test.scripts - DONE
## 3. Write unit tests - DONE
## 4. Add synonym values to checkMisspell - DONE
##     a. To retrieve synonym urls, GoogleSheets -> File -> Publish to web -> url
##        change tag to: 'pub?output=csv'

## 10.25.16 ROMINGER - HILGARD 305
##
## 1. Sum of beating durations for all samples in any given site in a sampling
##    round should be exactly 420 seconds. - DONE
##
## 2. Check collector names in proper format with regex - DONE # with levenstein
##     a. Format: "Capitalized initial". "Capitalized surname"
##     b. Check if name matches with any of the known collector names
##     c. Check if name could be a misformatted collector name if no match
##
## 3. Unit Tests
##     a. hdimDB -> tests -> testthat - NOTED
##     b. Save fake.data as a dataframe in the binary R structure - DONE
##     c. Rename fake.data as test.data - DONE
##     d. New testthat files for dupHDIM and checkMisspell - DONE
##     e. Rewrite unit test for checkTime to run on fakeData - DONE
## 4. Shiny
##     a. Use Andy's implementation (located in workspace) - IRRELEVANT
##
## Appendix: camelCase vs. snake_case

## 11.03.16 ROMINGER - HILGARD 305
##
## 1. Function to automate correction of plot names - DONE
##     a. Look at Andy's progress
##     b. Look at regex matching functions (e.g. is "x" in the cell value)
## 2. Autocorrecting method names - DONE
##     a. Look at regex matching functions
## 3. Generate wide-format reference dataframe tracking method metadata - WRITTEN WITH ROMINGER
##     a. dbcleaning_2016-09-02.R in hawaiiDimensions/Db
##     b. Plot names as rownames
##     c. Method names as columnnames
##     d. Combine pitfall contingency columms with method using paste()
##
## Appendix: Look at curtis_consolidate, Line 26 and on, for examples on regex expressions - NOT FOUND

## 11.18.16 ROMINGER - HILGARD 305
##
##  1. Ultimately change checkDb to output dataframe with HDIM, errTag, current
##     value, suggested correction columns - DONE
##      a. Generic misspelling function(column) #find synonym table by itself - DONE
##      c. Change list() in checkDB to rBind() - DONE
##      d. Have helper functions output two-column dataframe first - DONE
##      e. Implement assignCorr with Generic function - DONE
##  2. Do the damn Shiny implementation - DONE

## 12.02.16 ROMINGER - HILGARD 305
##
##  1. Implement generic regex checker - DONE
##      a. "if" clauses in assignCorr - DONE
##      b. fix extractCorr - DONE
##  2. Store synonym table urls in the package - DONE

## 12.09.16 ROMINGER - HILGARD 305
##
##  1. Regex instead of Levenstein Distance in closestMatch - MODIFY FOR OTHER COLUMNS
##  2. Rewrite unit tests to reflect new output
##  3. Shiny App - DEPLOYED
##  4. Fix time function
##  5. Make checkEmpty check that places that should be empty are empty - PROG
##  6. Make readme.md for the package in Markdown - DONE
##  7. MakeLabel implementation - DONE
##  8. 420 sec BeatingDuration parameter implementation - DONE
##     a. Add extractErr clause for checkDuration verbatim - DONE
##     b. Add assignCorr clause - DONE
##     c. Add dbChecker clause - DONE
#3  9. Modify roxygen tags - DONE
##
## TARGET: Darwin Core implementation, Kokua implementation
## NOTE: synonym vectors do not include '' and mark empty entries as misspelled

## 01.30.17 ROMINGER - NEW MEXICO X HILGARD 305
## 1. Modify .regexMatch to .indexMatch, match verbatim to synVector[1], return synVector[2] - DONE
## 2. Allow users to add values to synonym tables in Shiny App
##   a. Think about allowing Shiny app to directly edit the database
##   b. Look into package for reading and writing directly to Google Docs
##   c. Have app push and commit change to Github or save to somewhere with timestamp of previous version
##   d. Look into Google's options for version control
##   e. Andy will learn Shiny - OK
## 3. Check in with Henrik and the Sorters - SCHEDULED
##   a. See the latest output of checkDb - DONE
## 4. checkDuration 420 update  - DONE
##   a. First aggregate plots using verbatim and samplingRd - NOTED
##   b. Look into pplyr (ddply) and aggregate functions - NOTED
## 5. Use blueJeans to have meetings - DONE
## 6. Look up and send Andy a link to Wrangle - DONE
## 7. Test update to checkEmpty
## 8. Rewrite tests - DELAYED

## 02.06.17 ROMINGER - NEW MEXICO X 1910 OXFORD
## 1. Maybe add switch function to assignCorr - DONE
## 2. Rewrite unit tests
##   a. `test_that('checkEmpty finds missing entries', {    expect_equal(check$errMessage, 'Collector')})`
## 3. Rewrite Makelabels with Markdown - DONE

## 02.07.17 HUANG
## 1. Debug indexMatch - DONE
## 2. Allow user to specify regex, index, or leven in checkDb - DONE
## 3. Rename dbChecker to checkDb - DONE
## 4. Move .synValues - DONE

## 02.16.17 HUANG
## 1. Because .extractErr and .assignCorr tracks the error by HDIM, verbatim values
##    are aggregated in the case of duplicate HDIMs and may interfere
##    with the autocorrection process - IGNORE

## 02.16.17 ROMINGER - NEW MEXICO X 305 HILGARD
## 1. fakeLabels function integration with Shiny - DONE
##   a. knitr::kable # http://stackoverflow.com/questions/15488350/ \ - DONE
##      programmatically-creating-markdown-tables-in-r-with-knitr
##   b. rmarkdown::render # http://stackoverflow.com/questions/28507693/ \ - DONE
##      call-rmarkdown-on-command-line-using-a-r-that-is-passed-a-file
##   c. Shiny integration # https://shiny.rstudio.com/articles/generating-reports.html - DONE
## 2. Time Benchmark with: - DONE
##    >>> t0 <- proc.time()
##    >>> elapsed <- proc.time() - t0
## 3. Shiny App checkDB one tab two window implementation

## 02.21.17 ROMINGER - NEW MEXICO X 220 WELLMAN
## 1. Implement fakeLabels in Shiny - DONE

## 02.28.17 ROMINGER - NEW MEXICO X 305 HILGARD
## 1. Implement makeLabels in Shiny - DONE

## 03.10.17 ROMINGER - NEW MEXICO X 1910 OXFORD
## 1. textInput for repID - DONE
## 2. Upload new version to the shiny.io cloud - DONE
## 3. Put up github issue "from and to are of different lengths" - DONE
## 4. Kokua Data package? Look at Andy's link - DONE

## 03.17.17 ROMINGER - NEW MEXICO X DAVIS HOUSE
## 1. checkTime - DONE
##   a. Date / Date End - DONE
##   b. Time Begin/ Time End - DONE
##   c. Valid range for collection dates: March 1, 2014 to January 1, 2016.
##      Format should be %m/%d/YYYY (e.g. 7/8/2015 or 07/08/2015 are both valid,
##      but 7/8/15 would not be valid). - NOTED
##   d. TimeBegin and TimeEnd should be in 24hr time (i.e. military time) - DONE
## 2. Allow users to add values to synonym tables in Shiny App
##   a. Think about allowing Shiny app to directly edit the database
##   b. Look into package for reading and writing directly to Google Docs
##   c. Have app push and commit change to Github or save to somewhere with timestamp of
##      previous version.
## 3. Reduce hdimDB runtime
##   a. Isolate checker functions to compare runtime - DONE
##   b. Benchmark .levenshteinMatch and .indexMatch - DONE
##   c. Find source of runtime - DONE
## 4. Send the lab a link of the Shiny App - DONE