vignettes/OtherData.Rmd
OtherData.Rmd
While driftR
is designed to work seamlessly with output from YSI Multiparameter V2 Sonde, YSI EXO2, and Onset U24 Conductivity Logger instruments, it can also be used to correct data from other sources. There are only a few steps that would be needed to get data into a tidy driftR
format. Below are example data after they have been imported using the dr_read()
function. This is the expected format that driftR
requires, so data from other sources must be modified to this configuration.
# A tibble: 1,527 x 11
Date Time Temp SpCond pH pHmV Chloride AmmoniumN NitrateN Turbidity DO
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl>
1 9/18/2015 12:10:49 14.76 0.754 7.18 -36.4 51.22 3.35 0 3.7 92.65
2 9/18/2015 12:15:50 14.64 0.750 7.14 -34.1 49.62 6.29 0 -0.2 93.73
3 9/18/2015 12:20:51 14.57 0.750 7.14 -33.9 49.75 7.84 0 -0.1 93.95
4 9/18/2015 12:25:51 14.51 0.749 7.13 -33.9 50.32 7.67 0 -0.2 93.23
5 9/18/2015 12:30:51 14.50 0.749 7.13 -33.6 50.74 7.13 0 0.0 92.74
6 9/18/2015 12:35:51 14.63 0.749 7.13 -33.5 50.84 6.49 0 0.0 93.71
7 9/18/2015 12:40:51 14.69 0.749 7.13 -33.6 50.66 5.78 0 -0.2 94.56
8 9/18/2015 12:45:51 14.66 0.749 7.12 -33.3 50.23 5.32 0 -0.2 94.16
9 9/18/2015 12:50:52 14.65 0.749 7.12 -33.3 50.49 4.89 0 -0.2 93.58
10 9/18/2015 12:55:51 14.69 0.749 7.12 -33.1 50.04 4.60 0 -0.2 93.80
# ... with 1,517 more rows
The sections below detail pre-processing steps that you may have to take to prepare your data for use with driftR
.
Data come in a variety of formats, and importing them into R
can occasionally be a challenge.
csv
, tsv
, txt
, or another delimited file format, we recommend using the readr
package.readxl
package.haven
package.RODBC
package. This will require a Windows computer, 32-bit R
, and either Microsoft Access or the appropriate drivers installed.R
.All of the example code below assumes that you have a data frame named waterData
.
No metadata should be stored in the observations. If metadata are present, remove them using the following technique. (This example assumes that metadata is stored in row 1):
If there are multiple lines of metadata, they can be removed like so:
Given the typically large data sets for these intruments, we encourage (but do not enforce) data to be stored as tibbles. Tibbles are the tidyverse
implementation of data frames. They print in a more organized manner and they behave in a more stable fashion. To convert your data to a tibble, use the function as_tibble()
:
Variable names should be short and descriptive. We recommend using camelCase
or snake_case
to name variables. Use the rename()
function from dplyr
to accomplish this. The function accepts the data frame name followed by a comma and the new name set equal to the old name:
If you have a number of variables to rename, you can pipe them together:
Please check out our vignette on dates and times in driftR for additional details on how these should be formatted.
driftR
makes no direct use of the Temp
data included in output. The weathermetrics
package includes functions for conversions between Celsius and Fahrenheit.
Beyond date and time data, all variables should be stored as either double, integer, or numeric values:
Finally, if there are unnecessary variables left in your data set at the end of the pre-processing stage, you can use the select()
function from dplyr
to remove them. The function accepts the data frame name followed by a comma and a list of the variables to be removed inside -c(varlist)
:
Like all other dplyr
functions, select()
can be included in a pipe as well.