Using Data from Other Sources

While driftR is designed to work seamlessly with output from YSI Multiparameter V2 Sonde, YSI EXO2, and Onset U24 Conductivity Logger instruments, it can also be used to correct data from other sources. There are only a few steps that would be needed to get data into a tidy driftR format. Below are example data after they have been imported using the dr_read() function. This is the expected format that driftR requires, so data from other sources must be modified to this configuration.

# A tibble: 1,527 x 11
        Date     Time  Temp SpCond    pH  pHmV Chloride AmmoniumN NitrateN  Turbidity    DO
       <chr>    <chr> <dbl>  <dbl> <dbl> <dbl>    <dbl>     <dbl>    <int>      <dbl> <dbl>
 1 9/18/2015 12:10:49 14.76  0.754  7.18 -36.4    51.22      3.35        0        3.7 92.65
 2 9/18/2015 12:15:50 14.64  0.750  7.14 -34.1    49.62      6.29        0       -0.2 93.73
 3 9/18/2015 12:20:51 14.57  0.750  7.14 -33.9    49.75      7.84        0       -0.1 93.95
 4 9/18/2015 12:25:51 14.51  0.749  7.13 -33.9    50.32      7.67        0       -0.2 93.23
 5 9/18/2015 12:30:51 14.50  0.749  7.13 -33.6    50.74      7.13        0        0.0 92.74
 6 9/18/2015 12:35:51 14.63  0.749  7.13 -33.5    50.84      6.49        0        0.0 93.71
 7 9/18/2015 12:40:51 14.69  0.749  7.13 -33.6    50.66      5.78        0       -0.2 94.56
 8 9/18/2015 12:45:51 14.66  0.749  7.12 -33.3    50.23      5.32        0       -0.2 94.16
 9 9/18/2015 12:50:52 14.65  0.749  7.12 -33.3    50.49      4.89        0       -0.2 93.58
10 9/18/2015 12:55:51 14.69  0.749  7.12 -33.1    50.04      4.60        0       -0.2 93.80
# ... with 1,517 more rows

The sections below detail pre-processing steps that you may have to take to prepare your data for use with driftR.

Importing Data

Data come in a variety of formats, and importing them into R can occasionally be a challenge.

If your data are included in a flat file like a csv, tsv, txt, or another delimited file format, we recommend using the readr package.
If your data are included in a Microsoft Excel file, we recommend using the readxl package.
If your data are included in a SPSS, SAS, or Stata file, we recommend using the haven package.
If your data are in a Microsoft Access database, we recommend using the RODBC package. This will require a Windows computer, 32-bit R, and either Microsoft Access or the appropriate drivers installed.
If your data are in another database format, we recommend this article on using Databases with R.

All of the example code below assumes that you have a data frame named waterData.

Metadata

No metadata should be stored in the observations. If metadata are present, remove them using the following technique. (This example assumes that metadata is stored in row 1):

waterData <- waterData[-1,]

If there are multiple lines of metadata, they can be removed like so:

waterData <- waterData[-c(1, 2),]

Tibbles

Given the typically large data sets for these intruments, we encourage (but do not enforce) data to be stored as tibbles. Tibbles are the tidyverse implementation of data frames. They print in a more organized manner and they behave in a more stable fashion. To convert your data to a tibble, use the function as_tibble():

library(tibble)

waterTibble <- as_tibble(waterData)

Variable Names

Variable names should be short and descriptive. We recommend using camelCase or snake_case to name variables. Use the rename() function from dplyr to accomplish this. The function accepts the data frame name followed by a comma and the new name set equal to the old name:

waterTibble <- rename(waterTibble, date = a.very.long.date.variable.name)

If you have a number of variables to rename, you can pipe them together:

waterTibble <- waterTibble %>%
  rename(date = a.very.long.date.variable.name) %>%
  rename(time = a.very.long.time.variable.name)

Specific Variables

Date and Time

Please check out our vignette on dates and times in driftR for additional details on how these should be formatted.

Temperature

driftR makes no direct use of the Temp data included in output. The weathermetrics package includes functions for conversions between Celsius and Fahrenheit.

Other Variables

Beyond date and time data, all variables should be stored as either double, integer, or numeric values:

waterTibble$measure <- as.numeric(waterTibble$measure)
waterTibble$measure <- as.integer(waterTibble$measure)
waterTibble$measure <- as.double(waterTibble$measure)

Building Pipes with Date, Time, and Other Functions

You can integrate all of the non-tidyverse functions described here into mutate() function calls. The mutate() function is from the dplyr package.

waterTibble <- waterTibble %>%
  mutate(temp = as.double(temp)) %>%
  mutate(pH = as.double(pH)) %>%
  mutate(NitrateN = as.integer(NitrateN))

Removing Unnecessary Variables

Finally, if there are unnecessary variables left in your data set at the end of the pre-processing stage, you can use the select() function from dplyr to remove them. The function accepts the data frame name followed by a comma and a list of the variables to be removed inside -c(varlist):

waterTibble <- select(waterTibble, -c(a.very.long.pH.variable.name, a.very.long.chloride.variable.name))

Like all other dplyr functions, select() can be included in a pipe as well.

Andrew Shaughnessy, Christopher Prener, Elizabeth Hasenmueller

2018-06-13