Basic Processing

In this article, we will cover fundamental techniques for manipulating and analyzing time series data. This includes tasks such as creating time series, summarizing data based on time indices, identifying trends, and more.

Time series data often comes with specific considerations related to time zones, varying numbers of days in months, and leap years.

  1. Time Zones: Time series data collected from different regions or sources may be recorded in various time zones. Converting data to a consistent time zone is crucial to ensure accurate analysis and visualization, especially for data with hourly resolution.

  2. Varying Days in Months: Some months have 30 days, while others have 31, and February can have 28 or 29 days in leap years. This variation should be considered when performing calculations based on monthly or daily data.

  3. Leap Years: Leap years, which occur every four years, add an extra day (February 29) to the calendar. Analysts need to account for leap years when working with annual time series data to avoid inconsistencies.

Properly accounting for these specifics is crucial for accurate analysis and interpretation of time series data.

1 Library

Time series data structures are not standard in R, but the xts package is commonly used to work with time indices. However, it’s important to note that for processes that don’t rely on specific time indexing, the original data structure is sufficient. Time series structures are particularly useful when you need to perform time-based operations and analysis.

library(xts)
library(tidyverse)

2 Example Files

The example files provided consist of three discharge time series for the Ruhr River in the Rhein basin, Germany. These data sets are sourced from open data available at ELWAS-WEB NRW. You can also access it directly from the internet via Github.

fn_Bachum <- "https://raw.githubusercontent.com/HydroSimul/Web/main/data_share/Bachum_2763190000100.csv"
fn_Oeventrop <- "https://raw.githubusercontent.com/HydroSimul/Web/main/data_share/Oeventrop_2761759000100.csv"
fn_Villigst <- "https://raw.githubusercontent.com/HydroSimul/Web/main/data_share/Villigst_2765590000100.csv"

3 Create data

Before creating a time series structure, the data should be loaded into R. Time series in R can typically (only) support two-dimensional data structures, such as matrices and data frames.

If the date-time information is not correctly recognized during reading or if there is no time data present, you need to make sure that you have a valid time index.

There are two primary ways to create a time series in R:

  • xts(): With this method, you explicitly specify the time index and create a time series object. This is useful when you have a matrix with an external time index.

  • as.xts(): This method is more straightforward and is suitable when you have a data frame with a date column. The function will automatically recognize the date column and create a time series.

# Read a CSV file as data.frame
df_Bachum <- read_csv2(fn_Bachum, skip = 10, col_names = FALSE)
df_Villigst <- read_csv2(fn_Villigst, skip = 10, col_names = FALSE)

# Convert Date column to a Date type
df_Bachum$X1 <- as_date(df_Bachum$X1, format = "%d.%m.%Y")
df_Villigst$X1 <- as_date(df_Villigst$X1, format = "%d.%m.%Y")

# Create an xts object
xts_Bachum <- xts(df_Bachum$X2, order.by = df_Bachum$X1)
xts_Villigst <- as.xts(df_Villigst)

4 Merging Several Time Series

In R, the time index is consistent and follows a standardized format. This consistency in time indexing makes it easy to combine multiple time series into a single dataset based on their time index.

  • merge()
xts_Rhur <- merge(xts_Bachum, xts_Villigst)
names(xts_Rhur) <- c("Bachum", "Villigst")

It’s worth noting that when working with time series data in R, the length of the time series doesn’t necessarily have to be the same for all time series. This flexibility allows you to work with data that may have missing or varying data points over time, which is common in many real-world scenarios.

length(xts_Bachum)
[1] 12053
length(xts_Villigst)
[1] 11499

5 Subsetting (Index with time)

You can work with time series data in R using both integer indexing, and time-based indexing using time intervals.

# Create a time sequence
ts_Inteval <- seq(as_date("1996-01-01"), as_date("1996-12-31"), "days")

# Subset
xts_Inteval <- xts_Rhur[ts_Inteval, ]
head(xts_Inteval, 10)
           Bachum Villigst
1996-01-01 13.459    11.03
1996-01-02 12.331    10.03
1996-01-03 11.112     9.12
1996-01-04 11.272     8.11
1996-01-05 11.412     8.71
1996-01-06 11.526     8.29
1996-01-07 12.589     9.45
1996-01-08 12.508    10.09
1996-01-09 12.336     9.42
1996-01-10 12.510     8.47

6 Rolling Windows

Moving averages are a valuable tool for smoothing time series data and uncovering underlying trends or patterns. With rolling windows, you can calculate not only the mean value but also other statistics like the median and sum. To expand the range of functions available, you can utilize the rollapply(). This enables you to apply a wide variety of functions to your time series data within specified rolling windows.

  • rollmean()
  • rollmedian()
  • rollsum()
  • rollmax()
xts_RollMean <- rollmean(xts_Inteval, 7)
head(xts_RollMean, 10)
             Bachum Villigst
1996-01-04 11.95729 9.248571
1996-01-05 11.82143 9.114286
1996-01-06 11.82214 9.027143
1996-01-07 12.02186 8.934286
1996-01-08 12.23314 9.242857
1996-01-09 12.37214 9.238571
1996-01-10 12.61357 9.541429
1996-01-11 12.62643 9.535714
1996-01-12 12.56257 9.357143
1996-01-13 12.50186 9.242857

7 Summary in Calendar Period

Dealing with irregularly spaced time series data can be challenging. One fundamental operation in time series analysis is applying a function by calendar period. This process helps in summarizing and analyzing time series data more effectively, even when the data points are irregularly spaced in time.

  • apply.daily()
  • apply.weekly()
  • apply.monthly()
  • apply.quarterly()
  • apply.yearly()
xts_Month <- apply.monthly(xts_Inteval, mean)
xts_Month
              Bachum  Villigst
1996-01-31 12.478387  9.348065
1996-02-29 15.794241 17.403448
1996-03-31 14.244613 13.252903
1996-04-30 10.217533  7.310667
1996-05-31  9.331129  7.094839
1996-06-30 10.589067  6.700667
1996-07-31 11.607968  8.248710
1996-08-31 12.897806  9.410968
1996-09-30 14.516733 12.750000
1996-10-31 18.214161 17.702903
1996-11-30 30.673967 35.472667
1996-12-31 35.720290 39.940645