library(xts)
library(tidyverse)
Basic Processing
In this article, we will cover fundamental techniques for manipulating and analyzing time series data. This includes tasks such as creating time series, summarizing data based on time indices, identifying trends, and more.
Time series data often comes with specific considerations related to time zones, varying numbers of days in months, and leap years.
Time Zones: Time series data collected from different regions or sources may be recorded in various time zones. Converting data to a consistent time zone is crucial to ensure accurate analysis and visualization, especially for data with hourly resolution.
Varying Days in Months: Some months have 30 days, while others have 31, and February can have 28 or 29 days in leap years. This variation should be considered when performing calculations based on monthly or daily data.
Leap Years: Leap years, which occur every four years, add an extra day (February 29) to the calendar. Analysts need to account for leap years when working with annual time series data to avoid inconsistencies.
Properly accounting for these specifics is crucial for accurate analysis and interpretation of time series data.
1 Library
Time series data structures are not standard in R, but the xts
package is commonly used to work with time indices. However, it’s important to note that for processes that don’t rely on specific time indexing, the original data structure is sufficient. Time series structures are particularly useful when you need to perform time-based operations and analysis.
2 Example Files
The example files provided consist of three discharge time series for the Ruhr River in the Rhein basin, Germany. These data sets are sourced from open data available at ELWAS-WEB NRW. You can also access it directly from the internet via Github.
<- "https://raw.githubusercontent.com/HydroSimul/Web/main/data_share/Bachum_2763190000100.csv"
fn_Bachum <- "https://raw.githubusercontent.com/HydroSimul/Web/main/data_share/Oeventrop_2761759000100.csv"
fn_Oeventrop <- "https://raw.githubusercontent.com/HydroSimul/Web/main/data_share/Villigst_2765590000100.csv" fn_Villigst
3 Create data
Before creating a time series structure, the data should be loaded into R. Time series in R can typically (only) support two-dimensional data structures, such as matrices and data frames.
If the date-time information is not correctly recognized during reading or if there is no time data present, you need to make sure that you have a valid time index.
There are two primary ways to create a time series in R:
xts()
: With this method, you explicitly specify the time index and create a time series object. This is useful when you have a matrix with an external time index.as.xts()
: This method is more straightforward and is suitable when you have a data frame with a date column. The function will automatically recognize the date column and create a time series.
# Read a CSV file as data.frame
<- read_csv2(fn_Bachum, skip = 10, col_names = FALSE)
df_Bachum <- read_csv2(fn_Villigst, skip = 10, col_names = FALSE)
df_Villigst
# Convert Date column to a Date type
$X1 <- as_date(df_Bachum$X1, format = "%d.%m.%Y")
df_Bachum$X1 <- as_date(df_Villigst$X1, format = "%d.%m.%Y")
df_Villigst
# Create an xts object
<- xts(df_Bachum$X2, order.by = df_Bachum$X1)
xts_Bachum <- as.xts(df_Villigst) xts_Villigst
4 Merging Several Time Series
In R, the time index is consistent and follows a standardized format. This consistency in time indexing makes it easy to combine multiple time series into a single dataset based on their time index.
merge()
<- merge(xts_Bachum, xts_Villigst)
xts_Rhur names(xts_Rhur) <- c("Bachum", "Villigst")
It’s worth noting that when working with time series data in R, the length of the time series doesn’t necessarily have to be the same for all time series. This flexibility allows you to work with data that may have missing or varying data points over time, which is common in many real-world scenarios.
length(xts_Bachum)
[1] 12053
length(xts_Villigst)
[1] 11499
5 Subsetting (Index with time)
You can work with time series data in R using both integer indexing, and time-based indexing using time intervals.
# Create a time sequence
<- seq(as_date("1996-01-01"), as_date("1996-12-31"), "days")
ts_Inteval
# Subset
<- xts_Rhur[ts_Inteval, ]
xts_Inteval head(xts_Inteval, 10)
Bachum Villigst
1996-01-01 13.459 11.03
1996-01-02 12.331 10.03
1996-01-03 11.112 9.12
1996-01-04 11.272 8.11
1996-01-05 11.412 8.71
1996-01-06 11.526 8.29
1996-01-07 12.589 9.45
1996-01-08 12.508 10.09
1996-01-09 12.336 9.42
1996-01-10 12.510 8.47
6 Rolling Windows
Moving averages are a valuable tool for smoothing time series data and uncovering underlying trends or patterns. With rolling windows, you can calculate not only the mean value but also other statistics like the median and sum. To expand the range of functions available, you can utilize the rollapply()
. This enables you to apply a wide variety of functions to your time series data within specified rolling windows.
rollmean()
rollmedian()
rollsum()
rollmax()
<- rollmean(xts_Inteval, 7)
xts_RollMean head(xts_RollMean, 10)
Bachum Villigst
1996-01-04 11.95729 9.248571
1996-01-05 11.82143 9.114286
1996-01-06 11.82214 9.027143
1996-01-07 12.02186 8.934286
1996-01-08 12.23314 9.242857
1996-01-09 12.37214 9.238571
1996-01-10 12.61357 9.541429
1996-01-11 12.62643 9.535714
1996-01-12 12.56257 9.357143
1996-01-13 12.50186 9.242857
7 Summary in Calendar Period
Dealing with irregularly spaced time series data can be challenging. One fundamental operation in time series analysis is applying a function by calendar period. This process helps in summarizing and analyzing time series data more effectively, even when the data points are irregularly spaced in time.
apply.daily()
apply.weekly()
apply.monthly()
apply.quarterly()
apply.yearly()
<- apply.monthly(xts_Inteval, mean)
xts_Month xts_Month
Bachum Villigst
1996-01-31 12.478387 9.348065
1996-02-29 15.794241 17.403448
1996-03-31 14.244613 13.252903
1996-04-30 10.217533 7.310667
1996-05-31 9.331129 7.094839
1996-06-30 10.589067 6.700667
1996-07-31 11.607968 8.248710
1996-08-31 12.897806 9.410968
1996-09-30 14.516733 12.750000
1996-10-31 18.214161 17.702903
1996-11-30 30.673967 35.472667
1996-12-31 35.720290 39.940645