6  Dates

In this chapter, we explore how to handle dates in R. As a key data type, dates require special attention to ensure accurate manipulation and analysis. We’ll cover how to use regular expressions to recognize and format dates, and introduce the powerful lubridate package, which simplifies many common tasks like parsing, comparing, and transforming dates.

6.1 Parsing and Formatting Dates

Dates are treated as a distinct data type in R, which implies that, respectively, R has specific functions and methods designed to handle date data. This specialized treatment ensures that dates are managed accurately for operations such as comparisons, calculations, and formatting. On the one hand, dates can be seen as characters and be written in many different ways. For example, we can write the 1st of January 2024 as “1/1/2024”, “01/01/2024” or “2024/01/01”. On the other hand, dates follow a specific sequence, which allows them to be treated as quantities for purposes such as sorting and comparing. For instance, we can see the date “02/01/2024” as a later date (and so as a “higher” value) than “01/01/2024”.

In R, we transform a character into a date with the as.Date() function. This function has two main arguments: the value that we want to transform and the format (or regular expression) of that value. As discussed, a date can be written in many different ways and, therefore, we need to specify which format of the values we assume. This is especially true when we have dates of ambiguous form, such as “01/02/2024”: in Europe, this date would be seen as “1st of February 2024” while in the U.S.A. that date would mean “2nd of January 2024”. To specify the form, we use the percentage character (%) and the first letter of the words “year”, “month” or “date”. For instance, to specify that the date 01/02/2024 is “1st of February 2024” we use the format “%d/%m/%Y”, in which “d” comes from the word “day”, “m” comes from the word “month” and “Y” comes from the word “year”.

# Date example
as.Date("01/02/2024", format = "%d/%m/%Y")
[1] "2024-02-01"
# Class pf date example
class(as.Date("01/02/2024", format = "%d/%m/%Y"))
[1] "Date"

As shown, we use the same special characters in the format as in the date value, in the same order, replacing the numbers with their corresponding placeholders. We also see that the output is a date type value, formatted so that it shows the year first, then the month and, finally, the day. Due to the ambiguity in interpreting dates, a global standard numeric date format called ISO 8601 has been developed. The intuition of ISO 8601 is exactly the same as the output we received above: all components of a date appear in an order of decreasing units. In our example, the output starts with the year, then the month, and then the day. Additionally, each component has a fixed number of digits. We saw that the year has four digits, while the month and the day have two (including leading zeros if necessary).

6.2 Regular Expressions

Now that we’ve explored basic date transformations using the as.Date() function, let’s dive into how regular expressions help manage the various date formats. As dates can be written in so many different ways, there are similarly many different regular expressions that we can use. In the context of dates, regular expressions (regex) are used to identify and extract date patterns from strings, allowing you to match different date formats (e.g., “01/01/2024” or “2024-01-01”) for conversion or manipulation. In other words, regular expressions (in this case) are character values that describe date formats by specifying the exact pattern in which the date components (day, month, and year) appear. This is particularly useful when working with messy or inconsistent date data, as regex can help filter or transform strings into a standardized format. By identifying common patterns in the data such as numbers separated by slashes or dashes, regex enables us to handle dates more efficiently, ensuring they are correctly parsed and ready for further operations like comparisons or calculations.

The list below provides the characters we can use in the format argument, along with their description and two examples (per character).

Character Description Example
d Numeric day of month 5, 6
a Abbrevation of day of the week Mon, Tue
A Full name of day of week Monday, Tuesday
m Numeric month of year 5, 6
b Abbrevation of month Jun, Jul
B Full name of month June, July
y Year without century 23, 24
Y Year with century 2023, 2024

In the examples below, we see how to use the appropriate characters to transform different date representations, converting them to the standard ISO 8601 format.

# Date: 3rd of March 2023
as.Date("20230303", format = "%Y%m%d")
[1] "2023-03-03"
# Date: 3rd of March 2023
as.Date("20230303", format = "%Y%d%m")
[1] "2023-03-03"
# Date: 3rd of August 2022
as.Date("2022-08/03", format = "%Y-%m/%d")
[1] "2022-08-03"
# Date: 12th of July 2003
as.Date("03 Jul 12", format = "%y %b %d")
[1] "2003-07-12"
# Date: 23rd of June 2024
as.Date("June 23/2024", format = "%B %d/%Y")
[1] "2024-06-23"

Having properly represented dates, we can make comparisons between them, with the assumption that a most recent date is of higher value than an older date. In the examples below, we get the output TRUE or FALSE, depending on whether each statement holds.

# Is '3rd of March 2023' equal to '4th of March 2023'?
as.Date("20230303", format = "%Y%m%d") ==
  as.Date("20230403", format = "%Y%d%m")
[1] FALSE
# Is '3rd of August 2022' less recent than '12th of July 2003'
as.Date("2022-08/03", format = "%Y-%m/%d") < 
  as.Date("03 Jul 12", format = "%y %b %d")
[1] FALSE
# Is '23rd of June 2024' more recent or equal to '23rd of June 2024'
as.Date("Jun 23/2024", format = "%b %d/%Y") >= 
  as.Date("June 23/2024", format = "%B %d/%Y")
[1] TRUE

Respectively, we can add or subtract days from a date, as we would do with numeric data, , as well as find the difference in days between two dates.

# Add 7 days to '1st of June 2025'
as.Date("20250106", format = "%Y%d%m") + 7
[1] "2025-06-08"
# Subtract 4 days from '1st of June 2025'
as.Date("20250106", format = "%Y%d%m") - 4
[1] "2025-05-28"
# Difference in days between '8th of June 2025' and '28th of May 2025'
as.Date("20250806", format = "%Y%d%m") - 
  as.Date("20252805", format = "%Y%d%m")
Time difference of 11 days
# Transform the above difference into a numeric value
as.numeric(as.Date("20250806", format = "%Y%d%m") -
             as.Date("20252805", format = "%Y%d%m"))
[1] 11

Notice how in the last example we used the function as.numeric() to transform the output into a numeric value. This is a great solution if we want to calculate differences between dates in a data frame and get a single number as an output. Although we could stick with this approach, we have a better option when we want to calculate date differences. The function difftime() comes in handy and can do this calculation for us. It is better to use this function due to the extra argument units, which specifies in which units of measurement we want to see the difference (e.g. days or months). Of course, we can still use the function as.numeric() to transform the output into a numeric value. Below, we use the function difftime() to calculate the difference of the same dates as we did previously, but specifying also the units of measurement.

# Difference in days between '8th of June 2025' and '28th of May 2025'
difftime(as.Date("20250806", format = "%Y%d%m"), 
         as.Date("20252805", format = "%Y%d%m"), units = "days")
Time difference of 11 days
# Difference in weeks between '8th of June 2025' and '28th of May 2025'
difftime(as.Date("20250806", format = "%Y%d%m"),
         as.Date("20252805", format = "%Y%d%m"), units = "weeks")
Time difference of 1.571429 weeks

6.3 The lubridate Package

One of the most well-known and useful packages in R for working with dates is the lubridate package. With this package, we will be able to handle dates much easier than using base R code alone. Nonetheless, understanding the base R logic for handling dates is essential, as it not only helps us grasp the full potential of lubridate but also makes it easier to interpret and work with R scripts involving dates. So, let’s install and load the lubridate package in our console:

# Library
library(lubridate)

Previously, we learned how to specify the format of a character value in order to get a date. With lubridate, we have functions that create date outputs directly from a character value. The examples below show how easily we can extract dates from different character values (we use the exact same dates as before).

# Date: 3rd of March 2023
ymd("20230303")
[1] "2023-03-03"
# Date: 3rd of March 2023
ydm("20230303")
[1] "2023-03-03"
# Date: 3rd of August 2022
ymd("2022-08/03")
[1] "2022-08-03"
# Date: 12th of July 2003
ymd("03 Jul 12")
[1] "2003-07-12"
# Date: 23rd of June 2024
mdy("June 23/2024", format = "%B %d/%Y")
Warning: 1 failed to parse.
[1] "2024-06-23" NA          

The lubridate package offers a range of functions to easily convert character strings into date values. Since we’ve previously covered how regular expressions are used to recognize dates, understanding that ‘y’, ‘m’, and ‘d’ stand for ‘year’, ‘month’, and ‘day’ respectively, becomes intuitive. These functions are essential for converting dates in different formats, therefore ensuring consistency when working with date data. For example:

  • ymd() expects a date in the order of year, month, and day.

  • ydm() works when the input follows a year, day, month format.

  • mdy() converts month, day, and year.

  • myd() handles month, year, day.

  • dmy() is for day, month, year.

  • dym() accepts day, year, month.

These functions make it easy to manage various date formats in your data, thus enabling seamless transformations into a standard date type.

The important thing is to be sure about the order of time components of a date (e.g. firstly we have day, then year then month). Additionally, we saw these functions ignore separators and are not only for numeric values; we can use the “m” for the values “07” (which symbolizes July) and “July” itself. For instance, both of the following strings represent the same date — July 18, 2025 — and lubridate correctly interprets them even though one uses a numeric month and the other uses the full name of the month:

# Numeric month
mdy("07-18-2025")   # Returns: "2025-07-18"
[1] "2025-07-18"
# Textual month
mdy("July 18, 2025")  # Also returns: "2025-07-18"
[1] "2025-07-18"

The mdy() function knows to look for the month first, then the day, and then the year, regardless of how the month is written or what separators are used.

The lubridate package also provides the parse_date_time() function which is very similar to the as.Date() function that we saw at the beginning of this chapter. To transform a string value into a date, we simply include this string and the relevant format inside parentheses like this:

# Date: 3rd of March 2023
parse_date_time("20230303", order = "ymd")
[1] "2023-03-03 UTC"

We can also use this function to transform multiple date values at once, even if they are in different formats (as long as we know which value is of which format).

# Dates: 3rd of March 2023 and 9th of October 2020
parse_date_time(c("20230303", "10-09-2020"), order = c("ymd", "mdy"))
[1] "2023-03-03 UTC" "2020-10-09 UTC"

6.4 Make Dates

Sometimes, instead of transforming a character value to a date, we may want to combine different elements in order to generate a date. For instance, suppose we have the following data frame.

# Create a data frame "dates"
dates <- data.frame(Year = c("2021", "2022", "2023", "2024", "2025"),
                    Month = c("01", "01", "01", "01", "01"),
                    Day = c("01", "01", "01", "01", "01"))

# Print the results
dates
  Year Month Day
1 2021    01  01
2 2022    01  01
3 2023    01  01
4 2024    01  01
5 2025    01  01

In this data frame, we have the year, month and day in separate columns. We can use the make_date() function to combine the values of each column and create a date as a result. This function includes the arguments year, month and day which are used to specify the individual components of a date. We can use this function on our data frame “dates” in order to get a vector of the date of each row.

# Make a vector of dates
make_date(year = dates$Year, month = dates$Month, day = dates$Day)
[1] "2021-01-01" "2022-01-01" "2023-01-01" "2024-01-01" "2025-01-01"

Of course, we could store this vector to the same data frame, keeping both the individual components as well as the full date.

# Store the vector of dates
dates$Full_Date <- make_date(year = dates$Year, month = dates$Month, day = dates$Day)

# Print the results
dates
  Year Month Day  Full_Date
1 2021    01  01 2021-01-01
2 2022    01  01 2022-01-01
3 2023    01  01 2023-01-01
4 2024    01  01 2024-01-01
5 2025    01  01 2025-01-01

Regarding making dates, it is important to note that both base R and the lubridate package provide functions to print today’s date automatically. This is very useful for example when we want to create an R Script that is kept up to date automatically. To print the current date, we can either use the Sys.Date() function from base R or the function today() from the lubridate package: both functions will print the exact same thing.

6.5 Extract Date Parts

Up to this point, we learned how to create a full date from individual character components. With lubridate, we can also extract different parts from a date value, such as the year or the month. For example, we can use the year() function to extract the year of the date “2023-05-06”.

# Extract year from "2023-05-06"
year("2023-05-06")
[1] 2023

It is not difficult to imagine that we can extract all the different parts from a full date with respective lubridate functions. The code below shows those functions and the output that we get when we use them on the date “2023-05-06”.

# Extract month from "2023-05-06"
month("2023-05-06")
[1] 5
# Extract numeric day from "2023-05-06"
day("2023-05-06")
[1] 6
# Extract week day (from 1 to 7) from "2023-05-06"
wday("2023-05-06")
[1] 7
# Extract day of year (from 1 to 365 or 366) from "2023-05-06"
yday("2023-05-06")
[1] 126
# Extract quarter (from 1 to 4) from "2023-05-06"
quarter("2023-05-06")
[1] 2
# Extract semester (from 1 to 2) from "2023-05-06"
semester("2023-05-06")
[1] 1