# Date example
as.Date("01/02/2024", format = "%d/%m/%Y")
[1] "2024-02-01"
# Class pf date example
class(as.Date("01/02/2024", format = "%d/%m/%Y"))
[1] "Date"
In this chapter, we explore how to handle dates in R. As a key data
type, dates require special attention to ensure accurate
manipulation and analysis. We’ll cover how to use regular
expressions to recognize and format dates, and introduce the
powerful lubridate package, which simplifies many
common tasks like parsing, comparing, and transforming dates.
Dates are treated as a distinct data type in R, which implies that, respectively, R has specific functions and methods designed to handle date data. This specialized treatment ensures that dates are managed accurately for operations such as comparisons, calculations, and formatting. On the one hand, dates can be seen as characters and be written in many different ways. For example, we can write the 1st of January 2024 as “1/1/2024”, “01/01/2024” or “2024/01/01”. On the other hand, dates follow a specific sequence, which allows them to be treated as quantities for purposes such as sorting and comparing. For instance, we can see the date “02/01/2024” as a later date (and so as a “higher” value) than “01/01/2024”.
In R, we transform a character into a date with the
as.Date() function. This function has two main
arguments: the value that we want to transform and the format (or
regular expression) of that value. As discussed, a date can be
written in many different ways and, therefore, we need to specify
which format of the values we assume. This is especially true when
we have dates of ambiguous form, such as “01/02/2024”: in Europe,
this date would be seen as “1st of February 2024” while in the
U.S.A. that date would mean “2nd of January 2024”. To specify the
form, we use the percentage character (%) and the first letter of
the words “year”, “month” or “date”. For instance, to specify that
the date 01/02/2024 is “1st of February 2024” we use the format
“%d/%m/%Y”, in which “d” comes from the word “day”, “m” comes from
the word “month” and “Y” comes from the word “year”.
# Date example
as.Date("01/02/2024", format = "%d/%m/%Y")
[1] "2024-02-01"
# Class pf date example
class(as.Date("01/02/2024", format = "%d/%m/%Y"))
[1] "Date"
As shown, we use the same special characters in the format as in the date value, in the same order, replacing the numbers with their corresponding placeholders. We also see that the output is a date type value, formatted so that it shows the year first, then the month and, finally, the day. Due to the ambiguity in interpreting dates, a global standard numeric date format called ISO 8601 has been developed. The intuition of ISO 8601 is exactly the same as the output we received above: all components of a date appear in an order of decreasing units. In our example, the output starts with the year, then the month, and then the day. Additionally, each component has a fixed number of digits. We saw that the year has four digits, while the month and the day have two (including leading zeros if necessary).
Now that we’ve explored basic date transformations using the
as.Date() function, let’s dive into how regular
expressions help manage the various date formats. As dates can be
written in so many different ways, there are similarly many
different regular expressions that we can use. In the context of
dates, regular expressions (regex) are used to
identify and extract date patterns from strings, allowing you to
match different date formats (e.g., “01/01/2024” or “2024-01-01”)
for conversion or manipulation. In other words, regular expressions
(in this case) are character values that describe date formats by
specifying the exact pattern in which the date components (day,
month, and year) appear. This is particularly useful when working
with messy or inconsistent date data, as regex can help filter or
transform strings into a standardized format. By identifying common
patterns in the data such as numbers separated by slashes or dashes,
regex enables us to handle dates more efficiently, ensuring they are
correctly parsed and ready for further operations like comparisons
or calculations.
The list below provides the characters we can use in the format argument, along with their description and two examples (per character).
| Character | Description | Example |
|---|---|---|
| d | Numeric day of month | 5, 6 |
| a | Abbrevation of day of the week | Mon, Tue |
| A | Full name of day of week | Monday, Tuesday |
| m | Numeric month of year | 5, 6 |
| b | Abbrevation of month | Jun, Jul |
| B | Full name of month | June, July |
| y | Year without century | 23, 24 |
| Y | Year with century | 2023, 2024 |
In the examples below, we see how to use the appropriate characters to transform different date representations, converting them to the standard ISO 8601 format.
# Date: 3rd of March 2023
as.Date("20230303", format = "%Y%m%d")
[1] "2023-03-03"
# Date: 3rd of March 2023
as.Date("20230303", format = "%Y%d%m")
[1] "2023-03-03"
# Date: 3rd of August 2022
as.Date("2022-08/03", format = "%Y-%m/%d")
[1] "2022-08-03"
# Date: 12th of July 2003
as.Date("03 Jul 12", format = "%y %b %d")
[1] "2003-07-12"
# Date: 23rd of June 2024
as.Date("June 23/2024", format = "%B %d/%Y")
[1] "2024-06-23"
Having properly represented dates, we can make comparisons between them, with the assumption that a most recent date is of higher value than an older date. In the examples below, we get the output TRUE or FALSE, depending on whether each statement holds.
# Is '3rd of March 2023' equal to '4th of March 2023'?
as.Date("20230303", format = "%Y%m%d") ==
as.Date("20230403", format = "%Y%d%m")
[1] FALSE
# Is '3rd of August 2022' less recent than '12th of July 2003'
as.Date("2022-08/03", format = "%Y-%m/%d") <
as.Date("03 Jul 12", format = "%y %b %d")
[1] FALSE
# Is '23rd of June 2024' more recent or equal to '23rd of June 2024'
as.Date("Jun 23/2024", format = "%b %d/%Y") >=
as.Date("June 23/2024", format = "%B %d/%Y")
[1] TRUE
Respectively, we can add or subtract days from a date, as we would do with numeric data, , as well as find the difference in days between two dates.
# Add 7 days to '1st of June 2025'
as.Date("20250106", format = "%Y%d%m") + 7
[1] "2025-06-08"
# Subtract 4 days from '1st of June 2025'
as.Date("20250106", format = "%Y%d%m") - 4
[1] "2025-05-28"
# Difference in days between '8th of June 2025' and '28th of May 2025'
as.Date("20250806", format = "%Y%d%m") -
as.Date("20252805", format = "%Y%d%m")
Time difference of 11 days
# Transform the above difference into a numeric value
as.numeric(as.Date("20250806", format = "%Y%d%m") -
as.Date("20252805", format = "%Y%d%m"))
[1] 11
Notice how in the last example we used the function
as.numeric() to transform the output into a numeric
value. This is a great solution if we want to calculate differences
between dates in a data frame and get a single number as an output.
Although we could stick with this approach, we have a better option
when we want to calculate date differences. The function
difftime() comes in handy and can do this calculation
for us. It is better to use this function due to the extra argument
units, which specifies in which units of measurement we
want to see the difference (e.g. days or months). Of course, we
can still use the function as.numeric() to transform
the output into a numeric value. Below, we use the function
difftime() to calculate the difference of the same
dates as we did previously, but specifying also the units of
measurement.
# Difference in days between '8th of June 2025' and '28th of May 2025'
difftime(as.Date("20250806", format = "%Y%d%m"),
as.Date("20252805", format = "%Y%d%m"), units = "days")
Time difference of 11 days
# Difference in weeks between '8th of June 2025' and '28th of May 2025'
difftime(as.Date("20250806", format = "%Y%d%m"),
as.Date("20252805", format = "%Y%d%m"), units = "weeks")
Time difference of 1.571429 weeks
lubridate Package
One of the most well-known and useful packages in R for working with
dates is the lubridate package. With this package, we
will be able to handle dates much easier than using base R code
alone. Nonetheless, understanding the base R logic for handling
dates is essential, as it not only helps us grasp the full potential
of lubridate but also makes it easier to interpret and
work with R scripts involving dates. So, let’s install and load the
lubridate package in our console:
# Library
library(lubridate)
Previously, we learned how to specify the format of a character
value in order to get a date. With lubridate, we have
functions that create date outputs directly from a character value.
The examples below show how easily we can extract dates from
different character values (we use the exact same dates as before).
# Date: 3rd of March 2023
ymd("20230303")
[1] "2023-03-03"
# Date: 3rd of March 2023
ydm("20230303")
[1] "2023-03-03"
# Date: 3rd of August 2022
ymd("2022-08/03")
[1] "2022-08-03"
# Date: 12th of July 2003
ymd("03 Jul 12")
[1] "2003-07-12"
# Date: 23rd of June 2024
mdy("June 23/2024", format = "%B %d/%Y")
Warning: 1 failed to parse.
[1] "2024-06-23" NA
The lubridate package offers a range of functions to
easily convert character strings into date values. Since we’ve
previously covered how regular expressions are used to recognize
dates, understanding that ‘y’, ‘m’, and ‘d’ stand for ‘year’,
‘month’, and ‘day’ respectively, becomes intuitive. These functions
are essential for converting dates in different formats, therefore
ensuring consistency when working with date data. For example:
ymd() expects a date in the order of year, month,
and day.
ydm() works when the input follows a year, day,
month format.
mdy() converts month, day, and year.
myd() handles month, year, day.
dmy() is for day, month, year.
dym() accepts day, year, month.
These functions make it easy to manage various date formats in your data, thus enabling seamless transformations into a standard date type.
The important thing is to be sure about the order of time components
of a date (e.g. firstly we have day, then year then month).
Additionally, we saw these functions ignore separators and are not
only for numeric values; we can use the “m” for the values “07”
(which symbolizes July) and “July” itself. For instance, both of the
following strings represent the same date — July 18, 2025 — and
lubridate correctly interprets them even though one
uses a numeric month and the other uses the full name of the month:
# Numeric month
mdy("07-18-2025") # Returns: "2025-07-18"
[1] "2025-07-18"
# Textual month
mdy("July 18, 2025") # Also returns: "2025-07-18"
[1] "2025-07-18"
The mdy() function knows to look for the month first,
then the day, and then the year, regardless of how the month is
written or what separators are used.
The lubridate package also provides the
parse_date_time() function which is very similar to the
as.Date() function that we saw at the beginning of this
chapter. To transform a string value into a date, we simply include
this string and the relevant format inside parentheses like this:
# Date: 3rd of March 2023
parse_date_time("20230303", order = "ymd")
[1] "2023-03-03 UTC"
We can also use this function to transform multiple date values at once, even if they are in different formats (as long as we know which value is of which format).
# Dates: 3rd of March 2023 and 9th of October 2020
parse_date_time(c("20230303", "10-09-2020"), order = c("ymd", "mdy"))
[1] "2023-03-03 UTC" "2020-10-09 UTC"
Sometimes, instead of transforming a character value to a date, we may want to combine different elements in order to generate a date. For instance, suppose we have the following data frame.
# Create a data frame "dates"
dates <- data.frame(Year = c("2021", "2022", "2023", "2024", "2025"),
Month = c("01", "01", "01", "01", "01"),
Day = c("01", "01", "01", "01", "01"))
# Print the results
dates
Year Month Day
1 2021 01 01
2 2022 01 01
3 2023 01 01
4 2024 01 01
5 2025 01 01
In this data frame, we have the year, month and day in separate
columns. We can use the make_date() function to combine
the values of each column and create a date as a result. This
function includes the arguments year, month and
day which are used to specify the individual components of
a date. We can use this function on our data frame “dates” in order
to get a vector of the date of each row.
# Make a vector of dates
make_date(year = dates$Year, month = dates$Month, day = dates$Day)
[1] "2021-01-01" "2022-01-01" "2023-01-01" "2024-01-01" "2025-01-01"
Of course, we could store this vector to the same data frame, keeping both the individual components as well as the full date.
# Store the vector of dates
dates$Full_Date <- make_date(year = dates$Year, month = dates$Month, day = dates$Day)
# Print the results
dates
Year Month Day Full_Date
1 2021 01 01 2021-01-01
2 2022 01 01 2022-01-01
3 2023 01 01 2023-01-01
4 2024 01 01 2024-01-01
5 2025 01 01 2025-01-01
Regarding making dates, it is important to note that both base R and
the lubridate package provide functions to print
today’s date automatically. This is very useful for example when we
want to create an R Script that is kept up to date automatically. To
print the current date, we can either use the
Sys.Date() function from base R or the function
today() from the lubridate package: both
functions will print the exact same thing.
Up to this point, we learned how to create a full date from
individual character components. With lubridate, we can
also extract different parts from a date value, such as the year or
the month. For example, we can use the year() function
to extract the year of the date “2023-05-06”.
# Extract year from "2023-05-06"
year("2023-05-06")
[1] 2023
It is not difficult to imagine that we can extract all the different parts from a full date with respective lubridate functions. The code below shows those functions and the output that we get when we use them on the date “2023-05-06”.
# Extract month from "2023-05-06"
month("2023-05-06")
[1] 5
# Extract numeric day from "2023-05-06"
day("2023-05-06")
[1] 6
# Extract week day (from 1 to 7) from "2023-05-06"
wday("2023-05-06")
[1] 7
# Extract day of year (from 1 to 365 or 366) from "2023-05-06"
yday("2023-05-06")
[1] 126
# Extract quarter (from 1 to 4) from "2023-05-06"
quarter("2023-05-06")
[1] 2
# Extract semester (from 1 to 2) from "2023-05-06"
semester("2023-05-06")
[1] 1