Characteristics of Functional Programming – Examples in R
Table of Contents
Between 2016 and 2018, most of my projects were heavily written in R. At its core, R has a strong emphasis on functional programming. For example, I enjoyed using packages like dplyr
to seamlessly chain data cleaning and modeling steps. In R, everything operates as a function call—even for loops function under the hood as calls.
for (i in 1:10) {
print(i)
}
# is same as
`for`(i, 1:10, print(i))
In this post, I introduce the fundamental concepts of functional programming, using R code as illustrative examples.
#
First-class Functions
First-class functions are treated as ordinary variables of function type. Thus, these functions can be assigned to variables, stored in lists, passed as arguments to other functions, and returned as results from other functions.
# assign sum() to a variable x
x <- sum
x(c(1, 2, 3)) # returns 6
# stores functions in a vector
x <- 1:10
summary_ <- function (x) {
c(mean(x), median(x), sd(x), mad(x), IQR(x))
}
summary_(x)
#
Higher-Order Functions
In functional programming, functions can be passed as arguments to other functions. Functions that accept another function as an input are referred to as higher-order functions. For instance, in the following example, sapply
takes an anonymous function as its second argument and squares each element of the input vector.
sapply(c(1, 2, 3), function(x) x^2)
# [1] 1 4 9
#
Pure Functions
Pure functions have no side effects, meaning they always produce the same output for the same input without modifying the global environment, making their behavior predictable. In this context, most R objects exhibit copy-on-modify semantics, so modifying a function argument does not change its original values. This mechanism is referred to as pass-by-value or call-by-value.
# example 1
f <- function(x) {
x$a <- x$a + 1
return(x)
}
x <- list(a = 1)
f(x) # pass x as an argument
x # x is not changed
# example 2
dplyr::arrange(diamonds, carat)
dimonds # this operation doesn't arrange the origin diamonds dataset
#
Lazy Evaluation
Lazy evaluation is an evaluation strategy that postpones the evaluation of an expression until its value is required1.
# In R, the arguments of a function are evaluated when being used.
func <- function(x, y) {
return(x)
}
func(x = 1) # it works since y is not used
#
Closures
Closures are functions defined within another function. They are called closures because they encapsulate the environment of the parent function, allowing them to access variables from that environment.
In the code below2, the inner function function(x)
is a closure that retains the exponent
variable from its parent function’s execution context, thereby altering its behavior accordingly. Closures are particularly useful for creating function factories, which generate new functions based on different arguments.
power <- function(exponent) {
function(x) {
return(x ^ exponent)
}
}
square <- power(2)
square(3)
# [1] 9
cube <- power(3)
cube(3)
# [1] 27
#
Function Composition
Given the aforementioned properties, we can chain multiple functions together to create a “pipeline” function, which exemplifies function composition. In the example below, several functions are linked using the pipe operator %>%
to construct a linear model.
# Build a linear model by chaining functions
model <- mtcars %>%
select(mpg, cyl, disp, hp, wt, qsec) %>% # Select relevant features
mutate(cyl = as.factor(cyl)) %>% # Convert cyl to factor type
lm(mpg ~ ., data = .) # Build a linear model
I was fascinated by this approach when I first encountered it. In a project where I had to convert a SAS program to R, I encapsulated the business logic into functions and chained them together, achieving an almost one-to-one mapping between processing functions and workflow steps.