Jun 1, 2018

On the benefits and consequences of Referential Transparency

One foundation of Functional Programming is Referential Transparency (RT). Purely functional languages (like Haskell), and purely functional libraries for non purely functional languages (like scalaz or cats for Scala) aim at building Referential Transparent programs. The benefit of this constraint might not be immediately visible, so in this post I want to expose my thoughts on this, and see what Referential Transparency can buy us.

Reasoning and modeling

The benefit of RT is that it allows us to focus on something we can handle in our head, draw some conclusions, and drag this conclusion in a different context where we focus again. In this different context we don’t need to think about the whole process that allowed us to reach our conclusion, as we can just take this conclusion as a given and proceed further with our reasoning.

For example, when we study calculus we define what a limit of a function $f(x)$ in a point $x_0$ is. Then we define the derivative of a function in $x_0$ as $f$:

$$ \frac{d f}{dx}(x_0) = \lim_{x \rightarrow x_0} \frac{f(x)- f(x_0)}{x - x_0} $$

In physics this brings to the definition of velocity of a point moving in a line under a motion $x(t)$ as

$$ v(t) = \frac{d x}{d t} $$

We can then calculate the kinetic energy of an object of mass $m$ as

$$ E_k = \frac{1}{2}mv^2 $$

Now, when we want to calculate the kinetic energy of a ball of a given mass at a given speed, what we do is just replacing known values in place of the variables of our definition, and we are done. We don’t have to think, to calculate the energy, how the velocity has been calculated, in principle we don’t even need to be aware of the notion of limit. A person that knows how to add and multiply can easily calculate the kinetic energy of a baseball ball at a given speed without any knowledge of calculus.

On the other hand, we could unfold $v$ to its definition and still we would get to a correct result:

$$ E_k = \frac{1}{2}m\Big(\frac{d x}{d t}\Big )^2 $$

Being able to replace definitions into variables and viceversa, allows us to simplify our reasoning and limit or broaden the scope of deductions.

Referential Transparency

Referential transparency is exactly what we just exposed. A definition is RT if we can exchange reference for definition without mutating the effect of my reasoning.

When we write a program, we describe a process that performs some logical steps. The approach of functional programming is to model the program in terms of mathematical functions and expressions, so that we can apply reductions or expansions in our reasoning, making our implementation more robust from the logical point of view. Let’s see how we can apply this in Scala.

Pure functions

A function that adheres to the RT rule f: A => B takes a value a: A, produces a b: B, and does nothing else.

This brings along a few advantages:

We can reason locally about our code. When implementing the function, I can just think about how to transform a: A into a b: B. When using the result of the computation of f into another piece of code, I can just focus on the value B without having to think about the context that b might carry along, because there is no context.
We can prove behaviors, and establish explicit laws that our behaviors must adhere to
Testing becomes easier as there is no need to setup any context

Breaking RT: mutable state

In Maths when we write $x = 1$ it means that the value assigned to $x$ is $1$ and it will not change. Also, for any function $f(x)$, for any given input we will have always the same output, on multiple computations.

Scala is a hybrid language, in that it allows to do FP, but we can mix in non-functional code, so if e.g. we want to model an Account object we could do

scala> case class Account(var balance: BigDecimal = 0) {
     |   def debit(value: BigDecimal): Account = {
     |     this.balance = this.balance - value
     |     this
     |   }
     |   
     |   def credit(value: BigDecimal): Account = {
     |       this.balance = this.balance + value
     |       this
     |     }
     | }
defined class Account

scala> val openAccount = Account(1000)
openAccount: Account = Account(1000)

scala> openAccount.debit(10)
res0: Account = Account(990)

scala> openAccount.debit(10)
res1: Account = Account(980)

and we can see that invoking our function debit with the same input, we have 2 different outputs. The reason is that the computation carries along a context change due to the var we used in our case class.

This approach has a few drawbacks:

It is more difficult to reason about, due to the context mutation
It can yield errors in a concurrent situation, where multiple threads can call debit

A pure functional version of this doesn’t mutate anything in place, but rather produces a new object for every computation. The object returned by our function is structurally identical every time we compute it. In principle our computation could cache these values and return exactly the same instance for repeated invocations.

scala> case class Account(balance: BigDecimal = 0)
defined class Account

scala> def debit(account: Account, amt: BigDecimal): Account = account.copy(balance = account.balance - amt)
debit: (account: Account, amt: BigDecimal)Account

scala> val myAccount = Account(1000)
myAccount: Account = Account(1000)

scala> debit(myAccount, 100)
res2: Account = Account(900)

scala> debit(myAccount, 100)
res3: Account = Account(900)

Breaking RT: Exceptions

When we define a function f: A => B, we expect this function to return an output b: B for every input a: A. If in the implementation of f we introduce a throw or an assert, or rely on null, we end up with our computation breaking the contract we established when we defined f. The behavior of code throwing exceptions is not Referentially Transparent, as you can see from the following example

scala> case class Account(name: String, balance: Int) {
     |   require(name.length < 20, "Account name cannot contain more than 20 chars")
     | }
defined class Account

scala> Account("Very very very personal", 100)
java.lang.IllegalArgumentException: requirement failed: Account name cannot contain more than 20 chars
  at scala.Predef$.require(Predef.scala:293)
  ... 45 elided

The Account creation can throw an exception. Now let’s consider this implementation that relies on account creation

scala> def wealthyAccountFromDb(nameColumn: String, balanceColumn: Int): Option[Account] = {
     |   val acct = Account(nameColumn, balanceColumn)
     |   try {
     |     if (balanceColumn > 1000) Some(acct) else None 
     |   } catch {
     |     case NonFatal(_) => None
     |   }
     | }
wealthyAccountFromDb: (nameColumn: String, balanceColumn: Int)Option[Account]

scala> wealthyAccountFromDb("Very very very personal", 1500)
java.lang.IllegalArgumentException: requirement failed: Account name cannot contain more than 20 chars
  at scala.Predef$.require(Predef.scala:293)
  ... 46 elided

and let’s compare it with the substitution of the definition of acct in the place where acct is used

scala> def wealthyAccountFromDb(nameColumn: String, balanceColumn: Int): Option[Account] = {
     |   try {
     |     if (balanceColumn > 1000) Some(Account(nameColumn, balanceColumn)) else None 
     |   } catch {
     |     case NonFatal(_) => None
     |   }
     | }
wealthyAccountFromDb: (nameColumn: String, balanceColumn: Int)Option[Account]

scala> wealthyAccountFromDb("Very very very personal", 1500)
res6: Option[Account] = None

Does this yield the same result? No, so this code is not Referentially Transparent.

Throwing exceptions has also a few specific drawbacks:

It breaks totality of f: A => B, as exceptions could be thrown for some a in our domain
It forces us to think, every time we invoke a function, that it might throw an exception, so it makes our style very defensive and repetitive

The alternative is to convey the fact that some code can fail, explicitly in the output type. Use an Option[B] or Either[E, B] to express the fact that computation can fail. Rather that require we could rather use a smart constructor. If the type is simply B it means that the function is total and guarantees to work and provide a b :B.

Breaking RT: Scala Future

Scala scala.concurrent.Future provides the way to handle asynchronous execution. In order for Future to run the basic operations, we need an ExecutionContext available (typically it is provided implicitly). When a Future is created, it does not only create a value, but it runs immediately on the provided ExecutionContext.

import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.Future

scala> val x = Future({println("Future is running!"); "result"})
Future is running!
x: scala.concurrent.Future[String] = Future(Success(result))

The fact that Future executes eagerly as soon as it is created, is a violation of RT, as you can see from this simple case

scala> val x = Future({println("running"); "result"})
running
x: scala.concurrent.Future[String] = Future(Success(result))

scala> x flatMap(_ => x)
res7: scala.concurrent.Future[String] = Future(Success(result))

is different than

scala> Future({println("running"); "result"}) flatMap{_ => Future({println("running"); "result"})}
running
running
res8: scala.concurrent.Future[String] = Future(<not completed>)

because one never prints "running" (as it was printed when we created the Future in first place), another one prints it twice.

Alternative libraries such as Monix, cats-effects or scalaz8 provide alternatives to Future that separate declaration from execution, giving us back RT.

Let’s do the same as above with Monix Task. Defining a Task just creates a value that describes how things will be executed, it doesn’t trigger any execution.

scala> import monix.execution.Scheduler.Implicits.global
import monix.execution.Scheduler.Implicits.global

scala> import monix.eval.Task
import monix.eval.Task

scala> val task = Task({println("Task is running!"); "result"})
task: monix.eval.Task[String] = Task.FlatMap$1855446670

Execution must be explicitly invoked at the end of the world

scala> task.runAsync.foreach(println)
Task is running!
result

Now let’s verify that this property guarantees RT

scala> val t1 = task flatMap(_ => task)
t1: monix.eval.Task[String] = Task.FlatMap$1362502256

scala> val t2 = Task({println("Task is running!"); "result"}) flatMap {_ => Task({println("Task is running!"); "result"})}
t2: monix.eval.Task[String] = Task.FlatMap$446650376

scala> t1.runAsync.foreach(println)
Task is running!
Task is running!
result

scala> t2.runAsync.foreach(println)
Task is running!
Task is running!
result

As you can see t1 and t2 produce the same result, so they respect RT.

Laws and Referential Transparency

When we define an abstraction in FP (typically by means of a typeclass), we provide also the laws that this abstraction must abide to.

For example, for every (covariant) Functor it is required that it respects the composition law:

fa.map(f).map(g) <-> fa.map(f andThen g)

It is pretty obvious that this law makes sense only as long as f and g are referentially transparent.

If e.g. fa is a List[A], the map function could be defined considering the recursive nature of the List ADT:

def map[A, B](xs: List[A])(f: A => B): List[B] = xs match {
    case Nil => Nil
    case x :: xs => f(x) :: map(xs)(f)
}

By inspecting this simple definition, it is apparent that the functor law can hold only as long as executions of f on previous elements of the list don’t influence execution on subsequent elements. Clearly shared mutable state breaks this condition.

Let’s prove this with a simple example

scala> var state = 0
state: Int = 0

scala> val add: Int => Int => Int = x => y => {
     |   state = state + 1
     |   x + y + state
     | }
add: Int => (Int => Int) = $$Lambda$15566/1963356770@abe0b99

scala> val xs = List(1,2,3,4)
xs: List[Int] = List(1, 2, 3, 4)

scala> xs.map(add(4)).map(add(5))
res0: List[Int] = List(16, 19, 22, 25)

scala> state = 0
state: Int = 0

scala> val xs = List(1,2,3,4)
xs: List[Int] = List(1, 2, 3, 4)

scala> xs.map(add(4) andThen add(5))
res1: List[Int] = List(13, 18, 23, 28)

We can see that the 2 executions provide different results, even if we reset the state between one execution and the next one.

A simple pure version is instead a law abiding citizes of the functional world

scala> val add: Int => Int => Int = x => y => x + y
add: Int => (Int => Int) = $$Lambda$10651/883633605@50b6ae4e

scala> xs.map(add(4)).map(add(5))
res2: List[Int] = List(10, 11, 12, 13)

scala> xs.map(add(4) andThen add(5))
res3: List[Int] = List(10, 11, 12, 13)

Conclusions

I hope the examples exposed above give a good idea of the benefits of Referential Transparency.

The same concepts we have explored here apply when we develop our business application, when we are dealing with a DB call, or building an object out of a DB row, or performing a call to an API.

What we want to do is to apply one of the principles of Functional programming, i.e. being pure, rigorous and explicit about expectations and guarantees. This allows us to think in the little to compose in the big context, without juggling with too many concepts in our head. Remember that Constraints Liberate, Liberties Constrain.

RT plays a fundamental role in this and by using it we make our code easier to reason about, improving its maintainability and becoming ultimately more productive as developers.