Fundamentals of Spreadsheet Languages

October 30, 2019
Tie Cheng

Tie Cheng

CEO @ Matrix Lead

To many people, it may sound like an exaggeration to say that Microsoft Excel is a programming language. One reason is that Excel has so many functionalities that one can just use it like a software to accomplish tasks. Another reason, we believe, is that not many people have learned functional programming languages, to which spreadsheets can relate.

But when you say "Excel is a functional programming language" to a computer scientist, they will tell you that it lacks important features like higher-order functions, which are supported by normal functional programming languages. But how did spreadsheet programs become so popular and even powerful in many situations, regardless of these drawbacks?

In this article, we examine spreadsheets by various concepts of programming languages, and explain how spreadsheet languages can be seen as a functional programming language. By comparing with other languages, we break down what language features spreadsheets have, what they lack, and how they catch up on this. Additionally, we use examples to illustrate particular programming patterns that spreadsheets are good at.

1 Spreadsheet languages are a functional programming language

1.1 Multiple paradigms

We take Microsoft Excel & VBA (Visual Basic for Applications) as an example of spreadsheets. It can be seen from several different perspectives:

  • People may just store data in a spreadsheet, then it is just a small database.
  • People may just store texts in a spreadsheet, then it is like a text document.
  • We can use menus and buttons to manipulate data in spreadsheets; we can manually enter, modify, copy-paste data. In this case spreadsheets are a software.
  • VBA, which is behind Excel to extent its functionalities, is mainly an imperative and object-oriented programming language.

All these features contribute to the popularity of spreadsheets. But, in this article we want to accentuate the essential of spreadsheets, which makes spreadsheets special, and distinguishes it from a database/document/software/traditional imperative languages.

1.2 Functions and formulas as elementary components

1.2.1 Elementary components of spreadsheets

Spreadsheet functions and formulas are particularly fascinating to many people. They find functions and formulas powerful, pure and elegant. They enjoy building formulas with functions and building spreadsheets with formulas, which is like playing a Lego game with primitive blocks (take myself as an example, when I started to use Excel, I was obsessed by constructing long array formulas and wondered how powerful their combination could be).

Today, we can step back and look at functions and formulas from a programming perspective:

  • Functions and formulas are elementary components that allow users to automate repetitive tasks and calculations (in comparison with manual triggering of actions by menus and buttons)
  • Functions constitute a formula, chains of formulas constitute a spreadsheet program.
  • An execution of a spreadsheet program consists in evaluating the formulas in the chains one by one.

1.2.2 Facilities of programming in formulas and functions

Spreadsheet software have many facilities to help building formulas with functions (thus make it easy to write spreadsheet programs). Because of the interactive and visual feature of spreadsheets, these mechanisms may not exist in other programming languages and environments. We can list some:

  • Copy-pasting a formula to a range of cells is very handy in spreadsheets. Parameter references will automatically change and adjust according to the cell position. This copy-pasting practice is certainly not common or encouraged in other programming languages.

  • When we determine parameters of a formula, we can see values of spreadsheet cells (thus to say if they are useful or not), manually navigate through available cells, and select them as parameters.

  • Instant evaluation: in most of programming languages, programming environment (e.g., inside an editor) is separate from the execution environment (e.g., command lines of a terminal). By contrast, in spreadsheets, the programming environment and the evaluation environment is the same, i.e., the spreadsheet. Additionally, the evaluation of a spreadsheet can be automatic and is often fast. All of this gives an instant experience of the evaluation of a spreadsheet.

  • The interface of a spreadsheet allows users to see many cells at a glance. A formula is hidden behind the value of each cell; regardless of the complexity of the formula, it occupies only the space of one cell (though users can select a cell and see its formula). As a result, spreadsheets can hold and arrange many many formulas without disturbing users. If we consider each cell of a spreadsheet as a variable of a program, spreadsheets easily display many variables in a screen, which is not achievable by many other languages.

1.3 Concepts of functional programming languages reflected in spreadsheets

Then, we can see how classic concepts and advantages of functional programming are reflected and respected in spreadsheets:

  • Clear, consistent and auto-adjusted dependency: in a cell formula, the input parameters (i.e., cells and values) are very clear; the output is the cell itself. Moreover, spreadsheets try to protect well dependencies and keep them consistent. For instance, when users change the spreadsheet structure by inserting/deleting rows/columns or copy-pasting formulas, parameter references in formulas will be automatically adjusted such that cells are still correctly referenced. By contrast, the dependency of spreadsheet cells and VBA procedures is well not "protected": cell references in VBA procedures are not smartly auto-adjusted after a structure change of spreadsheets.

  • No side effects: the formula evaluation of a cell will only have impact to the cell itself and the cells depending on it. By contrast, we can code anything in a procedure in VBA, which may result in side effects such as modifying unrelated cells.

  • Immutability: once all the cell formulas are set, the evaluation of the spreadsheet will not be able to change any cell formula.

1.4 Other characteristics

One characteristic of spreadsheets is that the data structure they work on is especially two-dimensional arrays. The layout is organized in two dimensions such that data is well displayed; Many built-in functions are available to take several arrays as parameter. All of this permits of a handy operation and manipulation on this particular data structure.

Another characteristic of spreadsheets is that input data, intermediate data, output data are all mixed together in sheets. This makes it easy to re-use data, but it may result in messy spreadsheets if the data is not very organized.

2 Spreadsheet mechanisms to catch up on function defining

One big criticism of spreadsheet languages is the lack of the ability to define re-usable functions. Whereas, in most of programming languages, we can define a function and repeatedly call the function by its name. Here is a very basic example in Python:

def area(a, b): return a * b
print(area(1, 2)) # output: 2
print(area(3, 4)) # output: 12
print(area(5, 6)) # output: 30

Then, what are the mechanisms in spreadsheets to achieve the similar goal?

2.1 Making and holding many similar calculation bodies (i.e., spreadsheet formulas) is easy in spreadsheets, e.g., by copy-paste

Although spreadsheets do not permit of defining a re-usable function, we can enumerate the application of the same function to different input values. To do so, we can "brutally" repeat a similar calculation body (i.e., a spreadsheet formula) with slightly different arguments in many cells. Surprisingly, it is not inconvenient, for the following two reasons:

  • Making similar spreadsheet formulas with slightly different arguments is very easy by copy-paste. We can either use keyboard shortcuts like Ctrl+C and Ctrl+V, or by mouse drag and drop; spreadsheets will smartly adjust argument references while keeping the same formula structure.

  • As stated above, spreadsheets can hold and arrange many many formulas without disturbing users; Many formulas having a same structure stored in a range do not bother users.

As an example, we can quickly make the following spreadsheet to calculate the area of many rectangles, though no function like the previous area in Python is defined.

Enable Editing

2.2 Re-using existing cell values is very convenient in spreadsheets

Although users cannot define a function to re-use in spreadsheets, users can well re-use the result of a piece of code (i.e., the value of the ending cell of a formula chain). Because spreadsheets display well many cells at a glance and permit of a smooth navigation among cells, users can easily identify useful cell values and to write formulas referring to them. It is such handy in spreadsheets that users might over-use it and create complex relationship among spreadsheet cells.

2.3 Instant evaluation of a spreadsheet is helpful

In some cases, the instant evaluation may remedy the lack of the ability of defining functions. Sometimes the purpose of re-using a function is to compare outputs of different inputs: we give a set of inputs to a function and get a set of outputs. Spreadsheet users may have another way to undertake this experimentation: rather than putting side by side outputs from different inputs, they try different input values and can quickly see the output thanks to the seamless experience of the evaluation.

{todo: add references of Simon Python Jones and Sestoft, with a special format}

3 Lack of higher-order functions

When you tell a computer scientist that "spreadsheet is a functional programming language", their first reaction would be "does spreadsheet support higher-order functions?". Well, the answer is "No".

A higher-order function is a function that accepts other functions as parameters and/or use a function as the return value. A part of programming languages such as Pascal and C don't support higher-order functions; a part of programming languages (especially functional programming languages) such as OCaml have higher-order functions. Python permits of higher-order functions, here is an example:

def square(x): return x * x
def cube(x): return x * x * x
def strange(f, x): return f(x) + f(x * 2)
print(strange(square, 1)) # output: 5 (which is 1*1+2*2)
print(strange(cube, 1)) # output: 9 (which is 1*1*1+2*2*2)

By contrast, in spreadsheets, we cannot define such a powerful strange; we have to respectively define one column to enumerate strange(square, ...) and another column to enumerate strange(cube, ...).

Enable Editing

4 Favorite patterns of spreadsheets

Regardless of lack of mechanisms such as function defining and higher-order functions, spreadsheets are good at some particular programming patterns, especially over arrays. That's why it is very handy to achieve many tasks (which need to be undertaken by higher-order functions in other languages). We list some of these patterns in this section.

4.1 Mapping over arrays

One strength of spreadsheet programming is to iterate over arrays, especially to simultaneously map over several arrays. It is equivalent to map, map2, etc. in other languages. For example, formulas in Column D simultaneously map over 3 arrays in the following spreadsheets:

Enable Editing

Mapping over several arrays may not be primitive in some programming languages, we can implement it by ourselves. For example, we implement map3 as follows in OCaml:

# let map3 f a0 a1 a2 =
let l = Array.length a0 in
let e = f (Array.get a0 0) (Array.get a1 0) (Array.get a2 0) in
let result = Array.make l e in
for i = 1 to l - 1 do
let e = f (Array.get a0 i) (Array.get a1 i) (Array.get a2 i) in
Array.set result i e;
done;
result;;
val map3 :
('a -> 'b -> 'c -> 'd) -> 'a array -> 'b array -> 'c array -> 'd array =
<fun>
# let a0 = [|0; 0; 0; 0; 0|]
and a1 = [|1; 2; 3; 4; 5|]
and a2 = [|2; 3; 4; 5; 6|]
and f e0 e1 e2 = e0 + e1 + e2;;
# map3 f a0 a1 a2;;
- : int array = [|3; 5; 7; 9; 11|]

In Python, the map() function is quite flexible, and can task several lists as parameters.

l0 = [0,0,0,0,0,0]
l1 = [1,2,3,4,5,6]
l2 = [2,3,4,5,6,7]
result = list(map(lambda x, y, z: x+y+z, l0, l1, l2))
print(result0) # output: [3, 5, 7, 9, 11, 13]

4.2 Fold over arrays

Fold (also termed reduce) refers to a function that takes a combining operation, recursively applies to a data structure, and builds a return value. A strength of spreadsheets is to fold over several arrays. Additionally, it shows a final return value, as well as all the intermediate values. For example, calculating bank statements with credits and debits turns out to be a fold operation:

Enable Editing

We can implement this fold over 2 lists in OCaml as follows:

# let myFold2_left f init a0 a1 =
let e = f init (Array.get a0 0) (Array.get a1 0) in
let result = Array.make (Array.length a0) e in
for i = 1 to (Array.length a0) - 1 do
let e = f (Array.get result (i-1)) (Array.get a0 i) (Array.get a1 i) in
Array.set result i e;
done;
result;;
val myFold2_left :
('a -> 'b -> 'c -> 'a) -> 'a -> 'b array -> 'c array -> 'a array = <fun>
# let init = 100
and a0 = [|6;6;6;6;6|]
and a1 = [|1;1;1;1;1|]
and f x e0 e1 = x + e0 - e1;;
# myFold2_left f init a0 a1;;
- : int array = [|105; 110; 115; 120; 125|]

In Python, we can combine reduce() and zip() to achieve this:

from functools import reduce
init = 100
a0 = [6,6,6,6,6]
a1 = [1,1,1,1,1]
def f(l, cur):
r = l[-1] + cur[0] - cur[1]
l.append(r)
return l
print(reduce(f, zip(a0, a1), [init])) # output: [100, 105, 110, 115, 120, 125]

4.3 Other array operations like Filter, Sort and Unique

In previous versions of Excel, it was complicated to undertake array operations such as filter, sort and unique; one way was to use primitive functions such as INDEX, MATCH and COUNTIF to construct Ctrl+Shift+Enter (CSE) array formulas. As a consequence, array formulas are long, slow, and hard to understand. An exciting news is that, Microsoft Excel gradually introduces a set of new functions called dynamic array functions including FILTER, SORT and UNIQUE. These operations (and their combination) over multiple arrays then become very easy in Excel. The follows is an example where we find unique elements from a list and sort them. We can either achieve this by chaining two steps and two columns, or by one combined formula with one column:

Enable Editing

In Python, there are the sort() and sorted() functions to sort a list; there is no built-in function to achieve "unique", though there are many ways to construct it.

list = [3, 2, 1, 1, 2, 3, 5, 7, 8, 6]
unique = [x for i, x in enumerate(list) if i == list.index(x)]
print(unique) # output: [3, 2, 1, 5, 7, 8, 6]
print(sorted(unique)) # output: [1, 2, 3, 5, 6, 7, 8]

5 Conclusion

We show that core concepts of functional programming languages can be found in spreadsheet languages. Moreover, functional paradigm (e.g., consistent and auto-adjusted dependency) is well incorporated and protected by spreadsheet software like Microsoft Excel; respecting functional paradigm during spreadsheet development has many advantages including reducing possible spreadsheet errors. Therefore, we categorize spreadsheet languages as a functional programming language.

As illustrated by this article, the approach of seeing Excel as a programming language and spreadsheet files as programs is fundamental. Now, we could explain behaviors of Excel in vocabularies of computer science. Going further, we could improve the use and development of spreadsheets by inspiring from existing concepts, works and tools in other programming languages. This perspective and approach is the inspiration and foundation of our 10 Studio toolset.

Fundamentals of Spreadsheet Languages

October 29, 2019
Tie Cheng

Tie Cheng

CEO @ Matrix Lead

Welcome to this blog. This blog is created with Docusaurus 2 alpha.

def square(x): return x * x
def cube(x): return x * x * x
def summation(f, n): # summation accepts a function as parameter
total = 0
k = 1
while k <= n:
total= total + f(k)
k = k + 1
return total
print(summation(square, 3)) # output: 14 (which is 1*1+2*2+3*3)
print(summation(cube, 3)) # output: 36 (which is 1*1*1+2*2*2+3*3*3)

上面这个例子,在Excel里单是实现summation的循环就已经不可能了。

Hello!

Lorem Ipsum
hahahahahahaha
hahahaha

lailailailailai

Lorem Ipsum

lailailailailai

content

Tip

try

Function

Let's look at a very basic pattern in Excel:

Excel sample: 一个B1=A1*A1, B2= A2*A2, B3= A3*A3, etc. square function

To achieve the same thing, we use List.map function in OCaml as follows:

# List.map;;
- : ('a -> 'b) -> 'a list -> 'b list = <fun>
# let square x = x * x;;
val square : int -> int = <fun>
# List.map square [1; 2; 3];;
- : int list = [1; 4; 9]

We could notice that, this basic pattern of formulas is not less good than the OCaml one. Alough the caluclation body is repeated several times, it is very hands-on to achieve it by copy-paste.

Excel does not provide a way of possibility to give a name to that calculation body.

The calculation body is repeated in Excel, as many times as necessay. It is not tiridous to do so because of copy-paste Visually, it is not annoying, because it is embedded in a cell. As a result, what is shown is only the input and output, which is all we want to see.

Regarding the last point, it is what people critic most.

List.map (fun x -> Float.pi . (float_of_int x) . (float_of_int x)) [1; 2; 3];;

circlearea

Higher-order functions

f(f(x))

# let ho f x = f(x) + f(x+1);;
val twice : ('a -> 'a) -> 'a -> 'a = <fun>
# twice double 3;;
- : int = 15
# twice square 5;;
- : int = 61
int double (x) { return x * 2 }
int twice_double (x) {
return double (double (x + 1) + 1) + 1
}

不好的话,讲出来、show出来哪里不好也是贡献

Excel sample: 一个B1=A1*2, C1= A1*2+(A1+1)*2, B4=A2*A2, C4=A2*A2+(A2+1)*(A2+1) etc.

In Excel, 当然你可以把长的公式分成几步做,但是本质上,像C一样,还是得重写function,而不能用higher-order这种方便的定义方法。而且C里body的写法要更简单,因为用到了double,而Excel中只能用一些现有的值。

reuse function body V.S. reuse intermediate data/results

1种办法是完全expand formula,另一种办法是reuse intermediate data/results

我怎么感觉是用data呢?

map, map * 2 to a list, map 平方 to a list

List.sort is a function whose argument has function.

In functional programming, a function is defined as a "first order citizen". This means it includes all the properties generally available to any other element, such as the possibility of being affected to a name, returned as a result or passed as a parameter.

# let double x = 2 * x
and square x = x * x;;
val double : int -> int = <fun>
val square : int -> int = <fun>
# List.map;;
- : ('a -> 'b) -> 'a list -> 'b list = <fun>
# List.map double [1; 2; 3];;
- : int list = [2; 4; 6]
# List.map square [1; 2; 3];;
- : int list = [1; 4; 9]

也许要说明一下Excel做list function很natural,做map很natural

# let greater x y =
if (x > y) then 1
else if (x < y) then -1
else 0
and lesser x y =
if (x > y) then -1
else if (x < y) then 1
else 0;;
val greater : 'a -> 'a -> int = <fun>
val lesser : 'a -> 'a -> int = <fun>
# List.sort;;
- : ('a -> 'a -> int) -> 'a list -> 'a list = <fun>
# List.sort greater [1; 2; 3; 4; 5];;
- : int list = [1; 2; 3; 4; 5]
# List.sort lesser [1; 2; 3; 4; 5];;
- : int list = [5; 4; 3; 2; 1]

上面这个在Excel中用Sort就可以实现了。 不用c语言,用python做比较

others

Python有map3, map4。。。但比Excel缺点是不够visiable,不够平铺(你想挑什么就挑什么)flat, expose to you:

Polymorphism

Polymorphism, or parametric polymorphism, means that one function can operate on several types. For exemple, the bigger function below can operate over 2 integers or over 2 strings.

# let bigger x y = x > y;;
val bigger : 'a -> 'a -> bool = <fun>
# bigger 1 2;;
- : bool = false
# bigger "b" "a";;
- : bool = true

Because of the weak-type system of spreadsheets, applying one code to different types of data is never a problem.

anonym function, reduce functions are data, functions as input, functions as result/ return a function, store function as data

Tie's blog

https://www.microsoft.com/en-us/research/wp-content/uploads/2016/07/excel-1.pdf

How come is Excel a functional programming language?

define a function

https://en.wikipedia.org/wiki/Currying currying不太相关

https://ocaml.org/learn/taste.html function composition: let (>>) f g x = g ( f x) 也不太相关,因为没有function的定义,所以把2个function名字弄一起也莫名其妙。如果说excel中的composition,一种是在1个formula中做很多事,不要intermediate results了,一种是把intermediate results列出来。更像是data的composition.

closure, function closure 没什么关系

Why do we love spreadsheets?

October 10, 2019
Tie Cheng

Tie Cheng

CEO @ Matrix Lead

(to be released)

Why are functions and formulas beloved?

October 9, 2019
Tie Cheng

Tie Cheng

CEO @ Matrix Lead

(to be released)

How will dynamic arrays change spreadsheet modelling?

October 5, 2019
Tie Cheng

Tie Cheng

CEO @ Matrix Lead

(to be released)

Excel's skill maps

October 3, 2019
Tie Cheng

Tie Cheng

CEO @ Matrix Lead

(to be released)

A little bit of history of spreadsheets

October 2, 2019
Tie Cheng

Tie Cheng

CEO @ Matrix Lead

(to be released)

Other topics

October 1, 2019
Tie Cheng

Tie Cheng

CEO @ Matrix Lead

(to be released)