Verification by Types

Why Should Spreadsheets Be Verified by Tools?

Spreadsheet correctness should be taken seriously, and with the help of automatic tools. There are several reasons:

  • Spreadsheets are error-prone
  • Spreadsheets are carriers of important business information, data and logic
  • Spreadsheets can be big and complex, or made by others; they are hard to review thoroughly.

Verification by Types

Type checking is a classic, popular, and effective method to verify computer programs. It is even systematically applied to programs written in certain languages such as OCaml, C#, etc., which are thus called "strongly typed" languages. However, almost all the spreadsheet languages are "weakly typed". For example, a comparison of a string and a numeric value is permitted (returns True or False), and does not lead to any error. Even though some meaningless operations show errors such as #VALUE, they are not blocking and can be easily disregarded by users. Therefore, our approach aims at applying type checking to spreadsheet programs, to improve their quality.

Types

In our current type system, we class cell values by the following types:

TypeDescription
Emptyan empty cell
Doublee.g., 1,2, 120%, 1,25236E+10
Integere.g., 0, -1, 100
Booleanthe type of TRUE and FALSE
Stringe.g., abc, "abc","123", (a space), "" (two double primes)
Datee.g., 11/08/2019, 14 mars 2012, 14/3/12 1:30 PM
Currencye.g., $1 234,10, -€1 234,10
Anya global type to represent any value and type

Dangerous Typing Rules

We implement a set of dangerous typing rules in Spreadsheet Verificator. The tool verifies all the formulas of a spreadsheet, raises a typing error, if it discovers a dangerous typing rule may be matched during the calculation of the spreadsheet. In the following table, we list a part of our dangerous typing rules. For instance, Boolean + | - Currency | Date means it is dangerous to add or minus a Boolean value to a Currency or a Date. Note that some typing rules may be considered dangerous by some users, but safe by others. It is not possible to make a universally absolute and exhaustive set of dangerous typing rules. If you think certain typing rules are missing in the tool, please contact us.

Rules - Elementary arithmetic
-(Boolean | Date)
+Boolean
Currency + | - | * | / Date
Date + | - | * | / Currency
Currency | Date + | - Boolean
Boolean + | - Currency | Date
Boolean | String + | - | * | / Any
Any + | - | * | / Boolean | String
Date * | / Any
Any * | / Date
Rules - Relational operation
Date < | > | <= | >= | == | <> String | Integer | Double | Boolean | Currency
Currency < | > | <= | >= | == | <> String | Boolean | Date
String < | > | <= | >= | == | <> Boolean | Integer | Double | Date | Currency
Boolean < | > | <= | >= | == | <> String | Integer | Double | Date | Currency
Integer | Double < | > | <= | >= | == | <> String | Boolean | Date
Rules - Arithmetic
SIN | COS | ATAN | ASIN (Boolean | Date | String)
LN | LOG1 | LOG10 | SQRT | EXP (Boolean | Date)
POWER | LOG2 (Any, Boolean | Date)
POWER | LOG2 (Boolean | Date, Any)
Rules - MIN, MAX
MIN | MAX (Boolean | String)
MIN | MAX (..., Boolean | String, ...)
MIN | MAX (..., Currency | Double | Integer, ..., Date, ...)
MIN | MAX ({...; Boolean | String; ...})
MIN | MAX ({...; Currency | Double | Integer; ...; Date; ...})
Rules - SUM, AVERAGE, MEDIAN
SUM | AVERAGE | MEDIAN (Boolean | Date | String)
SUM | AVERAGE | MEDIAN ({...; Boolean | Date; ...})
SUM | AVERAGE | MEDIAN (..., Boolean | Date | String, ...)

Advantages

There are other existing tools and methods to verify spreadsheets. Our Spreadsheet Verificator has several advantages:

  • Spreadsheet Verificator is fully automatic; it does not require users to provide supplementary information to be launched.
  • It is sound. That means we guarantee that none of the dangerous typing rules could ever be matched in any calculation of the spreadsheet. In other words, if the spreadsheet is validated by the tool, the validation applies to all the possible input cell values, as long as their type does not change. This is particularly better and more exhaustive than a manually testing approach, which cannot cover all the possible input instances.

False Alarms

As the cell property that we reason on (i.e., types) is more abstract and less precise than concrete cell values, verification tools like ours involves a notion of precision. In case of being not precise enough, there may be false alarms, which means what the tool raises is not a real error. For instance, the formula =IF(ABS(A1)>=0, 3, false)+5 always evaluates to 8. Whereas, an analysis solely based on types translates the formula to =IF(ABS(A1)>=Integer, Integer, Boolean)+Integer. As it cannot evaluate the value of ABS(A1)>=Integer, it considers the typing error Boolean+Integer is possible. Therefore, one direction to improve verification tools is to improve their precision and make less false alarms.