No doubt you have read many articles decrying the use of spreadsheets due to the myriad of disasters that have resulted from their use.
We have a long history of ranting about their weaknesses when it comes to complex reporting tasks like generating a set of financial statements/Comprehensive Annual Financial Report. We have even written articles about how to avoid these mistakes, and about how to determine (calculate) if a spreadsheet is the right tool for a given task. What we have never done is present the technical reasons that spreadsheets are ill-suited for the job and why database-driven applications are far superior. That is what we will do in this article.
Before we dive in, we should state a couple of items (before the hate mail comes pouring in):
- There are many spreadsheet applications & database-driven tools. Clearly, we cannot discuss each one and their differences in detail. Instead, we will speak generally about typical, common features found in products like Microsoft Excel/Google Sheets and Microsoft SQL Server.
- Spreadsheet applications have gotten increasingly sophisticated, have remained inexpensive and easy to use, which has naturally lead to their use for more complex tasks.
- If left to choose between using a spreadsheet or paper & pen, we would recommend the spreadsheet.
- Some of the problems people have with spreadsheets occur because they fail to follow best practices. Thus, some of the highly publicized disasters reflect a skill/experience problem, not a software problem.
- We are active proponents that just because you CAN use a tool for a task, does not mean that it is the BEST tool for the job.
Similarities
Spreadsheets provide a rapid, powerful, generic way to manipulate numbers in any form or fashion your heart desires. The intersection of columns and rows allows you to enter data or formulas to generate a wide range of results.
A database-driven application extends the concept of a spreadsheet. A very rough comparison is that each “tab” in a spreadsheet workbook is equivalent to a database table, with each table consisting of rows and columns. A database is a collection of these tables and other metadata, just like a spreadsheet is a compilation of sheets. To be clear, we are using the term database here as it is the common expression.
Differences
Spreadsheet |
Database Driven Application |
|
Data (Transactions) |
User(s) performs data entry on the same spreadsheet that houses all the data. |
Two major approaches, both of which prevent the user from directly affecting previously recorded data:
|
Calculation(s) |
User(s) adds calculations/formulas to Cells that reside with the raw data. |
Calculations are performed as needed, either on a Report or Form. |
Formatting |
Like calculations, formatting is usually applied by the user to a selection of Cells. |
Formatting is applied to the Report that pulls data from one or more Tables. |
Validation |
When used (infrequently in our experience) user(s) adds to a selection of Cells. |
Applied centrally to all actions involving data (importing, deleting, adding new records, etc.). |
Data Integrity |
No certainty that data added was added correctly & completely. |
Generally speaking, considerable technology is dedicated to ensuring integrity. For more details see this article on ACID. |
Scalability |
Has improved over time but is not designed to handle massive volumes of data. |
Depending on the particular database technology selected, capable of handling millions of records. |
Audit/Logging |
Secure logging of essentially any & every item determined valuable is possible. |
|
Security |
Can be set at the Cell level but roles & groups are not supported. |
Extensive ability to assign roles & groups and specify permissions at any level. |
Spreadsheet weaknesses
The argument for continuing to use spreadsheets is obvious - they are inexpensive and easy to use at a basic level.
The problem with them is considerable however and can be seen in a common theme running through the comparison above.
The major technological problem with the spreadsheet is the collapse of many elements into one sheet or even down into one cell. In a spreadsheet, the "data table" IS the report. Entering new "transactions" means being able to edit existing rows. This makes the large, complex spreadsheet very fragile. Make a mistake entering a formula and everything breaks. It is like a tower of Jenga blocks waiting to fall.
Some of you may be typing a furious email to me right now saying "You can solve these issues in a spreadsheet. Just design it like a database and use all of its features, like macros." It is true, some of these issues can be handled IF you design your spreadsheet correctly and have extensive IT and programming skills; essentially, if you build an application inside your spreadsheet.
Thus the argument for continuing use of spreadsheets for complex tasks is not compelling for 3 reasons:
- Untrained operators - It is great to say "develop the spreadsheet properly" but consider the typical spreadsheet developer. They are often accountants/finance professionals. Because anyone can use spreadsheets & they are so easy, people with no particular development training build incredibly complex documents. Consequently, these documents rarely conform to the best practices that an experienced developer would insist upon.
In general, errors seem to occur in a few percent of all [spreadsheet] cells, ...In programming, we have learned to follow strict development disciplines to eliminate most errors. Surveys of spreadsheet developers indicate that spreadsheet creation, in contrast, is informal, and few organizations have comprehensive policies for spreadsheet development.
Ray Panko - What We Know About Spreadsheet Errors - Too little control - Spreadsheet tools are just not designed to enforce or in some cases even support best practices. One example of this is that many elements are combined/mixed together within each spreadsheet, sometimes even within one cell. Recall, spreadsheets are a generic tool designed to fit as many use-cases as possible. Thus, the user must:
- Know they need a given feature (like validation),
- Have the skills (which might include programming) to build it,
- Spend the time to build it,
- Technical Inability - Even if you have tremendous IT skills some actions are very difficult bordering on impossible within a spreadsheet:
- Managing large data sets - if you need to deal with millions of records, spreadsheets can not help you.
- Logging - It can be very difficult to determine which user added
row 7 in our example. Or if you are dealing with thousands of rows, which rows were added and when? - Documentation - if the person who created the spreadsheet leaves the organization, who else in the organization will know how to use what is essentially a custom application? There is no easy way to direct subsequent users where to start etc. other than some notes that you hope are updated and then followed by subsequent users.
Database weaknesses
All the power, validation and speed of a database driven solution comes at a cost. Depending on the exact database selected, the costs can be significantly higher than a spreadsheet and specialized hardware may be required.
Further installing, configuring and maintaining a database takes specialized knowledge.
Finally, one of the benefits of the database-driven application is one of the challenges- it constrains the user. In contrast to the Jenga metaphor of the spreadsheet, the database application can be thought of like a Rubiks Cube. The user can only do certain things in certain ways. Unlike the spreadsheet, there are rules about how the system can be used.
The Bottom LineWe have previously developed a detailed calculation to determine exactly when to use a spreadsheet but if you want to keep it simple, here is the cheat sheet for choosing between the two tools:
Choose a spreadsheet (and follow best practices in how you build and use it) if the task is:- simple,
- only used by one or two people,
- using a small set of data which
- will only be used once.
- complex,
- used by multiple people,
- standardized,
may house a moderate to large data set and- reoccurs.