Diff for TestDocument
(Commits: 4b17a0,
32161c,
Testing again.
)
My first computer had 256 megabytes of memory. At the time, it was more than enough. Today, the phone in my pocket has four times that, and is using 290 megabytes of it while it is sitting idle, waiting for a phone call, or for me to press a button.
But let's start with 256 megabytes. A megabyte contains 1024 kilobytes. A kilobyte is 1024 bytes. A byte is 8 bits. So, 256 megabytes is 256x1024x1024x8256*1024*1024*8 bits, or 2,147,483,648 bits.
Why bits? Bits are the smallest elements of state a computer knows about, tiny electronic switches that can indicate one of two values. That means that a computer with 256 megabytes of memory, essentially, has upwards of 2.1 million "switches" that can turn on and off, "yes" or "no," 1 or 0.
Hang on. It gets exponentially worse. The "state" of a computer isn't just a single bit, it's every bit that has the potential to affect a result. For example, let's say you had a computer with two buttons (labeled A and B), and a single light bulb as its output. The computer's function is to only light up the bulb when you press both buttons at once, but not in any other case. If you wanted to map the computer's possible states, you might show it something like this:
<table>
<tbody>
<tr>
<td><strong>Pressed A?</strong></td>
<td><strong>Pressed B?</strong></td>
<td><strong>Bulb Lit?</strong></td>
</tr>
<tr>
<td>No</td>
<td>No</td>
<td>No</td>
</tr>
<tr>
<td>Yes</td>
<td>No</td>
<td>No</td>
</tr>
<tr>
<td>No</td>
<td>Yes</td>
<td>No</td>
</tr>
<tr>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
</tr>
</tbody>
</table>
So you have an input (the buttons) and an output (the bulb). And for these inputs, it's easy to see that the bulb is only lit up when you press both buttons.
So, two inputs, with two possible states, can generate four possible result states. I'll save you some time and me a couple of paragraphs and just say that there's a way to calculate possible states based on number of inputs (and their possible values). For inputs with only two possible values, the number of possible result states is 2^N, where N is the number of inputs (2^2 = 4 result states, 2^3 = 8 result states, etc.)
Let's say my first 256 megabyte computer only had a single light bulb as an output (no screen, no speakers, just a light). And I had no idea how to light it up. And let's also say the only way I could figure out when the light will be lit is to randomly change pieces in memory (randomly flipping switches, if you will). How many tests would I have to perform to figure out exactly the conditions that the light would flash? And not just the first success... I would have to figure out all of the combinations of bits that corresponded to either a lit or not lit result (I'm apparently very thorough, in this case).
With exactly the calculation above, with the bits being the "inputs," and the number of results being the number of possible states the bulb could be in, the number would be the result of 2^2147483648.
I don't own or have access to a computer that can compute the value of that number. But that is a large number. And that was 11 years ago.
Granted, this is a purely mathematical exercise. If you extend the possible "inputs" to things like CPU registers, interrupt signals, peripheral device states, video output memory, etc., it gets even worse. So how is it we can ever be confident that when we try to get a computer to do something that the one result it returns is the correct one? The damn thing has 2.1 million handles and buttons and switches. And even assuming that we got a result, how do we ensure that something we did, either through ignorance or honest mistake, didn't lead us to the wrong answer?
Probability dictates that errors must happen. Given that magnitude of complexity and the chances for mistakes, it's amazing that people choose deal with computers at all. And yet they do.
And how they choose to do so is an interesting thing.
<h3>Banes</h3>
A lot of computer people have a <em>bane</em>. Their kryptonite, or the color yellow. That is: one thing, for which no amount of effort, planning, or intimate knowledge will ever keep said thing from being a pain in the ass.
For a long time, my <em>bane</em> was printers. Printers are horrible, unreliable, costly, unwieldy pieces of shit; the detestable, undying glue between the pre-computing days and the present. They are made to work briefly, but then designed wear, corrode, and become useless within the span of months, requiring constant maintenance, upkeep, and repeated investment in inks and paper.
Printers persist because they solve a problem that would otherwise be time-consuming, costly, and error-prone. And despite their problems, they are still vast improvements over the printing press or manual reproduction. When they are working, they are an impressive synthesis of the virtual and the physical. However, in the presence of the a computer display system (like we have had everywhere for decades), it hasn't made much sense to print something, no less fax, or copy something. Since the first day my father brought home a handheld scanner, I was hopeful that one day I would never need to see a piece of paper again (think: Star Trek). Even today, I refuse to own a printer. Unfortunately because of that I get strange questions from my mortgage refinance people asking why I don't have access to a fax machine. Email has been around for 30 years, but it's still not legally binding enough in most cases. <em>(grumble)</em>
Watching people interact with printers and printer-related functions is interesting because pieces of paper, like computers, are effective methods of storing information. Of course people write on paper: words, symbols, numbers, your basic standards of literacy. But people also draw, take notes, make diagrams, and find ways to record ideas outside the realm of committing spoken language to alphabet characters. When it comes to printers, how people treat the information in the computer versus what they can put to paper shows how they feel about the computer, the paper, and generally how they think. Prints everything? Distrust and disdain of computers. Prints nothing? Computers are where they live.
<em>(My desk is right next to a printer. It's something I think about a lot. More towards how paper and computers accidentally change the way people think, just by making people abide by their "rules." More on that later.)</em>
Thankfully, my career has steered me elsewhere from printer support. And towards other <em>banes. </em>In fact, one in particular. One that is subtle, but insidious. Ubiquitous, yet you almost never realize it's there, and rarely think twice upon using it, and miss it terribly when you don't have a tiny green icon associated with your file. As someone who studied for years learning about coding, software design, and technology, and makes their career out of building custom, reliable, functional systems for people, I think its the lowest, basest form of how a computer should be used. It is the McDonalds of productivity software. It is the Helvetica of programming and automation.
Alas, Excel is probably one of my biggest, most effective, and successful competitors for what I do. And I'm not entirely sure why.
<h3>These truths we hold to be self-evident...</h3>
The world absolutely runs on spreadsheets. And by spreadsheets, I mean Excel. Excel is is the Kleenex of the computing and business world, maybe even moreso than Word is for written documents. It is a household name that has become synonymous with the type of product it is rather than the brand. Everyone who has had to work with a computer has had to work with Excel. Schoolchildren, sitting in their expensive computer labs with state-of-the-art computing technology, are first taught typing, then Word processing, then Excel. Entire inventories, analysis tools, data repositories... entire businesses are ran out of Excel. Even if a business doesn't want to run on Excel, forbids storage or usage of Excel and its files, its employees still exchange .xls files over email, over shared drives, etc., to communicate things that are more easily expressed in terms of raw data or mathematical formula. Even applications that are written for a specific purpose, outside the capabilities or practicalities of Excel, are written with a feature to <em>export</em>Â your results <em>to</em> Excel.
Everyone knows what Excel does. They understand its very basic, simple concept of "cells" and how you use them. Likely, a random person on a given street will be able to walk up to a computer terminal and recognize, if not use a spreadsheet if they had to. And even if not, simply being shown the application has a strange, intuitive nature to it, where after only a slight learning curve, one can feel comfortable in working with it, even if not in the most efficient or organized manner.
And even though it's based on simple concepts, the potential depth of complexity for an Excel spreadsheet can be dizzying. The raw data aside, there can be near-limitless numbers of functions, dependencies between cells, formatting rules, macros/scripts to be run in the background, data validation and formatting to be performed... not to mention the potential for integration ("When I edit cell XY, it changes row XYZ in a separate database..."). Every person I've talked to belonging to engineering or scientific pursuits can talk about some "crazy" spreadsheet that solves some critical equation that takes a college degree to understand. You could say "it's just math," and you'd be right, but the fact that nearly every case cited involves teaching Excel how to perform such math blows my mind.
Conceptually, Excel maintains only perhaps one rule: a cell can be blank, or have a value, or be an equation that evaluates to a value. If you went through public school and took any math, you've got enough of a grasp on the syntax required to evaluate equations. Otherwise, you're pretty much free to do as you please. You can type or copy in data. You can scroll until you're bored (or at the end). And if you're not too keen on referring to cells by A3 and B7, you can just click your mouse and that cell will be populated into the equation you're entering. In fact, maybe the most tedious thing about spreadsheets off the bat is having to refer to data by its column rather than a more human readable name.
It is a simple, effective tool, that does one thing and does it well. As such, it should be among my pantheon of "awesome things" I have discovered in my years dealing with computers. And yet I call it a <em>bane </em>to my career.
Why? First, because it is a place where data goes to die. Second, in all but the most prototypical or rarely well-designed instances, Excel wastes a fantastic amount of time. Third, its design ignores and exacerbates problems that have been solved elsewhere in computer science for years. And fourth, it teaches people (including myself) the wrong things about how a computer actually works.
And consequently, nothing is more frustrating to me than having to argue through and work around people's confirmation bias about the above four.
<h3>The Event Horizon of the Black Hole</h3>
Before 2003, there were some pretty laughable limitations on Excel spreadsheets. Excerpt from <a href="http://office.microsoft.com/en-us/excel-help/excel-specifications-and-limits-HP005199291.aspx" target="_blank">here</a>.
<table>
<tbody>
<tr>
<th><strong>FEATURE</strong></th>
<th><strong>MAXIMUM LIMIT</strong></th>
</tr>
<tr>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>Worksheet size</td>
<td>65,536 rows by 256 columns</td>
</tr>
<tr>
<td>Column width</td>
<td>255 characters</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
</tr>
</tbody>
</table>
Why laughable? When it comes to the types of data, and the amount of data computers have stored in the last 20 years, this isn't much (see my "magnitude of possible state" discussion earlier). But consider more practically how often things exceed 65,536 (rows):
<ul>
<li>There are probably more than 65,536 people in your town. (Anchorage, at ~300,000 > 65,536)</li>
<li>There are 86,400 seconds in a day (not unreasonable if you, say, are testing/recording something once a second).</li>
<li>There are more than 400,000 people born every day. There are more than 200,000 people who die every day.</li>
<li>In 7 out of 12 months last year, there were more than 100,000 requests to the idkfa server.</li>
</ul>
<div>Or exceed 256 (columns):</div>
<div>
<ul>
<li>At least 50 countries in the world mark the date of their formation more than 256 years ago (such that you could not store things like their GDP, population, etc., one per column for each row of the spreadsheet).</li>
<li>If you wanted to scan documents written in Chinese or Japanese, and wanted to count how many times each character was used per document (not uncommon for search indexing), you would never be able to store a number per column for each of the thousands of logographs in their languages.</li>
<li>If you wanted to track cards used each day in a Magic: The Gathering tournament, you could never record it using one column per card. (In fact, you would run out of columns before you even got through the first <em>set</em>Â of cards, no less the 14 sets that were released afterwards.)</li>
</ul>
<div>The point being: at first glance, Excel may seem like it stores "enough" out of the box. And its ability to store more than one worksheet, or the ability to have more than one spreadsheet file, can be used as workarounds to exceed this limit. People are content with this because for the foreseeable future it gives them a path to results. You need to generate a report? Grab the latest data, put it into Excel (if it fits), run the wizard, and you're done.</div>
However, from a data management perspective, out of the box there is a very clear and limiting boundary that rears its head immediately if you do something even slightly complicated. After Excel 2007 (I believe), this limitation was extended to 1 million rows and 16,384 columns, but the limitation still exists, even if it extends your horizon further. And that means that unless you can say, without a doubt, that your data set will never grow larger, or become more complicated than what is possible to store in Excel, you will simply postpone the inevitable dark day when you have to compromise to make your data set "fit," or spend inordinate amounts of time and money to figure out the "right" way to do your task.
</div>
Speaking of making data "fit," there's also the problem that not all data fits on a two-dimensional grid. And when it doesn't, the things you have to do to make it "fit" are super unpleasant.
Take for instance, a the case for a genealogy spreadsheet. You can have a list of people:
<div>
<table>
<tbody>
<tr>
<th><strong>Name</strong></th>
<th><strong>Age</strong></th>
</tr>
<tr>
<td>Person 1</td>
<td>26</td>
</tr>
<tr>
<td>Person 2</td>
<td>55</td>
</tr>
<tr>
<td>Person 3</td>
<td>60</td>
</tr>
</tbody>
</table>
</div>
And let's say Person 1 is Person 2 and 3's offspring. How do you record that? You could start with this:
<div>
<table>
<tbody>
<tr>
<th><strong>Name</strong></th>
<th><strong>Age</strong></th>
<th><strong>Mother</strong></th>
<th><strong>Father</strong></th>
</tr>
<tr>
<td>Person 1</td>
<td>26</td>
<td>Person 2</td>
<td>Person 3</td>
</tr>
<tr>
<td>Person 2</td>
<td>55</td>
<td>?</td>
<td>?</td>
</tr>
<tr>
<td>Person 3</td>
<td>60</td>
<td>?</td>
<td>?</td>
</tr>
</tbody>
</table>
</div>
Not bad. The text "Person 2" and "Person 3" is repeated for each of their offspring (meaning you're storing the same name data twice), but this isn't so bad. But let us say that you wanted to store a number of notes attached to each person (maybe made each time somebody's entry in the database is entered).
<div>
<table>
<tbody>
<tr>
<th><strong>Name</strong></th>
<th><strong>Age</strong></th>
<th><strong>Mother</strong></th>
<th><strong>Father</strong></th>
<th><strong>Notes</strong></th>
</tr>
<tr>
<td>Person 1</td>
<td>26</td>
<td>Person 2</td>
<td>Person 3</td>
<td>Something something something.</td>
</tr>
<tr>
<td>Person 2</td>
<td>55</td>
<td>?</td>
<td>?</td>
<td>Blah blah blah.</td>
</tr>
<tr>
<td>Person 3</td>
<td>60</td>
<td>?</td>
<td>?</td>
<td>Doo doo doo.</td>
</tr>
</tbody>
</table>
</div>
Still not bad, though, right? You have a note with each person. But what happens when you start adding more notes? Do you just add to the current "notes" cell, adding a new text line?
<div>
<table>
<tbody>
<tr>
<th><strong>Name</strong></th>
<th><strong>Age</strong></th>
<th><strong>Mother</strong></th>
<th><strong>Father</strong></th>
<th><strong>Notes</strong></th>
</tr>
<tr>
<td>Person 1</td>
<td>26</td>
<td>Person 2</td>
<td>Person 3</td>
<td>Something something something.More.</td>
</tr>
<tr>
<td>Person 2</td>
<td>55</td>
<td>?</td>
<td>?</td>
<td>Blah blah blah.More.</td>
</tr>
<tr>
<td>Person 3</td>
<td>60</td>
<td>?</td>
<td>?</td>
<td>Doo doo doo.More.</td>
</tr>
</tbody>
</table>
</div>
But now you can't tell (without reading the note) whether the text in a cell is one note, or many notes. What if somebody doesn't clearly indicate whether something is a separate note? What if you need to parse out the different notes for something? Then maybe you start duplicating rows, but changing the notes.
<div>
<table>
<tbody>
<tr>
<th><strong>Name</strong></th>
<th><strong>Age</strong></th>
<th><strong>Mother</strong></th>
<th><strong>Father</strong></th>
<th><strong>Notes</strong></th>
</tr>
<tr>
<td>Person 1</td>
<td>26</td>
<td>Person 2</td>
<td>Person 3</td>
<td>Something something something.</td>
</tr>
<tr>
<td>Person 1</td>
<td>26</td>
<td>Person 2</td>
<td>Person 3</td>
<td>More.</td>
</tr>
<tr>
<td>Person 1</td>
<td>26</td>
<td>Person 2</td>
<td>Person 3</td>
<td>Even more.</td>
</tr>
</tbody>
</table>
</div>
Now you've solved the "note" problem, but now you have difficulty telling if there are three people named "Person 1," or if each row indicates the same person. Note also that we're storing each record for Name, Age, Mother, and Father three times, and duplicating them again each time we add a note. This is another problem we have to deal with.
The issue at hand is called "data normalization." It's the process by which you essentially design a database, by defining entities, their relationships, and making sure you store everything cleanly, efficiently, and without duplication or chance for misidentification.
The thing is: if you plan accordingly, you can normalize your data in Excel just fine. If you treat each spreadsheet as an "entity," and find a way to "relate" those entities (either through a macro, or some worksheet-fu), it can be done. The problem is more that unless you set out to maintain good data design to begin with, Excel will not force you to do so. Or even really mention anything about it. To Excel, you need only abide by their rules for cells, and everything else can be happily ignored.
Why is this important? Because without validation and constraints, the humans working with the Excel spreadsheet will get lazy. And when they aren't paying attention, they will introduce errors to the spreadsheet, or put things into the sheet that shouldn't be there. An Excel spreadsheet can't maintain itself (without work ahead of time), and relies on its human users to keep it sane.
And that is why Excel spreadsheets are where data goes to die. Record accuracy will drift, and conformity will turn to chaos, and sooner or later, you will absolutely run out of rows and columns. And when it comes time to heavily use or scrutinize the data, incredible amounts of time have to be spent doing data "cleanup," that would have otherwise been automatic given other toolsets.
<h3>Time Keeps on Slipping Into the Future</h3>
Everyone gets paid differently. There is no question about it, but it's a subject I try to avoid altogether (because it just ends up in fits of jealousy and envy that are hard to break out  of).
Why everyone gets paid differently is maybe a little more straightforward of a topic. Quite frankly: certain jobs are more critical to a business function, and other jobs are less. And, in general, a position's criticality is determined by a) how much money a person can make for the company, and b) how much money a person can save the company. Note also that these things aren't mutually exclusive: if you're both profitable and cost-saving, then all the wider your profit margins will be, and the higher likelihood of a better salary.
IT fits weirdly into these two classifications. They are (usually) at once highly skilled and trained, but unless your business is IT, you can't ask them to "engineer a new turbine" or "find a new market for our product." In fact, most IT workers resent being asked to do anything but computer-y things. We are dwarves, digging in our technological mines, building tools, crafting shiny things, keeping the systems working. Even if we can't always make things, we can facilitate, and give a competitive edge to a product. Where something might be done manually, IT might make something automatic. When a question has to be asked of a company's data, IT can make the analysis efficient, and repeatable, and put the answer in your email inbox.
The problem is that <a href="http://idkfa.com/ec/2011/08/05/were-not-all-so-bad/">we're perceived as expensive</a>, particularly for up-front costs. And we're the holders of the keys to magical things like "programming" and "databases" and "tools." And most times it takes us a while to do things (because when you ask for what seems like a simple thing you end up with a massively-expensive, over-engineered widget that does almost what you thought you asked for). We tend to build things to <em>last</em>, and also to <em>scale </em>as more people use them (because we have the expectation that something we build will be used constantly, forever, and by everyone). And if every thing we built went wrong at once, we would be massively overwhelmed, and massively blamed, and so we build so that that can never happen.
But if the problem is big enough, IT people work well, <em>at that scale. </em>We don't build pickaxes. We excavate mines. It's harder for us to work on smaller things, particularly in the realm of the prototypical, and especially when we aren't familiar with the problem set.
Which is where Excel finds an interesting niche. It is simple enough to start with, and powerful enough to accomplish something complex. Let's say an engineer (who, I'll note, is still more expensive than the IT person), needs to prototype a solution to a problem, and needs to see quick results for a fairly small set of test data. She is familiar enough with the data that she can detect input and output errors by manual inspection, and can solve her problem within the span of a couple of hours.
If this is the case, then great. The process doesn't need to be analyzed, formalized, or vetted. If the data is lost, there is no harm done. If the work that went into the spreadsheet was lost, it would be relatively easy for the engineer to replicate her work (and maybe in even less time because she's now familiar with the problem). And if somebody else had to replicate her work, the hope would be that what she was able to accomplish in Excel would be understandable to the next person coming along.
I think the situation above happens all the time. People just need the computer to do something for them, and Excel is a simple way to get that done. That is fine.
The problems occur when something becomes necessary, or popular, or both<em>.</em>Â Say it goes from one person using it to one hundred. One thousand. Say the people maintaining it goes from one, to one hundred, to one thousand. Say the small set of test data goes from hundreds of rows to millions.
Excel may technically still be able to perform as the tool for the job. But as it outgrows its scope, it wastes time. Instead of one expensive engineer maintaining equations, it is one hundred expensive engineers. The parameters and the data set can't be adequately shared or updated, so engineers now have to waste time waiting for others, for data, for updates, or end up working from outdated or incorrect information. Changes to the spreadsheet increase its complexity such that no one engineer can safely modify it. Or the data grows so large that simple workstations can no longer perform the calculations, causing further waits, downtime, and hardware cost. And given the large data set, expensive engineers have to spend time tediously wrangling data, forcing every action to a manual process, introducing further mistakes and errors into the system that had otherwise worked so well when it was smaller.
When the IT dwarf lords see this, they laugh, ask "You built that in Excel?" and return back to their subterranean fortress. When called upon to fix it, they become angry, furious even that they weren't consulted in implementing something so complex and used so pervasively. They writhe at the thought of having to debug Excel equations and macros, of having to handle columns and rows rather than names and symbols, of trying to make sense of the chaos that has become of a simple idea.
And then somebody gets their math out.
The question becomes: what is the absolute minimum amount of effort that can be taken to maximize the efficiency of something that wasn't built to scale correctly? And still maintain business function until that thing gets built? And still be less expensive than keeping things status quo? And how long until that time investment pays off in terms of hours saved by the engineers versus hours spent by the IT people? And is that length of time less than how long it would take to complete the project?
The story here is only halfway anecdotal. And I'm deeply sad to say: Excel (or whatever the terrible, horrible, prototype) is usually the winner. Based on my experience with corporate culture, making money will always trump eliminating waste. And it's not to say that the IT solution is overlooked every time, or we can't be useful in these cases (I don't know how many times somebody has asked me to perform batch operations on a spreadsheet, or data maintenance to clean up years of bad habits and careless keystrokes).
My problem isn't that people's projects grow out of their original scope. That's actually a good thing, in terms of growing customers, or growing users, or growing use of something that makes people's lives easier. My problem is that Excel makes this situation notoriously easy to slip into, and that even if somebody has been bitten by it in the past, they often continue to choose to use and defend Excel to build their tools because they a) don't care to talk to the IT people about better ways to do things, or b) don't care to learn how to do something new themselves because they think it will take too much time.
In the latter case, I'm sympathetic. I have my favorite tools I like to use. I recognize that they may be limited, or may actually be a disservice in some cases (too complex, etc.). But there are also the tools that, in the right spot, given a few good design choices while keeping things simple and efficient, you can tame the near-infinite state machine, and you can do amazing, <em>elegant</em> things. Because the tools I use treat computers for what and how they actually are. They acknowledge the complexity of their nature, and provide the tools to address it, allowing one to build solid, well-designed pieces of software.
Excel, in dropping people into its own special microcosm, discards almost all of the lessons learned by more "formal" programming languages, forcing its users into doing bad things to accomplish what they want, and allowing programmatic complexity to collapse upon itself.
<h3>Nobody's right if everybody's wrong...</h3>
A programming language is a set of mathematical and syntactical constructs (as represented by human or machine-readable commands) that unambiguously define a set of instructions and the order in which to perform them.
There's a lot to that statement. In fact, the above definition, its subtleties and eventualities, will probably dominate the intellectual and professional pursuits of the rest of my life. But when it comes down to it, it is saying "a set of instructions." What you do with those instructions is the interesting part, usually because one or very few sequences of instructions will do exactly what you want.
When languages were first designed, they were put together to get around the tedium of hand-coding numerical machine codes (for example, 0xF4 in hexadecimal is the "HLT" (Halt) command to tell a processor to turn itself off) that weren't obvious for human programmers. They were complicated, murder to understand, and prone to extreme error. "Higher level" languages were developed around making that task simpler: making it so programmers didn't have to memorize codes by rote, and abstracting some of the "math" involved with the programming.
Eventually, people figured out the simpler of the next-tier languages. They were useful enough to get around the tedium, and to put together something non-trivial. But when you started getting sufficiently complicated, things started to fall apart again. Adding more features, changing behavior, fixing errors, everything was incredibly unwieldy, and took vast amounts of time to get correct.
So they designed even <em>higher</em>Â level languages, those that took the problems of complexity, and gave programmers the tools to effectively save them from themselves. These tools enabled programmers to create even more complex systems, and to do so more quickly, safely, and make their finished products more resilient and robust. Now there are languages that go so far as to refuse to run unless you format your code such that it will be readable by other programmers (<a href="http://en.wikipedia.org/wiki/Python_(programming_language)" target="_blank">I'm serious</a>).
As far as I'm concerned, Excel has never had a "renaissance," the likes of which I've described above where its designers ask "Where does this fall apart when things get complicated? And how can we address those points?" Instead, its pervasive monopoly in the spreadsheet software market has meant that it hasn't needed to/wanted to change in the decades since its inception, despite shining examples of where improvements could be made from its programming language cousins.
For instance, let's say you have a column of data, and another column right next to it with a calculated value based on an equation. You discover that a constant you used isn't quite so "constant," and you want to have the equation reference another cell rather than a number embedded in the equation.
You start with an equation like "=A1+0.1234" in column B.
You want to make it like "=A1+G1", where G1 is the cell that stores the new value for your constant, 0.4321 (or something).
You click on B1, and edit the cell with the new equation. However, you discover that the equation you edited is only updated for cell B1, the rest of the cells have been left alone. You decided to fix it by copying your new equation, dragging your mouse over the cells you want updated, and hitting paste. But you miss the very last cell, and you don't notice. Now, for what should be the exact same function performed between columns A and B, you have a discrepancy.
This violates the <strong>"</strong><em>don't repeat yourself</em><strong>"</strong> methodology sung from the mouths of the software design gods. If you have to do something more than once, be sure you never <em>write</em>Â it more than once. This is so that changes and updates to your work are always consistent, and never depend on your manual effort to make sure all instances of behavior are the same (because you absolutely will miss something, says probability).
Another example of "low hanging fruit" on the "good design" tree is the separation of code and data, which Excel simply leaves to rot. In programming languages, it is a notoriously bad idea to have code that is considered "self-modifying," that is, code that upon execution will change its own code paths mid-run. This is a bad idea because it means that if something goes wrong, you can't be sure that the code you wrote is the code that was actually executed.
With a worksheet in Excel, there is almost no separation. Any set of equations you build are themselves "data" in a cell, and the numbers they return are the immediate evaluations of of that code. There is no distinction between an "input" cell and an "output" cell. If you want to see what I mean, take a spreadsheet you deal with frequently, and remove the "=" from the beginning of an equation.
Macros and scripts try to address this by keeping a series of routines and loops in the "background" to be executed when you call them. If you choose to use them, they can afford some "cleanliness" in the separation between your business logic and the raw data you're operating on. However, any convention you come up with is your own, and none that were part of Excel's original design.
My final criticism for Excel in this context is the concept of "scope." Scope, in programming terms, is a way of letting a computer know "what am I talking about at this point." A programming scope is essentially a moving window of variables you can reference that can include and exclude certain variables when it is convenient. A scope is important, not only so that the computer can know "Oh, you said X here, I know that you mean this X, and no other X like it," but also so you, as the programmer, can keep your scope <em>as limited as possible.</em>
Why is it important to keep a limited scope? It goes back to my discussion about states (remember the 2^2147483648 possible results in my first computer?). If your scope isn't limited, your number of possible "inputs," that is, variables that are important to what you're doing, exponentially increases your possible outputs (and in turn, increasing chances for error). Limiting your scope by stating "At this point, I will only ever reference X and Y, and nothing else," considerably reduces your chance for error.
Ideally, a programmer will design their programs so that for everything important and complicated they can reduce the number of moving parts required to make it work. If a task is too complicated, or involves too many moving parts, they have the option to divide the task into smaller tasks, with each sub-task having its own, even further limited scope. It may be the case that, for a given task, a programmer needs to refer to something absolutely everywhere (something in a "global" scope), but those cases are limited, avoided, and considered questionable design.
Excel is nothing but the global, flat scope. No matter the depth of complexity of your Excel spreadsheet, you can always refer to every one of the cells in a worksheet's grid, and from anywhere. Some might call this a feature. I call this dangerous. There are no scoping mechanisms, nothing afforded to the person designing a spreadsheet to indicate semantic relevance to the task at hand. Just A1, B2, G234, your memory, and the hope that nothing in your spreadsheet is ever moved.
I have a laundry list of other complaints, but I'm not sure I can explain them very well without obnoxious examples. My hope in including the above is to demonstrate the more egregious violations of programming principals, considering if I made the choices with my own programs that Excel exposes to its users, I would likely be fired. No joke.
<h3>Inelegant Solutions</h3>
I implied that Excel could benefit from a renaissance to revisit some of its questionable design choices. That isn't to say that there haven't been attempted successors. Access, the Microsoft Office Suite application which tries to merge relational database design, a scripting/business logic engine, and a user interface generator, is an alright attempt at solving some of Excel's shortcomings. Admittedly, it has its own shortcomings, but it exposes to its user and programmers a set of tools that let them adequately manage a data set (and to allow that data set to manage itself). The IT dwarf lords shy away from it because it limits concurrent users, and doesn't scale well, but at the very least it requires good <em>design</em>, otherwise things immediately fail to work.
Even still, whether there are good or bad things done with Access, they are usually greeted by the same question and subsequent laughter as before: "You built that in Access?"
When it comes to choosing tools, IT people are pretty much assholes. I apologize, but it's sort of our business to know the best way to build something. And the tools you use are important. If you don't <em>have</em>Â the right tool, we're the ones that <em>build</em>Â the tool. And we can, of course, be wrong. But if not, it will be perfect.
Which is why it is so absolutely excruciating when the most inelegant of solutions is chosen over a variety of better ones. And it is chosen because it is familiar, not because it is a technically sound decision. And it is chosen because it is assumed that anyone, given minimal experience with Excel, will be able to "work on it" without having to deal with IT or an expensive consultant. And because people have a hard time predicting the scopes of their problems, particularly when they have they aren't aware of the clear limitations their toolsets put upon them.
Which brings me to the other half of why Excel is my <em>bane</em>: pity.
Pity, because for many people, sitting at a desk, it never gets better than Excel. The more one works with it, it changes how you think about data, storage, expression, automation, and code design. And it puts your thoughts into a spreadsheet-shaped box. It is the same dynamic as I described between computers, printers, and paper, as we're subject to interact with the systems in the way we were taught.
---
We are tool-using animals. We are creative, intelligent, and manipulative. We interact with our world physically and symbolically, each button and each word a type of "handle" for an expected action and reaction and meaning.
I want to think that I will always be willing to learn something new, and useful, and interesting, curious as to the function and use of a given handle. And that I will always be interested in how something <em>really</em>Â works, not just in how something is <em>supposed</em>Â to work, and be able to appreciate the difference.
But I don't know. A lot of people like to use Excel.