Appendix A: Quick Guide to Spreadsheets
The spreadsheet is probably the simplest of the data journalism tools, and the one most adept at working with numbers. The purpose here is to demystify the topic of spreadsheets for anyone who finds them intimidating, to alert you to the useful things a spreadsheet can do, and to give a quick summary of how to use the basic features. We begin with some definitions:
A spreadsheet is a computer program that allows you to do simple or complex calculations on a series of numbers at the same time. Whenever values in the spreadsheet change, any calculations dependent on those values will be redone automatically (unless you set them to be done manually).
Microsoft Excel® is a brand of spreadsheet. If you have another brand, you will find that most of the information in this section also applies to your program, albeit with slight tweaks.
A worksheet is a page with a grid, made up of rows, columns, and cells. It is similar to a city map, with letters running across the top and numbers down the left side. Just as the letters and numbers on maps allow you to find specific streets, the ones on a worksheet allow you to find locations on the worksheet called cells. Depending on the version you are using, an Excel sheet can have between 256 and 16,384 columns and between 65,536 and 1,048,576 rows. The lower limits apply to Excel versions up to 2003. For those who like really big numbers, that works out to between 16,777,216 and 17,179,869,184 cells per worksheet. You can open multiple worksheets and each one has the same number of cells.
The cell pointer is moved around the screen by the mouse. When you click on a cell, its border is highlighted to indicate it is the active cell, meaning the one in which you can add or change information. When you type information into a cell, anything that was in it before is erased. If you merely want to edit an entry, double-click on the cell, or edit it in the formula bar.
The formula bar is a horizontal bar across the top of the spreadsheet screen that shows the contents of the active cell, whether a formula, value, or label.
A column runs down from the top of the worksheet page and is made up of individual cells. Each column is labelled with a letter, starting at A and running through to Z, and on through AA, AB, and so on. Rows run across from the left. A row is also made up of individual cells.
A cell is a rectangular area in which text (labels), numbers (values), and formulas can be typed. Cells are known by their coordinates; e.g., cell A1 is at the intersection of column A and row 1.
Labels are any characters typed into a cell that aren't purely numeric. Excel normally orients labels to the left of the cell. Labels are usually located along the far left, and across the top of the worksheet, to identify what all those numbers are. A spreadsheet without labels is mostly useless.
Values are numbers you type into cells. You can do calculations using values. Values are usually oriented to the right side of the cell. If you ever see a number oriented to the left, it usually means that somehow the spreadsheet is seeing it as a label or as text, and you won't be able to use it for calculations. This can happen sometimes when a database file is imported into Excel or in some cases when you paste in data from the web. It also happens if an apostrophe is entered before a number in a cell. The apostrophe won't be visible, but it tells the program to treat the number as text.
Formulas are used to do calculations on one cell or a series of cells. For instance, the code =A1+B1+C1 is a simple spreadsheet formula. The equal sign before the cell references tells the program that you are now entering a formula. As you enter this text, it will appear in the formula bar at the top of the screen. You can alter it there later by putting the cell pointer into the cell where you typed the formula. The results of formulas are automatically updated as soon as you change any of the values on which the formula is based, unless you have set up the program for manual recalculation. You would do this only on an extremely large worksheet where recalculation of thousands of values would slow the computer down.
Functions are used to make it easier to write formulas. Examples include AVERAGE, MEDIAN, and SUM (these are explained in the following section).
What a Spreadsheet Looks Like
We are using Excel 2013 for the PC for this tutorial. It is similar to Excel 2007 and 2010. Earlier versions of the PC version look a little different, as does the Mac version, but the basics are all the same. Consult your version's help system for details. Microsoft also offers a free, online version of Excel as part of its Office Online offering. It works almost exactly the same way as what we'll show you here. Other spreadsheet programs such as the free Open Office, Corel Quattro Pro or the online Google Spreadsheets (free), also work in much the same way.
The first illustration shows an Excel worksheet containing information on the population of census metropolitan areas in Canada in 2010 and 2013, in thousands. The data come from Statistics Canada. Using these data, we are going to walk through the Excel functions most often used by journalists. If you would like to follow along in your own copy of Excel, you can copy the data directly from Statistics Canada's website (the URL was http://www.statcan.gc.ca/tables-tableaux/sum-som/l01/cst01/demo05a-eng.htm when this was written). You can copy and paste data tables from most web browsers directly into Excel. With some browsers, you may need to use Excel's "paste special" command and choose text as the type of data to paste. Experiment to find the best way with your browser/Excel combination.
In the illustration, you can see the standard layout of a spreadsheet worksheet. Excel 2013, like Excel 2010 and 2007, uses a series of tabbed "ribbons" instead of traditional toolbars to house icons that access various program features. In this short tutorial, we will use the Home and Data ribbons exclusively. Your copy of Excel probably won't have all of the ribbons seen here, especially the last three. But don't worry—you'll have all the features you'll need.
Addition is probably the most common thing you will do in a spreadsheet, whether with a few numbers at a time, as here, or with thousands, or tens of thousands. So the first thing we will do is add up the total population of the census's metropolitan areas.
To add up a range of cells, we use this formula.
The colon between the cell references tells the program that we are dealing with a continuous range of cells, starting with the first one and running through to the last. If the cells aren't in a continuous range, separate them with commas:
You can also combine this to have several continuous ranges, separated by commas, as in =SUM(B3:B5,B8:B11). You type the formula where you want the answer to appear, as in this illustration:
Notice that the formula also appears in the formula bar across at the top of the screen. When you press <Enter>, the result appears in the cell.
The autosum button
There is a useful shortcut called the autosum button. By highlighting a cell underneath or to the left of the row or column of figures to be added, and then clicking on the symbol that looks like a letter E (it is actually the Greek letter sigma) in the "editing" area of the home ribbon, you can quickly add up the range of cells.
Copying a formula
From here, we could type the equivalent formula into the next cell to the right, to work out the total population for 2007. Instead, though, we will simply copy the formula by selecting the cell where we wrote it, then move the mouse pointer to the lower-right corner of the cell until a little "+" symbol appears. We will then hold onto the left mouse button and drag the "+" across to the right to copy the formula. When we let go of the mouse button, the formula will be automatically copied, as in the next illustration:
There are many more spreadsheet functions in addition to summing. For example, "AVERAGE" will calculate the average among a range of values (e.g., AVERAGE(A2:A26) will calculate the average value in the range of cells from A2 through A26). "MEDIAN" will calculate the median. There are dozens more, including some mighty impressive statistical calculations.
After adding, the next most likely thing you will do with a spreadsheet is calculate percentages. Journalists love percentages because they are familiar and readers understand them.
In a percentage calculation, the numerator and the denominator measure the same thing, be that people, bushels of tomatoes, or shipping containers. The simple rule to follow is, divide the number that represents only some of the thing by the number that represents the total number or quantity. Sometimes this is referred to as dividing the part by the whole.
So, if you want to find out what 50 is as a percentage of 100, you divide 50 by 100. (Don't worry about multiplying by 100 as you were taught to do in school; the spreadsheet takes care of that for you, as you will see.)
To begin, we will insert a column after the existing column for 2010. To do this, we click at the top of an existing column to highlight it, as in this illustration:
We then right click at the top of the column and pick "insert' from the context-sensitive menu, as in this illustration:
As soon as we let go of the left mouse button, the new column will appear:
To actually do a percentage calculation, we select the cell where we want the answer to appear, then enter the formula for calculating a simple percentage. In this case, we might type =B2/B36.
B2 contains the 2006 population of St. John's, Newfoundland and Labrador, and B36 the total population of census metropolitan areas in 2010. (We are actually going to add a "$" character to this formula, typing =B2/B$36. The "$" is an instruction to Excel to "lock" that part of the calculation in place, regardless of what else moves around on the worksheet. We explain this technique more fully later on.)
When you press <Enter>, the answer appears:
Note, however, that the result is not displayed as a percentage. To correct this, we format the column as a percentage by left-clicking on the column header C to highlight the whole column, and then clicking on the "percentage format" button in the "number" area of the home ribbon. It looks like a % sign, so it's easy to find.
The result looks like this:
Now, moving the mouse to the lower-left corner of cell C2 until the little "+" appears, we drag down the column, still holding the left mouse button. When we release the button, Excel will fill in the remaining percentages to the bottom:
If you would like to see more decimal places, click on the "add decimal places" button in the "number" area of the home ribbon. The next step is to fill in the same percentages for 2013.
Doing this is simple. We cut and paste the formula by clicking on the cell containing the formula we want to copy, copying it using the keystroke combination <ctrl> C, then selecting the cell where we want to put the formula, and pasting it with the keystroke combination <ctrl> V. This is the result:
Now, using the same skills as before, we can fill that result to the bottom. The worksheet, with column headers added to the percentage columns, ends up looking like this:
With just a few keystrokes, we have found out how each major metro centre has fared in terms of its share of the population of all major metro centres in 2010 and 2013.
Most other calculations in a spreadsheet follow this pattern. Do the first calculation, then fill or paste the formula into adjacent cells, rows, or columns.
One thing to be conscious of is how spreadsheets copy and fill formulas. The cardinal rule to remember is that as the copied formula moves across or down, the cells that it adds up (or multiplies, divides, etc.) move in lockstep.
So, the formula =SUM(A3:B3) written in cell C3 will become =SUM(A4:B4) if you copy or fill the formula to cell C4. This feature makes it easy to rapidly do the same calculation on multiple rows or columns. At times, however, you need to freeze the cell or cells on which a formula is applied. This is where the dollar sign comes in.
Use the "$" to lock or fix a cell used in a formula. If you place it before the "C" in C18, then the formula will always refer to column C, while the row can move. If you place it before the "18", the reverse applies. If you lock both, the formula will act on cell C18, no matter where you copy it.
You will use this feature most often when you need to constantly divide by the same total. Typically, there are many subtotals to calculate, each one in a different cell. But there is only one total. This is why we used the dollar sign in our first percentage formula in the population table.
One other thing journalists will often want to calculate is a percentage change. This is a little more tricky than calculating a simple percentage, but still easy enough. You will have an "old" number and a "new" one that represents the changed value. The rule is as such: work out the difference between the old and the new value (always subtract the old from the new, even if the new value is smaller), then divide that difference by the old value. Now, format it as a percentage.
The next empty column is column F, so you enter the formula in cell F2.
Type in the formula =(D2–B2)/B2 as in this illustration:
This subtracts the 2010 population from the 2013 population for St. John's (showing us the difference between the numbers), then divides the result by the 2010 population, yielding the percentage change. The first part of the formula is placed in parentheses to instruct Excel to do that part of the calculation first.
Now, we fill the result down to the bottom of the column, as in the next image, remembering to format the result as a percentage:
You may notice the #DIV/0! error message in cell F35. The error is generated because Excel tried to divide the empty cell by the total. Since we can't divide by 0, Excel lets us know with a slightly cryptic error message. You should clear the contents of the cell.
You can now sort by the results to yield information for a story.
Along with being wizards at calculations, spreadsheets are adept at sorting things out. So if you want to organize a column of numbers into a list running from largest to smallest, or smallest to largest, you can get the program to do it for you in a couple of easy steps. Doing this is very common, for example, when you want to know which government department spends the most money (and which the least), or who makes the biggest salary. Since it is such a useful operation for journalists, we will look at this one in a bit more detail here.
The key rule to observe in sorting spreadsheets is to ensure that all adjacent columns are sorted at the same time. Otherwise, they won't match up properly with each other and you will have a big mess. To ensure all rows are sorted together, it is crucial to ensure that the entire block being sorted is selected or highlighted. The only exception would be any totals at the bottom and any labels at the top that you want to remain as they are. In the following illustration, you can see how not to do it. Several columns have not been selected, and if we try to sort the worksheet, they won't be sorted with the others. Look carefully at what happens in this before-and-after sequence:
The result when sorted on the 2013 population:
As you can see, it's a mixed-up mess. To make sure this doesn't happen, ensure all of the area to be sorted is selected. Luckily, Excel makes this easy because as soon as you click on the sort button, the entire area of the sheet from the headers down to the first empty row and across to the first empty column will be selected. The program stops at an empty row or column because it assumes that after an empty row, as an example, will come a row of totals, and usually we don't want to sort totals.
To begin, put your mouse in the top row of the sheet, and click on the large Sort button on the data ribbon, as in this illustration.
The contiguous area of the sheet will be sorted, and a sort dialogue box will appear:
We can now sort as many columns as we like (Click "Add level" to add another sort column), the first one taking precedence over the second, and so on. Choosing "ascending" order will sort from A to Z and smallest to largest. "Descending" order will do the opposite. In this case, we will sort by column F in descending order. (It is important to choose "My data has headers" if there is a row of labels across the top, and "no header row" if there is not.) This will sort everything according to the change in metro population, from the largest change to the smallest.
Here's the result:
There is an even simpler method, which the help files will suggest you use. You can choose a cell in the column you want to sort and click one of the "quick sort" buttons in the sort & filter area of the data ribbon. It works just as well, but while you are learning, it is often best to do it the manual way so you can better see what is happening.
We have covered only the most basic features of Excel. A brief glance at the software instructions will show you that there is much more spreadsheets can do, including rudimentary data analysis. For a full guide, including details on how to analyze and filter data, create pivot tables, and perform other advanced functions useful to journalists, see Vallance-Jones and McKie's Computer-Assisted Reporting, A Comprehensive Primer, from Oxford University Press.