Higher Education

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide.

Visualizing Data with Google Fusion Tables and Tableau Public


The ability to take data from the web, do some additional calculations and/or visualize them online is one of the most powerful applications in journalism today. Some say it's the very future of journalism. But how to do it? The tools are getting ever simpler and Google Fusion Tables and Tableau Public are two of the most popular solutions.


Fusion Tables

The Google Fusion Tables application is described by Google as "experimental," and has been so ever since it was introduced. Nonetheless, it is a powerful way to present information in a way that allows the audience to interact with your story, in essence creating his or her own take on your data. For this exercise, we will make a Google Fusion Tables map of the percentage of the vote that went to John Tory in each ward in the 2014 Toronto municipal elections.

To make a Fusion table map, you will need two components. One is a map in the KML format used by Google Maps, and the other is a spreadsheet file or delimited text file of data that is summarized by the same geographical divisions as used on the map. In our example, the KML file is a map of election wards and our data are a spreadsheet of election results by election ward.

This is an important rule: the data you wish to map have to be organized into the same geographical units as the boundaries on the map and there must be a field in each table of data that has identical information, so as to make the join between the two.

There are three types of map files, points that represent things such as buildings, lines that represent roads, pipelines and other linear constructs, and polygons—multi-sided, enclosed shapes that represent areas such as election wards, provinces, city boundaries, etc.

You can download the files used for this exercise from.

To begin making a Fusion Table visualization, you first need to upload each of your files. Open your Google Drive account. If you don't have a Google account, you'll need to create one at www.google.com.

To begin the upload process, click on the "create" button in the Drive interface. Those with the latest updated version of Google Drive should click on the "new" button. In the dialogue that opens, choose "Fusion Table." If you don't see the Fusion Table option, click on "Connect More Apps," find Fusion Tables, and add to your Google Drive.

You will probably have a data file on the hard drive of your computer. Click on "choose file" and navigate using your computers file system to the location of your file. The file can be a spreadsheet, such as an Excel spreadsheet (xls or xlsx), a text file, or a KML map file. In the illustration, we have chosen an Excel file of votes for John Tory in the 2014 Toronto mayoral election.

Note: you can also use a Google spreadsheet file that already exists in your Google Drive account. This illustration shows the dialogue box with the Tory votes file chosen.

Click on the "next" button to open up a dialogue that previews the data in your table.

As you can see, the data include a "ward" field, which is the unique identifier of the election ward, a decimal column representing the percentage of votes for Ford in that ward, and a numbered ID field. The leftmost numbered field is added by the Fusion Tables application. At the top of the dialogue, you can indicate which row contains the field names; in this case it is row 1, the default. When you are satisfied that the data are arrayed correctly, click "next." The dialogue that follows is probably the most important, because in it, you set attributes that will be seen by those with whom you may share the data.

As you can see, you can change the table name, indicate whether other Fusion Tables users can download your data, attribute the source of the data, provide a link to the data source, and provide a narrative description of the data. The latter may be most important to you, as it will help you remember the contents of your table if you return to it months or years later.

Once you are satisfied, click Finish, and the data will be imported into a new Fusion Table. Within a short time, you should see your uploaded table.

In this case, the percentage of the vote is in an ugly-looking decimal format. It is easy, however, to have Fusion Tables display the data in a number field in a desired format. First, click on Edit>Change Columns.

As in the following image, choose the field you wish to alter, choose the number format you desire (in this case percentage), and click "save."

Your column will now be represented in the chosen format.

You have now completed the upload of your first table.

In order to create a visualization of the Tory votes on a map, you will also need to upload a map file in KML format. The procedure for doing so is exactly the same as we followed for the first table. The only differences will come when you reach the file preview stage and once the file has uploaded.

At the preview stage, you will notice a column called "geometry." It contains the attributes necessary to display your map. It is the column farthest to the right in the following illustration.

Similarly, once you have finished uploading your KML file, you will notice that a third tab appears at the top of the screen, labelled "Map of geometry." This indicates that Fusion Tables recognized your table as a map and that you can see it as such.

Click on the "Map of geometry" tab to see your map displayed.

The map will appear with a red background for the polygons. This is the default. We will change it a bit later.

The reason Fusion Tables are named such is that you can "fuse" two tables together to produce a map that displays your data. To begin this process, click on the Google Drive File menu, and choose "Merge." Navigate to the file you would like to use.

Click "next."

The dialogue that follows is crucial. Here, you will establish the "join" that will fuse the two tables together. In this case, it is two fields, one in each table, that both have the unique ID number of each ward.

Make sure that the two fields you are connecting on have matching data. If they don't, your Fusion Table merge won't work. Click "next."

In the dialogue that follows, you can pick the columns from the two tables that you wish to include in the new, fused table. For this example, we will keep them all. Remember, that if you enable downloading of the table, any fields you include will be available for downloading by users.

When you click "merge," your new, "fused" table will be created.

In the next box, click "view table." The new table will contain fields from both of the base tables.

Notice that the table now has the geometry field from the map, and the Percent field form the data table. It's a perfect fusion, so to speak.

If you click on the Map of geometry tab you will notice that the map looks the same as the original KML file. But since the new table contains data from both base tables, you can now change the look of the table to display the data as colours on the map.

To begin, choose "Change feature styles" and choose "Polygons fill color."

You can also alter the colour and width of the boundary lines between the election wards. Other options are available for point or line files.

For our map, we are going to colour of map based on gradients of values in the column of percentages of voters who voted for John Tory.

Click the "gradiant" tab, the "Show a gradiant" radio button, and the column you wish to display, in this case Percent. Click on "use this range" to use the range of values present in the column. Your dialogue will end up looking something like this:

Click "save" and the map will be recoloured to show what percentage of voters in each ward voted for John Tory, in a continuous gradient from the lowest percentage (lightest green) to the highest (darkest green).

You can also change the look of the information windows that pop up when you click on any of the individual wards.

There are many more customizations that are possible. You can also divide data into "buckets," which are ranges of data, each being given a different colour. You can also add a legend.

The final step is to embed your new Fusion Tables visualization in your website or blog. To do so, change the sharing options for the merged table to either "public" or "anyone with the link." Do this by clicking "share" in the upper right-hand corner of the screen.

Then, click Tools>Publish to find the embed link for your file. You can change the width and height of the embedded map, in pixels.

The embed link provided by Google is in an iframe. This will work on many websites; just paste the link into the html of the webpage or in the raw html window of your content management system (e.g., a WordPress page or post). Due to the individual peculiarities of various CMS systems, it is not possible to cover all of the possible issues here. Consult with the people who oversee your web service. For a whole lot more information about Fusion Tables, visit Google's Fusion Tables help site.


Creating a visualization with Tableau Public

Tableau Public has become an extremely popular way to visualize data. It is a free, cloud-based version of the Tableau Desktop application widely used for enterprise visualization and analysis of data. It is capable of extremely sophisticated, multi-panel visualizations and can be extensively customized.

The free, public version is aimed particularly at journalists, and Tableau Software is extremely helpful to journalists, and offers good documentation.

Tableau is a sophisticated platform, and this discussion is necessarily brief. However, it will give you a sense of what is possible to allow you to explore more deeply on your own.

Tableau Public is available in a PC and a Mac version. We will demonstrate using the Mac version, but the differences are minimal, beyond the normal operating-system-driven differences.

When you open the program for the first time, you will see a screen like this one:

Data to be visualized in Tableau need to be in raw data format, with one record per row. Data cannot be cross-tabulated (such as in an Excel pivot table). The following is an example of an acceptable raw data format:

The data show parking tickets in the City of Saskatoon. In raw data format, each row includes all of the data for one ticket and there are no summary columns. If you do happen to end up with data that have summary columns, you can reshape it into a form useable by Tableau Public using a plugin available for the PC version of Excel. You can get the plugin.

Tableau Public is able to import data in a plain, delimited text file, an Excel spreadsheet file or, in the PC version, an Access database table. The desktop version, which costs in excess of $1,000, can open a great many more formats, but for most journalists, Tableau Public's available formats will do just fine.

To begin the import, or connection as it is called in Tableau, click on Open Data button, click on the file format desired, and navigate to your file.

If there is more than one worksheet in your Excel workbook, you will be prompted to choose the worksheet to import. When the import is completed, click on "Go to Worksheet," and this window will open.

This screen may seem intimidating at first, but it is pretty easy to get used to.

It is important that your data be properly classified by the program. On the left side of the screen, you will see panes for what Tableau calls "dimensions" and "measures." Roughly translated, this means categories of information (dimensions) and numbers (measures). If you find that a number (a measure) has been misclassified as a dimension, just drag the field from the dimension to the measures area (the opposite applies as well).

Like Excel and other spreadsheet programs, Tableau uses the term "worksheet" to refer to a screen that contains data. However, where it differs is that Tableau also uses "dashboards," which are special screens that collect several worksheets together to create an interactive visualization.

For this tutorial, we will create a visualization that includes a table of parking tickets and a treemap of the same tickets. You can download the data used in the tutorial.

To begin creating the first worksheet, select the "offence" dimension and the "number of records" measure. The latter measure is created automatically by Tableau when you import the data and represents a count of all of the rows of data.

In the "Show Me" area on the far right side of the screen, you will see that several of the available visualization types go from being greyed out to coloured. This means they can be selected for use. By selecting one, you will automatically create that visualization. This "show me" feature is a great way to start understanding how the Tableau interface works.

Choosing the simplest table option in the top left-hand corner of the show me pane creates a table showing the count of each type of parking ticket:

To create a second worksheet, click on the "new worksheet" tab directly to the right of the "sheet 1" tab at the bottom of the screen. At any time, you can right click on the title of a worksheet (on the tab) and rename it.

Using the same two fields, offence and number of records, we can use the show me feature to create a treemap, which shows boxes of different sizes, proportionate to the number of tickets of each type, as in the next illustration:

The treemap shows, rather dramatically, how many expired meter tickets are issued in Saskatoon, as compared to other types of tickets.

Tableau Public's most impressive ability is to combine two more worksheets into what it calls a new dashboard.

To begin, click on the "new dashboard" tab at the bottom of the screen. That will open a blank dashboard, like this:

To begin creating your dashboard, drag the worksheet names from the top left hand corner of the dashboard screen into the blank central area. If you choose "floating" as the layout for new objects in the dashboard, you will be able to arrange the elements to your liking. If you choose tiled, as we are going to do, they will automatically position themselves. To change the size of the dashboard, in pixels, change the values in the lower left-hand corner (the values seen there will always be for the currently selected part of the dashboard screen).

The following image shows the dashboard with our two elements added in tiled arrangement.

The next step is to add a filter. A filter allows the user to manipulate the dashboard so elements change according to the user's preferences. The simplest way to create a filter is to click on the small arrow at the top right hand corner of one of the worksheets within the dashboard.

That will bring up this dialogue:

Click on Quick Filters, and choose the field you wish to filter on:

A new filter will be created, as in the next illustration.

The default type of filter is checkboxes, but this can be changed by clicking on the small downward arrow at the top right-hand corner of the filter, and choosing another style.

The table can now be filtered down to any one value in the offence field.

The real magic of Tableau is the ability of the program to apply a filter globally, so it affects not just the worksheet in which the filter was created, but all the others as well.

Again, click on the small downward arrow at the top right of the filter tile, and choose Apply to Worksheets>All Using this Data Source.

The filter will now apply to all of the objects in the dashboard, in this case to the table and to the treemap.

Before completing your dashboard, you can customize it. For example, by clicking on Dasbhoard>Show Title on the main menu, you can add and edit a title for the worksheet.

When you have completed your visualization (and this is a very simple one!), you can save your file to the Tableau Public server. In this free version of Tableau, you cannot save to your desktop.

To save the file you will need a free Tableau Public Account, and you will be prompted to create one if you don't have one.

First, choose Save to Web in the file menu:

You will then be presented with the sign-on/create new account dialogue:

Once you are logged on, you will be prompted to choose an online name for your visualization.

If you wish to have the sheets in your worksheet appear as tabs, click the checkbox. Then, click Save, and you will soon see a preview of your finished visualization.

As with Fusion Tables, Tableau provides both an embed link to put in an email or other message, and an embed code to embed your visualization in a website. Choose the one that works best for you, and check with your web service provider if you have problems.

Tableau Public can do much more than we have shown here, and as well as creating worksheets using the Show Me feature, you can create them manually by manipulating dimensions and measures. Tableau Software has excellent online documentation if you wish to explore further.


Utility Software

Data journalism calls on a great many software tools to help prepare data for analysis. Some of the most popular tools, and their uses, are listed here:

  • Cometdocs. An online service for converting tables of data in PDF files into data that can be analyzed using a spreadsheet or other data-manipulation program. (free, with paid accounts for more frequent conversions)
  • Outwit Hub. A program that can be used to harvest non-structured data from the web, so it can be analyzed. (Paid software with a free trial)
  • OmniPage. An optical character recognition program used to convert image files of data into editable spreadsheet files. (Paid software)
  • Abbyy Finereader. Another OCR program. (Paid software). Used to "clean up" inconsistent or "dirty" data before analysis can begin.