Visualizing the Chicago Cubs via Amazon QuickSight
May 14, 2018If you’re interested in visualizing your data in easy to display graphs, Amazon QuickSight may be your solution. Obviously, Amazon has great capabilities with big data, but sometimes even if you have “little” data you just need a dashboard or way of displaying that content. This post shows an example of how you can display data to tell a compelling story. For the purposes of this blog post, we’ll try to determine why the Chicago Cubs are the Major League’s favorite baseball team.
Creating an Amazon QuickSight Account
Amazon’s QuickSight can be accessed through your existing AWS console, but when you sign up for an account you’ll notice that it redirects you to a new portal. Login to your AWS Console and look for QuickSight.
You’ll notice that QuickSight requires you to sign up for a QuickSight account. So this is a bit different from the other services that AWS provides.
This you’ll pay on a monthly basis when you create an account. This isn’t an on-demand type service where you pay for what you use. There are two options when you create an account, Standard and Enterprise and the details for those are found in the screenshot below. This blog post uses Standard cause I’m an Architect on a budget!
Once you pick your edition you’ll setup your QuickSight account information. Give it a name and notification address as well as selecting regions. You can also allow QuickSight to look at your data across RedShift, S3 etc so that you have datasets that can immediately start helping you.
Once you’ve got your account setup, you’re ready to start uploading data.
The Data Sets
Now, before you can visualize anything, it has to be based on some data. Duh, right? Amazon will give you some datasets and analysis to use right out of the starting gate, so you can see what’s possible. To do anything really useful though, you’ll want to use your own data sets to do some analysis.
QuickSight gives you a few options for data sources such as using social media or public data sets. The data sets portal shows you an example list of data sources that can immediately get you started. How cool is it that you can connect QuickSight to your Github repo to get some analytics about whats happening?
For the purposes of this post, I’ve decided to upload my own file, which I’ve downloaded from data.world. This file includes information about MLB baseball games from 2016. I’ve uploaded the CSV file through QuickSight’s interface but you can also upload TSV, JSON, or XLSX files as well as ELF/CLF for log files.
Once the data has been uploaded, you can do your fancy visualizations.
Visualizing your Data
In the QuickSight console, you can click the “New analysis” button to get started.
The first step to creating an analysis is to select the data set. This should be the data that you just uploaded or configured in the previous section.
After the data is imported, you can select the “Create Analysis” button.
Once you’re in the analysis dashboard you’ll see that on the left hand side, you can drag and drop your fields, filter the fields and change the visualization types for your analysis. Adding fields to your analysis is as easy as dragging and dropping your fields onto the graph.
Now you can carve up your data in any way that you see fit, but I chose to look at some interesting data to see how beloved my Cubbies really were. To start, I looked at the attendance for the away teams. My theory was, that home attendance would give you some great information about teams that people liked, but also had a problem where the size of the stadium factored in, and the social aspects of baseball that had nothing to do with the teams playing. Going to the ball park for a business event or something to that effect. The attendance for the away teams might be a better representation of who the fans wanted to see play. I’m sure that no one here doubts the results of that visualization.
Partially so I could use another visual type, and partially to put a common misconception to bed, I looked at the wind direction for the Cubs home games. It’s often been said that the Chicago Cubs hitters have a huge advantage because of how the Chicago winds carry the baseball out of the park (for a home run) more than other teams. So if we look at the wind direction per game, you’ll see that most of the time the wind is blowing in from Right field, or moving from right to left, which means that many times it would be harder to hit a home run at Wrigley Field. If you’re a left handed pull hitter, you’ll likely have to hit into the wind most days. Maybe Wrigley isn’t a hitters park after all???? I’m just kidding, it is a home run park due to the power alleys but this graph still seemed fun.
Also, if you have a bunch of graphs that you want to display at once, you can add multiple visuals and then share that out with your team. Here I’ve added three visualizations.
After which I can share them with whomever I’d like.
Creating a Story
One of the coolest things about QuickSight is the ability to tell a story. You can add multiple visualizations and have them played in a specific order so that they explain a story. As you see below I’ve taken three different visualizations and saved them as a story.
If I play them, they show up like a slide show where my reviewers just click “Next” and they go from one slide to another. If I’ve done a great job with this, my reviewer should notice that the Chicago Cubs are clearly the worlds most favorite Major League Baseball Team.
Summary
OK, the Cubs are great, but the real point of this post was to get you familiar with just a few of the things that you can do with AWS QuickSight. Being able to visualize your data sets quickly can be a huge boost to many organizations. Are you profitable? Are you reaching your social media audience? Whatever your needs, QuickSight can show you some quickly digestible information about your data. Set it up with your data sets once and check in often to see how things change, or build it once for a report and share it with your teams. What will you do with this service?