Baseball: The Sport for Geeks

Baseball: The Sport for Geeks

October 9, 2017 1 By Eric Shanks

Geeks and sports just don’t mix. Well, thats not really true, but seems to be the stereotype that I’m accustomed to hearing. If you’re good with computers, or like science, then you probably don’t get, or don’t like sports. But here’s another crass generalization that I’ll make with absolutely no statistics to back it up: Baseball should be the sport that geeks gravitate towards.

It’s a Giant Algorithm

One of the knocks I hear about baseball is that the game is just too slow. It is in fact a slower paced game than basketball, hockey, soccer or really any sport that uses a game clock. But that’s what geeks should love about the game. It’s a game of anticipation for what’s going to happen next. But the list of things that will happen during any play is pretty small and they’re all based on “IF / THEN” rules just like in computer science. Let me give you an example.


(Inning: Top of the 5th, Score: 2-1 in your favor, Location: Home Team, Situation: {Runner on 1st, 1 out, 3 hitter up})

So we’ve got a set of inputs (In real life there is more to consider but this should suffice for an example) to our algorithm. Each of the position players would have a list of “IF / THEN” rules that they’d need to be aware of before any play happens. Let’s look at what some of them would look like. The diagram below shows what two fielders would need to consider before a ball is ever thrown to the hitter.

Now once the ball is hit and you know what to do, it comes down to athletic ability to catch, throw, run, etc but the game itself can be thrown into a giant case statement or select statement as you can see from the diagram above.

Look at the Data Mining

OK, so if the baseball algorithm above doesn’t convince you, how about a look at the crazy amount of data that is stored, captured and analyzed for baseball games.

Box Scores = Audit Summary

At the end of every baseball game there is a box score. The box score totals up everything that happened for each of the players who played during that game. The image below is an example of what a box score would look like. These box scores take into account the number of hits per at-bat calculated as a batting average, totals up strike outs, home runs, runs batted in and several other common statistics. These stats are aggregated every day and a player will have a running batting average and total home runs for the year for instance. These box scores serve as a log of the important statistics from each game sort of like an aggregated audit log would show you what had happened to a computer system over a certain time period.

The statistics don’t stop here though. Baseball in particular has gone nuts with the statistical information collected. How does a pitcher do against a certain team? How about against left handed hitters? How about left handed hitters during the month of June? How about left handed hitters on a team in the month of June with less than two outs and a runner in scoring position? Think of all the data that is generated throughout a season.

Teams have even started looking for important metrics that don’t show up in a box score known as “sabermetrics”. These metrics look at the metrics most important to win a game such as on-base percentage and wins above replacement.

Physics data

Over the past few years baseball had a need (really baseball fans just love stats and want more) to collect even more data including physics of what’s happening. Baseball is using technology to calculate the trajectory of a baseball so they can call balls and strikes with a computer. Once they got that technology pretty well set, they started offering viewers with statistics such as pitch rotation (how fast a ball spins – usually this explains how much a curveball will break), exit velocity (how fast a ball leaves a hitter’s bat), and launch angle (shows at what angle a ball leaves a hitter’s bat).

Managers Just Turn the Nerd Nobs

As technologists, we like to tweak things to see if we can squeeze out just a little bit more performance out of our systems. This is the exact same job as a baseball manager. They can take the monitoring data that baseball collects and see which players might have the best advantage against a particular team or opposing pitcher that day. Baseball is all about averages in the sense that a .300 hitter is considered very good, but also means that they fail to get a hit 70% of the time. But one player might hit slightly better against a different type of pitcher and managers will play those odds to try to find a favorable advantage.

This has gotten really incredible over the past decade. Managers have “spray charts” (a chart that shows where hitters most often hit the ball) and will reposition their fielders in non-standard baseball positions.

For instance the shortstop might play to the right of second base if a hitter usually hits the ball to that side of the field. It leaves a big empty spot on the left side of the field, but managers are willing to take that chance based on the data.


All of these decisions that you have time to consider during a slow moving baseball game is what makes the game so great. Fans at home get to second guess these decisions and make their own predictions about what might happen. Not only that but the data available about players lets you argue about which players are better and who’d perform the best under certain situations. If you’re a technologist that uses math, statistics, and algorithms on a daily basis, how do you not love a game like this?