Things have been slow around here lately due to some other life circumstances, so I haven’t gotten back to the Brandon Webb analysis, nor have I posted some of my other work.

I’ve been fiddling with some ideas from Dr. Alan Nathan’s paper on modeling pitch trajectories from a Jon Lester start, but I haven’t had the time to get them into the final format or chase down all the loose ends, of which there are many. Josh Kalk encouraged me to post my preliminary work, anyway, for both my own benefit and that of the analysis community, and he has a good point.

So what you have here are some of my experimental ideas tested out on Boston Red Sox pitcher Jonathan Papelbon. I originally started this expedition in search of his so-called “slutter” (slider/cutter hybrid), and it may well be in the data, but one of the loose ends is that I haven’t differentiated his harder-thrown pitches yet. It seems like a pretty big loose end to not have accomplished my main goal, but perhaps more important than whether Papelbon throws a slutter and whether I can identify it from the PITCHf/x data is the detour I followed in the process of looking.

In Dr. Nathan’s paper, he solves the equations of motion for each pitch Jon Lester threw and uses this data to determine the spin direction and spin rate of the baseball. Traditionally (if something in its first year of existence can have a tradition), people have classified pitches using the speed, horizontal break, and vertical break from the PITCHf/x data. Dr. Nathan’s method holds promise as an alternate viewpoint on the data, classifying a pitch based on what the pitcher does to the baseball as opposed to classifying it based on how the pitched baseball moves from the perspective of the batter/catcher/umpire. In addition, Dr. Nathan pointed out to me that the spin direction and spin rate are independent of the y0 “release point” measurement distance, which has been set first at 55 feet, then 40 feet, and now 50 feet during different parts of the season. Since the spin parameters are invariant across y0, it’s much easier to classify pitches for a whole season’s data set, with only the start_speed needing to be adjusted based on changes in y0.

The method presented in the paper is based on fitting the pitch trajectory to the data using iterative numerical analysis. However, Dr. Nathan also presents some approximations for the Magnus force and drag force which he says yield comparable results to his more accurate method. I took these approximations and solved for the spin direction and spin rate. I then applied these new simpler equations to the same Jon Lester data and confirmed that I achieved very similar results to those presented in the paper.

Spin direction (in degrees)

then add 270 degrees if ax<0 or 90 degrees if ax>0.

Spin rate (in rpm)

(I’m in the process of figuring out how to use LaTeX in WordPress to make the formulas look nice.)

Next, I decided to test these equations out on Papelbon to see if I could classify his pitches. Here we come to another loose end. Graphs with angles beg to be put on polar plots, for more reasons than one. Excel does not do polar plots without jumping through a lot of hoops, this another reason I turned to R for graphing. However, I haven’t learned how to do polar plots in R yet. So, I present these Papelbon graphs in Excel and apologize for the poor formatting. It obscures some of the information I want to communicate about the value of classifying pitches by spin direction, but I think I can still make my point.

First we have pitch speed plotted versus spin direction. Note that this data set includes all three possible y0 values, yet the pitches are grouping nicely.

I have normalized the start_speed values in this graph to compensate for y0 values different than 50 ft. The normalization factor was based on extrapolating or interpolating to 50 ft based on the start_speed and end_speed values and assuming the y deceleration due to drag was constant.

Also, I cut the graph off at 300 degrees in order to better see most of the data. In doing so, I excluded two outlier pitches at around 350 degrees with a speed of 97 mph. I think there is some sort of error in the data with those two pitches, but if you run the analysis yourself, you’ll see them there.

Compare that to the mess we get when we plot pitch speed versus horizontal break, once again including data from all three of the y0 values.

A typical technique to clean up the speed versus horizontal break graph, besides separating the data by y0 measurement points, is to plot the vertical break, either on the same graph as a second set of points, or to make a separate graph showing vertical break versus horizontal break and indicating speed groups by color. Ignoring the y0 problem for the moment, that’s usually sufficient to tease out the different pitch types, although rising and sinking fastballs can still be tough to differentiate. However, using our first graph, we don’t have to do any of that.

We also have spin rate data, but I found, at least for Papelbon, that it has a high correlation with pitch speed (R^2=0.68).

The correlation isn’t perfect, but it’s high enough that I didn’t find much helpful information in the way of classifying pitches by looking at spin rate versus any of the other parameters on the whole. However, spin rate can sometimes be helpful in deciding how to classify an individual pitch on the borderline between groups in the speed versus spin direction graph.

Intuitively this conclusion makes sense to me when I think about how a fastball is thrown. The harder you throw it, the more backspin you will put on the ball.

There are (at least) two other avenues for exploration in terms of pitch classification. One is separation of the data by handedness of the batter. When you do this for Papelbon, you can clearly see that there are some pitches he prefers to throw to righties and some to lefties.

I haven’t spent much time actually trying to figure out which pitches are which, so the story isn’t quite as fun. (Is there a slutter in there?) Hopefully I can come back and do that. Nonetheless, I think the main point is made.

The remaining avenue for investigation is classifying pitches by “release point”, which is really just the initial measurement point for the PITCHf/x system. The following graphs show z0, the vertical distance from the ground to the pitch, versus, x0, the horizontal distance from the centerline of home plate, with the negative direction to the catcher’s left. Here I found it interesting to see the graphs separated by y0. Measured at 55 feet from home plate, you can hardly tell anything apart.

When y0 is set to 50 feet, we can start to see some separation into groups. I haven’t correlated these groups to the groups we saw above (another of those niggling loose ends), but clearly there’s something worthwhile in this data. The different pitches are starting to move a little differently and we can pick that up a few feet out of the pitcher’s hand.

When we go to 40 feet, the separation of the groups gets a little better, but it appears that groups might also be getting a little smeared or stretched out. I don’t know if that was one of the factors in MLB/Sportvision settling on the 50-foot measurement distance, but in any case it works out well for us in this instance.

So, there you have it. There are a lot of loose ends yet to chase in this little voyage through Jonathan Papelbon’s repertoire. Hopefully there is also something useful to be gained from what I’ve found so far.

Please feel free to comment either on the work I’ve presented, on the suggestions I’ve made for further investigation, or on your own ideas for what can be done with this data.

Part 2 of the Papelbon article can be found here.

September 7, 2007 at 7:57 pm

Mike…on your last set of plots, what parameter is on the horizontal axis?

September 7, 2007 at 8:19 pm

The horizontal axis is x0, and the vertical axis is z0. I’ll note that in the text. Thanks.

September 7, 2007 at 8:25 pm

Never mind my last comment. I guess you are plotting z0 vs x0. Note that when you plot with y0=55, the points are all bunched together, meaning that there is not much scatter in the “actual” release point (i.e., the location when the ball actually leaves the hand). When you take y0 closer to home plate, the location of the pitches spread out, depending on the speed, spin, spin axis, etc. If you compute for y0=1.416, you are getting the location at home plate. It is probably not correct to refer to this as “release point” for anything other than y0~55.

When I looked at spin vs. v for Jon Lester, I didn’t see the large correlation. I think the difference is the Paps is mostly throwing hard stuff, and Lester throws a lot of off-speed stuff. For fastballs and sliders, I would expect a correlation between speed and spin, but not so for a curveball (which has more spin than you would expect by simply extrapolating to lower velocity from the fastball).

I would be interested in hearing other opinions about this.

September 7, 2007 at 8:57 pm

I forget sometimes and play fast and loose with the term “release point”. When we started out the year, we were at y0=55, so that parameter is stuck in my head with the name release point, even though that’s not really what it is anymore. I tried to clarify that in my post.

You’re probably right about the high correlation between speed and spin rate being restricted to fastballs and sliders. If you look at Papelbon’s graph, the correlation is better for speeds greater than 90 mph, and below that line it’s a pretty loose correlation.

September 7, 2007 at 11:49 pm

Alan, as I thought about this a bit more, the reason I am excited about using just speed and spin direction to make one illuminating graph is that it’s usually the fastballs and sometimes the hard breaking pitches that are toughest to tell apart. Changeups and curveballs are, generally speaking, pretty easy to identify.

I’m not saying that spin magnitude is useless information. Like a lot of other parameters, it can be valuable in any number of situations, and is certainly part of building the whole picture about a pitcher. But in looking for THE ONE GRAPH to sum up a pitcher, so far speed versus spin direction looks like the best candidate.

September 8, 2007 at 7:23 am

Excellent work here Mike. I am really glad you published it. One thing to think about is to make your plots just for games at Fenway so you don’t have to worry about the error from park to park. You might find that the correlation actually improves for spin rate and speed for Papelbon at least.

Then if it isn’t to hard to make a plot like that for Zito or Rich Hill you should be able to answer the curve ball spin rate question.

September 8, 2007 at 11:43 am

Mike…I don’t disagree with you at all (nor did I mean to give the impression that I am disagreeing). Some additional points:

1. I use Kaleidagraph to do all my plotting, and polar plots are possible with it. You could also probably figure out how to do it using Excel if you write the appropriate macro (but I have not figured out how to do that).

2. I don’t know anything about WordPress. There are LaTeX to html converters, but I doubt they are very good. I am a big user of LaTeX for all my scientific writing. If you figure out how to do web articles in LaTeX, let me know.

May 9, 2008 at 8:34 pm

[…] and “spin”, the latter calculated according to Fast & Nathan’s equation (see here for more.) I coded these pitches by their MLB classifications, and by the outcome of the pitch […]

April 10, 2010 at 4:58 pm

[…] spin that the pitcher is putting on the ball. For more on precisely how it’s calculated, see here. Liriano’s three pitches form clear clusters: the fastballs on top, the changeups directly […]