Things have been slow around here lately due to some other life circumstances, so I haven’t gotten back to the Brandon Webb analysis, nor have I posted some of my other work.

I’ve been fiddling with some ideas from Dr. Alan Nathan’s paper on modeling pitch trajectories from a Jon Lester start, but I haven’t had the time to get them into the final format or chase down all the loose ends, of which there are many. Josh Kalk encouraged me to post my preliminary work, anyway, for both my own benefit and that of the analysis community, and he has a good point.

So what you have here are some of my experimental ideas tested out on Boston Red Sox pitcher Jonathan Papelbon. I originally started this expedition in search of his so-called “slutter” (slider/cutter hybrid), and it may well be in the data, but one of the loose ends is that I haven’t differentiated his harder-thrown pitches yet. It seems like a pretty big loose end to not have accomplished my main goal, but perhaps more important than whether Papelbon throws a slutter and whether I can identify it from the PITCHf/x data is the detour I followed in the process of looking.

In Dr. Nathan’s paper, he solves the equations of motion for each pitch Jon Lester threw and uses this data to determine the spin direction and spin rate of the baseball. Traditionally (if something in its first year of existence can have a tradition), people have classified pitches using the speed, horizontal break, and vertical break from the PITCHf/x data. Dr. Nathan’s method holds promise as an alternate viewpoint on the data, classifying a pitch based on what the pitcher does to the baseball as opposed to classifying it based on how the pitched baseball moves from the perspective of the batter/catcher/umpire. In addition, Dr. Nathan pointed out to me that the spin direction and spin rate are independent of the y0 “release point” measurement distance, which has been set first at 55 feet, then 40 feet, and now 50 feet during different parts of the season. Since the spin parameters are invariant across y0, it’s much easier to classify pitches for a whole season’s data set, with only the start_speed needing to be adjusted based on changes in y0.

The method presented in the paper is based on fitting the pitch trajectory to the data using iterative numerical analysis. However, Dr. Nathan also presents some approximations for the Magnus force and drag force which he says yield comparable results to his more accurate method. I took these approximations and solved for the spin direction and spin rate. I then applied these new simpler equations to the same Jon Lester data and confirmed that I achieved very similar results to those presented in the paper.

Spin direction (in degrees) $\theta=\arctan(\frac{az+32.174}{ax})\cdot\frac{180}{\pi}$
then add 270 degrees if ax<0 or 90 degrees if ax>0.

Spin rate (in rpm) $\omega= \frac{\sqrt{ax^2 + (az + 32.174)^2}}{ |vy0|\cdot0.121\cdot0.00544\cdot2\pi}\cdot60$

(I’m in the process of figuring out how to use LaTeX in WordPress to make the formulas look nice.)

Next, I decided to test these equations out on Papelbon to see if I could classify his pitches. Here we come to another loose end. Graphs with angles beg to be put on polar plots, for more reasons than one. Excel does not do polar plots without jumping through a lot of hoops, this another reason I turned to R for graphing. However, I haven’t learned how to do polar plots in R yet. So, I present these Papelbon graphs in Excel and apologize for the poor formatting. It obscures some of the information I want to communicate about the value of classifying pitches by spin direction, but I think I can still make my point.

First we have pitch speed plotted versus spin direction. Note that this data set includes all three possible y0 values, yet the pitches are grouping nicely.

I have normalized the start_speed values in this graph to compensate for y0 values different than 50 ft. The normalization factor was based on extrapolating or interpolating to 50 ft based on the start_speed and end_speed values and assuming the y deceleration due to drag was constant.

Also, I cut the graph off at 300 degrees in order to better see most of the data. In doing so, I excluded two outlier pitches at around 350 degrees with a speed of 97 mph. I think there is some sort of error in the data with those two pitches, but if you run the analysis yourself, you’ll see them there.

Compare that to the mess we get when we plot pitch speed versus horizontal break, once again including data from all three of the y0 values.

A typical technique to clean up the speed versus horizontal break graph, besides separating the data by y0 measurement points, is to plot the vertical break, either on the same graph as a second set of points, or to make a separate graph showing vertical break versus horizontal break and indicating speed groups by color. Ignoring the y0 problem for the moment, that’s usually sufficient to tease out the different pitch types, although rising and sinking fastballs can still be tough to differentiate. However, using our first graph, we don’t have to do any of that.

We also have spin rate data, but I found, at least for Papelbon, that it has a high correlation with pitch speed (R^2=0.68).

The correlation isn’t perfect, but it’s high enough that I didn’t find much helpful information in the way of classifying pitches by looking at spin rate versus any of the other parameters on the whole. However, spin rate can sometimes be helpful in deciding how to classify an individual pitch on the borderline between groups in the speed versus spin direction graph.

Intuitively this conclusion makes sense to me when I think about how a fastball is thrown. The harder you throw it, the more backspin you will put on the ball.

There are (at least) two other avenues for exploration in terms of pitch classification. One is separation of the data by handedness of the batter. When you do this for Papelbon, you can clearly see that there are some pitches he prefers to throw to righties and some to lefties.

I haven’t spent much time actually trying to figure out which pitches are which, so the story isn’t quite as fun. (Is there a slutter in there?) Hopefully I can come back and do that. Nonetheless, I think the main point is made.

The remaining avenue for investigation is classifying pitches by “release point”, which is really just the initial measurement point for the PITCHf/x system. The following graphs show z0, the vertical distance from the ground to the pitch, versus, x0, the horizontal distance from the centerline of home plate, with the negative direction to the catcher’s left. Here I found it interesting to see the graphs separated by y0. Measured at 55 feet from home plate, you can hardly tell anything apart.

When y0 is set to 50 feet, we can start to see some separation into groups. I haven’t correlated these groups to the groups we saw above (another of those niggling loose ends), but clearly there’s something worthwhile in this data. The different pitches are starting to move a little differently and we can pick that up a few feet out of the pitcher’s hand.

When we go to 40 feet, the separation of the groups gets a little better, but it appears that groups might also be getting a little smeared or stretched out. I don’t know if that was one of the factors in MLB/Sportvision settling on the 50-foot measurement distance, but in any case it works out well for us in this instance.

So, there you have it. There are a lot of loose ends yet to chase in this little voyage through Jonathan Papelbon’s repertoire. Hopefully there is also something useful to be gained from what I’ve found so far.

Please feel free to comment either on the work I’ve presented, on the suggestions I’ve made for further investigation, or on your own ideas for what can be done with this data.

Part 2 of the Papelbon article can be found here.