### September 2007

There’s some very informative commentary from Ike at his blog on how the reconstruction algorithm of PITCHf/x works and how that affects measurement error in the data.

He also has a couple of previous posts on the PITCHf/x topic.

It’s nice to see a fellow Sooner and, as best I can tell, a fellow OU Physics alum writing on this topic.

Some of the other PITCHf/x analysts out there are looking at the hitting aspects or other things that can be divined from the data, but I’m still quite fascinated by the ability to classify a pitcher’s pitches. I know that’s not the be-all-end-all of baseball or of PITCHf/x, but I’m learning so much about the game from pursuing this angle, I may camp in this corner for a while.

I decided to take a look at one of my favorite players–who happens to have had quite a resurgence in the second part of this year–Royals’ former and once-again wunderkind Zack Greinke.

I split the data into three parts. The first part was his first seven starts this year, in which he compiled a 5.71 ERA on the strength of 49 hits, 11 walks, and 19 strikeouts in 34 2/3 innings. This performance resulted in his banishment to the bullpen in hopes of salvaging something from 2007 for Greinke. We only have one start recorded in PITCHf/x from this period. In this start, his fastball was recorded at 87-93 mph. Other than noting that fact, I’ve chosen to ignore the rest of the data from this start.

The second part was his relief performance, which lasted from May 10 to August 20, and for which he have 295 pitches recorded by PITCHf/x. In 38 relief appearances, he compiled a 3.54 ERA on the strength of 43 hits, 15 walks, and 55 strikeouts in 53 1/3 innings. In the data we have, his fastball as a reliever ran in the 92-98 mph range. Here’s the speed versus spin direction chart:

As a reliever, he threw 67% fastballs, 24% sliders, 5% changeups, and 4% curveballs. The fastball and changeup groupings are pretty obvious. I used the spin rate parameter to help me separate the sliders and curveballs. I won’t reproduce that graph for his relief outings, but suffice it to say that the curveballs are the pitches with slower speed, higher spin rate, and lower spin direction.

What’s interesting to me is comparing the reliever graph to the same graph for his return to the starting rotation, which began on August 24. We have PITCHf/x data for four of his five starts since then, missing only his September 15 start at Cleveland. In all five starts, he’s compiled a 1.71 ERA on the strength of 21 hits, 8 walks, and 15 strikeouts in 21 innings.

His fastball still has a lot of pop in the 91-97 mph range. It will be interesting to see if he can keep that life on his fastball as he stretches out beyond 4 or 5 innings at a time. He’s also using his changeup a bit more: 75% fastballs, 15% sliders, 8% changeups, and 3% curveballs.

Here’s the spin rate vs. spin direction graph for his starter outings. I’ve labeled the x axis to show how the spin direction corresponds to break to a right-handed hitter. A pitch that broke straight down would have a spin direction of 0 degrees, break away from a righty corresponds to 90 degrees spin direction, break up (or a “rising” fastball) corresponds to 180 degrees, and break in on a righty’s hands corresponds to 270 degrees. So the spin direction tells us which way the pitch will break, and the spin rate tells us how much the pitch will break.

Here’s the vertical break vs. horizontal break displayed in inches.

I’m excited to see Zack Greinke back on top of his game, and I hope he can stay there for years to come.

So far we’ve classified the pitches for four pretty orthodox pitchers: Joba Chamberlain, Jonathan Papelbon, Edinson Volquez, and Greg Maddux. It’s time to try someone a little more novel.

Like Josh Kalk, I’ve been thinking about ways of automating the pitch classification process, although he’s much further along that process than I am. One thing I have wondered is whether such a system can ever be developed that will handle all pitches from all pitchers, or whether we will have to restrict it to mainstream pitchers and pitches only. So I’ve been thinking about pitchers who might be exceptions to various rules that seem to apply well to everyone else.

Take Chad Bradford, for instance. Could a pitch classification system handle a pitcher who throws underhand? He plays in a home park, Baltimore, that doesn’t have the PITCHf/x system running, but we have 188 pitches recorded for him in other parks, and that seems to be enough to figure out his repertoire.

The scouting information on Bradford couldn’t be much muddier for a guy who doesn’t throw very many different pitches and uses the same basic delivery for all of them. One scouting report says he throws a fastball, a slider, and an occasional changeup. The Sporting News says he “throws in the high 80s and has a solid changeup and curveball.” Another article (of which I seem to have lost track) said he threw a two-seam sinking fastball. The Washington Post says he throws an 83-mph fastball, a slider, and a changeup. The Post article is well worth reading if you’re interested in Bradford. Everyone seems to agree he has three pitches, but they can’t agree what they are. Surely we can tell if we dive into the data we have.

Let’s start with what is becoming my traditional pitch classification graph, pitch speed versus spin direction. (As usual, I have normalized the start_speed parameter to y0 = 50 feet.)

I’ve not labeled the slow breaking pitch in the graph because, as we will discuss later, it’s name is not so clear. I have color-coded his three pitches and circled the groupings, and now I’ll go through the process I used to determine those groupings and the pitch types.

There are two pretty obvious groupings, one containing the fastballs and the other some sort of slower breaking pitch. There are several questions to answer. First, are the multiple pitches hiding in the cluster on the upper right, and if so, what are they? Second, what is the 65-70 mph pitch? None of my usual secondary graphs were very illuminating on these questions.

To help answer the first question, I went to another pair of graphs I like to see, the same speed vs. spin direction plot from above but split out for right-handed and left-handed hitters.

The first thing I notice is that lefties don’t get many of the slow breaking pitches in the lower left corner of the graph, and righties don’t see pitches slower than 76 mph on the right side of the graph. That implies that there are at least two distinct pitches in our grouping on the right side, since I see no reason to believe Bradford would purposefully throw his already-slow fastball even slower to lefties on occasion. Most probably this is a changeup. We can’t tell from this graph whether righties also see some changeups from Bradford or whether 76 mph is really the cutoff between fastballs and changeups.

For that, we move to another plot that exposes a limitation in our data. This graph shows Bradford’s pitch speeds (normalized to y0 = 50 feet) throughout the season, recorded when the Orioles visited a park equipped with PITCHf/x and Bradford happened to pitch. The x-axis lists the pitch id number from my database, but that corresponds closely to time, with pitch #1 at the beginning of the season and pitch #637,220 occurring on September 12.

Here we can see on a game-by-game basis that Bradford clearly throws three different speeds of pitches. Our overall speed data is being clouded by two games in Boston on July 31 and August 1, with pitch speed on average 3.5 mph slower than in other parks. Fenway Park’s PITCHf/x system is the source of all sorts of measurement errors, so this does not come as a surprise.

Now that we’ve separated the three groups of pitches, can we tell what they are? First, it’s reasonable to classify the mid-speed pitch as changeup since it has similar spin to the fastball only thrown a little slower. But is the fastball a regular four-seam fastball or a two-seam “sinking” fastball as some people suggested? And is the breaking pitch a slider or a curveball?

As an aside, in terms of pitch classification algorithms, do we really care what the names of Bradford’s pitches are or how he pronates his wrist? Shouldn’t a mathematical description of how the pitch moves be sufficient? I think the answer to the latter question is probably yes, but pitch classification can be interesting beyond just the search for a universal classification system. It’s interesting to learn about a particular pitcher’s approach, and for that description, it helps to know what the pitcher is attempting from his perspective.

How do we determine whether a fastball is a four-seamer or a two-seamer? The classic four-seamer, if thrown from the 12 o’clock position, would have only backspin, which would show up on our graph as a spin direction of 180 degrees. If the pitcher drops down to a 3/4 delivery, the four-seamer gets a little sidespin component, and the spin direction shifts to the neighborhood of 210 degrees. For example, we saw Volquez’s four-seam fastball right in this area, in the range from 200-220 degrees, and Papelbon’s four-seamer was in a similar range of spin directions, 195-225 degrees.

The two-seam fastball is thrown with the fingers along the seams, with the middle finger applying pressure to the ball to produce sidespin. If the two-seamer were thrown from the 12 o’clock position, we would expect to see a spin direction greater than 180 degrees by an amount dependent on how much sidespin the pitcher applied to the ball. From a 3/4 delivery, the spin direction would shift over to a greater angle by another 30 degrees or so. Greg Maddux’s two-seamer had spin directions in the range 215-265 degrees (a wide range consistent with his reputation of varying the movement on his fastball), Papelbon’s two-seamer was at 210-235 degrees, and Volquez’s two-seamer was at 220-245 degrees.

For an pure underhand pitcher with a delivery from 6 o’clock, the classic four-seam fastball with pure backspin (now switched to pure topspin by the change in delivery) would have a spin direction of 0 degrees (equivalent to 360 degrees). For a more realistic delivery from 5 o’clock, which appears to be consistent with pictures I can find of Bradford’s motion, the spin direction would shift back by about 30 degrees to 330 degrees. If Bradford were applying pressure to the ball to produce the sidespin of a two-seamer, that would tend to move the spin direction back toward 360 degrees because of the direction that the human wrist pronates. Instead, we see a spin direction of 295-330 degrees, consistent with a delivery between 5 and 6 o’clock and little or no sidespin applied to the ball. Therefore, I conclude he is throwing a four-seam fastball.

This conclusion squares with one we could have made via logic alone, without regard to the data. Pitchers generally attempt to throw their fastball as hard as they can, which is accomplished with the four-seam grip. They use different grips and accept slower speeds for the purposes of movement or deception. Because of his underhand motion, Bradford gets plenty of sink on his fastball without needing to sacrifice speed to put sidespin on the ball. Here’s the chart of vertical and horizontal movement.

His fastball sinks 5 to 12 inches, which is just incredible–almost as much downward break as a Barry Zito curveball! His fastball also moves in on the hands of a right-hander by 7 to 13 inches, which is comparable to the best two-seam sinkers thrown overhand from guys like Brandon Webb and Derek Lowe. You wonder why right-handers can’t hit this guy? My guess is that’s why. And it’s all attributable to the spin direction, which in turn comes from the underhand delivery, and not to the speed of the pitch, which is only in the low 80’s.

Finally, we come to the last question for this post: Is that a curve or a slider? It has the horizontal break of good curveball, and sometimes a little more. It has a vertical break somewhere between a typical slider and curveball. Which is it? We already know that Bradford’s unorthodox delivery can significantly affect the movement on a pitch. For the answer, we turn back to the spin direction graph. Bradford’s breaking pitch checks in with spin directions in the range of 75-130 degrees. That’s not too different than an overhand curveball delivered from the 1 o’clock position, although not a perfect fit.

A curveball with pure topspin delivered from the 12 o’clock position has a spin direction of 0 degrees. Dropping down to the 1 o’clock position shifts the spin direction to 30 degrees. However, the pronation of the actual human wrist can’t deliver pure topspin to the ball without also imparting some sidespin that pushes the spin direction up to higher angles. For example, Maddux’s curve lands in the 50-100 degree range of spin direction, Volquez throws his curve at 35-80 degrees, and the few curves we saw from Papelbon ranged from 40-80 degrees.

A theoretical underhand 6 o’clock curveball delivered with pure topspin (now switched to pure backspin by the change in delivery) would have a spin direction of 180 degrees. Coming back up to the 5 o’clock position would shift the spin direction back up to 150 degrees. The sidespin imparted by pronation would move us back toward 180 degrees; however, and we don’t see Bradford’s breaking pitch around 180 degrees.

So Joe Slider, our blog turns its lonely eyes to you.
Woo, woo, woo…
What’s that you say, Kerry Robinson?
Slidin’ Joe has left and gone to the plate.
Hey, hey, hey…hey, hey, hey!

Sorry for that, I needed a distraction. This post is getting a little long in the tooth. I now return you to “The Slider Teaches Johnny to Pronate”, already in progress…

A pitcher throwing a typical slider overhanded applies sidespin in the opposite direction of a two-seamer, as if it were trying to become a topspin curveball, which you might say it was. In the process, it also gets a significant spin component around the direction of travel, which we ignore in terms of affecting the break of the pitch. The significant sidespin of the slider would put its spin direction around 90 degrees in theory, 120 degrees if it’s thrown from 1 o’clock, but in reality the pronating wrist ends up turning more toward the backspin of the fastball than the topspin of the curveball, and our typical slider ends up at slightly higher angles than 90 or 120 degrees. Checking back on our previous pitch classifications, a Greg Maddux slider falls in the range of 130-175 degrees, and Papelbon’s slider is in the range 140-190 degrees.

If Bradford were throwing a slider from the 5 o’clock position, at approximately what angle would we expect to see it spin? If he were throwing from the 6 o’clock position and getting pure sidespin on a slider, we’d expect it to have a spin direction of 270 degrees. Moving his delivery toward 5 o’clock moves the spin direction on the slider toward 240 degrees. But then the wrist action tends to pull it back toward the fastball, maybe in the neighborhood of 280 degrees. Huh. We don’t see Bradford’s breaking pitch centered anywhere close to that.

In fact, it’s centered around a spin direction of 100 degrees, closer to the 180 degrees we’d expect from a submarine curveball than the 280 degrees we’d expect of a submarine slider, but still not a great match for the curveball. In fact, it seems to have a strong screwball component, which makes some sense out of the movement we see on the vertical/horizontal break, but makes no sense to me in terms of why Bradford would throw a screwball or screwball-like pitch rather than a curveball/slider. Screwballs are hard to throw; they’re hard on your arm, which makes no sense for Bradford, who already has to deal with back pain from his submarine delivery. Nobody reports Bradford throwing a screwball. Maybe this is because it moves more like a traditional overhand slider or curve, but more likely it’s because he doesn’t throw a screwball.

Maybe some of you can help me by finding an error in my calculations or reasoning or pointing me to a more authoritative scouting report on Bradford. Until then, I’ll have to leave his third pitch as a mystery.

Update: I’ve done more reading on Bradford and the underhand delivery, and I’m finding some support for the idea of Bradford throwing a screwball. Most people don’t call it that because it doesn’t move like a screwball from an overhander, but it appears I may not be as off base as I thought I was.

From the Sporting News: “Chad Bradford is a submariner with a tough slider and a circle changeup. Because of his delivery, Bradford essentially twists his hand to the left on each pitch, turning the ball over–almost a screwball-type action.”

Some of you who know more about pitching motions and grips might be able to make something out of the pictures in Chris O’Leary’s analysis of Bradford.

Once again, Josh Kalk has some good things brewing on his blog. He’s working on a clustering algorithm to distinguish pitch types for all pitchers, and he has player cards up for almost 300 pitchers. He’s seeking input to improve his algorithm from the first pass.

According to Josh, one of the worst performances of the algorithm was for Greg Maddux, so I thought I’d try out my tools on Maddux and see what I found in terms of pitch types.

My conclusion is that Maddux throws mostly two-seam fastballs (67%), a lot of changeups (21%), some cut fastballs (10%), and an occasional slider (1%) and curveball (1%). This more or less agrees with scouting reports, although you can find mention of Maddux throwing just about every pitch under the sun other than the knuckleball. For example, his Wikipedia article will tell you Maddux throws the splitter and the screwball, but I found no evidence for either in the PITCHf/x games in the 2007 season. On to the graphs…

Let’s start with what’s fast becoming my bread and butter, the pitch speed versus spin direction graph. I’ve color-coded the pitches in this graph based on my conclusions from all the data. I don’t claim that they are all easily identifiable based solely on this first graph. (All pitch speeds are normalized to y0 = 50 feet.)

The curveball is the easiest to identify. At 70-76 mph, it is the slowest pitch, and it’s the only one with topspin, with a spin direction of 50-100 degrees.

The slider is also fairly easy to distinguish, although I had to work a little at the exact boundaries between it and the cutter and changeup. The slider runs 79-83 mph, with mostly backspin and a little sidespin, corresponding to a spin direction of 120-180 degrees.

Maddux’s three main pitches are a bit tougher to separate. Let’s start with the changeup, whose most prevalent characteristic is its slower speed 78-83 mph, with similar spin direction to the fastballs. It has a spin direction ranging from 190-270 degrees, from mostly backspin to mostly sidespin.

Next, let’s go to the fastballs, the cutter and the two-seamer. From the speed vs. spin direction graph, you can tell that there are probably two separate fast pitches, but it’s hard to tell exactly where the line between them would go. Also, we can see another interesting fact. Usually the harder fastball is a four-seamer on the left, with more backspin (i.e., closer to 180 degrees), and the slower fastball is a two-seamer on the right, with more sidespin (i.e., shifted somewhat toward 270 degrees). Maddux doesn’t have that arrangement. His harder pitch is on the right, which is where the two-seamer should be. Based on scouting descriptions of Maddux’s repertoire, his main fastball is in fact a two-seamer, and he also throws a cut fastball. A cut fastball is usually a little slower than a four-seamer or two-seamer, so that correlates with our mystery pitch on the left half of the fastball grouping. In addition, the cut fastball typically has some slider-like characteristics, so it makes sense that the cut fastball would be found toward the slider side of the spin direction (i.e., toward 180 degrees and lower angles). So I think there’s good evidence to believe we’ve identified a two-seam fastball and a cut fastball on the graph.

Finding the boundary between the two will take us on a tour through some other graphs which will also help us define the changeup a little better and reaffirm our identification of the slider and curveball.

First, let’s visit an old standby graph, the speed versus horizontal break.

In this graph, the curveball is clearly evident again, as the slowest pitch with the most break away from a right-handed hitter. There is a group of pitches 78-83 mph also with a positive horizontal break, although slightly less break than the curveball. This, of course, is a signature of the slider. The changeup group is pretty readily identifiable here as the pitches 78-83 mph with negative horizontal break (in toward a right-hander). We can see some thinning out between the two-seamer group on the right and the cutter on the left of the fastest pitches. The boundary between the two is very smeared and hard to delineate on this graph. At least some of that smearing may come from the fact that we have three different y0 initial distances represented in our data set, and the closer y0 is set to home plate, the less break we will measure on the pitches.

I don’t see any evidence for a screwball on this graph.  A screwball should be a very slow pitch, like a curveball, but breaking in to a righty, opposite of the curve.  There are no pitches slower than 77 mph on the left side of the graph, hence, no sign of a screwball.

Next on tour comes the speed versus spin rate graph. It turns out that spin rate is a very useful parameter in helping us separate two-seamers from cutters and changeups.

The two-seamer generally has a faster spin rate, mostly in the 1500-2500 rpm range, but with some significant tails at both ends. The cutter and the changeup have slower spin rates, mostly in the 500-2000 rpm range. There is some overlap between the spin rates of different pitches, but it is a helpful tool in our pitch classification tool box.

Another graph we can make is spin rate versus spin direction, and this one is useful mainly for identifying two-seam fastballs that might otherwise look like borderline changeups. I didn’t find it very helpful in telling the other pitches apart.

It’s possible there are some splitters hiding out on the extreme right edge of this graph in what I’ve labelled as changeups. If there was a separate grouping hanging out farther to the right, as Papelbon’s splitter did, I’d tend to believe they were splitters, but at this time, apart from any other evidence, I don’t see a reason to believe they aren’t all just changeups.

Finally, we can look at vertical break versus horizontal break. This is the graph presented in Josh Kalk’s player card for Greg Maddux. You can see the great big blog that his algorithm couldn’t separate. I’ve nicely color-coded the pitches, so you can see there’s some order left-to-right, but they’re still pretty much a mess.

Ugh! Other than the curveball, I wouldn’t try to pick anything out of that graph alone. However, it was useful in identifying a few more two-seamers that were trying to masquerade as changeups.

So that’s the story on Maddux and his five pitches. The speed vs. spin direction graph once again comes through as the star, but this time it needed more help from its friends.

Of course, there are more interesting things in the data, for Maddux or for any pitcher, beyond just classifying their pitch types. Which pitches does Maddux prefer to throw to lefties or righties? Answer: he throws the cutter more to lefties (15%) than to righties (6%), and with righties he relies more on his two-seamer instead. Which pitches does he throw in various counts? How does he locate them? Which pitches get swings and misses and which ones see more contact? Which pitches turn into home runs most often? Et cetera. These will be left as an exercise to the reader.

Or, what the heck, do what I do and move on to another topic before this one is even cold.

I’ve been impressed for a while by the analysis that Steve West does of the Rangers’ pitchers using PITCHf/x data. His recent article about Edinson Volquez made me eager to try out my new spin rate toy on Volquez’s fastballs.

Sure enough, the speed versus spin direction graph clearly shows the difference between the “rising” four-seam fastball and the “sinking” two-seam fastball. It’s a difference that’s visible in the horizontal and vertical break graphs, too, and Steve comments that there may be two separate fastballs hiding in the data, but using spin direction we can see them plain as day.

The four-seamer runs 91-96 mph with mostly backspin and a little sidespin, corresponding to a spin direction of 200-220 degrees. The two-seamer runs 90-95 mph with a larger component of sidespin, corresponding to a spin direction of 220-245 degrees.

This graph also shows the changeup sitting at 80-85 mph with the spin direction varying between mostly backspin (210 degrees) to all sidespin (270 degrees). I left the curveball off this graph so that we could see the difference between the fastballs a little better, but the curve runs 77-82 mph with mostly topspin and some sidespin, corresponding to a spin direction of 35-80 degrees.

The spin rate graph confirms the diagnosis of the four-seam and two-seam fastballs, showing that the two-seamer has slightly slower spin.

Again, I left the curveball off the graph, but its spin rate ranges from 600-1600 rpm, similar to the changeup.

Finally, I want to take a look at the vertical and horizontal break on the pitches and show that our four pitch groupings do show up on that graph, too.

The four-seamer has a vertical break of +8 to +13 inches and a horizontal break of -3 to -7 inches. The two-seamer has a vertical break of +5 to +9 inches and a horizontal break of -7 to -11 inches, consistent with the “sinking” nature of the two-seam fastball.

The changeup has a vertical break of 0 to +5 inches and a horizontal break of -3 to -8 inches, and the curveball has a vertical break of -2 to -7 inches and a horizontal break of +3 to +7 inches.

What little scouting information I could find on Edinson Volquez agreed with the diagnosis of four pitches: a four-seam fastball, a two-seam sinking fastball, a changeup, and a curveball.

I’ve tied up some of the loose ends from my first post on Jonathan Papelbon, and I wanted to share those findings here.

First, I mentioned that polar plots were preferable to standard x-y plots when graphing spin direction, measured in degrees. I’ve decided that is untrue. The polar plot does a better job of conferring the idea that we’re looking at an angle, but the x-y plot does a better job of spreading the data out for examination, and that seems to be more important for pitch classification.

Second, at the time of the first article, I had not classified the major pitch types. Now I have, and what I found was consistent with the scouting reports. Papelbon’s main pitches are his four-seam and two-seam fastballs. Against lefties, he uses his split-fingered fastball as his offspeed pitch, and against righties, he uses the slider as his offspeed pitch, mixing in a few curves and a few splitters (and slutters?).

I didn’t find any strong evidence of a slider/cutter hybrid, the semi-famous “slutter”. I found two pitches sitting on the edge of the slider group that had a much higher spin rate. (Actually, the slider often has a significant component of spin with its axis along the direction of travel, which we cannot measure. These pitches may have the same spin rate as the slider but as sidespin rather than spin along the direction of travel.) These might be the slutter, but with only two instances, I’m hesitant to say.

Without further ado, here is the graph reprised from the previous post, with pitch types marked. Also, pitches to lefties are shown in red, and pitches to righties are shown in green.

The four-seam fastball is thrown the hardest, running 94-99 mph. It has the highest spin rate, about 2800 rpm. This is mostly backspin but also some sidespin that suggests it’s coming from a 3/4 delivery. It rises the most compared to a theoretical pitch without spin, about 12 inches on average, and rides in on a right-handed hitter by about 7 inches.

The two-seam fastball is thrown slightly slower, roughly 91-95 mph, with a slightly slower spin rate, about 2200 rpm. This pitch has about equal amounts of backspin and sidespin. It “rises” a little less than four-seamer, only about 9 inches, and rides in on a righty by about 8 inches.

The split-fingered fastball is thrown about 84-91 mph, and has a much slower spin rate, between 1000-2000 rpm. The splitter has a significant amount of sidespin. Compared to the other fastballs, it has a large drop. Compared to a pitch without spin, it rises about 2 inches and breaks away from a lefty by about 8 inches.

The slider is thrown about 81-88 mph, and has a very slow spin rate in the x-z plane that we measure. This is mostly backspin, but as mentioned before, much of the actual spin is probably around the direction of travel, as is typical for a slider. The slider “rises” by about 5 inches compared to a pitch without spin and breaks away from a righty by an inch or two.

The curveball is thrown 78-81 mph, and has a very slow spin rate of about 600 rpm. This is mostly topspin, with a component of sidespin. The curveball drops a couple inches and breaks away from right-hander by about 3 inches.

The two slutters, if in fact that is what they are, were thrown at 85 and 92 mph, with a spin rate of 2200 rpm. They had mostly backspin, rose 11-12 inches compared to a non-spinning pitch, and broke in on a righty by 0-2 inches.

Finally, moving on to a third topic, I found that the “release point” (initial position) data was not very helpful in classifying pitch types. The groupings we saw in the previous article were due to park variations. The park-to-park dependence washes out anything else. Josh Kalk’s work on correcting for park variance is going to be very important. To drive home that point, here is the initial horizontal position, x0, for all of Papelbon’s pitches since the All-Star Break, when the y0 distance was set to 50 feet. Notice the strong dependence on park above all else.

I’m not completely sure if the data is consistent even within one park. Note how the initial position has increased between the first stint in Boston and the second stint in Boston. It’s possible Papelbon changed his release point during this time, but given the other inconsistencies in the data set, my confidence that it’s not a system change is not very high.

Things have been slow around here lately due to some other life circumstances, so I haven’t gotten back to the Brandon Webb analysis, nor have I posted some of my other work.

I’ve been fiddling with some ideas from Dr. Alan Nathan’s paper on modeling pitch trajectories from a Jon Lester start, but I haven’t had the time to get them into the final format or chase down all the loose ends, of which there are many. Josh Kalk encouraged me to post my preliminary work, anyway, for both my own benefit and that of the analysis community, and he has a good point.

So what you have here are some of my experimental ideas tested out on Boston Red Sox pitcher Jonathan Papelbon. I originally started this expedition in search of his so-called “slutter” (slider/cutter hybrid), and it may well be in the data, but one of the loose ends is that I haven’t differentiated his harder-thrown pitches yet. It seems like a pretty big loose end to not have accomplished my main goal, but perhaps more important than whether Papelbon throws a slutter and whether I can identify it from the PITCHf/x data is the detour I followed in the process of looking.

In Dr. Nathan’s paper, he solves the equations of motion for each pitch Jon Lester threw and uses this data to determine the spin direction and spin rate of the baseball. Traditionally (if something in its first year of existence can have a tradition), people have classified pitches using the speed, horizontal break, and vertical break from the PITCHf/x data. Dr. Nathan’s method holds promise as an alternate viewpoint on the data, classifying a pitch based on what the pitcher does to the baseball as opposed to classifying it based on how the pitched baseball moves from the perspective of the batter/catcher/umpire. In addition, Dr. Nathan pointed out to me that the spin direction and spin rate are independent of the y0 “release point” measurement distance, which has been set first at 55 feet, then 40 feet, and now 50 feet during different parts of the season. Since the spin parameters are invariant across y0, it’s much easier to classify pitches for a whole season’s data set, with only the start_speed needing to be adjusted based on changes in y0.

The method presented in the paper is based on fitting the pitch trajectory to the data using iterative numerical analysis. However, Dr. Nathan also presents some approximations for the Magnus force and drag force which he says yield comparable results to his more accurate method. I took these approximations and solved for the spin direction and spin rate. I then applied these new simpler equations to the same Jon Lester data and confirmed that I achieved very similar results to those presented in the paper.

Spin direction (in degrees) $\theta=\arctan(\frac{az+32.174}{ax})\cdot\frac{180}{\pi}$
then add 270 degrees if ax<0 or 90 degrees if ax>0.

Spin rate (in rpm) $\omega= \frac{\sqrt{ax^2 + (az + 32.174)^2}}{ |vy0|\cdot0.121\cdot0.00544\cdot2\pi}\cdot60$

(I’m in the process of figuring out how to use LaTeX in WordPress to make the formulas look nice.)

Next, I decided to test these equations out on Papelbon to see if I could classify his pitches. Here we come to another loose end. Graphs with angles beg to be put on polar plots, for more reasons than one. Excel does not do polar plots without jumping through a lot of hoops, this another reason I turned to R for graphing. However, I haven’t learned how to do polar plots in R yet. So, I present these Papelbon graphs in Excel and apologize for the poor formatting. It obscures some of the information I want to communicate about the value of classifying pitches by spin direction, but I think I can still make my point.

First we have pitch speed plotted versus spin direction. Note that this data set includes all three possible y0 values, yet the pitches are grouping nicely.

I have normalized the start_speed values in this graph to compensate for y0 values different than 50 ft. The normalization factor was based on extrapolating or interpolating to 50 ft based on the start_speed and end_speed values and assuming the y deceleration due to drag was constant.

Also, I cut the graph off at 300 degrees in order to better see most of the data. In doing so, I excluded two outlier pitches at around 350 degrees with a speed of 97 mph. I think there is some sort of error in the data with those two pitches, but if you run the analysis yourself, you’ll see them there.

Compare that to the mess we get when we plot pitch speed versus horizontal break, once again including data from all three of the y0 values.

A typical technique to clean up the speed versus horizontal break graph, besides separating the data by y0 measurement points, is to plot the vertical break, either on the same graph as a second set of points, or to make a separate graph showing vertical break versus horizontal break and indicating speed groups by color. Ignoring the y0 problem for the moment, that’s usually sufficient to tease out the different pitch types, although rising and sinking fastballs can still be tough to differentiate. However, using our first graph, we don’t have to do any of that.

We also have spin rate data, but I found, at least for Papelbon, that it has a high correlation with pitch speed (R^2=0.68).

The correlation isn’t perfect, but it’s high enough that I didn’t find much helpful information in the way of classifying pitches by looking at spin rate versus any of the other parameters on the whole. However, spin rate can sometimes be helpful in deciding how to classify an individual pitch on the borderline between groups in the speed versus spin direction graph.

Intuitively this conclusion makes sense to me when I think about how a fastball is thrown. The harder you throw it, the more backspin you will put on the ball.

There are (at least) two other avenues for exploration in terms of pitch classification. One is separation of the data by handedness of the batter. When you do this for Papelbon, you can clearly see that there are some pitches he prefers to throw to righties and some to lefties.

I haven’t spent much time actually trying to figure out which pitches are which, so the story isn’t quite as fun. (Is there a slutter in there?) Hopefully I can come back and do that. Nonetheless, I think the main point is made.

The remaining avenue for investigation is classifying pitches by “release point”, which is really just the initial measurement point for the PITCHf/x system. The following graphs show z0, the vertical distance from the ground to the pitch, versus, x0, the horizontal distance from the centerline of home plate, with the negative direction to the catcher’s left. Here I found it interesting to see the graphs separated by y0. Measured at 55 feet from home plate, you can hardly tell anything apart.

When y0 is set to 50 feet, we can start to see some separation into groups. I haven’t correlated these groups to the groups we saw above (another of those niggling loose ends), but clearly there’s something worthwhile in this data. The different pitches are starting to move a little differently and we can pick that up a few feet out of the pitcher’s hand.

When we go to 40 feet, the separation of the groups gets a little better, but it appears that groups might also be getting a little smeared or stretched out. I don’t know if that was one of the factors in MLB/Sportvision settling on the 50-foot measurement distance, but in any case it works out well for us in this instance.

So, there you have it. There are a lot of loose ends yet to chase in this little voyage through Jonathan Papelbon’s repertoire. Hopefully there is also something useful to be gained from what I’ve found so far.

Please feel free to comment either on the work I’ve presented, on the suggestions I’ve made for further investigation, or on your own ideas for what can be done with this data.

Part 2 of the Papelbon article can be found here.

Next Page »