Once again building on pitch identification work I’ve done for a pitcher, here is Part 2 of the series on Joba Chamberlain. It’s not exactly all I hoped, for reasons I’ll get to in a moment, but there are some interesting things to be learned. This is similar to previous work I’ve done for Josh Beckett and Eric Gagne.
First, let’s look at which pitches Chamberlain uses in various ball-strike counts.
Joba Chamberlain definitely relies on his fastball, which is probably not unusual for a power pitcher out of the bullpen, but he throws his slider much more often with two strikes. In a 2-2 count, you can almost expect a slider (68%). I don’t think we have enough data on his use of the curveball to draw conclusions about that. You can compare my data to Josh Kalk’s, although my data set includes Chamberlain’s two divisional series appearances, and Josh’s algorithm classifies all of Chamberlain’s off-speed pitches as sliders, whereas I have identified his curveball and changeup separately.
Next, let’s look at the results split up by pitch type and batter handedness.
CS=called strike, SS=swinging strike, IPO=in play (out), IPNO=in play (no out), TB=total bases, BABIP=batting average on balls in play (including home runs), SLGBIP=slugging average on balls in play (including home runs). For Strk% all pitches other than balls are counted as strikes. Con% = (Foul+IPO+IPNO)/(Foul+IPO+IPNO+SS).
The first thing that jumps out is, of course, the results for his slider. Wow! Just wow. In the PITCHf/x games, at least, nobody got a hit off of it, and hardly anybody managed to put it into play or even foul it off. The only real negative would be that it seemed like he had a little trouble throwing his curveball for strikes, but given that he only walked nine men in 27 and 2/3 innings, that doesn’t seem a big concern.
Next let’s look at the strike zone charts showing where Joba Chamberlain locates his pitches against left-handed hitters and right-handed hitters. I’m keeping the same formatting for these charts as I did in the Beckett and Gagne analyses. The strike zone is shown as a box, including one radius of a baseball on each side of the plate, and the top and bottom of the zone are a general average not adjusted per batter in these charts. The location is plotted where the pitch crossed the front of home plate.
Let’s begin with the fastball.
Chamberlain works mostly on the outer half of the plate with the fastball to lefties, and he’s more in the zone to righties, although he also comes up and in to righties. Batters seem to be able to handle his fastball fairly well, not swinging and missing very often and having pretty good success when they do put the ball in play, similar to what we saw with Josh Beckett’s four-seam fastball. I don’t have a good idea yet how this compares to league-wide numbers for all pitchers’ fastballs or even to a significant number of other hard throwers.
Next, let’s look at the seldom-used curveball and changeup. I’ll present these without comment since there isn’t much data to discuss.
Finally, let’s move on to what you’ve all been waiting for: the famous Joba slider.
This is where this avenue of inquiry starts to go downhill. After looking at this graph, I wanted to talk about how Chamberlain gets a lot of swings and misses on his slider down and away to righties and down and in to lefties.
But I was bugged by the swinging strike that was recorded nearly at the lefty batter’s foot (x=1.83, z=0.35). Was a hitter so badly fooled by a slider that he swung at one at his shoe top? It’s certainly possible, but if so, I wanted to see it. So I brought up the MLB.tv footage for the game, September 23rd against Toronto, where Chamberlain entered with two on and two out in the 8th inning to face left-handed Adam Lind, trying to preserve a 7-5 Yankees’ lead. Jumping to the end of the story, Chamberlain throws Lind five straight sliders to strike him out and end the inning.
Unfortunately, however, the pitch locations recorded by PITCHf/x for these pitches were mistakenly attached to the wrong pitches in the Gameday XML data.
The first pitch of the at bat was a belt-high slider just inside that Lind swung at and missed, followed by a second pitch in almost the same location, with the same result. Next, Chamberlain threw two sliders at Lind’s feet; the second of these landed in the dirt. Lind laid off both of those pitches to even the count at 2-2. Finally, Chamberlain threw a slider down and in, labeled pitch #5 in the second graph, which Lind swung at and missed for strike three.
The XML pitch location data for this game seems to have missed the fourth pitch (the one in the dirt) altogether and added an extraneous pitch, labeled #3 in the first graph, that did not occur in the pitch sequence to Lind. Then the order of the other pitches is out of whack, too. The pitch labeled #1 should be #5, #2 should be #1, #4 should be #2, and #5 should be #3.
The conclusion is that, no, Chamberlain did not get Adam Lind to swing at slider at his shoe tops. He did get him to swing at a pitch down and in that would have been Ball 3 if he let it go by, and it was an impressive pitching performance by Chamberlain, but unfortunately it calls into question the integrity of our data set.
I don’t have any way to verify the integrity of the rest of the data without watching endless hours of games on MLB.tv. That may seem like a worthy endeavor to some, and I can’t argue too strenuously with them, but alas, the rest of my non-baseball life seems to think it has some importance, too.
I don’t intend my notation of this example in any way to disparage the incredible work that MLBAM and Sportvision have done in creating this data set and making it available to us. For free, no less. It’s an incredibly valuable resource, and some errors are to be expected during a season in which the system was being evaluated and debugged.
I just don’t know how prevalent these kinds of errors are and when they might call into question some of my conclusions. I do know that Eric Van spotted a similar error in Josh Beckett’s data from Game 1 of the division series, as detailed in this thread at Sons of Sam Horn, post #88. The PITCHf/x data in question for that game has since been removed from the data set altogether. Eric mentions plotting the human-generated x,y coordinates against the computer-generated PITCHf/x coordinates as a way to spot these errors, but in our case with the Chamberlain-Lind at bat, the human-generated coordinates look screwy to me, too. I haven’t applied Eric’s method to a larger data set, so it may still have merit.
While, we’re on this subject, I may as well put in a plug for Josh Kalk’s new PITCHf/x batter-pitcher matchup tool. You can look at the Chamberlain-Lind matchup there for yourself. It doesn’t tell you anything I didn’t show here, but I wanted to make sure all my readers were aware this great tool was available.
Update: Cory Schwartz from MLBAM addresses the PITCHf/x data error here.