March 2008

I just finished an analysis of Oakland A’s outfielder/DH Jack Cust, and it was published at The Hardball Times this morning.

I thought it might be helpful for me to let you know what else I’m working on at the current time, what things I’m considering working on in the future, and to mention problems that I consider generally important for PITCHf/x research to tackle in the near future, whether or not I plan to work on them personally.

I would appreciate input, both on what you consider important for baseball research in general and for what you’d most like to see me do. Maybe out of a discussion, we can jointly develop a set of PITCHf/x Hilbert problems.

What’s on my plate now

  • Updating the catalog of PITCHf/x-related articles. I’ve fallen seriously behind on this in 2008, mainly because it’s becoming too big for me to handle in its current format. Which leads to the second item…
  • Transferring the catalog of PITCHf/x-related articles into a database. This should improve searchability, portability, and timeliness.
  • Implementing an improved Gameday data spider using Wget for 2008 season.
  • Developing and/or consolidating data parsers that I developed last season into a more integrated and efficient whole.
  • Tinkering, as always, with the data from various players trying to learn something about the player or understand something about the PITCHf/x data set. This occasionally turns into an in-depth article on a player.

Things I might do in the future

  • Adding baserunner state information to my PITCHf/x database.
  • Investigating why some pitchers are home-run prone and others are not. I’m particularly interested in why some flyball pitchers are less home-run prone than other.
  • Investigating what we can learn from release point data for pitchers.
  • Attempting to identify hanging breaking balls.
  • Systematizing pitcher information on a broader scale.

Other questions for PITCHf/x research

Evaluating pitching:

  • Integrating information about mechanics and delivery with information from PITCHf/x about release point and trajectory of pitches and consistency of same.
  • Determining the run value of a fastball as a function of speed.
  • Measuring fastball speed, curveball spin, etc., as a function of fatigue. We would need a good measure of fatigue.
  • Improving pitch classification methods and terminology.

Evaluating hitting:

  • Collecting speed-off-bat information, a.k.a. Hit f/x. This is an area that Sportvision is investigating. It could potentially also have very valuable applications for fielding.

Improving data integrity:

  • Data correction for park and weather variations and camera distortion.
  • Identifying and eliminating spurious pitch data.

Evaluating catcher defense:

  • Pitch blocking: Dan Turkenkopf has made a great start at evaluating catchers’ ability to block pitches in the dirt. There’s much more that could be done.
  • Game calling: do some catchers have patterns and preferences for the pitches they call that we can distinguish from the pitcher’s patterns and preferences? Can we use a Tango WOWY approach to determine this?
  • Stolen bases: Do pitch speeds and types affect stolen base success rates? How does this affect particular catchers and pitchers? Does the speed or location of a pitchout affect its chance of success?

At this point, this is not a comprehensive list, but it would be nice if we could develop something like that.


I’m sad to say that I am leaving the Statistically Speaking blog at MVN and excited to let you know that I will be joining the team of writers at The Hardball Times.  I haven’t written any articles yet for THT, but I’ll let you know when I do.

Note: This article was originally published at the Statistically Speaking blog at on December 9, 2007.  Since the site is defunct and its articles are no longer available on the web, I am re-publishing the article here.

I have a twenty-month-old son, and one of his favorite books is Moo, Baa, La, La, La! I’ve read it with him so much that it informs my other thoughts, including those about baseball, apparently. The first page of the book says, “A cow says moo.” Next, “A sheep says baa.” Then, “Three singing pigs say La, La, La!”

“No, no, you say, that isn’t right. The pigs say OINK all day and night.”

In baseball analysis, oinking is boring. Everyone loves to marvel at singing pigs, particularly if maybe they thought a particular pig had a penchant for music in the first place. Rare is the analyst or reader who stops the party to tell us that pigs just say oink, and there’s something wrong with the analysis process that sold us into the thrall of a porcine melody. When it comes to PITCHf/x, the new toy of the sabermetric community–one that isn’t well-vetted or widely understood yet–we need to be especially careful about drawing superficial conclusions.

I’ve even been responsible for encouraging a porker or two to sing. Take a look at my comments on Eric Gagne’s slider:

His occasional slider seems very inconsistent. It runs about 82-86 mph, but its spin axis is all over the place, ranging from 120 degrees (great sidespin) to 210 degrees (no sidespin at all other than that from the 1 o’clock delivery).

It sounds great– it rang true with my preconceptions based on scouting reports that said Gagne didn’t have a very good slider, and the data showed he didn’t throw it very often, so Gagne himself appeared to agree with my conclusion. Strike up the band, the pigs are going to sing!

There’s only one problem. In this case, John Walsh was the one who pointed out to me that pigs say oink. The slider’s x-z spin axis seems to be all over the place because the spin axis on Gagne’s slider is almost completely aligned with the y-direction, the direction of travel, with the ball spinning like a spiraling football pass. Whatever small component of the spin shows up in x- or z-directions is overwhelmed by errors in the measurements and subsequent calculations. Now I know to pay attention to this, thanks to John.

I would like to encourage you all to be suspicious when you find something odd from the PITCHf/x data. Does John Smoltz throw a 92-mph slider? Maybe, or maybe the PITCHf/x system was measuring a little fast that day.

As I mentioned in my article about John Smoltz, I’ve recently been researching the slider. Looking at the slider naturally lends itself to examining its cousin, the cut fastball. In one sense, they’re completely different animals: one an off-speed pitch, a breaking ball; the other a fastball. That is true for some pitchers, whose sliders are classic breaking balls bearing little resemblance to a cutter. But for other pitchers they are more closely related, with sliders breaking very similarly to what might be called a cutter by someone else. In those cases, the main difference between a slider and a cutter is speed. The cutter is usually thrown almost as fast as a pitcher’s regular fastball, within a few miles per hour, and the slider is usually 5 mph or more slower than the fastball. Rather than quibbling too much over the theoretical definitions of sliders and cutters, let’s take a look at some real data.

Today let’s examine the classic cut fastball and the pitcher who made it famous: New York Yankees closer Mariano Rivera. We have detailed PITCHf/x data for 399 of the 747 pitches that Rivera threw during the 2007 season, including three post-season appearances in the divisonal series against Cleveland.

The Sporting News had this to say about Rivera back in 1999:

It wasn’t until 1997 that Rivera started toying with the grip on his fastball. He began immediately to evolve from a strikeout pitcher (he averaged almost 11 per nine innings in ’96 as setup man to Wetteland) to one who induces ground balls, thus improving his efficiency and making him available more often to Torre. Now he uses almost exclusively the cut fastball clocked in the mid-90s that seems much faster because he lulls hitters with a smooth delivery that explodes the ball in on hitters’ hands.

“Every appearance he breaks at least one bat,” Cone says. “We keep a tally on the bench.”

Try this tally, too: Lefthanders hit .143 with one home run against him this season, a direct result of his cutter. “No one has a cut fastball like him,” Red Sox first baseman Mike Stanley says. “You’re thinking it’s a slider, then you see it hit 96, 97 mph on the screen and you’re like, `Geez, no wonder this guy is so good.'”

Except for a couple mph lost on the fastball, the scouting report doesn’t seem to have changed that much in eight years since then.

I’ll start my analysis by graphing his pitch speed versus the angle at which the spin on the ball is deflecting the pitch. (You can look here for more information on how I calculate these numbers or what the PITCHf/x data fields mean.)

Pitch speed vs. spin force direction

As expected, we see that Rivera throws almost 90% cutters. The remainder of the pitches appear to be regular fastballs. I read rumors of a slider in a scouting report here or there, but I don’t see any evidence of it in the data we have from 2007. Either people are confusing the movement of the cutter with a slider, or Rivera throws it so infrequently it didn’t show up in my data.

Pitch movement

How does the pitch movement look to the hitters? Both his fastball and cut fastball have hop due to the mid-90-mph speed, but they break in opposite directions. Here’s a graph of the pitch movement, including both aerodynamic (spin) effects and the effect of gravity.

What does Mo Rivera do with these pitches? The first thing I noticed is that he throws his regular fastball almost exclusively to right-handed hitters. Lefties only see the cutter (with one exception in 223 pitches), but righties get 64% cutters and 36% regular fastballs.

Fastball 1 0 0 0 0 0 0 0%
Cutter 66 50 44 22 29 11 13 0.275 0.325 70% 79%
Fastball 16 3 13 5 3 4 5 0.571 0.714 64% 80%
Cutter 36 14 30 23 15 5 8 0.250 0.400 71% 68%

CS=called strike, SS=swinging strike, IPO=in play (out), IPNO=in play (no out), TB=total bases, BABIP=batting average on balls in play (including home runs), SLGBIP=slugging average on balls in play (including home runs). For Strk% all pitches other than balls are counted as strikes. Con% = (Foul+IPO+IPNO)/(Foul+IPO+IPNO+SS).

Lefties can’t deal with his cutter at all. They can’t put it in play very often, and when they do, it’s mostly ground balls on the infield.

Let’s take a look at strike zone charts showing where Rivera locates his fastball and his cut fastball against left-handed and right-handed hitters. I’m keeping the same formatting for these charts as I did in my analysis of Joba Chamberlain, Josh Beckett, and Eric Gagne. The strike zone is shown as a box, including one radius of a baseball on each side of the plate, and the top and bottom of the zone are a general average not adjusted per batter in these charts. The location is plotted where the pitch crossed the front of home plate.

Fastballs plate location and results

Rivera likes to bust his fastball inside against righties, and he gets quite a few foul balls that way. When the hitters did put it in play, they had fairly good results. Most of his fastballs were thrown early in the count, on 0-0, 1-0, or 1-1: 38% fastballs and 62% cutters on those three counts. On all other counts he threw 14% fastballs and 86% cutters to right-handers.

Cut fastballs plate location and results

Now, the cutter. Against lefties, he throws it both inside and outside. When he throws it inside, he gets a lot of ground ball outs, but anywhere he throws it he gets good results. Against righties, he mostly works away, but it looks like he’s also not afraid to come up and in. Righties swing and miss a lot against the cutter.

I probably haven’t shown anybody anything terribly revolutionary about Mariano Rivera. You knew he was a very good pitcher who relied on a cut fastball before you read this article. But if I’m going to talk about the cutter, I have to start with the best in the business, and that’s Mo Rivera.