Note: This article was originally published at the Statistically Speaking blog at MVN.com on January 9, 2008.  Since the MVN.com site is defunct and its articles are no longer available on the web, I am re-publishing the article here.

Who is the best pitcher in baseball right now? Some might answer that question with Jake Peavy or Josh Beckett, but I’d guess that at least 7 out of 10 times, the answer you would get is Minnesota Twins left-hander Johan Santana. Santana is a 28-year-old from Tovar, Venezuela, and after his fourth full year in the starting rotation, he already owns two Cy Young Award trophies.

Now, as Santana approaches the final season of the 4-year, $39.75 million contract he signed three years ago, the Twins appear eager to trade him, and the reported suitors include such teams as the New York Yankees, Boston Red Sox, and New York Mets, subject to Santana’s approval. I’ll leave the predictions of where he’ll land to those who are better qualified or more eager to comment than I am. However, I’d like to take a look at the pitching repertoire and strategy of possibly the best pitcher in baseball.

If you look at the scouting reports, they all talk about Johan Santana’s devastating changeup and how he works to make his throwing motion identical for all pitches. Most scouting reports list three pitches for Santana–fastball, changeup, and slider–and mention that his changeup comes in 15-20 mph slower than his fastball. Were this true, it would be highly unusual. Most major league changeups are 7-10 mph slower than the pitcher’s fastball. A few scouting reports speak of five pitches–two fastballs, a slider, a circle change, and a straight change. The most useful and interesting scouting information I found was an interview from 2006 that Pat Borzi conducted for the Sporting News with Johan Santana and his catcher Joe Mauer.

Santana throws four pitches for strikes-four- and two-seam fastballs between 92 and 95 mph, a slider/curve in the 84- to 87-mph range and a changeup that’s about 15 to 20 mph slower than the fastball. The changeup is his strikeout pitch; when Santana is on, he throws it from the same arm angle and release point as his fastball, and hitters can’t tell the difference until it’s too late.

I also found this quote from Santana interesting given that most people acknowledge his changeup as his best pitch:

“I want to make sure my two-seam fastball is working,” Santana says. “That’s my best pitch, and it’s going to make my other pitches look even better. That’s what I try to do all the time.”

We have detailed data from the PITCHf/x system for 1032 of Santana’s 3345 pitches during the 2007 season. Let’s dive in and see what we can learn about Santana’s repertoire and effectiveness with his various pitches.

Santana has at least three obvious pitch groupings: fastball, changeup, and breaking ball. Here I’ve shown two graphs that I use for pitch classification. The first graph shows the speed of his pitches versus the direction they break, in polar graph format. The second graph shows the movement on his pitches in the last quarter-second before they cross the plate, due to the forces of spin deflection and gravity.

The fastballs run 89-95 mph, and it’s hard to tell from these graphs alone whether Santana really does throw two different fastballs or just one. Through additional analysis, which I will explain shortly, as well as Santana’s own comments, I concluded that he did in fact throw a four-seam and a two-seam fastball and have coded them separately in these graphs.

We can also see that Santana throws two different offspeed pitches. One has a movement very similar to the fastball but is thrown slower at 80-84 mph. This is his changeup. It’s interesting to note that we see a 10 mph difference in speeds between his fastball and his changeup, typical of other major league changeups and nothing like the 15-20 mph difference that was reported by other sources. I don’t know if that was just the stuff of legend or whether Santana has changed his approach in recent years. More likely, people were comparing Santana’s very slowest changeup with his very fastest fastball and writing as if that represented a typical pitching pattern.

I could not find any sign of two different changeups in Santana’s repertoire, at least not two changeups that consistently have different movement or speed.

Santana’s other offspeed pitch is an 83-88 mph breaking ball, described in various scouting reports as either a slider or a curveball. Based on the spin direction, the speed, and the direction of break, it’s very clearly a slider. In the first graph of pitch speed vs. spin deflection angle, the calculation of the spin deflection angle for some of the sliders contains a good deal of error since the spin of those sliders is nearly aligned around the direction of travel of the pitch, resulting in spin deflection of only a couple inches or less. This is one of the classic indicators of a slider.

The sliders and changeups look difficult to separate at the margins in the two graphs I presented above, but including the (x-z component of the) spin rate in the discussion makes that task much easier.

Returning to the topic I mentioned earlier, how did I determine whether Santana threw both a four-seam and a two-seam fastball? Looking at the data in aggregate, it was impossible to see a dividing line, but when I examined the spin and break on a start-by-start basis, a little bit of order appeared out of the murkiness. In some starts, two separate groupings were obvious. In most starts, the dividing line was subtle. In a few cases, it was hard to find a dividing line at all. I did notice that the fastballs with the most sink and the slowest speed were thrown almost exclusively to right-handed hitters, and this, in addition to Santana’s own comments about throwing a two-seamer, gave me confidence in making a distinction between the two fastballs.

If you look at the comments from John Walsh and John Beamer on my Erik Bedard analysis, you’ll see that having to examine the data on a start-by-start basis in order to make an accurate pitch classification diagnosis is a recurring problem. We’d like to be able to look at a pitcher’s season data as a whole. This is an important area for further investigation.

Here are a couple more traditionally-used PITCHf/x graphs of pitch movement for those who are interested:

How does Santana use his pitches to left-handed and right-handed hitters? As a left-handed pitcher, he naturally sees predominantly right-handed hitters, making up 75% of his opponents. To righties, he throws about 41% four-seam fastballs, 35% changeups, 18% two-seam fastballs, and 6% sliders. To lefties, he throws 60% fastballs, 29% sliders, 7% changeups, and 4% two-seam fastballs. Against righties he’s the stereotypical fastball-changeup Santana that I’ve heard about. Against lefties, he’s a totally different pitcher, eschewing the changeup and the two-seam fastball and relying on a fastball-slider combination.

Next, let’s look at how Santana mixes his pitches in different ball-strike counts. I’ve split this out by batter handedness as well.

Against righties, you can see that the changeup is his favorite pitch with two strikes (57% of the time), and he mixes in his two-seam fastball more if he falls behind in the count (28% when behind vs. 15% when ahead or even).

Against lefties, he’s relies on the four-seamer about 70% of the time in most situations. With two strikes he feels confident enough to occasionally (14%) introduce the changeup to lefties, and on an 0-2 count, you can count on getting a slider two thirds of the time.

What’s the bottom line–what results does Santana get with his pitches? I attempted for a while to cast the answer to that question in terms of run values for each pitch determined by linear weights, but I’ve postponed that endeavor for the moment. There are too many pieces that I haven’t figured out how to put together yet. So here are the results in the same format I used in the Bedard article.

Fastball 0.32 0.20 0.25 0.10 0.13 0.316 0.188 0.842 0.158
Sinker 0.70 0.10 0.00 0.10 0.10 0.000 0.000 0.000 0.000
Slider 0.34 0.13 0.17 0.17 0.19 0.308 0.308 0.462 0.000
Changeup 0.24 0.06 0.18 0.24 0.29 0.400 0.400 0.400 0.000
Fastball 0.32 0.20 0.26 0.12 0.11 0.235 0.188 0.500 0.059
Sinker 0.35 0.17 0.24 0.06 0.19 0.333 0.250 0.741 0.111
Slider 0.38 0.12 0.24 0.08 0.18 0.111 0.000 0.444 0.111
Changeup 0.32 0.08 0.15 0.31 0.15 0.357 0.325 0.667 0.048
Lg. Avg. Ball CStrk Foul SStrk InPlay Avg BABIP SLG HR
Fastball 0.36 0.19 0.19 0.06 0.19 0.330 0.304 0.521 0.037
Slider 0.36 0.14 0.17 0.13 0.20 0.310 0.286 0.481 0.033
Changeup 0.40 0.11 0.14 0.13 0.21 0.319 0.295 0.502 0.035

The league average information comes from John Walsh’s article, and once again I’m using an adaptation of his format to present this information.

The four-seamer is Santana’s bread and butter, especially to lefties, and a good bit of creamy butter it has. He throws it for strikes and gets more swings and misses with it than most pitchers do. Hitters have a hard time putting the four-seamer into play, and when they do, Santana also gets really good results (a .188 BABIP compared to .304 league average BABIP on the fastball), although lefty batters–Hafner, Sizemore, and Thome–did hit three home runs off the four-seamer in our data set. He mostly pounds the zone with the pitch to both lefties and righties, although there appears to be some tendency toward pitching up and away from lefties and up and in to righties.

Santana doesn’t use the two-seamer much against lefties, and when he did, it was mostly for a ball. He works in the zone against righties and gets fairly average results with the two-seam fastball. One surprising thing to note is that he still gives up a lot of fly balls off the two-seamer; almost 70% of balls in play off the two-seamer were fly balls. The two-seamer seems like his weakest pitch based on the results we have from 2007, so I’m not sure I understand his statement from the Sporting News interview that it’s his best pitch.

Just look at all the red bleeding over the graph from the swinging strikes, and you know all you need to know about Santana’s changeup. The hitters can’t hit it. Santana can throw it for strikes just as well as his fastball. He throws it down and away from righties, and he gets a lot of swings and misses when they chase the changeup down out of the strike zone. When he gets it too close to the heart of the zone, they do make decent contact. It would go without saying, but this is an outstanding pitch.

Against lefties, Santana uses the slider mostly down and away, and he gets pretty average results with it. Against righties, he features the slider less often. When he does throw it, he keeps it inside. When he gets it up, it gets put in play, but he had fairly good results on a limited sample of balls in play except for one slider that Alex Rios launched 414 feet into the left field seats at the stadium formerly known as SkyDome.

I also looked a bit at pitch sequencing. Here’s a table showing what pitch a hitter is most likely to see from Santana based on what the previous pitch was.

Previous Pitch Fastball Sinker Slider Changeup
Fastball 66% 4% 26% 4%
Sinker 67% 0% 33% 0%
Slider 60% 9% 27% 4%
Changeup 76% 0% 24% 0%
Previous Pitch Fastball Sinker Slider Changeup
Fastball 52% 16% 5% 27%
Sinker 46% 21% 3% 31%
Slider 42% 30% 9% 18%
Changeup 43% 15% 8% 34%

I don’t notice any particular patterns to lefties, but to righties he’s more likely to throw the two-seamer after a previous two-seamer, and he’s more likely to throw a changeup after another changeup.

Johan Santana had yet another great season in 2007. He allowed a few more walks and home runs than in previous years, but without PITCHf/x data from previous seasons, I don’t have any way to know whether that was simply luck or a change in his pitching abilities and strategies.

I looked at the 11 home-run balls off Santana for which we have PITCHf/x data, and I couldn’t detect any useful patterns. They were mostly hit off pitches up and over the plate, but that doesn’t come as much of a surprise. Looking at the HitTracker data, he wasn’t burned by many short home runs barely sneaking over the fence, so he wasn’t unlucky in that regard, at least. This may be a topic for further investigation or possibly just the result of Santana being a fly ball pitcher and getting a little unlucky with how hard the hitters hit 33 of those fly balls in 2007.

Santana obviously has an outstanding changeup and a strong fastball, but you probably knew that already. What I didn’t know was how infrequently he uses the changeup against lefties or most of the other nuances of his pitching strategy. Unless you’re Joe Mauer or Mike Redmond (in which case, Hi!), hopefully you feel like you know the best pitcher in baseball a little better than you did before.

If you’re an employee of a Mr. Steinbrenner or a Mr. Henry gathering information for a future trade, by all means feel free to contact to me regarding where to send that check for my services. 🙂


Note: This article was originally published at the Statistically Speaking blog at MVN.com on January 14, 2008.  Since the MVN.com site is defunct and its articles are no longer available on the web, I am re-publishing the article here.

Many of you are hopefully familiar with the PITCHf/x system and at least some of the data and analysis that have been produced on the subject over the past year, but it may be completely new to some of you. In either case, I thought it would be helpful to provide an introduction and tutorial on the information that is available. I’ll point toward some existing resources and try to fill in some of the gaps. I’ve divided this primer into sections so you can easily skip to the parts that interest you.

  1. What is PITCHf/x?
  2. How do I get and use the data?
  3. Where can I find resources?
  4. How do I identify pitch types?
  5. How do I interpret graphs?
  6. Is the data reliable?
  7. Where can I go for further discussion and study?

1. What is PITCHf/x?

PITCHf/x is a system developed by Sportvision and introduced in Major League Baseball during the 2006 playoffs. It uses two cameras to record the position of the pitched baseball during its flight from the pitcher’s hand to home plate, and various parameters are measured and calculated to describe the trajectory and speed of each pitch. It was instituted in most ballparks throughout MLB as the 2007 season progressed, such that we have PITCHf/x data for a little over a third of the games from 2007. MLBAM used the PITCHf/x data in their Enhanced Gameday application and also made the data freely available for downloading and research.

In some ways, PITCHf/x is a bridge between scouting and analysis, giving us an objective window into the batter-pitcher matchup at a level we’ve never seen before. In 2008, the system should be installed in every major-league ballpark, and we will hopefully have complete detail for every pitch, although MLB has not committed to whether all the data will continue to be freely available in the future.

2. How do I get and use the data?

If you want to look at the XML data from a single game, you can go to the MLB website and browse through the files. Data is organized by year, month, day, and game. Within each game directory are a number of subdirectories containing the data in XML format. If you want to see the detailed pitch information within the game context, I suggest looking at the files in the inning subdirectory. If you want to see all the pitch information for a particular pitcher, you can go the pbp/pitchers subdirectory, but you need to know Elias playerID for your pitcher of interest. If you want to know what the various XML pitch data fields mean, read my glossary.

If you want to manipulate and analyze a single game’s worth of data, you can download and import the XML files into a Microsoft Excel spreadsheet. Dr. Alan Nathan has laid out the steps for you at his Physics of Baseball site.

If you want to get a little more hardcore, you can download the XML data for every game in the 2007 season. Using Perl scripts adapted from Joseph Adler’s Baseball Hacks, I downloaded the data and parsed it into a MySQL database. I’ve outlined the steps needed for you to do this yourself and shared the Perl code to give you a head start. (I’m not aware of anyone who’s gotten the Perl-to-MySQL path working on a Mac, so if you have, please drop me a line.)

3. Where can I find resources?

Probably the most popular and valuable PITCHf/x resource on the web is Josh Kalk’s collection of player cards. Josh has classified every pitch as either a fastball, sinker, cutter, splitter, changeup, slider, curve, or knuckleball using a clustering algorithm and made graphs of pitch speed, movement, and release point for every pitcher with at least 100 pitches recorded by PITCHf/x. Strike zone charts are available for hitters. This is a great resource that reminds me in some ways of Wikipedia: the depth, breadth, and accuracy of the information is amazing, doubly so since it’s free, but the accuracy isn’t perfect, and it’s worth keeping that in mind. Stuff that looks quirky to you may in fact be quirky. (Felix Hernandez does not throw a 100-mph splitter.)

Josh Kalk has also developed a PITCHf/x tool that allows you to query his database for a specific subset of pitches and plot their strike zone location.

The Hardball Times published a pitch identification tutorial by John Walsh that is a good introduction to the general PITCHf/x topic as well as the specific topic of pitch identification.

Dr. Alan Nathan’s Physics of Baseball site has a lot of interesting resources, including some PITCHf/x-related material.

4. How do I identify pitch types?

Some people are good at identifying pitch types while at the ballpark or from the center field TV camera view. That was a splitter. That was a sinker. That was a slider. Etc. I am not one of those people. If you are not one of those people either, PITCHf/x was made for you. Even if you are one of those people, PITCHf/x can be a useful resource for learning about how different pitches move.

A pitcher’s fastest pitch is usually a four-seam fastball. A typical major-league fastball is around 90 mph, many a little faster, some a little slower. The fastball from a right-handed pitcher breaks in toward a right-handed hitter. Pitches from a lefty move the opposite way; a fastball from a lefty breaks away from a right-handed hitter. I’ll describe the movement for pitches from a righty and you can flip the orientation if you want to know how a similar pitch from a lefty would behave.

Pitchers throw variations of the fastball by changing the grip on the baseball or parts of their motion and delivery. The most popular variation is a two-seam fastball, which often thrown a couple mph slower and breaks in more and drops more to a right-handed hitter from a right-handed pitcher than the four-seamer. The cut fastball is also thrown a few mph slower than the four-seamer and breaks away a little from a right-handed hitter, if it breaks at all.

The most popular off-speed pitch is the changeup, which is typically thrown 7-10 mph slower than a pitcher’s fastball. It usually has a similar break to the fastball, in toward a right-handed hitter. Some pitchers employ a grip on their changeup to impart additional movement, usually causing the pitch to break in more and drop more to a right-handed hitter. The split-finger fastball acts much like a changeup except that its velocity and movement are usually somewhere between the fastball and changeup.

Breaking balls include the slider and curveball. The slider is usually thrown at the same speed as the changeup or sometimes a few mph faster. The movement on the slider can vary quite a bit from one pitcher to another. Some sliders move like a cutter, with hardly any left-right break. Other sliders move more like a curveball, which breaks away from a right-handed hitter and down. The curveball is the slowest pitch, thrown in the 65-80 mph range in major league baseball.

The knuckleball is a special case in major league baseball these days. As far as I know, there were only two regular practitioners of the pitch in the majors last year: Tim Wakefield and Charlie Haeger. The pitch is thrown with very little spin such that the airstream interaction with the seam orientation causes the baseball to move unpredictably. Wakefield and Haeger throw the knuckleball about 65-70 mph.

Of course, there are a number of variations and combinations of the above pitches and specialty pitches like the screwball and gyroball and even the 50-mph Orlando Hernandez eephus pitch.

Here is a plot showing the typical vertical and horizontal spin deflection (a.k.a.”break”) of typical pitches from a right-handed pitcher, as viewed from the catcher’s point of view. A mirror image would give you the plot for left-handed pitcher. You can use this as a key for interpreting some of the graphs on Josh Kalk’s player cards or for understanding the spin-induced movement on various types of pitches.

5. How do I interpret graphs?

PITCHf/x analysis and research is a promising field with wide application and broad interest, and there are a number of people who have made important contributions in the first year of analysis. As a result, there are many different formats for presenting the results. I’ll summarize and explain a few of them here and give a more detailed explanation of some of the graphs that I use most frequently.

The most common plots presented by other PITCHf/x researchers include information about the speed and spin-induced deflection of pitches. To the best of my knowledge, Joe Sheehan was the first to produce these plots, showing speed on the vertical axis and the two components of spin deflection as two sets of points on the horizontal axis. Joe hasn’t done much pitch classification work recently, but he deserves a nod as the groundbreaker in that field.

Something you’re more likely to encounter these days is a plot from John Walsh, such as those contained in his pitch identification tutorial. He plots vertical “movement” versus horizontal “movement”, where movement refers to the spin-induced deflection, and indicates speed by color-coding the points on the graph.

Most common of all are the plots from Josh Kalk’s pitcher cards, particularly the plots of vertical “break” versus horizontal “break”. These are similar to John Walsh’s plots except that instead of color-coding for speed, the points on the graph are color-coded by pitch type. Josh has separate graphs that plot speed versus horizontal break and speed versus vertical break, reminiscent of the original Sheehan plots. Josh’s player cards also contain information on release point, which is the height and left-right position of the pitch measured 50 feet from home plate, which is soon after the actual release by the pitcher.

In the past I have presented graphs similar to those of Sheehan and Kalk, but more recently I’ve adopted a graph from Alan Nathan as my mainstay. It is a polar plot, with the speed of the pitch on the radial axis. The faster the pitch, the farther from the center. The slower the pitch, the closer to the center. The angle is the angle of the Magnus force, which is the force that cause the ball to break. Curveballs break down, so they’ll be in the bottom part of the graph. Sliders break away from a right-handed hitter, so they’ll be on the left side of the graph. The Magnus force of a fastball pushes the ball up, causing it to drop less than it normally would due to gravity alone, so the fastballs will be on the top part of the graph.

I’ve also started showing a graph of what I call “late break”, which is a combination of the effects of spin deflection and gravity as well as the speed of the pitch. The goal is to show something close to what the hitter perceives as the break or movement of the pitch. I calculate the deflection of the pitch due to two forces, spin and gravity, in the last 0.25 seconds of its trajectory before it crosses the plate, an idea I got from Tom Tango. I chose a quarter second because that’s roughly the reaction time of a batter executing a swing. I chose to include the effect of gravity because I believe that more accurately reflects what hitters see. Hitters don’t attempt to hit a gravity-less pitch; they attempt to hit a pitch that’s being affected by gravity and being deflected by spin.

6. Is the data reliable?

Whenever you are viewing or analyzing PITCHf/x data, it’s worth keeping in my mind that 2007 was a work in progress for Sportvision and MLBAM. They instituted the system in only a handful of stadiums to begin the year and added more systems in other stadiums, particularly in the second half of the year, as they gained confidence in the performance and accuracy of PITCHf/x. They experimented with measuring the initial point of the pitch trajectory at various distances from home plate, finally settling on 50 feet. They worked to identify and remove spurious data that was collected by the system. They trained operators who did such things as identifying the beginning of play in each half inning and setting the top and bottom of each batter’s strike zone in the system. In addition, the camera systems were sometimes recalibrated, possibly at the beginning of each home stand.

So it’s a bit naive to assume the data we have is a perfectly objective, accurate, and precise measure of each pitch. In most cases, it’s pretty close (within an inch or two) and good enough–much better than anything we’ve ever had before! But what are some of the sources of error to watch out for?

The data for some pitches is missing. In some cases this is obvious, when a stadium doesn’t have a system for part of the year, for example. Other times, portions of games will be missing, or even just individual pitches. Perhaps the operator may not have turned the system on for the first pitch of the inning, or MLB/Sportvision retroactively discovered an error in their data and removed it. We are also missing PITCHf/x data for all hit batsmen during the regular season.

There is erroneous data–spurious or mis-measured pitches. For example, the data may say that a pitch was released from ten feet off the ground, and unless Gumby has caught on with a major league team, I doubt any pitcher can reach that high. There are a number of 30-40 mph pitches that are recorded in the data that do not appear to be realistic. It’s been suggested that some of these may have been the system inadvertently recording other non-pitch throws of the baseball between the mound and the plate as a pitch.

There are indications of park and/or camera system bias. Data from Seattle and Toronto indicate pitch speeds that seem a few mph higher than they should be. Look how hard Dustin McGowan and Felix Hernandez are shown to have thrown on average. These guys are hard throwers, but not that hard. Similarly, the system at Fenway Park seems to have underestimated pitch speeds and otherwise collected strange data.

There are also altitude and temperature effects. In this case, the data collected by PITCHf/x may be completely correct, but our interpretation of the data has to take into account that air density affects how a pitched baseball moves. A curveball thrown in the thin air of Denver, Colorado won’t break as much as the same curveball thrown in the pea soup at sea level.

7. Where can I go for further discussion and study?

If you want to learn more about the details of Sportvision’s PITCHf/x system and MLB’s implementation, read this article by Mark Newman of MLB.com.

If you want to learn more about the physics of pitched baseballs, Alan Nathan is your man, and his freshman physics lectures on the Physics of Baseball at the University of Illinois are an excellent place to begin. You might also find these articles by Dave Baldwin and Terry Bahill helpful.

If you want to learn more about pitch classification methods, as I mentioned earlier, John Walsh’s pitch identification tutorial is a good place to start. You may also want to consult my survey of the topic, which contains a particular in-depth emphasis on my own work on the subject.

If you want to discuss PITCHf/x with other sabermetricians, I recommend The BOOK Blog run by Tom Tango.

If you want to learn about systematic error correction for the PITCHf/x data set, read Josh Kalk’s posts at his blog, and this post by Ike Hall, including comments by Alan Nathan.

If you want to learn about pitch sequencing analysis, Joe P. Sheehan’s Command Post at Baseball Analysts is a good resource, including these posts on the topic. Joe Sheehan’s writing is an excellent resource on a number of diverse PITCHf/x topics. Although I only listed him here under pitch sequencing, it’s well worth going through his archives on many other topics if you are interested in learning about PITCHf/x.

Dan Fox’s work is another great PITCHf/x resource, although, like Joe, I couldn’t find a neat category to file him under. He’s covered everything from pitch classification to measures of strike zone judgment.

If you want to learn about pitching styles, strategies, and repertoires throughout baseball history, I highly recommend reading the Neyer/James Guide to Pitchers, published in 2003. Rob Neyer has updates to the book at his blog.

I have a couple scouting reports up at the Hardball Times based on data from PITCHf/x, one on Scott Kazmir and the other on Cole Hamels.

I also highly recommend Matt Lentzner’s article at THT on his theory of pitching mechanics.

I’ve been doing a few other things behind the scenes that haven’t seen publication here or at THT, but I’m still involved in baseball analysis and writing, in case you were wondering.  You can look for my article on Cliff Lee in the upcoming Hardball Times Annual 2009, which will be available November 30.

My article at Hardball Times on Danny Herrera’s screwball includes views of his pitch trajectories as seen from the right-handed and left-handed batter’s boxes.

I mentioned in the References section that I did some trigonometry to transform the coordinate system from plate view to batter’s box view.

Here is what I did.

The pitch trajectory is shown as the dotted black line. Any point on the trajectory can be calculated using the initial position, velocity, and acceleration provided in the PITCHf/x data, along with the equations of motion. Only the x-y plane is shown above since no transformation was done to the z axis. The coordinates in the PITCHf/x coordinate space are x and y, shown in black.

The coordinates in the batter’s box view are x’ and y’, shown in red. The y-axis in the batter’s box view runs along a line from the batter’s head to the pitcher’s approximate release point (the average x value of his pitches at y = 55 feet). The x-axis in the batter’s box view is set perpendicular to this new y-axis.

The origin of the batter’s box view is offset 2.8 feet in the x direction from the origin in PITCHf/x coordinate space. I calculated 2.8 feet from the center of the plate as the approximate location of the batter’s head, based on a video frame capture in Marv White’s presentation at the PITCHf/x Summit. I chose not to offset the origin in the y direction for simplicity, although I also believe this does not introduce any significant inaccuracy. The batter’s head is typically within a foot or so of y=0.

First, I calculated the quantity m, the distance to the baseball, shown by the blue line. This distance m = sqrt ( y^2 + ( x + 2.8 ft)^2 ).

Next, I found the value of the angle alpha. The angle alpha = arctan ( 55 ft / ( x0 + 2.8 ft) ).

The angle (alpha – theta) = arctan ( y / ( x + 2.8 ft) ), which allows us to calculate the angle theta.

The angle theta = arctan ( 55 ft / ( x0 + 2.8 ft) ) – arctan ( y / ( x + 2.8 ft) ).

The batter’s box coordinates x’ and y’ can be found from the angle theta and the distance m. The new y’ = m * cos (theta), and the new x’ = m * sin (theta).

I am happy for you to use my method for batter’s view transformation if you provide attribution in the form of my name and/or a link to this website.

This has nothing to do with anything except me reveling in the things you stumble upon in the PITCHf/x data set. I was looking at some Roy Oswalt data from last year. When I looked at his August 18 start, I noticed he had thrown his fastball at two distinctly different speeds.

Roy Oswalt pitch sequence August 18, 2007

When do you think Oswalt pulled his left oblique muscle?

You’re right. From the AP game recap:

Oswalt said he first felt something near his rib cage on his last pitch of the third inning, a curveball to Geoff Blum. Oswalt batted with two outs in the fourth and beat out an infield RBI single to give the Astros a 3-0 lead.

“I went through the fourth and told them I want to stay out there and see if I could get through two more innings,” Oswalt said. “Made it through the fourth and thought I could have made it through the fifth.”

No revolutionary analysis there, but I thought it was a fun tidbit.

Last year I diligently kept a catalog of articles written about topics related to PITCHf/x or using PITCHf/x data. Some of you have noticed that I have been negligent in updating that catalog this year. My last full update was January 15, and I did a partial update on March 1.

A new update is now in progress behind the scenes. Since the article list now exceeds six hundred articles, I’m working toward a database solution to better track them all. Hopefully, I’ll be able to unveil something within the next few weeks. In the mean time, Harry, if you would quit writing more than an article per day, that would help a lot. I should just rename my catalog the Cubs f/x Index. 😉

During March I did an in-depth study of Jack Cust’s surprising 2007 season. Recently I’ve been wondering why he was struggling so mightily in 2008. I did an update to the study and published the results at The Hardball Times.

Edit:  I posted to THT Live about Cust’s performance over the last couple days.  I don’t intend to imply that I can divine the end to a player’s slump or the beginning of a hot streak.  It’s more of a case of me musing out loud about and trying to learn how the PITCHf/x tools fit into the scouting/performance picture.

« Previous PageNext Page »