Wednesday, January 30, 2013

How do Different Types of Hitters Age?

It's the baseball offseason, which means that the baseball free agent season is still going on.  Most of the better players have already been signed, but there are still a few players available.  One thing that needs to be considered is how baseball players age.  If you are a general manager signing a 30 year old player to a long term deal, you need to know that your player will not only be good this year, but will also be good in 5+ years.

Most baseball players will peak in their late 20s or early 30s.  They also generally hit free agency for the first time at around age 30*.  If you are going to spend big bucks on a star hitter, you want to make sure they turn out well and you don't regret the deal in two years.  Better yet, you want to be happy with the deal in the last year of the deal.  So how should a general manager play the free agent market?

To figure this out, I took a list of baseball players who played Major League baseball from age 24-40, with up to two or three years where they didn't play (i.e. they could play from age 24-38 and be on the list, or they could play from 24-33, be injured when they are 34 years old, and then play from ages 35-39).  In addition, I tried to make a list of players who finished their career in 1995 through today (and there are a few players on the list who are still playing today).  I also tried to eliminate players who were either caught with steroids or whom it was generally assumed used them.  Barry Bonds, Mark McGwire, Miguel Tejada, Sammy Sosa, Brady Anderson, and Manny Ramirez were taken out right away since their stats may skew the overall stats.  In all, I got a list of 84 players.

First, I separated the players based on their career equivalent runs (a combination of OBP and Slugging Percentage which was explained on a prior post).  I divided the group of 84 players into 4 groups: Group 1 had the 21 players with the highest equivalent run amount, group 2 had the next highest, group 3 had the next highest, and group 4 had the lowest.  Here is a listing of the players in each of the 4 groups:

Group 1: Frank Thomas, Todd Helton, Larry Walker, Jim Thome, Edgar Martinez, Chipper Jones, Mike Piazza, Gary Sheffield, Ken Griffey Jr., Jim Edmonds, Wade Boggs, Bobby Abreu, Fred McGriff, Moises Alou, Ellis Burks, Tony Gwynn, George Brett, Jorge Posada, Rickey Henderson, Jeff Kent, Luis Gonzalez

Group 2: Derek Jeter, Mark Grace, Eric Davis, Andres Galarraga, Paul O'Neill, Eddie Murray, Tim Raines, Matt Stairs, Bobby Bonilla, Paul Molitor, Barry Larkin, Dave Winfield, Harold Baines, Reggie Sanders, Chili Davis, Kirk Gibson, Kenny Lofton, Wally Joyner, Craig Biggio, Raul Ibanez, Dave Magadan

Group 3: Lou Whitaker, Julio Franco, Jeff Conine, Johnny Damon, Tony Phillips, Ivan Rodriguez, Andre Dawson, Brett Butler, Cal Ripken Jr., Brian Jordan, Mike Cameron, Alan Trammell, Todd Zeile, Garret Anderson, Randy Velarde, Steve Finley, Eric Young, Joe Carter, B.J. Surhoff, Gregg Zaun, Lance Parrish

Group 4: Damion Easley, Willie McGee, Devon White, Mark Grudzielanek, Gary Gaetti, Marquis Grissom, Tim Wallach, Mark McLemore, Benito Santiago, Craig Counsell, Sandy Alomar, Omar Vizquel, Shawon Dunston, Ozzie Smith, Otis Nixon, Brad Ausmus, Miguel Cairo, Lenny Harris, Tony Pena, Jose Vizcaino, Henry Blanco

Basically, what I did was found the equivalent runs for each of these players at each age from 24-40.  Then, I found the average Equivalent Runs for each group at each age (for example, At age 24, Group 1 averaged 6.371 Equivalent runs... a team of nine average 24-year old Group 1 players would average about 6.371 runs per game).  Then, I put them onto a line chart and wanted to see which group ages best.  Here's what the graph looks like:


 Basically, from age 24 to age 33, there is a very consistent difference between Group 1 and the rest of the group.  If you are a GM and you want to sign a free agent Group 1-type player, you should expect to pay a lot more and get a lot more (1+ equivalent runs per game... a very significant amount).  A group 4 player will be about 3 equivalent runs worse than a group 1 player from age 24 to age 33.  While group 4 is the least valuable hitting group in this analysis, the players in that group are generally good players (there has to be a reason they stay relevant until their late 30s).  Keeping in mind that even the most mediocre baseball player (cough Cesar Izturis cough) can score about 1.7 equivalent runs over the course of a season, signing one superstar at age 27 or 28 will generally be a better decision than signing two Marquis Grissom types at age 27 or 28.  Signing one Jim Edmonds at age 27 would be better than signing a Sandy Alomar and an Omar Vizquel at age 27.

However, at about age 34 or 35, the top three groups take a step down but group 1 takes the biggest step down.  Players in group 4, however, seem to stay around the same or even get better through age 37.  If you are signing a super star at age 30, you will generally have to sign the player for at least 6 or 7 years.  You may get a great 4-5 years followed by a good, but underwhelming 2-3 years.  However, you will likely get closer to what you pay for from a group 4 type player when signing a long term deal at age 30.  It's funny how 2-3 short years make such a difference; at age 27-28, signing a super star for 7 years is a better idea than signing two average players for 7 years.  But, at age 30-31, signing 2 average players seems like a better idea.

So we got a little bit of an idea about how the overall ability of a hitter should be taken into account when signing a hitter for the long haul.  How about different types of hitters?  Instead of great vs. average in this case, I decided to look at OBP vs. Slugging %.  Basically, I took the OBP portion of the equivalent run statistic and divided it by the equivalent run statistic.  Better OBP players are in group 1 and better slugging players are in group 4.  Here is how the groups shook out:

Group 1: Otis Nixon, Ozzie Smith, Dave Magadan, Mark McLemore, Craig Counsell, Brett Butler, Omar Vizquel, Tony Phillips, Brad Ausmus, Rickey Henderson, Jose Vizcaino, Wade Boggs, Lenny Harris, Eric Young, Tim Raines, Gregg Zaun, Miguel Cairo, Kenny Lofton, Julio Franco, Randy Velarde, Mark Grace

Group 2: Tony Pena, Lou Whitaker, Mark Grudzielanek, Alan Trammel, Derek Jeter, Willie McGee, Tony Gwynn, Craig Biggio, Barry Larkin, Bobby Abreu, Wally Joyner, Paul Molitor, Damion Easley, Todd Zeile, Johnny Damon, B.J. Surhoff, Henry Blanco, Edgar Martinez, Chili Davis, Jorge Posada, Jeff Conine

Group 3: Marquis Grissom, Sandy Alomar, Paul O'Neill, Devon White, Tim Wallach, Harold Baines, Mike Cameron, Luis Gonzalez, Cal Ripken Jr., Kirk Gibson, Bobby Bonilla, Gary Sheffield, Todd Helton, Steve Finley, George Brett, Eddie Murray, Benito Santiago, Chipper Jones, Raul Ibanez, Matt Stairs, Eric Davis

Group 4: Dave Winfield, Frank Thomas, Fred McGriff, Brian Jordan, Ivan Rodriguez, Shawon Dunston, Lance Parrish, Gary Gaetti, Jim Thome, Moises Alou, Jeff Kent, Ellis Burks, Jim Edmonds, Garret Anderson, Reggie Sanders, Larry Walker, Andres Galarraga, Mike Piazza, Ken Griffey Jr., Andre Dawson, Joe Carter

I did the same thing as I did on the chart above and here are the results:


In this case, it appears that slugging percentage players start out as much better run producers than on-base gurus.  At age 27, where Group 4 takes its first peak, the difference is about 1.5 equivalent runs, which is a significant amount.  However, slugging does not age as well and over time.  On the other hand, on-base gurus stay about the same and even get a little better from age 24 all the way to age 37.  As a matter of fact, the on-base gurus bring the difference with group 1 to within about half a run at age 37.  As an FYI, I included age 39 and 40 on this chart, but the sample size diminishes significantly as many players are done after age 38.

One interesting pattern in this chart is that the "middle of the road" type players (groups 2 and 3), remain closer to group 4 than group 1 and actually get better and closer to group 4 over time all the way to age 39.  From age 24 to 33, the difference between the middle groups and group 4 is about 0.5 runs; slightly less than the difference between the middle groups and group 1.  However, by age 37, the difference is significantly less and almost non-existent.  By looking at this, it appears that sluggers start very well, but decline over time.  On-base gurus start below average and get slightly better until their mid-30s, but don't ever come within 0.5 equivalent runs of the sluggers in the group.  The middle groups start right in between sluggers and on-base gurus, but get closer and closer to the slugging group until they are essentially at the same level.

Therefore, I would argue that, in free agency, general managers should want somebody who is good at both getting on base (i.e. taking walks) and slugging (i.e. getting extra base hits and home runs) than someone who is significantly better at one than the other.  If the choice is between on-base gurus and sluggers, getting an on-base guru would likely make more sense since they will likely be cheaper and will decline less than sluggers.  Signing a slugger will very likely result in an overpayment; it's a lot less likely for the hybrid (good OBP and Slug%) players and the on-base gurus.

So here's what I take from this analysis:
  • It is better to sign a super star at age 27 than two average players at the same price.
  • It is better to sign two average players at age 30 than a super star at the same price.
  • "Hybrid" hitters are the best free agent hitters to go after.  These types of hitters generally level out by age 30 but don't get worse until their late 30s and produce very well.  Players like this include Edgar Martinez, Chili Davis, Jorge Posada, Jeff Conine, Marquis Grissom, Sandy Alomar, Paul O'Neill, and Devon White.  Obviously not everybody works out, but there's a better chance with these players.
  • If there is a choice between a strict slugger (i.e. Joe Carter) and a strict on-base player (Otis Nixon), go for the on-base player.  While they won't produce as well, they will likely be cheaper and they are more likely to be underpriced since on-base gurus generally level out or even get better until their late 30s.  Sluggers tend to decline starting at about age 32.
  • Dave Magadan was underrated.  This is totally off topic, but his career equivalent run amount was 5.395 (a team of 9 Dave Magadans would score, on average, 5.395 runs a game).  A team of 9 Cal Ripkens, on the other hand, would score about 5.128 runs a game.  A bit off topic, but something I noticed from this analysis.


(*For those who don't know, the first six years of a player's Major League career are controlled by the team that drafted them.  The player is paid basically what the team wants to pay them for the first three years and for the next 3 years, the player and the team either sign deals closer to market value or go to arbitration to determine salary.)

Monday, January 21, 2013

How Does a Batter Contribute to Their Team?

For as long as baseball has been around, there have been discussions about what makes a good player, and most specifically, which statistics contribute to the team's overall success.  Many people will say that the ability to play small ball contributes to a team's success.  Others will say that it's all about team speed and running the bases.  Still others will say it's all about power hitting.  Most, however, will say that it's a combination of all of the above.

So what do statistics say about this?

To determine whether small ball or big ball contributes more to a team's success when it comes to runs, I took a sample of nearly 1,000 teams from 1975 to 2012 (with strike-shortened seasons 1981, 1994, and 1995 excluded).  From this, I determined the total of the team's stolen bases and the team's home runs.  Stolen bases are a hallmark of small ball.  Small ball involves using hitters in such a way to ensure that your team scores one run; more runs are not expected.  For example, Batter A walks, steals 2nd, reaches 3rd on a ground ball to the right side, and scores on a sacrifice fly.  1 run on 0 hits.  Walks are really used extensively in both small ball and big ball, but the stolen base is used extensively in small ball.

So, I ran a correlation analysis between stolen bases and runs per game.  I found a correlation of -10.397%.  This means that stolen bases not only don't contribute to a team's success on offense, it actually deters a team from scoring, albeit a little bit.  I attached a scattergram below to show this relationship.  While the relationship looks fairly weak, there is a slight negative relationship.  A team that steals 200+ bases is slightly less likely to score runs than a team that steals 50 or fewer bases in a season.  Please note that there is an R^2 value on these graphs.  R is the coefficient used for correlation, so if you take the square root of the number, you should get the approximate correlation.


So what about home runs?  Big ball.  If a runner gets on first, you give up the more likely opportunity to get one run to try to get a curvy number on the scoreboard.  In these cases, a runner walks and then the next three hitters swing away while the runner stays at first base.  What kind of correlation is seen between home runs and runs per game?

The answer is that there is a fairly strong positive relationship.  There is a correlation of 75.850% between home runs and average runs per game.  In other words, home runs explain about 75.850% of a team's success on the scoreboard.  Again, I attached a scattergram, and unlike the stolen bases chart, there is a definite relationship and it's positive.

So, when determining whether big ball or small ball scores runs, it seems clear that big ball works best.  Could there be another way to determine how good an offense is?  Well, one of the most used statistics in baseball is batting average.  How well does batting average correlate to average runs?

The answer is that batting average correlates well, although perhaps not as well as some would expect.  The correlation is 80.635%, which is a solid positive relationship, but only about 5% more than the home run counting statistic.  Batting average includes not just home runs, but all hits.  This just shows how much more important home runs are than singles, doubles, and triples.  A home run is an automatic run.  A single (or even three consecutive singles) is not.  Again, a scattergram is below to show the relationship.

So far, we determined that stolen bases are a terrible way to contribute to offense, hitting the ball out of the ball park is better, and having a high batting average is slightly better.  What if we take the small ball out of batting average?  Batting average treats a walk and a sacrifice fly in the same way, yet a walk gets a runner on base and a sacrifice fly gets an out.  Batting average treats a home run and a single in the same way, yet a home runs is an automatic run and a single doesn't.  There are two statistics that take the small ball out of batting average: On-Base Percentage and Slugging Percentage.

On-base percentage is basically the same thing as batting average but there are two major differences.  Walks are treated as if they count with on-base percentage, with batting average, they don't count (as a matter of fact, if you go back in time a long ways, walks were once treated as outs in batting average).  The other difference is that sacrifices count as outs, instead of not counting at all.  How does this correlate with runs scored?

Surprisingly well.  While batting average and runs per game correlates at an 80.635% clip, on-base percentage and correlates with runs per game at an astonishing 89.188% clip.  In other words, the difference between using batting average and on-base percentage to determine a team's offensive prowess is larger than the difference between using batting average and home runs.  Walks matter in a positive way; sacrifices matter in a negative way.  The scattergram is seen below.

The other statistic that can be considered batting average without the smallball is slugging percentage.  In smallball, a leadoff single scores by strategic outs.  A stolen base there, a sacrifice there, and a wild pitch there, and the goal is to score one run.  A double would be treated the same way.  Maybe you don't need the stolen base, but a smallball team will try to score a single run with strategic outs.  In big ball, a home run is worth more than a single.  A leadoff single scores with a series of hits.  A leadoff double scores with maybe one hit.  Slugging percentage determines the average number of bases per at-bat.  A 1.000 slugging percentage means that a player averages one base every at bat.  They can do that by hitting a home run every 4 at bats or hitting a single every at bat.  Hitting a home run every 4 at bats would give the same player a mediocre .250 batting average and hitting a single every at bat is a 1.000 batting average.  Which is better at determining the average runs per game?

The answer is slugging percentage.  The slugging percentage is 90.917%, even better than on-base percentage.  Again, the difference between using slugging percentage to determine how well someone contributes to runs and using batting average is almost double the difference between using batting average and home runs.  Scattergram is below.

One statistic that is gaining steam in the baseball world is OPS, also known as On-Base Plus Slugging.  As the name implies, it is simply the sum of on-base percentage and slugging percentage.  Logically, this statistic doesn't really make all that much sense.  On-base percentage is the percentage of time someone gets on base; slugging percentage is the average number of bases a player gets every at-bat.  We're adding a counting stat (slugging percentage, which isn't really a percentage) and a percentage (on-base percentage).  However, does this give us a good offensive stat.

The answer is no.  Actually, it gives us a great baseball stat.  The correlation is an astonishing 95.198%.  Over 95% of a team's runs can be explained using OPS.  It makes sense; the statistic not only gives a team credit for walks and negative credit for sacrifices, but it also gives a team more credit for a double than a single and more credit for a home run than for a double.  The scattergram is below.

Is there any way to get a better statistic than OPS?  Should on-base percentage or slugging percentage be weighted differently?  I played around with the numbers for a little while, but I came to the conclusion that OBP needs to be weighted about twice as much as slugging percentage and that there may be a little work that can be done with exponents.  However, for the most part, the correlation only went up marginally: from 95.198% to 95.821%.  The scattergram is below.


So, by and large, using big ball to score runs is a better way to score than using small ball.  That's not to say there aren't situations where small ball works.  A tied game with a runner on 1st in the bottom of the 9th and the 7-8-9 players coming up may be one of the situations.  A runner on first with a light-hitting pitcher may be one of those situations.

However, this is why I really scoff at the notion that small ball wins games.  It doesn't.  Any manager that uses small ball in the 4th inning of a game where they're down 3-0 should not be managing baseball games (or at least shouldn't be calling the shots).  Any GM that values stolen bases over home runs is going to be fired soon since their teams won't win.

Sunday, January 20, 2013

Introduction

I should probably introduce this blog, which will more than likely go forgotten after a few posts here.  It happens, I've written on at least one in the past and other things come up.  Honestly, at this point, the blog is a function of my sports fandom, my love of numbers, and quite frankly, boredom.

I will mention that there are really three teams I follow more than anybody else.  There's the Milwaukee Brewers, the Green Bay Packers, and the Milwaukee Panthers men's basketball team.  You probably won't see much written up about the Milwaukee Bucks, the Wisconsin Badgers, or the Marquette Gold in this blog since I don't follow them (and if you can't tell, I don't particularly care for Marquette, although they are playing very well right now).

I do follow my teams very closely.  I have been to all 10 Panther home games this season (as painful as that is), all 17 Panther home games last season (slightly less painful), and 40 Brewers games last season.  Save for the first two Panther home games last year, I have a scorecard for each of the games.  I don't know if it's because I love numbers or that I always need to do more than one thing at a time, but I always enjoy keeping them.


To give you an idea on the kind of information I look into when I look into sports stats, here is what I found from my Brewers stats last year:

  • Martin Maldonado was exceptional in Brewers games I attended last year. His line (BA/OBP/Slug) was .389/.450/.648 in those games and he hit 3 home runs.
  • He put up those stats in the span of 60 plate appearances, which admittedly is a small sample size.  By my calculation, in those 60 plate appearances, he contributed about 14.68 runs to the team.
  • Starting catcher, Jonathan Lucroy, on the other hand, had 90 plate appearances and contributed about 12.90 runs to the team.  
  • Nyjer Morgan had 99 plate appearances and contributed 12.64 runs to the team.  And it just so happened that I was there in person to see all 3 of his home runs last year.  So I saw him in the games he provided a little more value than average for the team (line of .282/.313/.474), and still contributed 2 fewer runs than the backup catcher did in about 40 fewer plate appearances.
  • The Brewers had 4 players with at least 150 plate appearances in games I attended last year.  Ryan Braun and Rickie Weeks each had 164, Aramis Ramirez had 157, and Norichika Aoki had 151.  Even though he had fewer at bats, Ramirez contributed the most runs to the team (31.49), followed by Ryan Braun (28.93), and Rickie Weeks (26.20).  Norichika Aoki came in sixth at 14.83.
  • I must have seen Maldonado at his best and Aoki at his worst.  As stated above, Aoki contributed 14.83 runs in the games I attended and Maldonado contributed 14.68.  They essentially contributed the same number of runs, but it took Aoki 91 more plate appearances to do it.  However, I think most of this is attributed to luck (Aoki's line of .242/.344/.305 is far from his actual .288/.355/.433 line).  
  • To put this another way, based on these statistics, a team of 9 Martin Maldonados would have defeated a team of 9 Norichika Aokis by the average score of 9.44 - 3.79, based on their offensive stats in games I attended.
  • Want to take a guess who led the team in triples in games I attended last year?  Was it Carlos Gomez?  No, he had 0.  Rickie Weeks?  He had 2.  Ryan Braun?  He had 1 triple.  Nyjer Morgan?  He had 2.  The correct answer is ARAMIS RAMIREZ, with 3.
What do these stats mean?  Very little, since for the most part, these sample sizes are small.  Still, there may be some things to take from it.  For example, the fact that Aramis Ramirez hit 3 triples in 157 plate appearances while Rickie Weeks hit 2 triples in 164 plate appearances and Carlos Gomez hit 0 in 104 plate appearances shows that speed may not be all that large of an indicator in who gets triples.  Aramis Ramirez is a good power hitter who can hit the ball the opposite way; there is no way a ball hit to the wall in left field turns into a triple.  Maybe he can beat the shift.  I tend to think that the cause of this is that he is a strong hitter and that the sample size is small enough that there is a chance at a significant error.

I used the sample of plate appearances in games I attended since I had those stats handy, but this is just an example of the kinds of things I look at.  If I keep this blog going, I will go over how I calculated runs contributed (for future references, I'll call it Equivalent Runs, or Eq. Runs).  It will change slightly going into next season, but the basics of the formula will remain the same.

As you may be able to tell from what is written above, I really prefer baseball statistics to any other statistics.  I may write some about college basketball and football, but most of these posts will be based on baseball.