Rambles and RBI
Who was the greatest RBI hitter of all time? Was Cap Anson really as good as his 2000+ RBI suggest? A fresh new look at updating a classic statistic.
Rambles and RBI
It's been a while since I've had a really good ramble, so, if you don't mind, I would like to use this occasion to vent to the Internet community. (If you do not wish to read this ramble, you may skip ahead without missing valuable information.)
I'm sorry I have not put out a post in a while, but for the last few days I have suffered from terminal writer's block. After Sports Illustrated arrived in my mailbox last Wednesday, I finally came up with a Nobel prize winning idea. As you know, once a year Sports Illustrated publishes an issue called "Where Are They Now?" which catches up with stars of yesteryear. Every year, in the same issue, the magazine lists a group of athletes projected to succeed in the coming years, under the headline "Where Will They Be?"
My idea was a simple one: Find the lists of players published 5-10 years ago, then determine what percentage of them had gone on to success in their respective fields.
Unfortunately, SIVault.com does not give away this information freely. (I have a strict no-cost policy when writing for this site.) If any of you, my dear readers, has "Where Are They Now?" issues from 5-10 years back, please contact me at firstname.lastname@example.org. Thanks for your support.
For those of you who skipped the ramble, it is now safe to begin the article.
For years, the sabermetric community has dismissed RBI as an invalid statistic, claiming it says more about the hitting environment than the player hitting. This is a valid argument. However, for this article, I decided to look at what skill set leads to RBIs.
I will reference two mathematical terms in this article: Correlation and R-Squared Value. Correlation measures the strength of the tie between two values. It is always in the form of a number with an absolute value between 0 and 1. A correlation of 1 means that when Value A goes up, Value B also increases. A value of -1 means that when A goes up, B goes down. A value of 0 means that the two variables exist independently of each other.
R-Squared measures how much of the variation between a variable can be explained by the other. It always assumes a value between 0 and 1.
Three variables that seem to have a high correlation with RBI are Runs Created, Total Bases, and Extra Bases. Runs Created is a statistic, created by Bill James, that is found by taking the product of On-Base Percentage, Slugging Percentage, and At-Bats. Total Bases is a measure of how many bases a player records from hits, with a single worth one base, a triple worth 3, etc. Extra Bases is simply Total Bases - Hits. Here are the Correlations and R-Squared values for each stat, using the values for the 2010 season so far:
RC 0.72 0.52
TB 0.77 0.59
XB 0.86 0.74
All of these values seem to correspond well with RBI. However, XB (Extra Bases) seems to be the best measure for predicting RBI. In addition, XB seems to be a stat that sabermetricians are comfortable equating with a player's ability level; this stands in stark opposition to RBI.
Using Excel's LINEST function, we can find the best equation to predict a player's RBI totals for the 2010 season:
TruRBI = 15.09 + 0.57*XB
Here are the leaders in TruRBI for the 2010 season so far:
(Note: These are also the leaders in XB)
Jose Bautista - 88.05
Miguel Cabrera - 81.21
Adam Dunn - 77.79
Josh Hamilton - 77.22
Joey Votto - 73.8
A wonderful thing about this stat is that, because it uses the same scale as RBI, we can have the same thresholds that we do for Runs Batted In. (For example, an 100-TruRBI season is comparable to an 100 RBI campaign.
Now, while having this info for 2010 is nice, we should be able to use this information to find the best RBI hitter in history. First, we must tweak the formula a little, to account for the fact that we are dealing with careers, and not a single season.
Career TruRBI = 234 + 0.69*XB
We find that this equation explains 83% of the variation in career RBI.
Here are the career leaders in TruRBI:
Hank Aaron - 2362.65
Barry Bonds - 2332.29
Babe Ruth - 2248.8
Willie Mays - 2154.27
Stan Musial - 1961.76
Note that, just as there are only 3 players in history to eclipse the 2000-RBI mark, only 4 players have more than 2000 career TruRBIs. Also, Cap Anson, who, in his career, hit 2076 RBI, ranks 221st on our career list. Here are some other RBI hitters who don't stack up:
Lave Cross - Worth about 570 RBI less than his career total of 1371.
Ty Cobb - Drops from 7th all-time to 57th.
George Davis - Drops 256 spots in our rankings.
Nap Lajoie - Drops from #31 to #169.
Afterword: Anybody remember Vince Coleman? Here's a synopsis of his career:
He set a Florida A&M record by recording 65 steals in 69 attempts. That year, he led the NCAA Division I in both steals and stolen base percentage.
In the minor leagues, he set the all-time record for steals in a season, with 145; he set this record despite missing a month with a broken hand. He then stole 101 bases in AA before being called up to the majors.
Set a rookie record with 110 steals in his rookie season; he was only caught 25 times.
In the next two seasons, he also eclipsed the 100-steal mark. All three of the seasons rank in the top six in single-season stolen bases.
He led the National League in stolen bases 6 consecutive seasons.
From 1989-1990, he stole 50 consecutive bases without being caught.
He is currently sixth all-time in steals with 752.
Surely he deserves the nickname "The Man of Steal," or at least one better than "Vincent van Go."