tiger tales temp: sabermetrics

Showing posts with label sabermetrics. Show all posts

Monday, February 18, 2013

What is the Best Tigers Line-up?

Every fan has his own idea of the ideal line-up. Traditionalists tend to like to have a speedster lead off, a bat-control guy hit second, the best hitter third and the best slugger (who is not also the best hitter) bat fourth. Some just want the numbers one and two hitters to get on base a lot and don't care as much about speed. Others follow The Book by Tom Tango, Mitchel Lichtman and Andrew Dolphins which claims that the best hitter should not bat third, bat rather first, second or fourth. Still others toy with the idea of having the best hitter on the team lead off, the second best hitter bat second, etc. with the reasoning that the best hitters should get the most at bats.

One thing I like to do before every season is check out the line-up tool at Baseball Musings. Developed by analysts Cyril Morong, Ken Arneson and Ryan Armbrust, it estimates the number of runs a line-up would score based on every batter's on-base percentage (OBP) and slugging average (SLG). Since getting on base (OBP) and advancing runners with hits (SLG) are the two most important elements of run scoring, their method makes some sense.

However, the line-up algorithm also has limitations. Perhaps most importantly, it does not consider the speed of base runners. It also does not address psychological factors such as batters feeling comfortable in certain spots. What it does do is try to determine the best line-ups based purely on hitting which is a good place to start.

Using the Bill James Handbook projections, I plugged OBP and SLG for the nine Tigers starters into the line-up analyzer. The Handbook projections tend to be optimistic, but this is the time of the year to be optimistic. Anyway, one possible line-up is shown in Table 1 below. The line-up tool says that line-up would score 5.687 runs per game or 921 runs in 162 games. That's a lot of runs, but that's because we are assuming that all nine players are going to play 162 games which, of course, won't happen. That's OK though. The goal is just to compare different line-ups.

Table 1: Tigers Projected Line-up


Player 1:
Player 2:
Player 3:
Player 4:
Player 5:
Player 6:
Player 7:
Player 8:
Player 9:

The line-up tool considers every possible permutation of those nine batters and estimates that the best line-up would score 5.766 RPG or 934 runs, while the worst would score 5.502 RPG or 891 runs. That is a difference of 43 runs which is not huge, but not insignificant either - between four and five wins.

Table 2 shows that four of the five best line-ups have Prince Fielder leading off! In fact, eight of the top ten have Fielder at number one and all of the top thirty have either Fielder or Alex Avila. Remember though that this only looks at hitting and does not consider speed of which Fielder and Avila have none. More interesting to me is Cabrera in the two hole in all of the top thirty line-ups. That actually makes some sense, but I'd probably want someone with at least a little speed (as well as the ability to get on base) in front of him.

You also might notice that all of the long list of "best" line-ups have Omar Infante batting ninth preceded by Andy Dirks, Torii Hunter and Jhonny Peralta in some order. That also looks good to me, although we already know that Hunter will hit second in Jim Leyland's line-up.

Table 2: The Five Top Run-Producing Line-ups

5.766	Fielder	Cabrera	Jackson	Avila	Martinez	Peralta	Hunter	Dirks	Infante
5.766	Avila	Cabrera	Jackson	Fielder	Martinez	Peralta	Hunter	Dirks	Infante
5.766	Fielder	Cabrera	Jackson	Martinez	Avila	Peralta	Hunter	Dirks	Infante
5.765	Fielder	Cabrera	Martinez	Avila	Jackson	Peralta	Hunter	Dirks	Infante
5.765	Fielder	Cabrera	Jackson	Avila	Martinez	Hunter	Peralta	Dirks	Infante

Table 3 looks at the worst line-ups. Right away, you see the first problem - that Cabrera is batting ninth which would obviously never happen. As bad as those line-ups are, they would still produce less than 5% fewer runs than the best line-ups. We want those five percent though, so those line-ups are out.

Table 3: The Five Lowest Run-Producing Line-ups

5.502	Peralta	Infante	Jackson	Dirks	Hunter	Martinez	Fielder	Avila	Cabrera
5.502	Dirks	Infante	Jackson	Peralta	Hunter	Martinez	Fielder	Avila	Cabrera
5.502	Peralta	Infante	Jackson	Hunter	Dirks	Martinez	Fielder	Avila	Cabrera
5.503	Peralta	Infante	Jackson	Dirks	Hunter	Fielder	Martinez	Avila	Cabrera
5.503	Dirks	Infante	Jackson	Peralta	Hunter	Fielder	Martinez	Avila	Cabrera

It's doubtful than any manager would ever have Fielder or Avila bat leadoff, but suppose we have Jackson lead off followed by Cabrera, an idea that appeals to me. The bottom four will be Dirks, Hunter, Infante and Peralta in some order. Fielder, Martinez and Avila will bat 3-4-5 in some order. I played around with various combinations and came up with the line-up in Table 4. This one would score and estimated 930 runs, 9 more runs or one win better than the Table 1 line-up. That's probably not worth the uproar caused by having Cabrera batting second, but I like it in theory.

Table 4: One More Line-up


Player 1:
Player 2:
Player 3:
Player 4:
Player 5:
Player 6:
Player 7:
Player 8:
Player 9:

Thursday, January 24, 2013

WAR Baseline is All About Playing Time

For people trying to learn sabermetrics, one of the most confusing concepts is the replacement baseline used in the Wins Above Replacement (WAR) statistics. In simple terms WAR is the wins a player contributed to his team's win total above what you would expect from a replacement level player - a theoretical player who could be acquired for league minimum salary. An example of a replacement player would be a player in AAA, who is good enough to get some time in the majors, but is not regarded as a top prospect.

Why is replacement used instead of average or zero? When building a ball club, comparing players to league average can be problematic. If a team is forced to replace a player due to an injury, he is not likely to be replaced by an average player or even a slightly below average player. Average players are actually good players and are not generally available quickly or cheaply. In most cases, the injured player will be replaced by a player who is substantially below average.

Comparing players to zero is also not generally a great idea because your replacement is not likely to bat .000 for any length of time. Your replacement will usually be somewhere between zero and average. Based on examination of data over several years, analysts determine how good a player typically needs to be to get a decent amount of playing time. The threshold above which a player must perform in order to get consistent at bats is called replacement level. Different people use somewhat different replacement levels, but I'll follow the FanGraphs.com definition here.

If you are interested in playing general manager and are concerned about roster construction or how much money a player is worth,the replacement threshold is the way to go. If you want to do something else such as selecting hall of famers or award winners or you just want to know how many players on your favorite team are above average, you can use an alternative baseline.

If you do decide to shun replacement level for something more intuitive though, you should understand the consequences. It all comes down to how much credit you want to give for playing time. Whether you choose Wins Above Average (WAA) or Wins Above Zero (WAO) or WAR can make a substantial difference when there is a lot of variation in playing time among players.

Suppose, Gary Great and Sammy Solid were both second basemen with exactly 600 Plate Appearances (PA). They were both average base runners and average defenders and played in neutral parks. The only way they differed was that Gary was a much better hitter than Sammy. Gary had a .400 OBP, .540 slugging average and .398 Weighted On-Base Average (wOBA). Sammy had a .325 OBP, .450 slugging average and .335 wOBA. The question is how many wins was Gary worth compared to Sammy?

We would normally have to do a lot of calculations involving base running, fielding and park effects in order to calculate Wins, but the question is simplified by assuming that the two players were similar in every way except batting. Based on PA and wOBA, Gary had 40 Batting Runs which means than he contributed an estimated 40 runs above what would be expected from an average player in the same number of plate appearances. Since 10 runs is worth approximately one win, he was 4 WAA.

Sammy had 10 Batting Runs or 1 WAA. So, there was a a gap of three wins between the two players. (Note that we should actually be adding a fraction of a win for playing second base, but they both get the same fraction so we'll ignore it for simplicity.)

What if we use zero as the baseline rather than average? An average player is worth 68 runs over 600 PA, so Gary was 40 + 68 = 108 runs above zero (also called Runs Created) or 10.8 WA0. Sammy had 78 Runs Created or 7.8 WA0. Again, the the two players were separated by three wins.

Finally, a replacement player is worth 20 runs per 600 PA below an average player, so Gary was 40 + 20 Runs Above Replacement or 6 WAR. Sammy was 30 Runs Above Replacement or 3 WAR. So, one more time there were three wins between the two batters. There was a very big disparity in the number of wins a each player was credited in WAA, WA0 and WAR, but no difference in the number of wins separating the two players because they had the same number of PA.

It's another story when players are far apart in their numbers of PA Suppose Gary had a .398 wOBA in 300 PA while Sammy still had a .335 wOBA in 600 PA. In that case, Gary had 20 Batting Runs compared to 10 for Sammy. That comes out to 2 WAA for Gary and 1 WAA for Sammy. So Gary was one win better by that measure. Does this make sense? Is a great hitter who missed half the season worth more wins than an above average hitter in a full season?

Let's see what happens if we change the threshold. An average player is worth 34 runs in 300 PA, so Gary was 20 + 34 = 54 Runs Above Zero. Sammy was still 78 Runs Above Zero. In terms of wins, Gary was 5.4 WA0 and Sammy 7.8 WA0. In this case, Sammy was 2.4 Wins better than Gary.

Finally, if replacement is the baseline, Gary was 20 + 10 = 30 Runs Above Replacement or 3 WAR while Sammy was 10 + 20 = 30 Runs Above Replacement or 3 WAR. So, they were considered equal contributors to wins by this metric.

The lesson learned is that the baseline you choose can make a large difference in your evaluation of players. In the first case, Gary was the better player. In the second instance, Sammy was the better player by a substantial margin. In the third situation, they were equals. You don't have to use replacement level if you don't want, but it's important to be aware how much the results vary among baselines.

Wednesday, June 13, 2012

What Types of Teams Make the Playoffs?

With the Tigers having a sub-par season both offensively and defensively, there has been a lot of talk about what they need to do to get back on track for the playoffs. I've heard some fans say that the team is built around offense and that is the primary area where they need to improve. They are currently 9th in the AL in runs scored per game and many feel as if they have the potential to be in the top four in the league. If they can do that, the offensive-minded fans think they can make the playoffs.

On the other hand, the Tigers are also 11th in runs allowed per game. I've read in a few places that fans should not be too concerned about run prevention because the team is built around hitting. Personally, I did not expect them to be nearly as bad as 11th in the league, especially playing in Comerica Park, which is generally a neutral park offensively. I also don't think that kind of run prevention is likely to get them into the playoffs no matter how much they hit.

Anyway, this reminds me of a simple study I published in Beyond Batting Average looking at what kinds of teams are most likely to make the playoffs. I'm updating that analysis here using a slightly different approach.

I examined the relative importance of offense and defense (pitching/fielding combined) in reaching the playoffs for all major league teams from 1990-2011 (excluding the strike-shortened 1994 season). I ranked each major league team in every year based on offense (runs scored) and defense (runs allowed). Then, each team’s offense was categorized as “Good”, “OK” or “Poor” based on their rank. If a team finished in the top third of the league in runs scored, then it was considered Good. If it finished in the middle third, then it was placed into the OK group. Teams in the bottom third were classified as Poor. Team defense was categorized the same way (Good, OK, Poor) based on runs allowed.

Crossing the offense classification (Good, OK, Poor) with the defense classification (Good, OK, Poor) yielded nine categories (Good offense and Good defense, OK offense and Good defense, etc) shown in Table 1 below. For example, the 2008 Brewers were seventh in the National League in runs scored and had the fourth fewest runs allowed, so they went into the OK offense/Good defense category.

Table 1: Offense versus Defense in Making the Playoffs, 1990-2011

Offense	Defense (pitching/fielding)	Teams	Playoffs	%
Good	Good	60	51	85.0
OK	Good	78	41	52.6
Good	OK	70	36	51.4
Bad	Good	44	8	18.2
Good	Bad	52	6	11.5
OK	OK	75	8	10.6
Bad	OK	76	2	2.6
OK	Bad	68	0	0.0
Bad	Bad	87	0	0.0

Data from Baseball-Databank.org

The table indicates that there were 60 teams between 1990-2011 which could be categorized as Good offense/Good defense. Not surprisingly, 51 (or 85.0%) of those clubs made the playoffs. The next most likely types of teams to make the post-season were OK offense/Good defense (52.6%) and Good offense/OK defense (51.4%). A team in any of the other classifications had less than one in five chance of making the playoffs.

The data tell us that any team with post-season aspirations needs to be either strong in both offense and defense or strong in one and OK in the other. With all the traditional talk about the importance of pitching and fielding in winning games , some may be surprised that there does not seem to be an advantage in excelling defensively versus offensively.

On the negative side, if a team is in the bottom third of the league in either offense or defense, they have little chance of making the playoffs, even if they are in the top third in the other phase of the game. If they are just OK in both offense and defense, they also do not have favorable odds.

Right now, the Tigers would be categorized as OK Offense/Bad Defense. Based on the above, they will probably have to move up a notch in both areas in order to get into the playoffs.

Saturday, May 12, 2012

Verlander and Smyly Among Early Pitching Leaders

Most readers of this blog are aware of the limitations of ERA in evaluating pitcher performance. Two of the biggest issues are:

(1) ERA gives pitchers full credit/blame for results of batted balls in play despite the fact that they share that responsibility with fielders. For example, a pitcher with a strong defense behind him will tend to give up fewer hits (and thus fewer runs) than if he had a poor defense behind him.

(2) ERA gives pitchers full responsibility for sequencing or timing of events, that is, it assumes that they can control when they give up hits and walks. For example, if a pitcher pitches extraordinarily well with runners in scoring position in a given year, he will have a lower ERA than if he had a typical year in those situations. Additionally, a pitcher who tends to bunch base runners together in single innings will have a higher ERA than if he had a typical year distributing base runners more evenly.

In reality, pitchers have limited control over both the number of batted balls that drop for hits and sequencing of events. Thus, Defense Independent Pitching Statistics (DIPS) such as FIP, xFIP, tERA and SIERA have been developed to remove some of the noise of ERA. DIPS are based on things that pitchers do control for the most part - walks, hit batsmen, strikeouts, home runs and types of batted balls (ground balls , fly balls, line drives, pop flies).

Because they are based on things that pitchers essentially control, the DIPS metrics are said to be better measures of true talent than ERA. As a result, they are also better than ERA at predicting future performance. However, they only measure a portion of a pitcher's talent and should be used as complements to ERA rather than as replacements.

More and more fans are becoming comfortable with DIPS theory, but it is still a really difficult concept to get across to the mainstream. If you ever try to explain FIP or any other DIPS statistic to the uninitiated, you will probably find that they are skeptical of a pitching statistic which ignores hits. They are not likely to buy into it even if they realize the limitations of ERA.

So, rather than asking fans to take the big leap from ERA to FIP, why not meet them half way? Instead of removing hit prevention and sequencing in one step, it might be better to remove one factor at a time. Bill James did that with his Component ERA (ERC). Applying the runs created methodology to pitchers, he determined what a pitcher's ERA should have been based on walks, hit batsmen, strikeouts, homers AND hits allowed. I'm going to look at some similar statistics here based on more modern measures such as linear weights and Base Runs.

We often use Weighted On-Base Average (wOBA) to measure overall hitting performance and it can also be used for pitchers. The American League wOBA Against (wOBAA) leaders are shown in Table 1 below. Tigers ace Justin Verlander is currently third in the league with a .245 wOBAA. Rookie starter Drew Smyly also ranks among the leaders with a .284 wOBAA.

Table 1: AL wOBA Against Leaders

Player	Team	IP	wOBAA
Jered Weaver	LAA	50.2	.211
Jake Peavy	CHW	52.1	.225
Justin Verlander	DET	51.1	.245
Jason Hammel	BAL	38.2	.253
Jason Vargas*	SEA	51.2	.259
Gavin Floyd	CHW	46.1	.262
C.J. Wilson*	LAA	41.2	.265
Felix Hernandez	SEA	59.0	.270
Chris Sale*	CHW	33.0	.274
Brandon Morrow	TOR	47.2	.282
CC Sabathia*	NYY	51.1	.283
Drew Smyly*	DET	34.0	.284
Ricky Romero*	TOR	48.0	.285
Jeff Niemann	TBR	33.2	.287
Jake Arrieta	BAL	44.2	.289
Neftali Feliz	TEX	32.0	.294
Tommy Milone*	OAK	43.2	.295
David Price*	TBR	45.1	.295
Jeanmar Gomez	CLE	29.0	.296
Wei-Yin Chen*	BAL	37.0	.297

It's always good to convert to runs allowed when trying to evaluate pitchers, so we'll do that next. The Base Runs measure was created by David Smythe in the early 1990s. It is based on the idea that we can estimate team runs scored if we know the number of base runners, total bases, home runs and the typical score rate (the score rate is the percentage of base runners that score on average. Base Runs also works well for individual pitchers. The complete formula can be found here.

Justin Verlander has 13 Base Runs Against in 51 1/3 innings so far this year. This means that he should have allowed an estimated 13 runs based on the number of base runners, total bases and home runs he has allowed. He has allowed 17 actual runs, so runs are scoring against him at a higher rate than you would expect so far. That could possibly be due to bad defense, unfortunate timing or just bad luck on locations of batted balls.

Verlander has 11 Base Runs Above Average (RAA) which means that he has saved the Tigers an estimated 11 runs compared to the average pitcher in the same number of innings. Table 2 shows that he is tied for third in the American League on that metric. Smyly is 14th with 4 RAA.

Table 2: AL Runs Above Average Leaders

Player	Team	IP	Base Runs	RAA
Jered Weaver	LAA	50.2	9	14
Jake Peavy	CHW	52.1	11	14
Justin Verlander	DET	51.1	13	11
Felix Hernandez	SEA	59.0	17	11
Gavin Floyd	CHW	46.1	13	9
Jason Vargas*	SEA	51.2	16	9
Jason Hammel	BAL	38.2	10	8
C.J. Wilson*	LAA	41.2	13	7
Ricky Romero*	TOR	48.0	17	5
CC Sabathia*	NYY	51.1	19	5
Chris Sale*	CHW	33.0	11	5
Brandon Morrow	TOR	47.2	18	5
David Price*	TBR	45.1	17	5
Drew Smyly*	DET	34.0	12	4
Tommy Milone*	OAK	43.2	16	4
Jeff Niemann	TBR	33.2	12	4
Neftali Feliz	TEX	32.0	12	4
Henderson Alvarez	TOR	48.1	19	3
Derek Holland*	TEX	46.2	19	3
Jake Arrieta	BAL	44.2	18	3

Finally, Table 3 shows that Verlander has allowed 2.26 Base Runs per nine innings. About 93% of runs are earned, so multiply this result by .93. to put it on the same scale as ERA. The final result is a weighted component ERA. Although, I am not using linear weights here, I call it WERC because others have said the like the name. It's really not a novel idea though. Toirtap of Walk Like a Saber has been using Base Runs to evaluate pitchers for a while but prefers to not convert to the ERA scale.

Getting back to the example, Verlander has a 2.26 WERC which again is third in the league. This is worse than his actual ERA of 2.63 which indicates that he may be pitching better than his ERA suggests. Smyly's WERC of 2.93 is not as good as his league-leading 1.59 ERA, but is probably more reflective of how he has pitched - very well, but not the best pitcher in the league.

Table 3: AL WERC Leaders

Player	Team	IP	Base Runs/9 IP	WERC
Jered Weaver	LAA	50.2	1.69	1.57
Jake Peavy	CHW	52.1	1.90	1.76
Justin Verlander	DET	51.1	2.26	2.10
Jason Hammel	BAL	38.2	2.45	2.28
Gavin Floyd	CHW	46.1	2.57	2.39
Felix Hernandez	SEA	59.0	2.63	2.45
Jason Vargas*	SEA	51.2	2.77	2.57
C.J. Wilson*	LAA	41.2	2.81	2.62
Chris Sale*	CHW	33.0	2.94	2.73
Drew Smyly*	DET	34.0	3.15	2.93
Ricky Romero*	TOR	48.0	3.23	3.00
Neftali Feliz	TEX	32.0	3.27	3.04
Jeff Niemann	TBR	33.2	3.27	3.05
David Price*	TBR	45.1	3.32	3.09
Brandon Morrow	TOR	47.2	3.34	3.10
Jeanmar Gomez	CLE	29.0	3.37	3.13
CC Sabathia*	NYY	51.1	3.40	3.16
Tommy Milone*	OAK	43.2	3.41	3.17
Jake Arrieta	BAL	44.2	3.61	3.35
Derek Holland*	TEX	46.2	3.61	3.36

Note: The raw data used in the above calculations were taken from Baseball-Reference.com

tiger tales temp

Monday, February 18, 2013

What is the Best Tigers Line-up?

Thursday, January 24, 2013

WAR Baseline is All About Playing Time

Wednesday, June 13, 2012

What Types of Teams Make the Playoffs?

Saturday, May 12, 2012

Verlander and Smyly Among Early Pitching Leaders

About Me

Detroit Tigers Links

Blog Archive

Detroit Tiger Links

Baseball Links

Labels

wdfn

WDFN AM 1130 Detroit

Coast to Coast Tickets