Originally posted July 2, 2008
Chris,
I love the first parts of your analysis, and because I’m a total engineering nerd, I’m going to continue this statistics festival, but before I do, I want to say that before I read anything you did, I thought I’d be better off with consistent players. We’ll see if this holds true with my analysis.
First of all I’m going to use a random number generator to determine a category to analyze in an 18 team 6X6 H2H baseball league. The randomly generated category to look at was found to be RBI. This year (through 10 weeks), the lowest total RBI total by any team is 196. The highest is 339. The best record by any team in this category is 9-1-0. The worst record is 1-8-1. Surprisingly the team with the lowest RBI total does not have the worst weekly record nor does the team with the highest RBI total have the best weekly record. Here is a breakdown of RBI (through 10 weeks) by team and corresponding record.
Team W L
T Total
Red Sox
9 1
0
315
Giants 7 3 0 299
Blue Jays
7 3 0 298
Yankees 7 3 0 339
Cardinals 6 4 0 308
Rays 6 4 0 249
Royals 5 5 0 261
Braves 5 5 0 264
Orioles 5 5 0 278
Mets 5 5 0 279
Tigers 4 5 1 252
Angels 4 5 1 256
Brewers 4 6 0 236
Rangers 4 6 0 268
Athletics 4 6 0 216
Twins 3 7 0 196
Marlins 2 7 1 242
Phillies 1 8 1 205
What I’m curious about, and what remains unanswered, is how a team that averages 33.9 RBI per week (the Yankees in the table above) will compare to his common opponents for the year (they have averaged 26.8 RBI per week). Here are the weekly results for the Yankees:
Week Yankees Opponent Outcome
1
26 35 L
2 40 20 W
3 48 25 W
4 28 35 L
5 45 37 W
6 27 8 W
7 27 43 L
8 40 26 W
9 27 25 W
10 31 14 W
Note that the Yankees have the highest RBI total this year to date, but they have lost two more games than the leader in the category, the Red Sox, who are 9-1-0 so far. You can see that on the three occasions that the Yankees lost, they had fewer than their average RBI in that week. There, that proves it, it is better to get consistent performance out of your players than streakiness.
Ok, so I’m actually not ready to conclude that consistency is the winning way yet, instead, I’ll have to do a little more investigating first. I did an analysis of all players with 14 or more ABs in one week and found out that the average player in a week would have three RBI in twenty-two at-bats. If we had a team that gave us 220 ABs in one week (representing starters and irregular backups on a nine position fantasy offense, we can assume they will produce 30 RBI on average. Certainly, the more data we use the more statistically relevant our data will be, but for now we can assume that we will compare three different teams, the average Yankee team, the league average team , and the average Yankee opponent. Even though this is not necessarily true, we will assume that each actual team (ours and each opponent) will produce their average RBI in 220 at-bats (we’ll come back to this assumption later). Note that when we are building these teams, we are still assuming that all players are equally capable of producing an RBI in an at-bat. In reality, we know that someone like Alex Rodriguez is more likely to have opportunities to drive in runs (and is perhaps even more likely to succeed in doing so) than say, Brad Ausmus. But to make this analysis possible, we have to make some assumptions! In order to dig deeper into this investigation, I’ve decided that when comparing the three teams, we will look at what happens when these three teams play each other. Before I do that, I will assume that our data will follow a normal distribution ( I used http://davidmlane.com/hyperstat/z_table.html to get my probabilities). This means that we are assuming that the most frequent RBI total will be the average for each player on that team and that the further away from that average value, the lower the probability a player has of actually producing that number of RBI in a week. Because a player cannot have a negative RBI total, you will see that there is a significant possibility that a player will have 0 RBI in a week. Here are the three probability tables for the “average players involved” using the standard deviation generated over those 745 RBI in 5404 AB (the weekly RBI total of the players with 14 or more AB).
Probability a Representative Player Will Have a Weekly RBI:
RBI Avg Yank
Avg Opponent Avg Team
0 7.64 13.90 13.4
1 8.40 11.95 11.7
2 12.88 15.80 15.7
3 16.36 17.30 17.3
4 17.21 15.69 15.8
5 15.01 11.80 12.0
6 10.85 7.35 7.6
7 6.49 3.80 3.9
8 3.22 1.62 1.7
9 1.32 0.58 0.6
10 0.45 0.17 0.2
11 0.13 0.04 0.04
12 0.03 0.008 0.01
13 0.006 0.001 0.0015
14+ 0.0008 0.0002 0.0003
As you can see, the average Yankee hitter has a better chance of producing four or more RBI in a week and a lower chance of producing three or fewer RBI in a week. What is striking to me is the very low probability of a Yankee player producing 0 RBI in a week, it is almost half that of either of the other teams. In order to get data by which I can compare these teams, I have used a random number generator to randomly assign an RBI total for each position on each team (only Yankees and opponents) for each of the ten weeks. The total RBI for the team was then tallied and compared to each other team’s actual and average weeks. Here are the results:
Week Actual Yankees Sim Yanks Avg Yanks Actual Opponent Sim Opp. Avg Opp.
1 26 32 34 35 38 27
2 40 29 34 20 28 27
3 48 29 34 25 39 27
4 28 24 34 35 24 27
5 45 32 34 37 24 27
6 27 38 34 8 29 27
7 27 22 34 43 17 27
8 40 40 34 26 33 27
9 27 54 34 25 37 27
10 31 32 34 14 31 27
The resulting records for each opponent vs. the field would be:
Week Actual Yanks Sim Yanks Avg Yanks Actual Opp. Sim Opp. Avg Opp.
1 0-5-0 2-3-0 3-2-0 4-1-0 5-0-0 1-4-0
2 5-0-0 3-2-0 4-1-0 0-5-0 2-3-0 1-4-0
3 5-0-0 2-3-0 3-2-0 0-5-0 4-1-0 1-4-0
4 3-2-0 0-4-1 4-1-0 5-0-0 0-4-1 2-3-0
5 5-0-0 2-3-0 3-2-0 4-1-0 0-5-0 1-4-0
6 1-3-1 5-0-0 4-1-0 0-5-0 2-3-0 1-3-1
7 2-2-1 1-4-0 4-1-0 5-0-0 0-5-0 2-2-1
8 4-0-1 4-0-1 3-2-0 0-5-0 2-3-0 1-4-0
9 1-3-1 5-0-0 3-2-0 0-5-0 4-1-0 1-3-1
10 2-2-1 4-1-0 5-0-0 0-5-0 2-2-1 1-4-0
This yields an overall record vs. the field of:
Actual Yankees
28-17-5
Sim Yankees
28-20-2
Average Yankees
36-14-0
Actual Opponent
18-32-0
Sim Opponent
21-27-2
Average Opponent
12-35-3
If summarized a bit differently:
Actual Teams
46-49-5
Sim Teams
49-47-4
Average Teams
48-49-3
This doesn’t seem like much of a difference overall, though it would have nice to have had my average productivity rather than my actual performance. It would have also been nice to see my opponents put up an average performance rather than their actual performance, although I might have caught a break by playing the actual opponent each week rather than the simulation opponent. That being said, I think it’s more important to look at each team’s record versus each other team to get more perspective on the issue.
Actual Yankees
Sim Yankees Average Yankees Actual Opponent Sim Opponent Average Opponent
Actual Yankees
5-4-1 4-6-0 7-3-0 6-3-1 6-1-3
Sim Yankees
4-5-1 3-7-0 6-4-0 7-2-1 9-1-0
Average Yankees
6-4-0 7-3-0 6-4-0 7-3-0 10-0-0
Actual Opponent
3-7-0 4-6-0 4-6-0 3-7-0 6-4-0
Sim Opponent
3-6-1 2-7-1 3-7-0 7-3-0 3-7-0
Average Opponent
1-6-3 1-9-0 0-10-0 4-6-0 7-3-0
If you look on the vertical axis, you can see a team’s record (over the 10 weeks) against the other teams. On the horizontal axis, then, is the opposition’s record against the selected team. For example, if I look at the third row and fifth column of this chart, I find the record of the average Yankee team vs. the simulated opponent.
Does all of this data reveal everything? I’m not quite sure. It sure looks like over the course of a season a team that has poor totals in a category will generally lose that category in a head-to-head matchup to a team with superior totals in that category. I do see that the actual Yankee team has a better record than the simulation or average total would predict. Likewise, the actual opponent has a better record than the simulation or average total would have predicted. So I guess what I’m trying to say is: I’ll take my actual totals every week and not worry about consistency.
I promised a short explanation of why we should not assume that we can just take the average RBI total for our team and our opponents and use them for this analysis. The summary is that every team is different, so technically, I should produce a normal distribution of data for each team, then perform simulations for each team on each week (rather than clumping all of my opponents into one typical opponent). While technically more accurate, this type of analysis would take much more time than I’m willing to invest at this point of time. The data provided convinces me that I need not worry about consistency too much when putting together my team. Finally, not only do we see that every team is different, but we also find that each team can be very different in a given week. Injuries, days off, and owner changes can influence the players producing those RBI for a team. One thing I can be certain of is that you should maximize your RBI chances by having viable backups for as many positions as possible in leagues that let you make daily substitutions. If you are concerned with your peripheral statistics (batting average, on-base percentage, and/or slugging percentage) then perhaps you can focus on getting backups with good peripherals and take your chances on the accumulated stats (runs, homeruns, RBI, stolen bases).
Chris, it took some time, but I think this analysis is going to be helpful when I put together teams in the future. I hope you agree.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment