Tuesday, November 27, 2007

The Gezer conundrum, again

My anonymous commentator is trying to understand the park effects at Gezer. Actually, so am I.

The problem in a nutshell is how to distinguish between the skill levels of the home teams at Gezer and the effects of the park itself. Gezer was home to Bet Shemesh and Modiin, the league's two biggest slugging teams. If you look at the home run totals at Gezer versus the other fields, you'll find a tremendous gap:

Teams at Gezer scored over 2.8 times as many home runs per game as teams playing at Yarkon, and about 2.7 times as many home runs per fly ball. Compared to Sportek, the ratios are 2.4 and 2.2. Overall, 117 of the IBL's 187 home runs, or 63%, were hit at Gezer, where just 39% of the games were played.

But the performance gap narrows substantially when we look at broader measures of offense, not just home runs:

Batters at Gezer actually reached base less often than those at Sportek, and not a whole lot more than those at Yarkon. The slugging gap is substantial, but not nearly as wide as the home run gap. This may reflect on the pitchers of Bet Shemesh and Modiin, which were among the league's best.

If we count times reached base on errors as hits - which for all intents and purposes they are - the gap narrows further:

Remember that error rates were highest at Yarkon and Sportek. Counting errors, it turns out that on-base rates were pretty similar across the fields, with Sportek leading. In slugging, which is less important to run scoring than getting on base, Gezer led Sportek by just 60 points (or 13%) and Yarkon by 110 (28%).

Translated into run scoring, in runs per game, plate appearance and 27 outs:

That's right. At Gezer, the average game scored just 12% more runs than at Yarkon and 10% more than at Sportek. Per plate appearance, that's 14% more than Yarkon and 8% more than Sportek; per out, 15% more than Yarkon and 7% more than Sportek.

If you followed my recent post about how runs are scored, you'll understand why. Getting on base is much more important than slugging. And there are plenty of ways to score other than home runs.

What about the park factor?

But that 12-15% run boost at Gezer is not Gezer's park factor for runs. How much of the run increase was due to the field at Gezer, and how much due to the high-slugging teams that played there?

To find that out, you have to compare how the same set of teams played at Gezer versus away from Gezer. That's what I ultimately did in this post, where I took all the teams that played each other at least twice both at Gezer and elsewhere (and likewise for the other parks). This gives us a close approximation of how the different parks affect the same player matchups.

And that's where I discovered that though Gezer produced a home run boost of 76% over Sportek and 176% over Yarkon, overall run production for the same team matchups was just 4.4% higher than at Yarkon, and was actually 2.5% lower than at Sportek.

Now, these figures may be substantially inaccurate. The sample size is very small, with just 122 games distributed among six teams and three fields. The "pros" estimate major-league park effects over at least three full seasons of 162-game play. All sorts of noise could be skewing these results: a few unrepresentative games, or an untimely injury, or the distribution of pitchers in the games being compared.

But it seems clear that most of the 12-15% difference in run production among the three venues (as opposed to home run hitting) can be attributed to the offensive power of the teams that played in them.

This is consistent with the per-team run production averages:

Look at Bet Shemesh and Modiin, which shared Gezer; Netanya and Tel Aviv, which shared Sportek; and Petach Tikva and Ra'anana, which shared Yarkon. Most of the apparent park factors for run production are in fact due to differences in team offensive ability.

The upshot

What does this mean for comparing player performance? That park factors have their main impact on individual components of performance, such as home runs or strikeout rates. When comparing them among players, we have to pay close attention to park effects. But when comparing overall run production, we can be sloppier, since the park differences are not great.

For precise comparisons, we should weight performances by their respective parks by adjusting the run production estimates on a per-park basis, and I hope to post park-adjusted tables for batting leaders soon. But whatever corrections are necessary will not change the overall offensive domination by the Bet Shemesh sluggers.

One last comment. The two run production estimators I'm currently using, Base Runs and custom IBL linear weights, when calibrated to match overall IBL run scoring are also quite accurate at estimating overall run production at Gezer. But they show similar biases for the other two fields, overestimating production at Sportek by about 3.7% and underestimating production at Yarkon by some 3%. This could be pure chance, if teams overall scored about 12 more runs than should be expected at Yarkon and about 12 fewer at Sportek. But it might indicate that the formulas aren't quite capturing all the aspects of run production at the two fields.

Perhaps run estimates based on these formulas should be scaled up or down 3% to calibrate them to the actual results at Sportek and Yarkon.

