Sunday, September 2, 2007

Does Gezer Field inflate offense?

Let's talk about park effects.

It's well known that different ballparks affect the play of the game in different ways. A given park may make it easier or harder to hit home runs, or doubles, or even to walk or strike out. It's what are called "park effects" or "park factors".

Take Gezer Field. Ari Alexenberg has identified some of the relevant factors. The 395ft altitude, in the Shfela foothills, may have some effect on the flight of the ball. More significant are the short distances down the right and left field lines (280 and 316 feet respectively), which turn fly balls into home runs, and the sharp upward slopes near the outfield fences, which turn routine pop-ups into wacky plays. Not to mention the relatively narrow foul territory and the lighting post in the middle of right field.

The question is how to quantify these effects.


Measuring park effects
On first thought, it might sound simple: just calculate the stats for all games played at a given field, and compare them with the overall league averages. If the games at that field yield more runs (for example) than the average, it's a hitter's park.

The problem with this approach is obvious. Take, for example, Yankee Stadium. Let's say that in the average game at Yankee stadium 6 runs are scored, compared with a league average of 4. Does that mean the park is responsible for the 50% increase in run production? Not necessarily. It could just be that the Yankees, who played in every game at Yankee Stadium, have a powerful offense and regularly outscore the league average, whether at home or on the road.

To get a meaningful estimate of park effects, you need to play the same set of games - between the same pairs of teams - both at the stadium you're measuring, and again at the average of all the league's stadiums. That precise experiment is not quite possible, but we can do something similar: Compare all the Yankees' home games with all their road games. Over the course of a season, the Yankees play the same teams at home as on the road, but they play their road games at a mix of all the league's stadiums. Comparing their home games with their road games is the closest we can come to playing the same list of games both at Yankee Stadium and at the average stadium.

For the major leagues, this is the standard way to measure park effects. (Recent figures for the MLB are available from ESPN.)

Clearly, there are some flaws to this approach. For one, luck plays a role. Even over an entire major league season, sheer luck would produce variations in runs scored in different stadiums, even when comparing the same teams. Second, a team's road games do not entirely reflect the league average stadium, since they don't include their home park. Finally, if a team's schedule does not consist of an even mixture of all the league's other teams, its road games will again not reflect the play level of the average league stadium, but rather a weighted average of the park effects of the venues it played in.

Some baseball researches have adjusted the conventional park effect calculation to account for these problems. To mitigate the effect of luck, several seasons' worth of data can be averaged. Corrections can be applied to adjust for the fact that road games don't include a team's home field. With substantial effort (better brush up your math), it's even possible to adjust for unbalanced schedules.


And you thought that was difficult?
So how can we estimate the park effects of Gezer Field (or Sportek or Yarkon, for that matter)?

Unfortunately, all the problems with estimating major league park effects go double or triple for the IBL. A 41-game season offers a much smaller sample size than a 162-game MLB schedule, increasing the error level of any statistic. With only three fields, it is even more difficult to compare performance with the "average" field, since each field itself contributes about a third of that average. Comparing home and road games essentially compares each field with the average of the other two, not with the league average.

And the IBL schedule was extraordinarily unbalanced, both in terms of the number of games between pairs of teams and, even more so, in terms of the distribution of those games by venue.

Here, to be precise, is the breakdown of IBL games by venue. The triplets of numbers represent games played at Gezer / Sportek / Yarkon, respectively. The pairs of numbers represent games at each team's home field and road games.

IBL games by venue (Gezer / Sportek / Yarkon)

Bet Mod Net Pet Raa Tel
Bet -- 8/0/0 5/2/1 4/1/3 3/0/5 4/4/1
Mod 8/0/0 -- 3/2/2 4/0/4 6/0/2 6/4/0
Net 5/2/1 3/2/2 -- 0/2/5 1/2/7 2/5/1
Pet 4/1/3 4/0/4 0/2/5 -- 1/3/5 0/3/5
Raa 3/0/5 6/0/2 1/2/7 1/3/5 -- 0/3/3
Tel 4/4/1 6/4/0 2/5/1 0/3/5 0/3/3 --

Total 24/7/10 27/6/8 11/13/16 9/9/22 11/8/22 12/19/10

Home/Away 24/17 27/14 13/27 18/22 19/22 19/22

League total: 47/31/44

Note that no IBL team played more than 27 games in its supposed home field, and Netanya played just 13 at "home" in Sportek, fewer than it played at Yarkon.


Double whammy
Finally, what about the fact that the IBL has six teams playing in just three parks? On the one hand, this can actually make it easier to estimate park effects. Approximately twice as many games were played at each field as would have been for the same length season with six fields, improving our sample size. Also, since two teams share each home field, each individual team has less of a biasing effect on the field's statistics.

On the other hand, some of Bet Shemesh's "road" games were actually played at its home field against Modiin, which shared it. Bet Shemesh never played Modiin elsewhere, so it's not possible to compare those games against the same games played in another location.


Ways forward
Given these complications, I doubt there's a single best way to assess IBL park effects. I can think of a few approaches worth trying. I hope to check them out and compare the results.

At the moment, I'm hoping to try:
  • Comparing all games played at each venue with the league averages

  • Comparing all games played at a given venue for which the same two teams also played at a different venue

  • As above, but weighted by the number of games played at each venue to try to achieve a more balanced schedule


Let me know if you have any better suggestions.