Wednesday, September 5, 2007

Surprise! Park factors are not what you expected!

I'd like to take a first stab at estimating IBL park factors.

For this first attempt, I'm going to average the performance of all six teams at each of the three fields.

I know I already explained why that's somewhat problematic: it overweights the home teams at each park, so that (for example) Gezer, home to the IBL's strongest batters (Bet Shemesh and Modiin) would have its park effects exaggerated in favor of batting, while Yarkon, home to the league's weakest teams (Petach Tikva and Raanana) would look too much like a pitcher's park.

To correct for this problem, I've decided to weight each team's record equally at each of the fields. That is, rather than tote up all the games played at each field and average the number of runs, hits, etc., I computed each team's individual performance at each field, and then averaged the team results weighting each team equally.

So even though Bet Shemesh played 24 games at Gezer and Modiin played 27, while Petach Tikva played only 9 there, I've treated them as equal for the purpose of averaging Gezer's performance levels. The same, of course, goes for Sportek and Yarkon.

This approach doesn't eliminate all forms of bias in the computation. For example, the home teams are still overrepresented in the average level of fielding faced at each park. But nothing we do will realistically eliminate all forms of bias, and I consider this approach to be a reasonable start.

The envelope, please:

Now, for the analysis. We've already seen the home run effects. Gezer produced 32% more home runs per game than the average field, while Yarkon produced 37% fewer. Overall, home runs were hit more than twice as often at Gezer as at Yarkon (relative factor: 2.08), and the difference from Sportek was nearly as large (1.82). Presumably, this is mainly due to the distances to the fences. Most of those Gezer home runs became fly outs at Yarkon.

So Gezer was a big hitters' park, right?

Not so fast.

What about triples?

Triples aren't that common to begin with. Only 19 were hit in the IBL season. Of those, 13 were at Yarkon, 6 at Sportek and 0 at Gezer. Shorter fences, not enough room to hit a triple.

How about base hits? Not much to say about them. Gezer was smack in the middle here, with a range of just 13% in hit rates between the three fields - probably too narrow to be meaningfully distinguished, given the sample size. Gezer was above average for doubles, with Yarkon below average, but again the range was not that great - 19%. Total bases from all hits - including those frequent Gezer homers - was just 7% above average at Gezer, with Yarkon 11% below average, a range of 21%.

Gezer also yielded more strikeouts and fewer walks than average. Overall, after home runs, the most telling column in the chart, and the most relevant to winning or losing games, is the one labelled R. Average runs per game, when weighting all six teams equally, varied within a range of 5% among the three parks. And Gezer was by no means the leader - it was just below average!

What can we learn from this? First, as usual, the data doesn't always support our intuitions about baseball. Second, a lot more goes into a baseball game than home run hitting. Third, if we were to assess IBL batters by adjusting their performances based on park factors, we'd see changes in home runs and triples, but batting and slugging averages wouldn't be affected much.

To confirm that suspicion, I calculated batting and slugging averages on the same basis as the park factors - averaging team results per field on an equal basis. (I don't have enough data tabulated yet to analyze on-base percentages.)

IBL 2007 batting and slugging averages by field, weighting teams equally at each field

So how much does Gezer field inflate offense? Not as much as I expected, at least.


Soccer Dad said...

You posted batting avg and SLG, what about OBA?

And you call yourself sabermetric!


iblemetrician said...

Soccer Dad,

I assume you mean on-base percentage. You're right, of course, that I should have posted it. But unfortunately I don't yet have the data collected to calculate it.

I still need to extract hits by pitch and sacrifice flies from the IBL box scores. They're encoded a bit differently in the box scores than the standard the batter data (see here, for example).

Without them, I can still approximate OBP using only hits, walks and at-bats, but I'd rather wait until have the real thing.