Monday, November 12, 2007

Who were the IBL's best hitters? (Part I)

All this effort in tabulating reaches on error has been directed towards the goal of assessing player offensive performance. Having long ago determined that you can't analyze the IBL without errors, I needed to attribute the errors to batters - data which is missing from the IBL summary stats.

Now that I've done that, I can apply run estimators on a player-by-player basis to rank their offensive performance.

I won't rehash here the discussion of different run estimation methods. A good summary can be found here, by Justin Inaz.

I'll be looking at two run estimators, Base Runs and Linear Weights, and discussing how I chose the IBL-appropriate coefficients for them.

You may remember that I used Base Runs once before, in estimating the IBL's per-team performance. Arguably, Base Runs is not a suitable approach for assessing individual offensive players, since its formula applies the player's own on-base ability to his own base-advancement skills, as if he were playing on an entire team of players with his stats. This would yield overestimates for exceptionally good players, and underestimates for exceptionally bad ones.

Nevertheless, I've applied Base Runs for individual players to see what came out.

In addition, I applied Linear Weights. This family of techniques assigns a fixed multiplier to each type of offensive event in the game. To calculate a player's value, you just add up the values of all his stats. The multipliers are meant to be estimates of the average number of runs each type of event is worth in the league.

Thus, a single gets a certain run value, as does a home run, or an out, or a stolen base - and the same fixed value is applied to all of the player's offensive production, even if we know (for example) that a certain home run was a grand slam, while a certain two-outs single left him stranded at the end of the inning. We don't care; we just tote up the average run values and call that his estimated run production. The advantage is that it absolves the player of any responsibility for the performance of his teammates, so that may actually be what we want to do when comparing hitters across a league.

1. Base Runs

I took Tom Tango's weights for the Base Runs equations, with a few modifications. In the A component (runners on base), I added reaches on error (not all errors, just reaches) and catcher interference. In the B component (base advancement), instead of Tango's coefficient for errors (0.799) I scaled it up to attribute to each batter the league average ratio of other fielding errors (without the batter reaching base). That is, instead of 0.799*E I used 1.220*ROE, since I have ROE per batter but I have no data on runner advancement on errors. Finally, in the C component (outs), I subtracted ROE, since a batter reaching base on error is not out.

Applying this formula to the league totals, I get an estimated 1230.4 runs produced, about 3.6% lower than the actual value of 1276 - pretty good, since I did nothing to customize the coefficients for the IBL. (I'm still working out how to do that, now that you mention it.)

So here they are, the top 25 hitters in the 2007 Israel Baseball League, according to the Base Runs estimator. (Why 25? Feeling generous, I guess. It also coincides with all the players with at least 20 estimated runs produced.)

(Click to enlarge.)

There's Gregg Raymundo, way ahead of the pack, presumably due largely to his absurdly high on-base percentage. Jason Rees, who I recently dissed in comparison to Eladio Rodriguez, places second, followed closely by teammate Johnny Lopez. (Yes, the first three are all from Bet Shemesh.)

Eladio comes in fifth, but keep in mind that this is a cumulative statistic, so playing time matters. Had Eladio not been out with injury, he would presumably have surpassed Lopez and Rees (compare Eladio's 39.0 Base Runs in 118 plate appearances with Rees's 43.7 in 154).

Bet Shemesh grabs seven of the top 17 positions, dominating the leaders table as much as they dominated the diamond.

Bear in mind, though, that I'm using unadjusted stats here - Gezer's park factors presumably give the Blue Sox a bit of a boost. Though it doesn't seem to have done much for their home field partners, Modi'in, with just four slots in the top 25.

Enough about Base Runs. Let's have some Linear Weights.

2. Linear Weights

But which weights to use?

To start with, I took Tom Tango's weights (see the lwts_RC column here). They're based on the MLB from 1974-1990, so there's no reason to assume they'd be suitable for the IBL.

But they're not bad, either. Applying them to the league totals, they estimate 1237.2 runs, lower than the actual 1276, but a bit better than Base Runs did.

Applying them to the players, we get:

Not that different, actually. Again, the top 25 players are those with over 20 estimated runs produced. The exact same players are in both lists, with 12 of them at the exact same rankings as with Base Runs. A few of them are mixed around a bit - Eladio edges out Josh Doane, for example - but the only one with a significant change in position is David Kramer, who drops from 14th to 21st.

Arguably the most notable change between the two tables is in leader Gregg Raymundo, whose estimated run production drops from 59.88 using Base Runs to 49.55 using Linear Weights. This presumably demonstrates the problem with using Base Runs for individual player estimates of outstanding hitters - it's as if he played on a whole team of Raymundos, whereas Linear Weights assumes he played with average players.

But I still can't take these numbers seriously, knowing they were generated using weights from the seventies and eighties of Major League Baseball. I have no choice but to generate my own weights.

Tune in next time for IBL-specific Linear Weights estimates.

No comments: