The previous post on estimating the level of play in the IBL generated some interesting comments, including on the Baseball Fever Sabermetrics Forum and Tom Tango's blog. Also, Rabbi Jason Miller noticed my citation of his game observations, and commented.
I'd like to respond to the comments, and add some more observations of my own.
Why errors and steals?
Tango is surprised that error rates and stolen base rates correlate at all with the level of the league. After all, the reason batting averages, or walk and strikeout rates, don't track the league level is that they are the result of the confrontation between the batter and pitcher/fielders. Better leagues have better hitters, but also better pitchers and fielders. On the whole, they balance each other out, so the majors don't have higher batting averages or walk rates than weaker leagues. Sometimes pitching overpowers hitting or vice versa, but there's no connection between the relative strength of hitters and fielders and the overall level of league play.
You might expect the same to apply to errors and stolen bases. An error is not just the fault of the fielder. Some batters consistently reach base on error far more often than other batters, presumably because they're hitting more hard-to-field balls. Shouldn't that balance out the stronger fielding in the stronger leagues?
A stolen base certainly is not the sole fault of the fielding team; arguably, it's first of all a skill of the baserunner. So why should weaker leagues have higher steal rates? Don't they have less skilled runners?
On the one hand, the graphs speak for themselves. The correlations between league level and error rates per at-bat (0.93) and stolen base rates per runner on base (0.85) are stunningly strong. If you leave out the inconsistent rookie leagues, they're even higher (0.97 and 0.88 respectively). But that doesn't absolve us of an explanation.
The answer, I think, is that the league-level variations we see in both error rate and steal rate are primarily factors of the quality of the fielding. It may be true that some hitters are better able to hit balls that are hard to field, but at lower levels of play that's not the main factor in producing errors. To quote myself:
What I think you're seeing with the top major leaguers is an ability of exceptional batters not just to "hit it where they ain't", but also to "hit it where it's hard to field". What I think we're seeing with high overall league error rates in the minors is at the opposite end of the defensive ability scale - not balls hit where it's hard to play them, but routine plays that the sub-major-leaguers flub: dropped catches, wild throws, bobbled grounders.
That is, I suspect that the further you go down the ability ladder, the more errors reflect unprofessional fielding rather than skillful batting. Hence, overall higher error rates in overall weaker leagues.
A similar argument can be made regarding steals. While running speed is important in baseball, it's not necessarily that much higher in the majors than in weaker leagues. What is substantially higher is fielding ability, as a result of more experience and winnowing out the poor fielders. Plenty of minor league players can run as fast as their major league counterparts, but they aren't as practiced at holding runners on base and picking them off at second.
The upshot of this analysis is that both of these measures are, at least at league level, essentially indicators of fielding ability. We still have no independent measures of league level based on batting ability or pitching ability. The assessment is very one-dimensional. Unfortunately, stats such as wild pitches or hit batters do not seem to be available for the minor leagues; they could be good indexes of pitcher skill.
More about the stats and graphs
Tango is probably right in suggesting that I had the denominators wrong - errors should be measured per at-bat, and steals per runner on base. In practice, though, those changes don't affect the results in any significant way.
On reflection, I would drop the "unearned runs" and "defense efficiency" measures. The former is just a roundabout and unreliable way of measuring the error rate - it might be useful if you don't have error stats, but it's generally better to measure errors directly. The latter measures the defense's success in putting out batters on balls in play. However, the correlation between batting average on balls in play (BABIP = (H - HR) / (AB - HR - SO)) and league level is very weak (see below). In practice, then, the DER graph is also just another way of measuring the error rate. That leaves us with two relevant stats: errors per at bat and stolen bases per runner on base.
We can plot them against each other for another picture of the league quality level (click to enlarge):
In this graph, I've indicated the league level by the plot symbol: blue spheres for the majors, green spheres for AAA, gray spheres for AA, red spheres for A+, gold spheres for A, gray diamonds for A-, orange spheres for rookie leagues. Three independent leagues have been marked with stars: the Atlantic League (red), Canada's Intercounty Baseball League (orange), and the Israel Baseball League (blue). The regression line is based only on the majors and ranked minor leagues, including the rookie leagues but excluding the independents.
With the exception of the steal-frenzied IBL, the relationship between the steal rate and error rate is clear and strong (0.92 for the ranked leagues). Also, the grouping of leagues by level is mostly distinct. AAA and AA seem quite close in level here - maybe fielding levels aren't different enough to distinguish between them. Note that the Atlantic League falls in the AA-AAA area, as both the league and observers generally claim. A and A- leagues are quite close, but A+ is clearly at a rank of its own. And the rookie leagues show a wide range of levels, but they cluster quite close to the SB/E regression line (with the Canadian IBL somewhere in the middle).
Arguably, the distance along this line could be used as an estimate of league quality, at least as indicated by fielding ability. I'll try to calculate those estimates, time permitting.
Without further ado, here's the graph of BABIP I promised. There's a correlation between BABIP and league level, but it's weak (0.33), and not much value in assessing league quality.
A final comment on the stats. Sabermetricians have often derided the error stats and fielding percentage, not without good reason: "Errors and therefore fielding percentage are an inadequate way of measuring fielders because of the subjective nature of the decisions and because they only record failures and thus fail to take into account the fact that good fielders cover more ground and therefore record more outs" - Dan Agonistes.
But in the aggregate, I think I've shown that errors are a relevant measure of league quality level, and one of the few such measures that are widely gathered and published for baseball leagues of all levels of play. Keep that in mind next time someone touts his new top-secret formula for assessing fielding ability or league quality.
And now back to the rabbi.
Rabbi Miller defers to the judgment of Jay Sokol, who attended the IBL game with him:
Jay is the General Manager for the Delaware Cows of the Great Lakes League, which is a summer league dedicated to helping college players get used to the wooden bats they'll use in the minor leagues. Jay thought the level of play in the IBL was very similar to the wood bat summer league. He even recognized an IBL player whom he previously scouted for the Cows.
I certainly defer to Sokol's baseball judgment - I'm just a fan and a novice sabermetrician. I would point out, though, that the game they watched was between Netanya and Raanana, two of the IBL's weaker teams (at least until Netanya's closing weeks). The game's box score and play-by-play log indicate that Raanana committed five errors - high even by their own averages (2.1 errors per game, the highest in the IBL). So I wouldn't rely on a single game to assess the IBL's level of play. But thanks for the input!