Monday, November 19, 2007

Download IBL data files while you can!

I wish the IBL long life and much success, but let's be realistic here. Given the recent updates to the IBL's website, it would probably be wise to save the game data while it's still available.

The game log files, which are hosted by Major League Baseball's Gameday system, are not on the IBL's site. They can be found at:

http://gd2.mlb.com/components/game/ind/year_2007/
(hat tip to weskelton)

Now, maybe the files aren't going anywhere anytime soon. Heck, maybe the league isn't going anywhere anytime soon. But it seems prudent to copy them now if you're thinking about analyzing the IBL stats.

If you're not familiar with the Gameday format, Mike Fast explains how to go about processing the data, with some help from Perl code from Joseph Adler's book, Baseball Hacks. Though the IBL doesn't have pitch-by-pitch data. (I've been writing my own code, since I've been using this project as an opportunity to learn to program in Python.)

Finally, if there's any substance to the announcement of some former IBL players and investors of their intention to form a new league to replace the IBL, I urge them not to forget about the things the IBL did right. In particular, make sure you track the stats. The IBL has a far better statistical record available on the Internet than any of the MLB-affiliated minor leagues. Absurd, but true. Baby, bathwater. Do it right.

Otherwise, we'll always have cricket.

6 comments:

Anonymous said...

How about the pitching? I don't see any pitching info. I know that Craig Eagle, Jason Benson, Scott Perlman, and Ben Pincus pitched almost all of their games at Gezer where offensive stats were skewed, while Aaron Pribble, and Mike Etkin played almost all of their games at other fields that were more pitcher friendly. How do you compare them with other pitchers in the league, and with each other?

iblemetrician said...

I assume you mean you don't see anything about pitching on the blog. You're right - I haven't really touched on pitching yet. That's mostly because hitting is somewhat easier to analyze. I figured I'd start with the easier material.

Pitchers only play in a small sample of games, which may not at all be representative of the league as a whole. They also can be either starters or relievers, and many IBL pitchers did both in different games. I anticipate that analyzing the pitching will be trickier for those reasons.

Have you read my posts about park factors? You may be surprised to learn that there was less than a 10% difference in overall run production among the parks - and that Gezer was not the highest of the three!

Gezer mainly inflated home runs, and (less) doubles, but also featured more strikeouts and fewer walks, as well as substantially fewer errors.

That said, my recent posts on the batting leaders have not been park adjusted. Partly because I'm still deciding how best to adjust for the parks.

Anonymous said...

I may be just thinking out loud, but I see a fatal flaw to this thinking - 1/2 of all innnings pitched at Gezer were thrown by the staffs of the Miracle, and Blue Sox who were by far the best pitching staffs in the league, and allowing presumably fewer walks, and producing more strikeots - plus they played each other, so each team played more than half of their games at their "home field". I would like to see a comparison of park factors if you removed those two teams statistics when from when they played against each other so we could see a true home and away effect. (I'm sure that if you took the two best pitching staffs in the league and put them in Fenway Park, and had them each play more than half of their games in Fenway Park, then took the two worst staffs and had them play more than half their games in Dodger stadium, the stats would be skewed and Dodger stadium would appear to have a greater park effect than Fenway.)
The reality is that a pitcher at Gezer knows that a walk could easily turn into a two run homer on the next pitch so pitchers made sure thay wouldn't walk anyone, and were more likely to challenge a hitter and risk a solo homer. Just look at the homeruns allowed leaders: Craig Eagle and Jason Benson...4th and 8th in WHIP league wide and had probably the most starts of any pitcher at Gezer. As a matter of fact, of the top 8 in WHIP, 6 are from Modin and Bet Shemesh. I wonder what their splits are at and away from Gezer. Eagle gave up 9 homeruns, and I'm guessing all 9 were at Gezer and at least 6 were vs. Bet Shemesh. Benson was about the same and at least 6 of his were vs. Modiin.

Any thoughts?

P.S. I thought Lipitz, Etkin and Perlman were the best relievers in the league, and the only difference I saw was fields they pitched at.

iblemetrician said...

Thanks for all the thoughtful comments. I'd appreciate at least a random nickname in the future; too many anonymouses get to be annoying.

I fear that either you haven't read my last post about park factors, or you didn't understand it properly.

I didn't just add up all the games at Gezer and compare them to the league averages. Rather, I collected all matchups of two teams which played each other at least twice at Gezer and also at least twice elsewhere. That way, we're looking at approximately the same mix of team matchups both at Gezer and elsewhere. Since Bet Shemesh and Modiin only played each other at Gezer, their games against each other do not enter into this calculation at all.

To be precise, in calculating Gezer's park factors I included all games between the following pairs of teams:

Bet Shemesh - Netanya
Bet Shemesh - Petach Tikva
Bet Shemesh - Raanana
Bet Shemesh - Tel Aviv
Modiin - Netanya
Modiin - Petach Tikva
Modiin - Raanana
Modiin - Tel Aviv
Netanya - Tel Aviv (yes, they played twice at Gezer!)

Overall, I came up with 37 games at Gezer to compare with 37 games at other fields. The games at Gezer featured Modiin 19 times, Bet Shemesh 16 times, Tel Aviv 12 times, Netanya 10 times, Raanana 9 times and Petach Tikva 8 times. The games at the other fields featured Modiin 14 times, Bet Shemesh 17 times, Tel Aviv 15 times, Netanya 13 times, Raanana 7 times and Petach Tikva 8 times.

Not a perfect correspondence, but pretty close.

Again, comparing those matchups at Gezer and away from Gezer, the average was 9.92 runs scored per game at Gezer (both teams) versus 9.84 away. That's about 0.8% higher for Gezer, or about 0.5% above the average field.

I haven't collected all the pitching stats I want before I start analyzing pitching properly. The most starts at Gezer:

Maximo Nelson 8
Craig Eagle 7 (1 at Sportek, 1 at Yarkon)
Ben Pincus 7
Matt Bennett 6
Juan Feliciano 5
Josh Zumbrun 5
Alper Ulutas 4
Adam Crabb 4

Jason Benson only had 3 starts at Gezer (1 at Sportek, 3 at Yarkon).

Eagle gave up 8 homers at Gezer and 1 at Yarkon. Benson gave up 5 at Gezer, 2 at Sportek and 2 at Yarkon.

I definitely have to think about how to adjust pitcher performance by field (and strength of opponents, probably).

I do think Lipetz was far and away the best reliever in the league.

Hopefully, I'll finish up with batter rankings soon and move on to pitchers.

Jonathan said...

I really enjoy this site!

I feel it in my bones that there will be another Israel baseball League season. Lets all support it with good thoughts and good actions. Goodness knows it has added a lot to our lives.

iblemetrician said...

Jonathan,

Glad you like the site.

I'm with you on hoping for a successful second season.