Wednesday, August 29, 2007

The data and the quirky schedule

One thing the IBL handled pretty well was data gathering. For each game, the website features not only a complete box score, but also a play-by-play game log. The only thing missing by major league standards is pitch-by-pitch data.

Unfortunately, none of the three main sources of game information is complete. The game logs, for example, don't list the starting pitcher; that appears only in the box scores. Neither the logs nor the boxes specify the venue, which is significant since a number of games were played at the "wrong" field due to various scheduling constraints, primarily the unavailability of Sportek for the first two weeks. The venue appears only in the "schedule" sections of the website.

But even that is not perfect. One game (July 1, Bet Shemesh vs. Netanya) was played in Gezer, with a protest filed. The last half-inning was then replayed on August 6 in Sportek. But the game appears in the schedule as an August 6 game played in Sportek. The specifics of these events are clearest from the IBL's press releases from the dates in question.

So I've been working on processing the game data files to assemble as accurate a database as possible for analysis.

So far, I believe have a complete listing of all the games played and their venues (omitting the August 10 Petach Tikva-Netanya game which was lost due to forfeit). Some preliminary observations:

The IBL originally planned a 45-game schedule for each team. With six teams, that means nine games between each pair of teams (since each team plays five different opponents), for a total of 135 games.

In fact, due to the problems with Sportek and the belated decisions to add a one-day All Star break and extend the championship game to a three-day tournament, only 123 regular-season games were played (including the forfeit and an uncompleted suspended game), twelve short of the original plan.

Each team played 41 games, so one would expect each pair of teams to have played 8 or 9 times. Oddly, that was far from the case:


Strangely, Netanya and Raanana played 10 games, as did Tel Aviv and Modiin - more than they should have played under a full 45-game schedule! Meanwhile Tel Aviv played Raanana just 6 times, two fewer than the expected minimum out of each team's 41 games played, and Modiin played Netanya just 7 times.

I don't know if the original schedule is available for comparison, but the scheduling inconsistencies, combined with the already small sample set, may make meaningful statistical analysis difficult for some issues (such as park effects, which I plan to start with). We'll see.

No comments: