I’m pleased to make available for download a tool I created to extract MLB and MILB data and insert it into a mysql database for analysis. Nothing particularly groundbreaking here, but it does make the process of selecting and acquiring baseball data much easier, especially for Mac saberists.
Background:
In 2006 Joseph Adler was the first to describe how to download Gameday data from mlb.com’s xml files. Mike Fast later updated the data structure and provided some excellent Perl scripts to to grab pitch f/x data and parse it into a mysql database.
The problem, if you are a Mac user, is that the DBI (plug ins that allow Perl and Python to insert data into a mysql database) are insanely difficult to work with. Indeed, one of the great frustrations of being a Mac user is getting any scripting language to work with a mysql database.
PHP, which this tool is written in, is free of any of those issues. My tool also allows the user to choose the directory to download and parse, as well as the year from an easy to use GUI interface, making the process painless for baseball analysts. No time has been spent making it pretty, but it works and works well.
==================================
If you already have an *AMP stack set up on your system, just download the files here, change the config.inc in the includes directory to reflect your mysql database settings and away you go.
For folks who are still trying to get their own saber system up and running on their Mac (or any other system really – PHP is cross platform – but this tutorial is Mac specific) you’ll need to download a few items first.
1. Go to the MAMP download page and grab the latest version.
2. Download Sequel Pro.
3. Install both. Simply drag and drop into your Applications folder.
4. Configure MAMP so when you begin extracting data, you don’t get time outs. To do this, open your MAMP folder in your Applications directory after install, and find the following file. Here is the full path to avoid confusion:
/Applications/MAMP/conf/php5/php.ini
Open the file with TextEdit, TextWrangler or TextMate (or even VI if you are a command line guy). Whatever you have handy. The line you are looking for is “max_execution_time”. The quickest way to find it is to hit command-f and search for the term. It will be set to 60 or something similar. Change it to 0. Save and exit.
5. Start MAMP by double clicking on the MAMP.app in the MAMP folder. It will bring up a small window that looks like this (though the green lights should be red for you. If not press “stop servers”):

Press the “Preferences” button, then click the “Ports” tab and hit “Set to default apache and mysql ports”. It should look like this when you are done:

Finally, hit OK and go back to the main screen and press “Start Servers”. The red lights will turn green and you may be asked for your password.
6. Now download the Rotobase Extraction tool. Open the rotobase folder, then the includes folder and open the config.inc.
The full path is: rotobase/includes/config.inc
You’ll note that the user name and password are root, which are the user names and passwords for the default installation of MAMP. It should work out of the box for you unless you make any changes or choose not to follow the instructions below. If you do, this is the file you will need to modify to get things working, especially the “dbname”, which should be set to the name you gave your database.
7. Launch Sequel Pro which you have already downloaded and installed. Launch it and you will be greeted by a login screen. It will look something like this:

Navigate to the “Socket” tab. Name the database whatever you want. I chose Baseball. For username and password enter “root” without the quotes. Leave database blank. Copy and paste the following path into the Socket field:
/Applications/MAMP/tmp/mysql/mysql.sock
Hit “Add to favorites” to save your settings for the next time you launch Sequel Pro.
Now hit connect.
8. In the upper left hand corner you’ll need to add a database. Here is what the selection window looks like:

Click Add database, and name it “milb” without the quotes. NOTE: if you name it ANYTHING else, you will need to modify the file in step 6.
Hit Ok.
9. Now you will need to import the database structure. I have provided a pre-populated admin table with login credentials for the tool already for you, so it is important that you get this part correct.
Go to File > Import:

And browse to wherever you downloaded the Extractor tool.
rotobase/rotobase.sql is the file you want. It should look like this:

Hit open and the database tables should be imported. You can click through them to verify that all but tbladmin is empty.
10. The default installation directory for PHP scripts in MAMP is
/Applications/MAMP/htdocs/
Copy the rotobase folder to that location.
11. You are almost finished! Open your browser of choice and enter the following URL
http://localhost/rotobase
You will be greeted by a login screen like the following:

Enter “admin” without the quotes for Username.
Enter “password” without the quotes for Password.
Hit Login. You will be greeted by a config screen where you can change your username and email address. I HIGHLY recommend adding your email address. Extractions can take days, and the tool will email you when it is completed.
In the upper left of the screen click on the Extract tab. You will be presented with the following screen:

From here you can choose the league data you would like to download. If you are only interested in MLB Pitch f/x data, choose the mlb directory and choose a year from 2007-2010.
The minor league directories (aaa, rok) correspond to MILB leagues. Play by play data are available for most back to 2005. Chose a directory and a year and hit extract.
You’ll be waiting a while, so to troubleshoot I recommend downloading 2010 mlb data to test.
You can verify things are working by Loading up Sequel Pro, going to the games, atbats or pitches tables and refreshing to check that the data are being inserted.
NOTE: Whenever you download a new directory (aaa, mlb, rok) you will need to create a new database to hold the data. When you do you will have to give it a unique and descriptive name, like “AAA”. For this to work correctly, you will need to follow the directions in Step 6 to change the database name to the appropriate name (in this example AAA).
==================================
There are only 2 restrictions to using this tool.
1. Please link back to rotobase.com so others can find and enjoy the tool as well.
2. If you improve upon the tool you must make the source code available for others to use and enjoy.
==================================
Hope this helps analysts out there who have been struggling to get minor league play by play or pitch f/x data.
For those who just want a SQL dump, check back in a few weeks as I will provide downloads for minor league raw and MLE data.
Enjoy!