Monday, January 28, 2013

Guest Blog: "Predicting Migration Times With Google Prediction API" by Eugenio Cuevas Pozuelo

Guest Blogger: Eugenio Cuevas Pozuelo

A Java developer hailing from Valencia Spain, Eugenio Cuevas Pozuelo aka eucuepo, has been a staple as of late in the CloudSpokes community. Rising up through the 2013 leaderboards and making quite a name for himself, eucuepo has won many recent challenges, especially the GAMME ones, which he offered to guest blog about! When he isn't writing awesome code, or guest blogs, he enjoys going on runs (he just completed his first half marathon) and taking pictures with his DSLR. You can also head over to his personal blog to see more great posts.

The Challenge:
After a series of challenges on the GAMME Estimation tool, I decided to do a blog post proudly showing my development, but when the CloudSpokes team told me I could do a guest post, I took it even more seriously! When I came across the GAMME Estimation contest, I had recently won my first challenge and I was just getting used to this awesome community. So, I read the description and started to do a proof of concept with the Google Prediction API. This API is awesome, as it works by “learning” samples and then predicting a result for given data. For a guy that has the sad record of failing an easy statistics course five times in college, this is gold! It can be really abstract, really complex statistics “under the hood” and let the developers focus on the functionality required for the app.

Diving In:
The challenge was to get an accurate estimate for the migration time of the GAMME tool, an app developed by Google to migrate from Outlook servers to Gmail, up in the clouds. The logs were provided with some sample data on the migration.

I started with a proof of concept with the prediction API, they provide some cool examples that can be used for free. After crafting a CSV from the challenge assets, I got successful results, so I dove into the coding!

I went with Google App Engine for the development, because it provides seamless access to the Google APIs, and this project is using three, Cloud Storage, Google Drive and of course, the Prediction API. Looking into the past, I think I wouldn’t chose it again, because of some constraints of the project, like the 60 second limit for requests, or the limited java classes usage allowed in the Java Whitelist, but I still think it is a great platform to develop in.

For the frontend I used Twitter Bootstrap, which I am a huge fan of (who isn't?), for its general awesomeness. I imagine this app being run by a consultant in front of the client prior to a migration, perhaps on a tablet, so its responsive design makes it also cool to run on a portable device. This is how it looks on iPad:

Nostra Gamme:
The initial task was simply to upload the CSV via upload to Google Drive, then trigger the prediction with the data provided. It should be able to predict the time spent migrating, depending on the server and thread numbers, for the number of emails, contacts and calendars given. It will also predict the other way around, it will give the number of servers required to migrate in a determinate time span. To achieve this, two groups of prediction models were created from the original data, as the output for the prediction changes. I thought it would be cool to visually display the data, so I added jqPlot, which is great, although a little tricky sometimes. Here is a screenshot of the first version:

The Great Refinement:
After some positive feedback from the judges, a follow-up contest was launched to make some enhancements. I really enjoyed doing the original one, so this was a possibility to do some enhancements of my own. The requirements were to add client maintenance to manage several models per client. That gave me the possibility to learn (and fight against) the App Engine Datastore. I am a traditional SQL guy, but the Google approach to store data has its advantages, as it is declarative and incredibly fast!

I thought it would be cool to expose some functionality to add the clients, so I added an Ajax-based maintenance, which works with a REST API so it could be called from a client program. This is a screenshot of the frontend:

Fully Entering The Clouds:
I was happy to know there was another challenge open for GAMME prediction, to polish it a bit more. An average calculation was added to compare it with the one given to the prediction API, to have a reference and have a better understanding of the data provided. It also pushed into the clouds the client program extracting the logs from GAMME, which rounded it up to be a fully cloud development! Here is a screenshot of the prediction for the final app:

I would like to thank all Cloudspokes team for the feedback during the development. You rock guys! Keep these cool challenges coming!

- eucuepo

No comments:

Post a Comment