My master's capstone project is how I've spent a great many evenings over the past year. It's been a chance to explore some new technologies and new problem solving techniques like machine learning.
Collaborative recommender systems (like the software that drives Netflix's movie recommendations) require a lot of rating data before they become useful. This presents a problem for new systems. To provide quality suggestions, a system needs a large set of users actively rating content - but attracting those initial users is difficult because new systems are of low value until they can meaningfully recommend content.
This project demonstrates one approach to solving the problem. The software I built bootstraps a recommender system by harvesting publicly available micro-blogging data (specifically Twitter). The software uses machine learning to build a sentiment analysis classifier that allows it to decide if movie-related posts express positive or negative feeling. It then uses the classified data to construct a recommendation system.
The system is large enough that breaking it into a set of discrete components made a much cleaner design. This also allowed me to work on the individual components in isolation. The component responsibilities are:
The Groovy language was my choice for the project. It's a language I've been playing with for a couple years now. I've used it in the context of building some small Grails sites. I've also started writing Groovy tests for our Java software at work; using Groovy for Java tests is a reasonably common use case. The Groovy console is something I often keep up during Java development so I can quickly test some regular expression or a Java API. This project was a chance to finally use the language on a large scale.
Grails is the tool I chose for the user interface. I've used it a couple times before for small personal projects and have always been happy with the results. I've tried quite a few web frameworks for the JVM and Grails is the most productive. Grails is a Groovy based framework, so it also made a lot of sense to use it on a Groovy based project. I didn't want to simply build a piece of Grails software, though, so the user interface is the only component that depends on Grails. The other pieces are pure Groovy.
I also took this opportunity to explore a few other technologies I hadn't found the time to experiment with yet. I used Gradle for the build system. The project uses the Spock specification/testing framework (this is a really brilliant piece of software). I included Weka for some machine learning algorithms, though in the final product Weka is simply a redundant algorithm implementation used to check the software's own calculations. Finally, I used Guice for dependency injection.