Just how much Java does one need to use Hadoop and Mahout effectively?
It shouldn't be all that difficult to get data from PHP to Java for analysis using Mahout and Hadoop.
Even easier is to process using Mahout and Hadoop off-line in a batch mode and to store the data products in a file system or database. PHP can then read these data products as easy as falling off a log.
For real-time use, the recommendations part of Mahout supports a variety of web-service interfaces that make it pretty easy to access from PHP. Hitting the model evaluation part of Mahout would require a bit more programming.
I just did the same thing, and it's been years I did anything Java related. What I did was the following:
- Started off with simple Hadoop streaming examples
- Try my own analysis with PHP streaming
- Started experimenting with Pig
- Start experimenting with using PHP streaming inside Pig
All without any Java!