Just how much Java does one need to use Hadoop and Mahout effectively? Just how much Java does one need to use Hadoop and Mahout effectively? hadoop hadoop

Just how much Java does one need to use Hadoop and Mahout effectively?


It shouldn't be all that difficult to get data from PHP to Java for analysis using Mahout and Hadoop.

Even easier is to process using Mahout and Hadoop off-line in a batch mode and to store the data products in a file system or database. PHP can then read these data products as easy as falling off a log.

For real-time use, the recommendations part of Mahout supports a variety of web-service interfaces that make it pretty easy to access from PHP. Hitting the model evaluation part of Mahout would require a bit more programming.


Beginner level of Java is sufficient. You can always dug deep on adhoc need basis.


I just did the same thing, and it's been years I did anything Java related. What I did was the following:

  1. Started off with simple Hadoop streaming examples
  2. Try my own analysis with PHP streaming
  3. Started experimenting with Pig
  4. Start experimenting with using PHP streaming inside Pig

All without any Java!