Large scale machine learning - Python or Java? [closed] Large scale machine learning - Python or Java? [closed] python python

Large scale machine learning - Python or Java? [closed]


As Apache is going strong producing excellent stuff like Lucene/Solr/Nutch for Search, Mahout for Big Data Machine Learning, Hadoop for Map Reduce, OpenNLP for NLP, lot of NoSQL stuff. The best part is the big "I" which stands for integration and these products can be integrated with each other well as of course in most situations they (these products) complement each other.

Python is great too however if you consider above from ASF then I will go with Java like Sean Owen. Python will always be available for the above but mostly like Add on's and not the actual stuff. For example you can do Hadoop using Python by using Streaming etc.

I partially switched from C++ to Java in order to utilize some of the very popular Apache products like Lucene, Solr & OpenNLP and also other popular open source NoSQL Java products like Neo4j & OrientDB.


I think one big thing Java has going for it is Hadoop. If you really mean large scale, you'll want to be able to use something like that. Generally speaking Java has the performance advantage, and more libraries available. So: Java.


If you are looking at NoSQL databases fit for ML task, then Neo4J is one of the more production ready (relatively) and capable of handling BigData, it is native to JAVA but comes along with a beautiful REST API out of the box and hence can be integrated with the platform of your choice. JAVA will give you an performance edge here.