Pentaho vs Microsoft BI Stack [closed] Pentaho vs Microsoft BI Stack [closed] sql-server sql-server

Pentaho vs Microsoft BI Stack [closed]


I reviewed multiple Bi stacks while on a path to get off of Business Objects. A lot of my comments are preference. Both tool sets are excellent. Some things are how I prefer chocolate fudge brownie ice cream over plain chocolate.

Pentaho has some really smart guys working with them but Microsoft has been on a well funded and well planned path. Keep in mind MS are still the underdogs in the database market. Oracle is king here. To be competitive MS has been giving away a lot of goodies when you buy the database and have been forced to reinvent their platform a couple of times. I know this is not about the database, but the DB battle has cause MS to give away a lot in order to add value to their stack.

1.) Platform
SQL server doesn't run on Unix or Linux so they are automatically excluded from this market. Windows is about the same price as some versions or Unix now. Windows is pretty cheap and runs faily well now. It gives me about as much trouble as Linux.

2.) OLAP
Analysis services was reinvented in 2005 (current is 2008) over the 2000 version. It is an order of magnatude more powerful over 2000. The pentaho (Mondrian) is not as fast once you get big. It also has few features. It is pretty good but there are less in the way of tools. Both support Excel as the platform which is esscential. The MS version is more robust.

3.) ETL
MS - DTS has been replaced with SSIS. Again, order of magnatude increase in speed, power, and ability. It controls any and all data movement or program control. If it can't do it you can write a script in Powershell. On par with Informatica in the 2008 release.Pentaho - Much better than is used to be. Not as fast as I would like but I can do just about everything I want to do.

4.) dashboard
Pentaho has improved this. It is sort of uncomfortable and unfriendly to develop but there is really not a real equiv for MS.

5.) reports
MS reports is really powerful but not all that hard to use. I like it now but hated it at first, until I got to know it a little better. I had been using crystal reports and the MS report builder is much more powerful. It is easy to do hard things in MS, but a little harder to do easy things.Pentaho is a little clumsy. I didn't like it at all but you might. I found it to be overly complex. I wish it was either more like the Crystal report builder or the MS report builder but it is jasper like. I find is to be hard. That may be a preference.

6.) ad hoc
MS - this was the real winner for me. I tested it with my users an they instantly in love with the MS user report builder. What made the difference was how it was not just easy to use, but also productive.Pentaho - is good but pretty old school. It uses the more typical wizard based model and has powerful tools but I hate it. It is an excellent tool for what it is, but we have moved on from this style and no one wants to go back. Same problem I had with logiXML. The interface worked well for what it was but is not really much of a change from what we used 12 years.http://wiki.pentaho.com/display/PRESALESPORTAL/Methods+of+Interactive+Reporting

There are some experienced people out there that can make Pentaho really run well, I just found the MS suite to be more productive.


Warning -- there are numerous sites out there listing the numerous deficiencies, bugs, and annoyances with SSIS. Not sure why SSIS came out on top with the post -- but before you bet your project on it, look at what people have to say in the blogosphere. From my experience its about 20:1 ranting about how horrible SSIS is to work with--I can concur as well, currently looking for any alternative.


Great information here? I have not tried Pentaho but and planning on checking it out. I am a seasoned MS BI consultant, using it since 1998. SSIS is very fast and very powerful but the criticisms are spot on. I found the following issues with SSIS:

(1) It is hard to debug, you get cryptic errors that may not give you any hint about what and where the problem really is.

(2) Per a prior comment, it is the shittiest development environment ever! I have no clue what they are thinking.

(a) Create a table with a 100 or more columns and put a merge join on it. Now go back in and try to make an update to the merge join (like pull a new column through). It can take several minutes, even on the fastest machine after you click ok on the merge join to save your change. I have a huge dataflow with lots of wide records and many merge joins. Adding one column to the dataflow takes more than half a day. I update a merge join and then have to go do something else and check back 5-10 minutes later to see if it is done. Microsoft's response to this is to break up your package into multiple packages, place the data in a table or binary between them. Well if you are going to disk between all the steps, you may was well do the whole thing in SQL! One of the main purposes of an ETL tool is to all this stuff in memory and avoid disk I/O.

(b) The designer outright crashes sometimes, losing all your work since last save (I do ctrl-S in my sleep now because of this)

(c) I had to figure out a hack and generate SSIS package XML in Excel for wide records. I have a Healthcare client where 600+ column records are commonplace. If you try to define a file format with 600 columns in SSIS, you have to type every single column in one at a time!!! Even MS access allows you to cut and paste a layout from a spreadsheet into a file layout, but not SSIS. So I had to generate the XML from the layout and paste the XML code into the right place in the package. Ugly way to do it but it saved entire days of work and lots of errors.

(d) Similar to (c), if you need to trim all your columns and you have say 600+ of them, guess what? In the derived column component, you have to type trim(column1) 600+ times! I now do all simple transforms like this in the SQL query to get the data, since that can easily be generated from an Excel sheet.

(e) There are many quirky things, components that turn invisible, sometimes you open the package and all the components are completely re-arranged incoherently.

(f) The FTP feature, possibly one of the most common things you need in ETL, is weak and only supports plain vanilla FTP which nobody uses. Everyone these days uses SFTP, FTPS, https, etc... So almost every implementation requires using a 3rd party commend line driven file transfer app the package has to call.

(g) Trying to CYA, similar to the ridiculous security in Windows Vista, Microsoft has made it exceedingly difficult to actually promote an SSIS package from one environment to another. It defaults to this stupid thing of "encrypting sensitive information with user key" security which means it must run under the same account in the environment you are moving it to as the environment you developed it, something that is rarely the case. There are better ways to configure but it always try to revert to this completely useless security protection.

(h) Lastly most of these problems are now in there 3rd version, clearly indicating Microsoft has no plan to fix them.

(i) Debugging is not nearly as easy as other languages.

SSIS still has a great many benefits, but not without some serious pain.