Apache Kylin fault tolerance Apache Kylin fault tolerance hadoop hadoop

Apache Kylin fault tolerance


There are data faults and system faults.

Data fault tolerance: Kylin partitions cube into segments and allows rebuild an individual segment without impacting the whole cube. For example, assume a new daily segment is built on daily basis and get merged into weekly segment on weekend; weekly segments merge into monthly segment and so on. When there is data error (or whatever change) within a week, you need to rebuild only one day's segment. Data changes further back will require rebuild a weekly or monthly segment.

The segment strategy is fully customizable so you can balance the data error tolerance and query performance. More segments means more tolerable to data changes but also more scans to execute for each query. Kylin provides RESTful API, an external scheduling system can invoke the API to trigger segment build and merge.

A cube is still online and can serve queries when some of its segments is under rebuild.

System fault tolerance: Kylin relies on Hadoop and HBase for most system redundancy and fault tolerance. In addition to that, every build step in Kylin is idempotent. Meaning you can safely retry a failed step without any side effect. This ensures the final correctness, no matter how many fails and retries the build process has been through.

(I'm also Apache Kylin co-creator and committer. :-)


Notes: I'm Apache Kylin co-creator and committer.

The Fault Tolerance point is really good one which we actually be asked from some cases, when they have extreme large datasets. To calculate again from begin will require huge computing resources, network traffic and time.

But from product perspective, the question is: which one is more important between precision result and resources? For transaction data, I believe the exactly number is more important, but for behavior data, it should be fine, for example, the distinct count value is approximate result in Kylin now. It depends what's kind of case you will leverage Kylin to serve business needs.

Will put this idea into our backlog and will update to Kylin dev mailing list if we have more clear clue for this later.

Thanks.