High memory/performance critical computing - Architectural approach opinions High memory/performance critical computing - Architectural approach opinions hadoop hadoop

High memory/performance critical computing - Architectural approach opinions


Never decide about technology before being sure about workflow's critical-path

This will never help you achieve ( a yet unknown ) target.

Not knowing the process critical-path, no one could calculate any speedup from whatever architecture one may aggressively-"sell" you or just-"recommend" you to follow as "geeky/nerdy/sexy/popular" - whatever one likes to hear.

What would you get from such pre-mature decision?

Typically a mix of both the budgeting ( co$t$ ) and Project-Management ( sliding time-scales ) nightmares:

  • additional costs ( new technology also means new skills to learn, new training costs, new delays for the team to re-shape and re-adjust and grow into a mature using of the new technology at performance levels better, than the currently used tools, etc, etc )
  • risks of choosing a "popular"-brand, which on the other side does not exhibit any superficial powers the marketing texts were promising ( but once having paid the initial costs of entry, there is no other way than to bear the risk of never achieving the intended target, possibly due to overestimated performance benefits & underestimated costs of transformation & heavily underestimated costs of operations & maintenance )

What would you say, if you could use a solution,
where "Better options" remain your options:

  • you can start now, with the code you are currently using without a single line of code changed
  • you can start now with a still YOUR free-will based gradual path of performance scaling
  • you can avoid all risks of (mis)-investing into any extra-premium costs "super-box", but rather stay on the safe side re-use a cheap and massively in-service tested / fine-tuned / deployment-proven COTS hardware units ( a common dual-CPU + a few GB machines, commonly used in large thousands in datacentres )
  • you can scale up to any level of performance you need, growing CPU-bound processing performance gradually from start, hassle-free, up to some ~1k ~2k ~4k ~8k CPUs, as needed -- yes, up to many thousands of CPUs, that your current workers'-code can immediately use for delivering the immediate benefit of the such increased performance and thus leave your teams free hands and more time for thorough work on possible design improvements and code re-factoring for even better performance envelopes if the current workflow, having been "passively" just smart-distributed to say ~1000, later ~2000 or ~5000-CPU-cores ( still without a single SLOC changed ) do not suffice on its own?
  • you can scale up -- again, gradually, on an as-needed basis, hassle-free -- up to ( almost ) any size of the in-RAM capacity, be it on Day 1 ~8TB, ~16TB, ~32TB, ~64TB, jumping to ~72TB or ~128TB next year, if needed -- all that keeping your budget always ( almost ) linear and fully adjusted by your performance plans and actual customer-generated traffic
  • you can isolate and focus your R&D efforts not on (re)-learning "new"-platform(s), but purely into process (re)-design for further increasing the process performance ( be it using a strategy of pre-computing, where feasible, be it using smarter fully-in-RAM layouts for even faster ad-hoc calculations, that cannot be statically pre-computed )

What would business owners say to such ROI-aligned strategy?

If one makes CEO + CFO "buy" any new toy, well, that is cool for hacking this today, that tommorrow, but such approach will never make shareholders any happier, than throwing ( their ) money into the river of Nile.

If one can show the ultimately efficient Project plan, where most of the knowledge and skills are focused on business-aligned target and at the same time protecting the ROI, that would make both your CEO + CFO and I guarantee that also all your shareholders very happy, wouldn't it?

So, which way would you decide to go?


This topic isn't really new but just in case... As far as my experience can tell, I would say your T-SQL DB might by your bottle neck here.

Have you measured the performance of your SQL queries? What do you compute on SQL server side? on the Node.js side?

A good start would be to measure the response time of your SQL queries, revamp your queries, work on indexes and dig into how your DB query engine works if needed. Sometimes a small tuning in the DB settings does the trick!