Rolling median algorithm in C

c algorithm r statistics median

I have looked at R's src/library/stats/src/Trunmed.c a few times as I wanted something similar too in a standalone C++ class / C subroutine. Note that this are actually two implementations in one, see src/library/stats/man/runmed.Rd (the source of the help file) which says

\details{  Apart from the end values, the result \code{y = runmed(x, k)} simply has  \code{y[j] = median(x[(j-k2):(j+k2)])} (k = 2*k2+1), computed very  efficiently.  The two algorithms are internally entirely different:  \describe{    \item{"Turlach"}{is the Härdle-Steiger      algorithm (see Ref.) as implemented by Berwin Turlach.      A tree algorithm is used, ensuring performance \eqn{O(n \log        k)}{O(n * log(k))} where \code{n <- length(x)} which is      asymptotically optimal.}    \item{"Stuetzle"}{is the (older) Stuetzle-Friedman implementation      which makes use of median \emph{updating} when one observation      enters and one leaves the smoothing window.  While this performs as      \eqn{O(n \times k)}{O(n * k)} which is slower asymptotically, it is      considerably faster for small \eqn{k} or \eqn{n}.}  }}

It would be nice to see this re-used in a more standalone fashion. Are you volunteering? I can help with some of the R bits.

Edit 1: Besides the link to the older version of Trunmed.c above, here are current SVN copies of

Srunmed.c (for the Stuetzle version)
Trunmed.c (for the Turlach version)
runmed.R for the R function calling these

Edit 2: Ryan Tibshirani has some C and Fortran code on fast median binning which may be a suitable starting point for a windowed approach.

c algorithm r statistics median

I couldn't find a modern implementation of a c++ data structure with order-statistic so ended up implementing both ideas in top coders link suggested by MAK ( Match Editorial: scroll down to FloatingMedian).

Two multisets

The first idea partitions the data into two data structures (heaps, multisets etc) with O(ln N) per insert/delete does not allow the quantile to be changed dynamically without a large cost. I.e. we can have a rolling median, or a rolling 75% but not both at the same time.

Segment tree

The second idea uses a segment tree which is O(ln N) for insert/deletions/queries but is more flexible. Best of all the "N" is the size of your data range. So if your rolling median has a window of a million items, but your data varies from 1..65536, then only 16 operations are required per movement of the rolling window of 1 million!!

The c++ code is similar to what Denis posted above ("Here's a simple algorithm for quantized data")

GNU Order Statistic Trees

Just before giving up, I found that stdlibc++ contains order statistic trees!!!

These have two critical operations:

iter = tree.find_by_order(value)order = tree.order_of_key(value)

See libstdc++ manual policy_based_data_structures_test (search for "split and join").

I have wrapped the tree for use in a convenience header for compilers supporting c++0x/c++11 style partial typedefs:

#if !defined(GNU_ORDER_STATISTIC_SET_H)#define GNU_ORDER_STATISTIC_SET_H#include <ext/pb_ds/assoc_container.hpp>#include <ext/pb_ds/tree_policy.hpp>// A red-black tree table storing ints and their order// statistics. Note that since the tree uses// tree_order_statistics_node_update as its update policy, then it// includes its methods by_order and order_of_key.template <typename T>using t_order_statistic_set = __gnu_pbds::tree<                                  T,                                  __gnu_pbds::null_type,                                  std::less<T>,                                  __gnu_pbds::rb_tree_tag,                                  // This policy updates nodes'  metadata for order statistics.                                  __gnu_pbds::tree_order_statistics_node_update>;#endif //GNU_ORDER_STATISTIC_SET_H

c algorithm r statistics median

I've done a C implementation here. A few more details are in this question: Rolling median in C - Turlach implementation.

Sample usage:

int main(int argc, char* argv[]){   int i, v;   Mediator* m = MediatorNew(15);    for (i=0; i<30; i++) {      v = rand() & 127;      printf("Inserting %3d \n", v);      MediatorInsert(m, v);      v = MediatorMedian(m);      printf("Median = %3d.\n\n", v);      ShowTree(m);   }}

CodeHunter

Rolling median algorithm in C

Two multisets

Segment tree

GNU Order Statistic Trees

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last