indices of the k largest elements in an unsorted length n array indices of the k largest elements in an unsorted length n array arrays arrays

indices of the k largest elements in an unsorted length n array


Here is my implementation that does what I want and I think is reasonably efficient:

#include <queue>#include <vector>// maxindices.cc// compile with:// g++ -std=c++11 maxindices.cc -o maxindicesint main(){  std::vector<double> test = {0.2, 1.0, 0.01, 3.0, 0.002, -1.0, -20};  std::priority_queue<std::pair<double, int>> q;  for (int i = 0; i < test.size(); ++i) {    q.push(std::pair<double, int>(test[i], i));  }  int k = 3; // number of indices we need  for (int i = 0; i < k; ++i) {    int ki = q.top().second;    std::cout << "index[" << i << "] = " << ki << std::endl;    q.pop();  }}

which gives output:

index[0] = 3index[1] = 1index[2] = 0


This should be an improved version of @hazelnusse which is executed in O(nlogk) instead of O(nlogn)

#include <queue>#include <iostream>#include <vector>// maxindices.cc// compile with:// g++ -std=c++11 maxindices.cc -o maxindicesint main(){  std::vector<double> test = {2, 8, 7, 5, 9, 3, 6, 1, 10, 4};  std::priority_queue< std::pair<double, int>, std::vector< std::pair<double, int> >, std::greater <std::pair<double, int> > > q;    int k = 5; // number of indices we need  for (int i = 0; i < test.size(); ++i) {    if(q.size()<k)        q.push(std::pair<double, int>(test[i], i));    else if(q.top().first < test[i]){        q.pop();        q.push(std::pair<double, int>(test[i], i));    }  }  k = q.size();  std::vector<int> res(k);  for (int i = 0; i < k; ++i) {    res[k - i - 1] = q.top().second;    q.pop();  }  for (int i = 0; i < k; ++i) {    std::cout<< res[i] <<std::endl;  }}

8 4 1 2 6


The question has the partial answer; that is std::nth_element returns the "the n-th statistic" with a property that none of the elements preceding nth one are greater than it, and none of the elements following it are less.

Therefore, just one call to std::nth_element is enough to get the k largest elements. Time complexity will be O(n) which is theoretically the smallest since you have to visit each element at least one time to find the smallest (or in this case k-smallest) element(s). If you need these k elements to be ordered, then you need to order them which will be O(k log(k)). So, in total O(n + k log(k)).