numpy merge sorted array to an new array?

You can use

from numpy import concatenate, sortc = concatenate((a,b))c.sort(kind='mergesort')

I am afraid you can't do better than this, unless you write your own sorting function as a python extension, à la cython.

See this question for a similar problem, but keeping only the unique values in the merged array. The benchmarks and comments there are insightful as well.

python sorting numpy

The sortednp package implements an efficient merge of sorted numpy-arrays:

import numpy as npimport sortednpa = np.array([1,3,5])b = np.array([2,4,6])c = sortednp.merge(a, b) # c == np.array([1,2,3,4,5,6])

Inspired by Sander's post, I measured numpy's mergesort (v1.17.4), Sander's answer, and sortednp (v0.2.1) for different array-sizes and ratios of the sizes between a and b using the following code:

from timeit import timeit as timport sortednp as snpimport numpy as npdef numpy_mergesort(a, b):    c = np.concatenate((a,b))    c.sort(kind='mergesort')    return cdef sanders_merge(a, b):    if len(a) < len(b):        b, a = a, b    c = np.empty(len(a) + len(b), dtype=a.dtype)    b_indices = np.arange(len(b)) + np.searchsorted(a, b)    a_indices = np.ones(len(c), dtype=bool)    a_indices[b_indices] = False    c[b_indices] = b    c[a_indices] = a    return cresults = []for size_factor in range(3):    for max_digits in range(3, 8):        size = 10**max_digits        # size difference of a factor 10 here makes the difference!        a = np.arange(size // 10**size_factor, dtype=np.int)        b = np.arange(size, dtype=np.int)        assert np.array_equal(numpy_mergesort(a, b), sanders_merge(a, b))        assert np.array_equal(numpy_mergesort(a, b), snp.merge(a, b))        classic = t(lambda: numpy_mergesort(a, b), number=10)        sanders = t(lambda: sanders_merge(a, b), number=10)        snp_result = t(lambda: snp.merge(a, b), number=10)        results.append((size_factor, max_digits, classic, sanders, snp_result))text_format = " ".join(["{:<18}"] * 5)print(text_format.format("log10(size factor)", "log10(max size)", "np mergesort", "Sander's merge", "sortednp"))table_format = "            ".join(["{:.5f}"] * 5)for result in results:    print(table_format.format(*result))

The results show that sortednp consistently is the fastest implementation:

log10(size factor) log10(max size)    np mergesort       Sander's merge     sortednp          0.00000            3.00000            0.00016            0.00062            0.000050.00000            4.00000            0.00135            0.00469            0.000290.00000            5.00000            0.01160            0.03813            0.002920.00000            6.00000            0.14952            0.54160            0.035270.00000            7.00000            2.00566            5.91691            0.671191.00000            3.00000            0.00005            0.00017            0.000021.00000            4.00000            0.00019            0.00058            0.000141.00000            5.00000            0.00304            0.00633            0.001371.00000            6.00000            0.03743            0.06893            0.018281.00000            7.00000            0.62334            1.01523            0.387322.00000            3.00000            0.00004            0.00015            0.000022.00000            4.00000            0.00012            0.00028            0.000132.00000            5.00000            0.00217            0.00275            0.001222.00000            6.00000            0.03457            0.03205            0.015242.00000            7.00000            0.51307            0.50120            0.34335

python sorting numpy

When one array is considerably larger than the other, a decent speedup (5-fold on my pc) can be obtained by doing a np.searchorted, which is limited in speed primarily by searching insertion indices of the smaller array:

import numpy as npdef classic_merge(a, b):    c = np.concatenate((a,b))    c.sort(kind='mergesort')    return cdef new_merge(a, b):    if len(a) < len(b):        b, a = a, b    c = np.empty(len(a) + len(b), dtype=a.dtype)    b_indices = np.arange(len(b)) + np.searchsorted(a, b)    a_indices = np.ones(len(c), dtype=bool)    a_indices[b_indices] = False    c[b_indices] = b    c[a_indices] = a    return c

Timing gives:

from timeit import timeit as tresults = []for size_digits in range(2, 8):    size = 10**size_digits    # size difference of a factor 10 here makes the difference!    a = np.arange(size // 10, dtype=np.int)    b = np.arange(size, dtype=np.int)    classic = t(lambda: classic_merge(a, b), number=10)    new = t(lambda: new_merge(a, b), number=10)    results.append((size_digits, classic, new))if True:    text_format = " ".join(["{:<15}"] * 3)    print(text_format.format("log10(size)", "Classic", "New"))    table_format = "         ".join(["{:.5f}"] * 3)    for result in results:        print(table_format.format(*result))log10(size)     Classic         New            2.00000         0.00009         0.000273.00000         0.00021         0.000304.00000         0.00233         0.000825.00000         0.02827         0.006016.00000         0.33322         0.060597.00000         4.40571         0.86764

When a and b are roughly of equal length differences are smaller:

from timeit import timeit as tresults = []for size_digits in range(2, 8):    size = 10**size_digits    # same size    a = np.arange(size , dtype=np.int)    b = np.arange(size, dtype=np.int)    classic = t(lambda: classic_merge(a, b), number=10)    new = t(lambda: new_merge(a, b), number=10)    results.append((size_digits, classic, new))if True:    text_format = " ".join(["{:<15}"] * 3)    print(text_format.format("log10(size)", "Classic", "New"))    table_format = "         ".join(["{:.5f}"] * 3)    for result in results:        print(table_format.format(*result))log10(size)     Classic         New            2.00000         0.00026         0.000873.00000         0.00108         0.001824.00000         0.01257         0.012435.00000         0.16333         0.126926.00000         1.05006         0.491867.00000         8.35967         5.93732

CodeHunter

numpy merge sorted array to an new array?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last