numpy, get maximum of subsets

You can use np.maximum.reduceat:

>>> _, idx = np.unique(g, return_index=True)>>> np.maximum.reduceat(v, idx)array([ 4, 74, 10])

More about the workings of the ufunc reduceat method can be found here.

Remark about performance

np.maximum.reduceat is very fast. Generating the indices idx is what takes most of the time here.

While _, idx = np.unique(g, return_index=True) is an elegant way to get the indices, it is not particularly quick.

The reason is that np.unique needs to sort the array first, which is O(n log n) in complexity. For large arrays, this is much more expensive than using several O(n) operations to generate idx.

Therefore, for large arrays it is much faster to use the following instead:

idx = np.concatenate([[0], 1+np.diff(g).nonzero()[0]])np.maximum.reduceat(v, idx)

python arrays numpy max vectorization

Here's one convoluted vectorized approach using masking and broadcasting that puts each group into rows of a regular 2D array and then finds maximum along each row -

# Mask of valid numbers from each group to be put in a regular 2D arraycounts = np.bincount(g)mask = np.arange(counts.max()) < counts[:,None]# Group each group into rows of a 2D array and find max along ech rowgrouped_2Darray = np.empty(mask.shape)grouped_2Darray.fill(np.nan)grouped_2Darray[mask] = vout = np.nanmax(grouped_2Darray,1)

Sample run -

In [52]: gOut[52]: array([0, 0, 0, 0, 1, 1, 1, 1, 2, 2])In [53]: vOut[53]: array([ 1,  2,  3,  4, 74, 73, 72, 71,  9, 10])In [54]: grouped_2Darray # Notice how elements from v are stackedOut[54]: array([[  1.,   2.,   3.,   4.],       [ 74.,  73.,  72.,  71.],       [  9.,  10.,  nan,  nan]])In [55]: np.nanmax(grouped_2Darray,1)Out[55]: array([  4.,  74.,  10.])

python arrays numpy max vectorization

You can create your mask like following and use map function :

>>> mask=np.diff(g)!=0>>> map(np.max,np.split(v,np.where(mask)[0]+1))[4, 74, 10]

If you don't want to get a generator with map you can use a list comprehension to achieve same result in list, and note that the iteration of list comprehension has performed at C language speed inside the interpreter, like built-in functions.

[np.max(arr) for arr in np.split(v,np.where(mask)[0]+1)]

But I think the numpythonic solution still is better to use.

CodeHunter

numpy, get maximum of subsets

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last