"Reduce" function for Series

python performance pandas vectorization reduce

With `itertools.chain()` on the values

This could be faster:

from itertools import chaincategories = list(chain.from_iterable(categories.values))

Performance

from functools import reducefrom itertools import chaincategories = pd.Series([['a', 'b'], ['c', 'd', 'e']] * 1000)%timeit list(chain.from_iterable(categories.values))1000 loops, best of 3: 231 µs per loop%timeit list(chain(*categories.values.flat))1000 loops, best of 3: 237 µs per loop%timeit reduce(lambda l1, l2: l1 + l2, categories)100 loops, best of 3: 15.8 ms per loop

For this data set the chaining is about 68x faster.

Vectorization?

Vectorization works when you have native NumPy data types (pandas uses NumPy for its data after all). Since we have lists in the Series already and want a list as result, it is rather unlikely that vectorization will speed things up. The conversion between standard Python objects and pandas/NumPy data types will likely eat up all the performance you might get from the vectorization. I made one attempt to vectorize the algorithm in another answer.

python performance pandas vectorization reduce

Vectorized but slow

You can use NumPy's concatenate:

import numpy as nplist(np.concatenate(categories.values))

Performance

But we have lists, i.e. Python objects already. So the vectorization has to switch back and forth between Python objects and NumPy data types. This make things slow:

categories = pd.Series([['a', 'b'], ['c', 'd', 'e']] * 1000)%timeit list(np.concatenate(categories.values))100 loops, best of 3: 7.66 ms per loop%timeit np.concatenate(categories.values)100 loops, best of 3: 5.33 ms per loop%timeit list(chain.from_iterable(categories.values))1000 loops, best of 3: 231 µs per loop

python performance pandas vectorization reduce

You can try your luck with business["categories"].str.join(''), but I am guessing that Pandas uses Pythons string functions. I doubt you can do better tha what Python already offers you.

CodeHunter

"Reduce" function for Series

With `itertools.chain()` on the values

Performance

Vectorization?

Vectorized but slow

Performance

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last

"Reduce" function for Series

With itertools.chain() on the values

Performance

Vectorization?

Vectorized but slow

Performance

Recent Posts

With `itertools.chain()` on the values