pandas.apply expects output shape (Shape of passed values is (x,), indices imply (x,y)) pandas.apply expects output shape (Shape of passed values is (x,), indices imply (x,y)) numpy numpy

pandas.apply expects output shape (Shape of passed values is (x,), indices imply (x,y))


Method apply(func) loops over rows (or cols) and applies func to every row. The results of func are then put in a new data frame or a series. If func returns a scalar value (as e.g. sum does) then it's a series. If it returns an array, list or series, then the result is a frame of dimensions depending on the length of this array.

In your code, func returns arrays of different length (intervals lengths), which cannot be put in a frame. Hence the error. (Actually, the first error you get is probably something like this: ValueError: could not broadcast input array from shape (5) into shape (9).)

The line

return reduce(union1d,intervals),'foobar'

returns a tuple, so the result of apply is a series. And

return [reduce(union1d,intervals),'foobar']

returns a list of length 2. So, you get here an n x 2 data frame. The dimensions coincide with the input data frame, so pandas assumes you wanted to modify cells of your original frame (something like applying lambda x: 2*x) and it keeps the column names.

A solution that would probably work would be to change range(x, y) in your function to tuple(range(x, y)). But it's neither efficient nor pythonic. A better one is to replace apply with an explicit loop over rows, as for example:

def coverage(table):    intervals = []    for row in table.itertuples():        intervals += list(range(row.Start, row.End + 1))    return np.unique(intervals)