Why is `arr.take(idx)` faster than `arr[idx]` Why is `arr.take(idx)` faster than `arr[idx]` numpy numpy