How should I get the shape of a dask dataframe? How should I get the shape of a dask dataframe? python python

How should I get the shape of a dask dataframe?


You can get the number of columns directly

len(df.columns)  # this is fast

You can also call len on the dataframe itself, though beware that this will trigger a computation.

len(df)  # this requires a full scan of the data

Dask.dataframe doesn't know how many records are in your data without first reading through all of it.


With shape you can do the following

a = df.shapea[0].compute(),a[1]

This will shop the shape just as it is shown with pandas


Well, I know this is a quite old question, but I had the same issue and I got an out-of-the-box solution which I just want to register here.

Considering your data, I'm wondering that it is originally saved in a CSV similar file; so, for my situation, I just count the lines of that file (minus one, the header line). Inspired by this answer here, this is the solution I'm using:

import dask.dataframe as ddfrom itertools import (takewhile,repeat) def rawincount(filename):    f = open(filename, 'rb')    bufgen = takewhile(lambda x: x, (f.raw.read(1024*1024) for _ in repeat(None)))    return sum( buf.count(b'\n') for buf in bufgen )filename = 'myHugeDataframe.csv'df = dd.read_csv(filename)df_shape = (rawincount(filename) - 1, len(df.columns))print(f"Shape: {df_shape}")

Hope this could help someone else as well.