Shuffle DataFrame rows

python pandas dataframe permutation shuffle

The idiomatic way to do this with Pandas is to use the .sample method of your dataframe to sample all rows without replacement:

df.sample(frac=1)

The frac keyword argument specifies the fraction of rows to return in the random sample, so frac=1 means return all rows (in random order).

Note:If you wish to shuffle your dataframe in-place and reset the index, you could do e.g.

df = df.sample(frac=1).reset_index(drop=True)

Here, specifying drop=True prevents .reset_index from creating a column containing the old index entries.

Follow-up note: Although it may not look like the above operation is in-place, python/pandas is smart enough not to do another malloc for the shuffled object. That is, even though the reference object has changed (by which I mean id(df_old) is not the same as id(df_new)), the underlying C object is still the same. To show that this is indeed the case, you could run a simple memory profiler:

$ python3 -m memory_profiler .\test.pyFilename: .\test.pyLine #    Mem usage    Increment   Line Contents================================================     5     68.5 MiB     68.5 MiB   @profile     6                             def shuffle():     7    847.8 MiB    779.3 MiB       df = pd.DataFrame(np.random.randn(100, 1000000))     8    847.9 MiB      0.1 MiB       df = df.sample(frac=1).reset_index(drop=True)

python pandas dataframe permutation shuffle

You can simply use sklearn for this

from sklearn.utils import shuffledf = shuffle(df)

python pandas dataframe permutation shuffle

You can shuffle the rows of a dataframe by indexing with a shuffled index. For this, you can eg use np.random.permutation (but np.random.choice is also a possibility):

In [12]: df = pd.read_csv(StringIO(s), sep="\s+")In [13]: dfOut[13]:     Col1  Col2  Col3  Type0      1     2     3     11      4     5     6     120     7     8     9     221    10    11    12     245    13    14    15     346    16    17    18     3In [14]: df.iloc[np.random.permutation(len(df))]Out[14]:     Col1  Col2  Col3  Type46    16    17    18     345    13    14    15     320     7     8     9     20      1     2     3     11      4     5     6     121    10    11    12     2

If you want to keep the index numbered from 1, 2, .., n as in your example, you can simply reset the index: df_shuffled.reset_index(drop=True)

CodeHunter

Shuffle DataFrame rows

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last