Random Sample of a subset of a dataframe in Pandas

You can use the sample method*:

In [11]: df = pd.DataFrame([[1, 2], [3, 4], [5, 6], [7, 8]], columns=["A", "B"])In [12]: df.sample(2)Out[12]:   A  B0  1  22  5  6In [13]: df.sample(2)Out[13]:   A  B3  7  80  1  2

*On one of the section DataFrames.

Note: If you have a larger sample size that the size of the DataFrame this will raise an error unless you sample with replacement.

In [14]: df.sample(5)ValueError: Cannot take a larger sample than population when 'replace=False'In [15]: df.sample(5, replace=True)Out[15]:   A  B0  1  21  3  42  5  63  7  81  3  4

python pandas sample random-sample

One solution is to use the choice function from numpy.

Say you want 50 entries out of 100, you can use:

import numpy as npchosen_idx = np.random.choice(1000, replace=False, size=50)df_trimmed = df.iloc[chosen_idx]

This is of course not considering your block structure. If you want a 50 item sample from block i for example, you can do:

import numpy as npblock_start_idx = 1000 * ichosen_idx = np.random.choice(1000, replace=False, size=50)df_trimmed_from_block_i = df.iloc[block_start_idx + chosen_idx]

python pandas sample random-sample

Thank you, Jeff,But I received an error;

AttributeError: Cannot access callable attribute 'sample' of 'DataFrameGroupBy' objects, try using the 'apply' method

So I suggest instead of sample = df.groupby("section").sample(50) using below command :

df.groupby('section').apply(lambda grp: grp.sample(50))

CodeHunter

Random Sample of a subset of a dataframe in Pandas

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last