Randomly insert NA's values in a pandas dataframe Randomly insert NA's values in a pandas dataframe python python

Randomly insert NA's values in a pandas dataframe


Here's a way to clear exactly 10% of cells (or rather, as close to 10% as can be achieved with the existing data frame's size).

import randomix = [(row, col) for row in range(df.shape[0]) for col in range(df.shape[1])]for row, col in random.sample(ix, int(round(.1*len(ix)))):    df.iat[row, col] = np.nan

Here's a way to clear cells independently with a per-cell probability of 10%.

df = df.mask(np.random.random(df.shape) < .1)


I think you can easily iterate over data frame columns and assign NaN value to every cell produced by pandas.DataFrame.sample() method.

The code is following.

for col in df.columns:    df.loc[df.sample(frac=0.1).index, col] = pd.np.nan


To add to and modify @Jaroslav Bezděk's code a bit, here is my view. Here, I am assuming that you want to apply the NaNs to numeric variables.

# select only numeric columns to apply the missingness tocols_list = df.select_dtypes('number').columns.tolist()        # randomly remove cases from the dataframefor col in df[cols_list]:    df.loc[df.sample(frac=0.05).index, col] = np.nan

Note: if you use pd.np.nan you get ipython-input-5-e9827aa92133>:9: FutureWarning: The pandas.np module is deprecated and will be removed from pandas in a future version. Import numpy directly instead.