Update a dataframe in pandas while iterating row by row

You can assign values in the loop using df.set_value:

for i, row in df.iterrows():    ifor_val = something    if <condition>:        ifor_val = something_else    df.set_value(i,'ifor',ifor_val)

If you don't need the row values you could simply iterate over the indices of df, but I kept the original for-loop in case you need the row value for something not shown here.

update

df.set_value() has been deprecated since version 0.21.0you can use df.at() instead:

for i, row in df.iterrows():    ifor_val = something    if <condition>:        ifor_val = something_else    df.at[i,'ifor'] = ifor_val

python pandas updates dataframe

Pandas DataFrame object should be thought of as a Series of Series. In other words, you should think of it in terms of columns. The reason why this is important is because when you use pd.DataFrame.iterrows you are iterating through rows as Series. But these are not the Series that the data frame is storing and so they are new Series that are created for you while you iterate. That implies that when you attempt to assign tho them, those edits won't end up reflected in the original data frame.

Ok, now that that is out of the way: What do we do?

Suggestions prior to this post include:

pd.DataFrame.set_value is deprecated as of Pandas version 0.21
pd.DataFrame.ix is deprecated
pd.DataFrame.loc is fine but can work on array indexers and you can do better

My recommendation
Use pd.DataFrame.at

for i in df.index:    if <something>:        df.at[i, 'ifor'] = x    else:        df.at[i, 'ifor'] = y

You can even change this to:

for i in df.index:    df.at[i, 'ifor'] = x if <something> else y

Response to comment

and what if I need to use the value of the previous row for the if condition?

for i in range(1, len(df) + 1):    j = df.columns.get_loc('ifor')    if <something>:        df.iat[i - 1, j] = x    else:        df.iat[i - 1, j] = y

python pandas updates dataframe

A method you can use is itertuples(), it iterates over DataFrame rows as namedtuples, with index value as first element of the tuple. And it is much much faster compared with iterrows(). For itertuples(), each row contains its Index in the DataFrame, and you can use loc to set the value.

for row in df.itertuples():    if <something>:        df.at[row.Index, 'ifor'] = x    else:        df.at[row.Index, 'ifor'] = x    df.loc[row.Index, 'ifor'] = x

Under most cases, itertuples() is faster than iat or at.

Thanks @SantiStSupery, using .at is much faster than loc.

CodeHunter

Update a dataframe in pandas while iterating row by row

Response to comment

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last