How are iloc and loc different?

Label vs. Location

The main distinction between the two methods is:

loc gets rows (and/or columns) with particular labels.
iloc gets rows (and/or columns) at integer locations.

To demonstrate, consider a series s of characters with a non-monotonic integer index:

>>> s = pd.Series(list("abcdef"), index=[49, 48, 47, 0, 1, 2]) 49    a48    b47    c0     d1     e2     f>>> s.loc[0]    # value at index label 0'd'>>> s.iloc[0]   # value at index location 0'a'>>> s.loc[0:1]  # rows at index labels between 0 and 1 (inclusive)0    d1    e>>> s.iloc[0:1] # rows at index location between 0 and 1 (exclusive)49    a

Here are some of the differences/similarities between s.loc and s.iloc when passed various objects:

<object>	description	`s.loc[<object>]`	`s.iloc[<object>]`
`0`	single item	Value at index label `0` (the string `'d'`)	Value at index location 0 (the string `'a'`)
`0:1`	slice	Two rows (labels `0` and `1`)	One row (first row at location 0)
`1:47`	slice with out-of-bounds end	Zero rows (empty Series)	Five rows (location 1 onwards)
`1:47:-1`	slice with negative step	three rows (labels `1` back to `47`)	Zero rows (empty Series)
`[2, 0]`	integer list	Two rows with given labels	Two rows with given locations
`s > 'e'`	Bool series (indicating which values have the property)	One row (containing `'f'`)	`NotImplementedError`
`(s>'e').values`	Bool array	One row (containing `'f'`)	Same as `loc`
`999`	int object not in index	`KeyError`	`IndexError` (out of bounds)
`-1`	int object not in index	`KeyError`	Returns last value in `s`
`lambda x: x.index[3]`	callable applied to series (here returning 3^rd item in index)	`s.loc[s.index[3]]`	`s.iloc[s.index[3]]`

loc's label-querying capabilities extend well-beyond integer indexes and it's worth highlighting a couple of additional examples.

Here's a Series where the index contains string objects:

>>> s2 = pd.Series(s.index, index=s.values)>>> s2a    49b    48c    47d     0e     1f     2

Since loc is label-based, it can fetch the first value in the Series using s2.loc['a']. It can also slice with non-integer objects:

>>> s2.loc['c':'e']  # all rows lying between 'c' and 'e' (inclusive)c    47d     0e     1

For DateTime indexes, we don't need to pass the exact date/time to fetch by label. For example:

>>> s3 = pd.Series(list('abcde'), pd.date_range('now', periods=5, freq='M')) >>> s32021-01-31 16:41:31.879768    a2021-02-28 16:41:31.879768    b2021-03-31 16:41:31.879768    c2021-04-30 16:41:31.879768    d2021-05-31 16:41:31.879768    e

Then to fetch the row(s) for March/April 2021 we only need:

>>> s3.loc['2021-03':'2021-04']2021-03-31 17:04:30.742316    c2021-04-30 17:04:30.742316    d

Rows and Columns

loc and iloc work the same way with DataFrames as they do with Series. It's useful to note that both methods can address columns and rows together.

When given a tuple, the first element is used to index the rows and, if it exists, the second element is used to index the columns.

Consider the DataFrame defined below:

>>> import numpy as np >>> df = pd.DataFrame(np.arange(25).reshape(5, 5),                        index=list('abcde'),                       columns=['x','y','z', 8, 9])>>> df    x   y   z   8   9a   0   1   2   3   4b   5   6   7   8   9c  10  11  12  13  14d  15  16  17  18  19e  20  21  22  23  24

Then for example:

>>> df.loc['c': , :'z']  # rows 'c' and onwards AND columns up to 'z'    x   y   zc  10  11  12d  15  16  17e  20  21  22>>> df.iloc[:, 3]        # all rows, but only the column at index location 3a     3b     8c    13d    18e    23

Sometimes we want to mix label and positional indexing methods for the rows and columns, somehow combining the capabilities of loc and iloc.

For example, consider the following DataFrame. How best to slice the rows up to and including 'c' and take the first four columns?

>>> import numpy as np >>> df = pd.DataFrame(np.arange(25).reshape(5, 5),                        index=list('abcde'),                       columns=['x','y','z', 8, 9])>>> df    x   y   z   8   9a   0   1   2   3   4b   5   6   7   8   9c  10  11  12  13  14d  15  16  17  18  19e  20  21  22  23  24

We can achieve this result using iloc and the help of another method:

>>> df.iloc[:df.index.get_loc('c') + 1, :4]    x   y   z   8a   0   1   2   3b   5   6   7   8c  10  11  12  13

get_loc() is an index method meaning "get the position of the label in this index". Note that since slicing with iloc is exclusive of its endpoint, we must add 1 to this value if we want row 'c' as well.

python pandas indexing dataframe

iloc works based on integer positioning. So no matter what your row labels are, you can always, e.g., get the first row by doing

df.iloc[0]

or the last five rows by doing

df.iloc[-5:]

You can also use it on the columns. This retrieves the 3rd column:

df.iloc[:, 2]    # the : in the first position indicates all rows

You can combine them to get intersections of rows and columns:

df.iloc[:3, :3] # The upper-left 3 X 3 entries (assuming df has 3+ rows and columns)

On the other hand, .loc use named indices. Let's set up a data frame with strings as row and column labels:

df = pd.DataFrame(index=['a', 'b', 'c'], columns=['time', 'date', 'name'])

Then we can get the first row by

df.loc['a']     # equivalent to df.iloc[0]

and the second two rows of the 'date' column by

df.loc['b':, 'date']   # equivalent to df.iloc[1:, 1]

and so on. Now, it's probably worth pointing out that the default row and column indices for a DataFrame are integers from 0 and in this case iloc and loc would work in the same way. This is why your three examples are equivalent. If you had a non-numeric index such as strings or datetimes, df.loc[:5] would raise an error.

Also, you can do column retrieval just by using the data frame's __getitem__:

df['time']    # equivalent to df.loc[:, 'time']

Now suppose you want to mix position and named indexing, that is, indexing using names on rows and positions on columns (to clarify, I mean select from our data frame, rather than creating a data frame with strings in the row index and integers in the column index). This is where .ix comes in:

df.ix[:2, 'time']    # the first two rows of the 'time' column

I think it's also worth mentioning that you can pass boolean vectors to the loc method as well. For example:

 b = [True, False, True] df.loc[b]

Will return the 1st and 3rd rows of df. This is equivalent to df[b] for selection, but it can also be used for assigning via boolean vectors:

df.loc[b, 'name'] = 'Mary', 'John'

python pandas indexing dataframe

In my opinion, the accepted answer is confusing, since it uses a DataFrame with only missing values. I also do not like the term position-based for .iloc and instead, prefer integer location as it is much more descriptive and exactly what .iloc stands for. The key word is INTEGER - .iloc needs INTEGERS.

See my extremely detailed blog series on subset selection for more

.ix is deprecated and ambiguous and should never be used

Because .ix is deprecated we will only focus on the differences between .loc and .iloc.

Before we talk about the differences, it is important to understand that DataFrames have labels that help identify each column and each index. Let's take a look at a sample DataFrame:

df = pd.DataFrame({'age':[30, 2, 12, 4, 32, 33, 69],                   'color':['blue', 'green', 'red', 'white', 'gray', 'black', 'red'],                   'food':['Steak', 'Lamb', 'Mango', 'Apple', 'Cheese', 'Melon', 'Beans'],                   'height':[165, 70, 120, 80, 180, 172, 150],                   'score':[4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],                   'state':['NY', 'TX', 'FL', 'AL', 'AK', 'TX', 'TX']                   },                  index=['Jane', 'Nick', 'Aaron', 'Penelope', 'Dean', 'Christina', 'Cornelia'])

All the words in bold are the labels. The labels, age, color, food, height, score and state are used for the columns. The other labels, Jane, Nick, Aaron, Penelope, Dean, Christina, Cornelia are used for the index.

The primary ways to select particular rows in a DataFrame are with the .loc and .iloc indexers. Each of these indexers can also be used to simultaneously select columns but it is easier to just focus on rows for now. Also, each of the indexers use a set of brackets that immediately follow their name to make their selections.

.loc selects data only by labels

We will first talk about the .loc indexer which only selects data by the index or column labels. In our sample DataFrame, we have provided meaningful names as values for the index. Many DataFrames will not have any meaningful names and will instead, default to just the integers from 0 to n-1, where n is the length of the DataFrame.

There are three different inputs you can use for .loc

A string
A list of strings
Slice notation using strings as the start and stop values

Selecting a single row with .loc with a string

To select a single row of data, place the index label inside of the brackets following .loc.

df.loc['Penelope']

This returns the row of data as a Series

age           4color     whitefood      Appleheight       80score       3.3state        ALName: Penelope, dtype: object

Selecting multiple rows with .loc with a list of strings

df.loc[['Cornelia', 'Jane', 'Dean']]

This returns a DataFrame with the rows in the order specified in the list:

Selecting multiple rows with .loc with slice notation

Slice notation is defined by a start, stop and step values. When slicing by label, pandas includes the stop value in the return. The following slices from Aaron to Dean, inclusive. Its step size is not explicitly defined but defaulted to 1.

df.loc['Aaron':'Dean']

Complex slices can be taken in the same manner as Python lists.

.iloc selects data only by integer location

Let's now turn to .iloc. Every row and column of data in a DataFrame has an integer location that defines it. This is in addition to the label that is visually displayed in the output. The integer location is simply the number of rows/columns from the top/left beginning at 0.

There are three different inputs you can use for .iloc

An integer
A list of integers
Slice notation using integers as the start and stop values

Selecting a single row with .iloc with an integer

df.iloc[4]

This returns the 5th row (integer location 4) as a Series

age           32color       grayfood      Cheeseheight       180score        1.8state         AKName: Dean, dtype: object

Selecting multiple rows with .iloc with a list of integers

df.iloc[[2, -2]]

This returns a DataFrame of the third and second to last rows:

Selecting multiple rows with .iloc with slice notation

df.iloc[:5:3]

Simultaneous selection of rows and columns with .loc and .iloc

One excellent ability of both .loc/.iloc is their ability to select both rows and columns simultaneously. In the examples above, all the columns were returned from each selection. We can choose columns with the same types of inputs as we do for rows. We simply need to separate the row and column selection with a comma.

For example, we can select rows Jane, and Dean with just the columns height, score and state like this:

df.loc[['Jane', 'Dean'], 'height':]

This uses a list of labels for the rows and slice notation for the columns

We can naturally do similar operations with .iloc using only integers.

df.iloc[[1,4], 2]Nick      LambDean    CheeseName: food, dtype: object

Simultaneous selection with labels and integer location

.ix was used to make selections simultaneously with labels and integer location which was useful but confusing and ambiguous at times and thankfully it has been deprecated. In the event that you need to make a selection with a mix of labels and integer locations, you will have to make both your selections labels or integer locations.

For instance, if we want to select rows Nick and Cornelia along with columns 2 and 4, we could use .loc by converting the integers to labels with the following:

col_names = df.columns[[2, 4]]df.loc[['Nick', 'Cornelia'], col_names]

Or alternatively, convert the index labels to integers with the get_loc index method.

labels = ['Nick', 'Cornelia']index_ints = [df.index.get_loc(label) for label in labels]df.iloc[index_ints, [2, 4]]

Boolean Selection

The .loc indexer can also do boolean selection. For instance, if we are interested in finding all the rows wher age is above 30 and return just the food and score columns we can do the following:

df.loc[df['age'] > 30, ['food', 'score']]

You can replicate this with .iloc but you cannot pass it a boolean series. You must convert the boolean Series into a numpy array like this:

df.iloc[(df['age'] > 30).values, [2, 4]]

Selecting all rows

It is possible to use .loc/.iloc for just column selection. You can select all the rows by using a colon like this:

df.loc[:, 'color':'score':2]

The indexing operator, `[]`, can select rows and columns too but not simultaneously.

Most people are familiar with the primary purpose of the DataFrame indexing operator, which is to select columns. A string selects a single column as a Series and a list of strings selects multiple columns as a DataFrame.

df['food']Jane          SteakNick           LambAaron         MangoPenelope      AppleDean         CheeseChristina     MelonCornelia      BeansName: food, dtype: object

Using a list selects multiple columns

df[['food', 'score']]

What people are less familiar with, is that, when slice notation is used, then selection happens by row labels or by integer location. This is very confusing and something that I almost never use but it does work.

df['Penelope':'Christina'] # slice rows by label

df[2:6:2] # slice rows by integer location

The explicitness of .loc/.iloc for selecting rows is highly preferred. The indexing operator alone is unable to select rows and columns simultaneously.

df[3:5, 'color']TypeError: unhashable type: 'slice'

CodeHunter

How are iloc and loc different?

Label vs. Location

Rows and Columns

.ix is deprecated and ambiguous and should never be used

.loc selects data only by labels

.iloc selects data only by integer location

Simultaneous selection of rows and columns with .loc and .iloc

Simultaneous selection with labels and integer location

Boolean Selection

Selecting all rows

The indexing operator, `[]`, can select rows and columns too but not simultaneously.

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last

How are iloc and loc different?

Label vs. Location

Rows and Columns

.ix is deprecated and ambiguous and should never be used

.loc selects data only by labels

.iloc selects data only by integer location

Simultaneous selection of rows and columns with .loc and .iloc

Simultaneous selection with labels and integer location

Boolean Selection

Selecting all rows

The indexing operator, [], can select rows and columns too but not simultaneously.

Recent Posts

The indexing operator, `[]`, can select rows and columns too but not simultaneously.