logo

How to iterate over rows in a Pandas DataFrame


Introduction

Before we explore three common ways to iterate over rows in a Pandas DataFrame, let's read in some data.

import pandas as pd

# Load the drinks dataset
drinks = pd.read_csv('https://andybek.com/pandas-drinks')

iterrows()

The example uses the iterrows() method to iterate over each row in the drinks DataFrame. The method returns an iterator that yields pairs of row index and row data as Series objects, which can be accessed in the loop using the index and row variables.

# Iterate over rows using the iterrows() method
for index, row in drinks.iterrows():
    print(
      row['country'],
      row['beer_servings'],
      row['wine_servings'],
      row['spirit_servings']
    )
Country200020012002
Afghanistannannannan
Albania89.054.0132.0
Algeria25.014.0nan
Andorra245.0312.0138.0
............

One drawback of iterrows() is that it does not preserve data types across the row. For example, if a given column contains integers, the returned value for that column will be a Python integer.

If the column contains floats, the returned value will be a Python float. For that reason, it is usually better to use the itertuples() method, which not only presevers data types, but also yields namedtuples, which are easier to work with than Series

itertuples()

The itertuple method returns an iterator that yields namedtuples representing each row in the DataFrame, with fields corresponding to the column names.

for row in drinks.itertuples(index=False):
    print(
      row.country,
      row.beer_servings,
      row.wine_servings,
      row.spirit_servings
    )
Country200020012002
Afghanistannannannan
Albania89.054.0132.0
Algeria25.014.0nan
Andorra245.0312.0138.0
............

apply()

You can use the apply() function and pass axis=1 as an argument to apply the function to each row.

drinks.apply(lambda row: print(
  row['country'],
  row['beer_servings'],
  row['wine_servings'],
  row['spirit_servings']
), axis=1)
Country200020012002
Afghanistannannannan
Albania89.054.0132.0
Algeria25.014.0nan
Andorra245.0312.0138.0
............

Warning

Iterating in pandas is typically not the most efficient way to do things. In most situations, using a vectorized operation is going to be much faster than iterating over rows. Part of the reason for this is that pandas is built on top of NumPy, which is a library that provides a high-performance multidimensional array object, as well as tools for working with those arrays. Pandas is essentially built around the NumPy array object, and it is much faster to perform operations over the entire array at once rather than iterating over each row.

If you find yourself iterating over rows in a DataFrame, you should consider whether there exists a vectorized solution. If you can't find one, you can always use the apply() function to apply a custom function to each row.

Of course there's nothing wrong with falling back to iterrows() or itertuples(), especially if performance isn't a concern. But if you're working with large datasets, careful consideration of your approach is always a good idea.

🚀🚀 If you want to master pandas, check out my 27 hour bootcamp. Get it for 80% off using link below:

👉👉 The Ultimate Pandas Bootcamp: Advanced Python Data Analysis 👈👈