logo

Handling Pagination

Most websites split data across multiple pages. To get everything, you need to follow pagination links.

The simplest approach works when URLs follow a pattern:

for page in range(1, 6):
    url = f"https://example.com/products?page={page}"
    response = requests.get(url)
    # Parse and extract data...

When URLs don't follow a pattern, find the "next" link:

while True:
    # Scrape current page...

    next_link = soup.find("a", class_="next")
    if not next_link:
        break  # No more pages

    url = next_link["href"]
    response = requests.get(url)
    soup = BeautifulSoup(response.text, "html.parser")

Always add a small delay between requests to be polite to the server. Rapid-fire requests can get you blocked.

I cover pagination strategies in my Web Scraping course.