Handling Pagination
Most websites split data across multiple pages. To get everything, you need to follow pagination links.
The simplest approach works when URLs follow a pattern:
for page in range(1, 6):
url = f"https://example.com/products?page={page}"
response = requests.get(url)
# Parse and extract data...
When URLs don't follow a pattern, find the "next" link:
while True:
# Scrape current page...
next_link = soup.find("a", class_="next")
if not next_link:
break # No more pages
url = next_link["href"]
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
Always add a small delay between requests to be polite to the server. Rapid-fire requests can get you blocked.
I cover pagination strategies in my Web Scraping course.