Extracting Attributes
Sometimes you need more than text. Links have URLs, images have sources, and forms have actions. These live in HTML attributes.
link = soup.find("a")
url = link["href"] # Get the href attribute
Access attributes like dictionary keys. Common ones you'll extract:
img = soup.find("img")
print(img["src"]) # Image URL
print(img.get("alt")) # Alt text (safely)
Using .get() is safer - it returns None instead of crashing if the attribute doesn't exist.
For links, you often want both the text and the URL:
for link in soup.find_all("a"):
print(link.text, "->", link.get("href"))
Real scraping is often about collecting these attributes - URLs to follow, image sources to download, data attributes that contain information.
Learn attribute extraction in my Web Scraping course.