Parsing HTML with BeautifulSoup
Raw HTML is just text. BeautifulSoup turns it into a tree structure you can navigate and search.
from bs4 import BeautifulSoup
html = "<html><body><h1>Hello</h1><p>World</p></body></html>"
soup = BeautifulSoup(html, "html.parser")
Now soup is a searchable object. You can find elements by tag name:
heading = soup.find("h1")
print(heading.text) # Hello
The .text property extracts just the text content, stripping away the HTML tags. This is usually what you want - the actual data, not the markup.
BeautifulSoup handles messy, broken HTML gracefully. Real-world web pages are often imperfect, and BeautifulSoup doesn't complain.
I teach BeautifulSoup from basics to advanced in my Web Scraping course.