Finding Children Nodes With Beautiful Soup

Chưa phân loại
The task of web scraping is one that requires the understanding of how web pages are structured. To get the needed information from web pages, one needs to understand the structure of web pages, analyze the tags that hold the needed information and then the attributes of those tags.

For beginners in web scraping with BeautifulSoup, an article discussing the concepts of web scraping with this powerful library can be found here.

This article is for programmers, data analysts, scientists or engineers who already have the skillset of extracting content from web pages using BeautifulSoup. If you do not have any knowledge of this library,  I advise you to go through the BeautifulSoup tutorial for beginners.

Now we can proceed — I want to believe that you already have this library installed.  If not, you can do this using the command below:

pip install BeautifulSoup4

Since we are working with extracting data from HTML, we need to have a basic HTML page to practice these concepts on.  For this article, we would use this HTML snippet for practice. I am going to assign the following HTML snippet to a variable using the triple quotes in Python.

sample_content = """<html>

To make an unordered list, the ul tag is used:
Here’s an unordered list
<li>First option</li>
<li>Second option</li>

To make an ordered list, the ol tag is used:

Here’s an ordered list

<li>Number One</li>
<li>Number Two</li>
<p>Linux Hint, 2018</p>

Now that we have sorted that, let’s move right into working with the BeautifulSoup library.

We are going to be making use of a couple of methods and attributes which we would be calling on our BeautifulSoup object. However, we would need to parse our string using BeautifulSoup and then assign to an “our_soup” variable.

from bs4 import BeautifulSoup as bso
our_soup = bso(sample_content, "lxml")

Henceforth, we would be working with the “our_soup” variable and calling all of our attributes or methods on it.

On a quick note, if you do not already know what a child node is, it is basically a node (tag) that exists inside another node. In our HTML snippet for example, the li tags are children nodes of both the “ul” and the “ol” tags.

Here are the methods we would be taking a look at:

  • findChild
  • findChildren
  • contents
  • children
  • descendants


The findChild method is used to find the first child node of HTML elements. For example when we take a look at our “ol” or “ul” tags, we would find two children tags in it. However when we use the findChild method, it only returns the first node as the child node.

This method could prove very useful when we want to get only the first child node of an HTML element, as it returns the required result right away.

The returned object is of the type bs4.element.Tag. We can extract the text from it by calling the text attribute on it.

Here’s an example:

first_child = our_soup.find("body").find("ol")

 The code above would return the following:

<li>Number One</li>

To get the text from the tag, we call the text attribute on it.



To get the following result:

‘Number One’

We have taken a look at the findChild method and seen how it works. The findChildren method works in similar ways, however as the name implies, it doesn’t find only one child node, it gets all of the children nodes in a tag.

When you need to get all the children nodes in a tag, the findChildren method is the way to go. This method returns all of the children nodes in a list, you can access the tag of your choice using its index number.

Here’s an example:

first_child = our_soup.find("body").find("ol")

This would return the children nodes in a list:

[<li>Number One</li>, <li>Number Two</li>]

To get the second child node in the list, the following code would do the job:


To get the following result:

<li>Number Two</li>

That’s all BeautifulSoup provides when it comes to methods. However, it doesn’t end there. Attributes can also be called on our BeautifulSoup objects to get the child/children/descendant node from an HTML element.


While the findChildren method did the straightforward job of extracting the children nodes, the contents attributes does something a bit different.

The contents attribute returns a list of all the content in an HTML element, including the children nodes. So when you call the contents attribute on a BeautifulSoup object, it would return the text as strings and the nodes in the tags as a bs4.element.Tag object.

Here’s an example:

first_child = our_soup.find("body").find("ol")

This returns the following:

["n   Here’s an ordered listn   ", <li>Number One</li>,
n, <li>Number Two</li>, n]

As you can see, the list contains the text that comes before a child node, the child node and the text that comes after the child node.

To access the second child node, all we need to do is to make use of its index number as shown below:


This would return the following:

<li>Number Two</li>


Here is one attribute that does almost the same thing as the contents attribute. However, it has one small difference that could make a huge impact (for those that take code optimization seriously).

The children attribute also returns the text that comes before a child node, the child node itself and the text that comes after the child node. The difference here is that it returns them as a generator instead of a list.

Let’s take a look at the following example:

first_child = our_soup.find("body").find("ol")

The code above gives the following results (the address on your machine doesn’t have to tally with the one below):

<list_iterator object at 0x7f9c14b99908>

As you can see, it only returns the address of the generator. We could convert this generator into a list.

We can see this in the example below:

first_child = our_soup.find("body").find("ol")

This gives the following result:

["n        Here’s an ordered listn        ", <li>Number One</li>,
‘n’, <li>Number Two</li>, ‘n’]


While the children attribute works on getting only the content inside a tag i.e. the text, and nodes on the first level, the descendants attribute goes deeper and does more.

The descendants attribute gets all of the text and nodes that exist in children nodes. So it doesn’t return only children nodes, it returns grandchildren nodes as well.

Asides returning the text and tags, it also returns the content in the tags as strings too.

Just like the children attribute, descendants returns its results as a generator.

We can see this below:

first_child = our_soup.find("body").find("ol")

This gives the following result:

<generator object descendants at 0x7f9c14b6d8e0>

As seen earlier, we can then convert this generator object into a list:

first_child = our_soup.find("body").find("ol")

We would get the list below:

["n   Here’s an ordered listn   ", <li>Number One</li>,
‘Number One’, ‘n’, <li>Number Two</li>, ‘Number Two’, ‘n’]


There you have it, five different ways to access children nodes in HTML elements. There could be more ways, however with the methods and attributes discussed in this article one should be able to access the child node of any HTML element.

ONET IDC thành lập vào năm 2012, là công ty chuyên nghiệp tại Việt Nam trong lĩnh vực cung cấp dịch vụ Hosting, VPS, máy chủ vật lý, dịch vụ Firewall Anti DDoS, SSL… Với 10 năm xây dựng và phát triển, ứng dụng nhiều công nghệ hiện đại, ONET IDC đã giúp hàng ngàn khách hàng tin tưởng lựa chọn, mang lại sự ổn định tuyệt đối cho website của khách hàng để thúc đẩy việc kinh doanh đạt được hiệu quả và thành công.
Bài viết liên quan

10 Best Blue Color Wallpapers for Ubuntu Desktop

BLUE the color of the ocean and the sky and water, three of the most fascinating things in human race. Ever wondered why...

How to Install Jenkins with Docker on Ubuntu 18.04

What is Docker? Docker is a free and open source software tool that can be used to pack, ship and run any application as...

Install Sublime Text 3 on Ubuntu

Install Sublime Text 3 on Ubuntu 17.10 Sublime Text is a very popular text editor for programmers and developers. It supports...
Bài Viết

Bài Viết Mới Cập Nhật

Huớng dẫn dùng proxy cho ios, iphone 2023

Cách gắn set proxy cho điện thoại android, oppo, giả lập android, Ldplayer Bằng Proxydroid

Mua Proxy Socks5 VN Chơi Game Gia Lập Tăng Cường Trải Nghiệm Chơi Game

Mua Proxy Mỹ, Us Nuôi Tài Khoản Etsy, eBay Tìm Hiểu Về Mua Proxy Mỹ tại

Mua Proxy Game – Giải pháp tuyệt vời cho việc chơi game trên mạng mà không bị giới hạn về vị trí địa lý