Google News
logo
Jsoup - Interview Questions
How can you extract text content from HTML elements using Jsoup?
In Jsoup, you can extract text content from HTML elements using the text() method provided by the Element class. This method retrieves the combined text content of an element and all its descendant elements, excluding any HTML tags. Here's how you can extract text content from HTML elements using Jsoup:

1. Using the text() Method : You can call the text() method on an Element object to retrieve its text content:
Element element = document.getElementById("example");
String textContent = element.text();?

This will retrieve the text content of the element identified by the ID "example".

2. Extracting Text Content from Multiple Elements : You can also extract text content from multiple elements by iterating over a collection of Element objects and calling the text() method for each element :
Elements elements = document.getElementsByTag("p");
for (Element element : elements) {
    String textContent = element.text();
    System.out.println(textContent);
}?

This example retrieves text content from all <p> elements in the HTML document.

3. Handling Whitespace : By default, Jsoup preserves the whitespace in the text content of HTML elements. If you want to normalize the whitespace (remove leading and trailing whitespace, collapse multiple consecutive whitespace characters into a single space), you can use the normalize() method before extracting the text content :
String normalizedTextContent = element.text().normalize();?

4. Extracting Text Content from Specific Element Types : You can use various methods provided by the Document class or Element class to select specific types of elements and then extract their text content. For example:
// Extract text content from all elements with class "content"
Elements contentElements = document.getElementsByClass("content");
for (Element element : contentElements) {
    String textContent = element.text();
    System.out.println(textContent);
}?

Replace "content" with the desired class name or use other methods like getElementsByTag() or getElementById() to select elements based on different criteria.

By using the text() method, you can easily extract text content from HTML elements using Jsoup in your Java code. This feature is particularly useful for tasks like web scraping, data extraction, and content analysis.
Advertisement