Select Page
Automation Testing

Read Data from XML by Using Different Parsers in Java

There are various parsers available in Java and in this blog we will be exploring which parser will suit you best to read data from XML.

Ott

Reducing the complexity of any process is always the key to better performance, similarly parsing the XML data to obtain a readable format of that XML file that we humans can understand is also a very important process. A simple equivalent to this parsing process would be the process of language translation. Let’s take the example of two national leaders discussing an important meeting. They could either choose to use a common language like English or talk in the languages they are comfortable with and use translators to solve the purpose. Likewise, the XML will be in a format that is easily understood by a computer, but once the information has been parsed, we will be able to read data from XML and understand it with ease.

As one of the leading QA companies in the market, we use different parsers based on our needs and so let’s explore which parser would be the perfect match for your need by understanding how they work. But before we explore how we can read data from XML, let us get introduced to XML first as there might be a few readers who may not know much about XML.

An Introduction to the XML:

XML stands for Extensible mark-up Language, and it’s primarily used to describe and organize information in ways that are easily understandable by both humans and computers. It is a subset of the Standard Generalized Mark-up Language (SGML) that is used to create structured documents. In XML, all blocks are considered as an “Element”. The tags are not pre-defined, and they are called “Self-descriptive” tags as it enables us to create our own customized tags. It also supports node-to-node interaction to fill the readability gap between Humans and Machines.

XML is designed to store and transfer data between different operating systems without us having to face any data loss. XML is not dependant on any platform or language. One can say that XML is similar to HTML as it neither acts as the frontend nor as the backend. For example, we would have used HTML to create the backend code, and that code would be passed to the frontend where it is rendered as a webpage.

Prerequisite:

There are a few basic prerequisites that should be ready in order to read data from XML, and we have listed them below,

1. Install any IDE(Eclipse/Intellij )

2. Make sure if Java is installed

3. Create a Java project in IDE

4. Create an XML file by using .xml extension

XML file creation:

So the first three steps are pretty straightforward, and you may not need any help to get it done. So let’s directly jump to the fourth and final prerequisite step, where we have to create an XML file manually in our Java project.

Navigate to the File tab in your IDE

– Create a new file

– Save it as “filename.xml”

The XML file will display under your Java project. In the same way, we can create the XML file in our local machine by using the .xml file extension. Later, we can use this XML file path in our program for parsing the XML. Let’s see the technologies for parsing the XML.

XML Parse:

XML parsing is nothing but the process of converting the XML data into a human-readable format. The XML parsing can be done by making use of different XML Parsers. But what do these parsers do? Well, parsers make use of the XSL Transformation (XSLT) processor to transform the XML data to a readable format and paves the way for using XML in our programs. The most commonly used parsers are DOM, SAX, StAX, Xpath, and JDOM. So let’s take a look at each parses one-by-one..

Using DOM Parser to Read data from XML:

DOM stands for Document Object Model. DOM is a parser that is both easy to learn and use. It acts as an interface to access and modify the node in XML. DOM works by building the entire XML file into memory and moving it node by node in a sequential order to parse the XML. DOM can be used to identify both the content and structure of the document. But the setback that comes with DOM is that it is slow and consumes a large amount of memory because of the way it works. So DOM will be an optimal choice if you are looking to parse a smaller file and not a very large XML file as everything in DOM is a node in the XML file. Let’s see how to parse the below XML by using the DOM parser.

Here is the XML File that we need to parse:

<?xml version = "1.0"?>
<Mail>
        <email Subject="Codoid Client Meeting Remainder">
    <from>Priya</from>
    <empid>COD11</empid>
    <Designation>Software Tester</Designation>
    <to>Karthick</to>
    <body>We have meeting at tomorrow 8 AM. Please be available
    </body>
        </email>
    <email Subject="Reg:Codoid Client Meeting Remainder ">
        <from>Kartick</from>
        <empid>COD123</empid>
        <Designation>Juniour Software Tester</Designation>
        <to>Priya</to>
        <body>Thanks for reminding me about the meeting. Will join on time</body>
    </email>
</Mail>
DOM Parser:
package com.company;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import java.io.File;
import java.io.IOException;
public class DOMParser {
    public static void main(String[] args) throws ParserConfigurationException, IOException, SAXException {
        try {
            File file = new File("E:\\Examp\\src\\com\\company\\xmldata.xml");
            DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
            DocumentBuilder builder = documentBuilderFactory.newDocumentBuilder();
            Document doc = builder.parse(file);
            doc.getDocumentElement().normalize();
            System.out.println("Root element::  " + doc.getDocumentElement().getNodeName());
            NodeList nList = doc.getElementsByTagName("email");
            for (int temp = 0; temp < nList.getLength(); temp++) {
                Node nNode = nList.item(temp);
                System.out.println("\nCurrent Element :" + nNode.getNodeName());
                if (nNode.getNodeType() == Node.ELEMENT_NODE) {
                    Element eElement = (Element) nNode;
                    System.out.println("Email Subject : "
                            + eElement.getAttribute("Subject"));
                    System.out.println("From Name : "
                            + eElement
                            .getElementsByTagName("from")
                            .item(0)
                            .getTextContent());
                    System.out.println("Designation : "
                            + eElement
                            .getElementsByTagName("Designation")
                            .item(0)
                            .getTextContent());
                    System.out.println("Employee Id : "
                            + eElement
                            .getElementsByTagName("empid")
                            .item(0)
                            .getTextContent());
                    System.out.println("To Name : "
                            + eElement
                            .getElementsByTagName("to")
                            .item(0)
                            .getTextContent());
                    System.out.println("Email Body : "
                            + eElement
                            .getElementsByTagName("body")
                            .item(0)
                            .getTextContent());
                }
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

We have created a DocumentBuilderFactory API to produce the object trees from XML, after which we’ve also created a document interface to access the XML document data. As stated earlier, the node is the base datatype for DOM here. From the code, we can see that the getDocumentElement() method will return the root of the element, and the getElementsByTagName() method will return the value of that particular tag.

Using the SAX Parser to Read data from XML:

The SAX parser is a simple event-based API that parses the XML document line-by-line using the Handler class. Everything in XML is considered to be “Tokens” in SAX. Unlike the DOM parser that we saw earlier, SAX does not load the entire XML file into memory. It also doesn’t create any object representation of the XML document. Instead, it triggers events when it encounters the opening tag, closing tag, and character data in an XML file. It reads the XML from top to bottom and identifies the tokens and call-back methods in the handler that are invoked. Due to the top to bottom approach, tokens are parsed in the same order as they appear in the document. Due to the change in the way SAX works, it is faster and uses less memory in comparison to the DOM parser.

SAX Parser:
try{
            File file = new File("E:\\Examp\\src\\com\\company\\xmldata.xml");
            SAXParserFactory saxParserFactory= SAXParserFactory.newInstance();
            SAXParser saxParser= saxParserFactory.newSAXParser();
            SaxHandler sax= new SaxHandler();
            saxParser.parse(file,sax);

        }
        catch (Exception e){
            e.printStackTrace();
        }
    }
}

In the above code, we have created an XML file and given its path in the code. The SAXParserFactory used in the code creates the new instance for that file. After that, we can create the object for the Handler class using which we parse the XML data. So we have called the handler class method by using the object. Now, let’s see how the Handler class and its method are created.

class SaxHandler extends DefaultHandler{
    boolean from=false;
    boolean to=false;
    boolean Designation= false;
    boolean empid= false;
    boolean body=false;
    StringBuilder data=null;
@Override
    public void startElement(String uri, String localName,
                             String qName, Attributes attributes){

    if(qName.equalsIgnoreCase("email")){
        String Subject= attributes.getValue("Subject");
        System.out.println("Subject::  "+Subject);
    }
    else if(qName.equalsIgnoreCase("from")){
        from=true;
    }
    else if(qName.equalsIgnoreCase("Designation")){
        Designation=true;
    }
    else if(qName.equalsIgnoreCase("empid")){
        empid=true;
    }
    else if(qName.equalsIgnoreCase("to")){
        to=true;
    }
    else if(qName.equalsIgnoreCase("body")) {
        body = true;
    }
    data=new StringBuilder();
}
@Override
      public void endElement(String uri, String localName, String qName){
      if(qName.equalsIgnoreCase("email")){
          System.out.println("End Element::  "+qName);
      }
}
    @Override
   public void characters(char ch[], int start, int length){
//    data.append(new String(ch,start,length));
        if(from){
            System.out.println("FromName::  "+new String(ch,start,length));
            from=false;
        }
        else if(Designation){
            System.out.println("Designation::  "+new String(ch,start,length));
            Designation=false;
        }
        else if(empid){
            System.out.println("empid::  "+new String(ch,start,length));
            empid=false;
        }
        else if(to){
            System.out.println("to::  "+new String(ch,start,length));
            to=false;
        }
        else if(body){
            System.out.println("body::  "+new String(ch,start,length));
            body=false;
        }
}
}

Our ultimate goal is to read data from XML using the SAX parser. So in the above example, we have created our own SAX Parser class and also extended the DefaultHandler class which has various parsing methods. The 3 most prevalent methods of the DefaultHandler class are:

1. startElement() – It receives the notification of the start of an element. It has 3 parameters which we have explained by providing the data that has to be used.

startElement(String uri, String localName,String qName, Attributes attributes)

uri – The Namespace URI, or the empty string if the element has no Namespace URI.

localName – The local name (without prefix) or the empty string if Namespace processing is not being performed.

qName – The qualified name (with prefix) or the empty string if qualified names are not available.

attributes – The attributes attached to the element. If there are no attributes, it shall be an empty attributes object.

The startElement() is used to identify the first element of the XML as it creates an object every time a start element is found in the XML file.

2. endElement() – So we have already seen about startElement(), and just as the name suggests, endElement() receives the notification of the end of an element.

endElement (String uri, String localName, String qName) 

uri – The Namespace URI, or the empty string if the element has no Namespace URI

localName – The local name (without prefix) or the empty string if Namespace processing is not being performed.

qName – The qualified name (with prefix) or the empty string if qualified names are not available.

The endElement() is used to check the end element of the XML file.

3.characters() – Receives the notification of character data inside an element.

characters (char ch[], int start, int length) 

ch – The characters.

start – The start position in the character array.

length – The number of characters that have to be used from the character array.

characters() is used to identify the character data inside an element. It divides the data into multiple character chunks. Whenever a character is found in an XML document, the char() will be executed. That’s why we append() the string to keep this data.

Using the JDOM Parser to Read data from XML:

So the JDOM parser is a combination of the DOM and SAX parsers that we have already seen. It’s an open-source Java-based library API. The JDOM parser can be as fast as the SAX, and it also doesn’t require much memory to parse the XML file. In JDOM, we even can switch the two parsers easily like DOM to SAX, or vice versa. So the main advantage is that it returns the tree structure of all elements in XML without impacting the memory of the application.

import org.jdom2.Attribute;
import org.jdom2.Document;
import org.jdom2.Element;
import org.jdom2.JDOMException;
import org.jdom2.input.SAXBuilder;

import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

public class JDOMParser {
    public static void main(String[] args) throws JDOMException, IOException {
        try{
            File file = new File("E:\\Examp\\src\\com\\company\\xmldata.xml");
            SAXBuilder saxBuilder = new SAXBuilder();
            Document doc= saxBuilder.build(file);
            System.out.println("Root element :" + doc.getRootElement().getName());
            Element ele= doc.getRootElement();
            List<Element> elementList = ele.getChildren("email");
            for(Element emailelement: elementList){
                System.out.println("Current element::  "+emailelement.getName());
                Attribute attribute= emailelement.getAttribute("Subject");
                System.out.println("Subject::  "+attribute.getValue());
                System.out.println("From::  "+emailelement.getChild("from").getText());
                System.out.println("Designation::  "+emailelement.getChild("Designation").getText());
                System.out.println("Empid::  "+emailelement.getChild("empid").getText());
                System.out.println("To::  "+emailelement.getChild("to").getText());
                System.out.println("Body::  "+emailelement.getChild("body").getText());
            }
        }
        catch (Exception e){
            e.printStackTrace();
        }
    }
}

We have used the SAXBuilder class to transform the XML to a JDOM document. The getRootElement() is used to find the starting element of the XML and store all the elements from the XML to a list based on the starting element and iterate that element list. At the very end, we have used the getText() method to get the value of each attribute.

Using the StAX Parser to Read data from XML:

The StAX Parser is similar to the SAX Parser with just one difference. That major difference is that it employs 2 APIs (Cursor based API and Iterator-based API) to parse the XML. The StAX parser is also known as the PULL API, and it gets the name from the fact that we can use it to access the information from the XML whenever needed. The other standout aspect of the StAX parser is that it can read and also write the XML. Every element in the XML is considered as “Events”, and below is the code that we require for parsing the XML file using the StAX Parser.

XMLInputFactory factory = XMLInputFactory.newInstance();
XMLEventReader eventReader =
        factory.createXMLEventReader(new FileReader("E:\\Examp\\src\\com\\company\\xmldata.xml "));
while(eventReader.hasNext()) {
        XMLEvent event = eventReader.nextEvent();
        switch(event.getEventType()) {
        case XMLStreamConstants.START_ELEMENT:
        StartElement startElement = event.asStartElement();
        String qName = startElement.getName().getLocalPart();
        if (qName.equalsIgnoreCase("email")) {
        System.out.println("Start Element : email");
        Iterator<Attribute> attributes = startElement.getAttributes();
    String rollNo = attributes.next().getValue();
    System.out.println("Subject " + Subject);
    } else if (qName.equalsIgnoreCase("from")) {
    EmailFrom = true;
    } else if (qName.equalsIgnoreCase("empid")) {
    Empid = true;
    } else if (qName.equalsIgnoreCase("Designation")) {
    Desination = true;
    }
    else if (qName.equalsIgnoreCase("to")) {
    EmailTo = true;
    }
    else if (qName.equalsIgnoreCase("body")) {
    EmailBody = true;
    }
    break;
    case XMLStreamConstants.CHARACTERS:
    Characters characters = event.asCharacters();
    if(EmailFrom) {
    System.out.println("From: " + characters.getData());
    EmailFrom = false;
    }
    if(Empid) {
    System.out.println("EmpId: " + characters.getData());
    Empid = false;
    }
    if(Desination) {
    System.out.println("Designation: " + characters.getData());
    Desination = false;
    }
    if(EmailTo) {
    System.out.println("to: " + characters.getData());
    EmailTo = false;
    }
    if(EmailBody) {
    System.out.println("EmailBody: " + characters.getData());
    EmailBody = false;
    }
    break;
    case XMLStreamConstants.END_ELEMENT:
    EndElement endElement = event.asEndElement();
    if(endElement.getName().getLocalPart().equalsIgnoreCase("email")) {
    System.out.println("End Element : email");
    System.out.println();
    }
    break;
    }
    }
    } catch (Exception e) {
    e.printStackTrace();
        }
}}

In StAX, we have used the XMLEventReader interface that provides the peek at the next event and also returns the configuration information.

The StartElement interface give access to the start elements in XML and the asStartElement() method returns the startElement event. It is important to note that the exception will be shown if the start element event doesn’t occur.

All character events are reported using the Characters interface. If you are wondering what would get reported as character events? The answer is that all the text and whitespaces events are reported as characters events.

The asCharacters() method returns the Characters from XML, and we will be able to get the data from XML as characters using the getData() method. Though it iterates each and every data from the XML and gives it in the form of a tree structure, it doesn’t return the start and end element events.

The EndElement class is used to point and return the end of the elements in an XML doc.

Using the Xpath Parser to Read data from XML:

The Xpath parser is a query language that is used to find the node from an XML file and parse the XML based on the query string. Now let’s take a look at an example code for better understanding.

File inputFile = new File("E:\\Examp\\src\\com\\company\\xmldata.xml");

            DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
//            DocumentBuilder dBuilder;
            DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
            Document doc = dBuilder.parse(inputFile);
            doc.getDocumentElement().normalize();
            XPath xPath =  XPathFactory.newInstance().newXPath();
            String expression = "/Mail/Email";
            NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(doc, XPathConstants.NODESET);
for (int i = 0; i < nodeList.getLength(); i++) {
    Node nNode = nodeList.item(i);
    System.out.println("\nCurrent Element :" + nNode.getNodeName());
    if (nNode.getNodeType() == Node.ELEMENT_NODE) {
        Element eElement = (Element) nNode;
        System.out.println("From : " + eElement.getElementsByTagName("from").item(0).getTextContent());
        System.out.println("EmpId : " + eElement.getElementsByTagName("empid").item(0).getTextContent());
        System.out.println("Designation : " + eElement.getElementsByTagName("Designation").item(0).getTextContent());
        System.out.println("TO : " + eElement.getElementsByTagName("to").item(0).getTextContent());
        System.out.println("Body : " + eElement.getElementsByTagName("body").item(0).getTextContent());
    }

In the above code, we used the XPath Factory for creating a new instance for the XPath. Then we have taken the XPath for the XML data and stored it as a String datatype. This String expression is called as “XPath Expression”.

Next, we have compiled the list of the XPath Expression by using the xPath.compile() method and iterated the list of nodes from the compiled expression using the evaluate() method.

We have used the getNodeName() method to get the starting element of the XML.

So once the XML document has been read, we would reuse the document and the XPath object in all the methods.

Conclusion

We hope you have found the parser that fits your requirement and in-process also enjoyed reading this article. So to sum things up, we have seen how each parser works to understand the pros and cons of each type. Choosing the apt parser might seem like a very small aspect when compared to the entire scale of the project. But as one of the best software testing service providers, we believe in attaining maximum efficiency in each process, be it small or big.

Comments(0)

Submit a Comment

Your email address will not be published. Required fields are marked *

Talk to our Experts

Amazing clients who
trust us


poloatto
ABB
polaris
ooredo
stryker
mobility