Document API

The Document class represents the root of an XML document, containing the document element and preserving document-level formatting like XML declarations and DTDs.

Overview

The Document class serves as the top-level container for an XML document, maintaining:

  • Document element (root element)
  • XML declaration with version, encoding, and standalone flag
  • DOCTYPE declarations
  • Processing instructions at document level
  • Comments before and after the root element
  • Whitespace preservation between top-level nodes

Creating Documents

Factory Methods

// Create empty document
Document doc = Document.of();

// Parse from XML string
String xmlString = "<root><child>value</child></root>";
Document doc2 = Document.of(xmlString);

// Create with root element
Document doc3 = Document.withRootElement("project");

// Create with XML declaration
Document doc4 = Document.withXmlDeclaration("1.0", "UTF-8");

Fluent API

// Build document using fluent API
Document doc = Document.withXmlDeclaration("1.1", "UTF-8");
doc.root(Element.of("project"));

Document Properties

XML Declaration

Document doc = Document.of();

// Set XML declaration components
doc.version("1.0");
doc.encoding("UTF-8");

// Generate XML declaration
doc.withXmlDeclaration();

// Access declaration
String xmlDecl = doc.xmlDeclaration();

Encoding Management

Document doc = Document.withXmlDeclaration("1.0", "UTF-16");

// Set document encoding
doc.encoding("UTF-16");
String encoding = doc.encoding(); // "UTF-16"

// Encoding affects serialization
String xml = doc.toXml();
// Output will use UTF-16 encoding

Version Control

Document doc = Document.withXmlDeclaration("1.1", "UTF-8");

// Set XML version
doc.version("1.1");
String version = doc.version(); // "1.1"

// Version 1.1 allows additional characters in names

Root Element Management

Setting Root Element

Document doc = Document.of();

// Create and set root element
Element root = Element.of("project");
doc.root(root);

// Access root element
Element rootElement = doc.root();

Root Element with Namespaces

Document doc = Document.of();

// Create root with namespace
QName rootName = QName.of("http://maven.apache.org/POM/4.0.0", "project");
Element root = Element.of(rootName);
doc.root(root);

// Namespace declarations are preserved

Document Structure

Adding Top-Level Nodes

Document doc = Document.withRootElement("html");

// Add processing instruction before root
ProcessingInstruction stylesheet =
        ProcessingInstruction.of("xml-stylesheet", "type=\"text/xsl\" href=\"style.xsl\"");
doc.addNode(stylesheet);

// Add comment after root
Comment footer = Comment.of("Generated by DomTrip");
doc.addNode(footer);

Document Traversal

String xml =
        """
    <?xml version="1.0"?>
    <!-- Comment 1 -->
    <root>
        <child>value</child>
    </root>
    <!-- Comment 2 -->
    """;
Document doc = Document.of(xml);

// Access all document nodes
Stream<Node> allNodes = doc.nodes();

// Find specific node types
List<Comment> comments = doc.nodes()
        .filter(node -> node instanceof Comment)
        .map(node -> (Comment) node)
        .collect(Collectors.toList());

// Find processing instructions
List<ProcessingInstruction> pis = doc.nodes()
        .filter(node -> node instanceof ProcessingInstruction)
        .map(node -> (ProcessingInstruction) node)
        .collect(Collectors.toList());

DOCTYPE Support

Setting DOCTYPE

Document doc = Document.of();

// Set DOCTYPE declaration
doc.doctype("<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" "
        + "\"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">");

// Access DOCTYPE
String doctype = doc.doctype();

DOCTYPE Preservation

String xmlWithDoctype =
        """
    <?xml version="1.0"?>
    <!DOCTYPE root SYSTEM "example.dtd">
    <root>
        <element>content</element>
    </root>
    """;

Document doc = Document.of(xmlWithDoctype);
// DOCTYPE is preserved exactly as written
String preserved = doc.doctype();

Document Statistics

Node Counting

String complexXml =
        """
    <?xml version="1.0"?>
    <!-- Comment -->
    <root>
        <child1>value1</child1>
        <child2>value2</child2>
    </root>
    """;
Document doc = Document.of(complexXml);

// Count total nodes
int totalNodes = (int) doc.nodes().count();

// Count specific node types
long elementCount = doc.nodes().filter(node -> node instanceof Element).count();

long commentCount = doc.nodes().filter(node -> node instanceof Comment).count();

Serialization

Basic Serialization

Document doc = Document.withRootElement("root");

// Serialize with default settings
String xml = doc.toXml();

Custom Serialization

Document doc = Document.withRootElement("root");

// Serialize with custom configuration using Serializer
DomTripConfig config =
        DomTripConfig.prettyPrint().withIndentString("  ").withLineEnding("\n");

Serializer serializer = new Serializer(config);
String xml = serializer.serialize(doc);

Advanced Features

Document Cloning

String xmlString = "<root><child>value</child></root>";
Document original = Document.of(xmlString);

// Create a copy (deep clone)
Document copy = Document.of(original.toXml());

// Modifications to copy don't affect original

Document Validation

Document doc = Document.withRootElement("root");

// Check if document is valid
boolean hasRoot = doc.root() != null;
boolean hasValidStructure = doc.root() != null; // Document with root element is valid

// Validate XML declaration
boolean hasXmlDecl = !doc.xmlDeclaration().isEmpty();

Memory Management

// For large documents, consider memory usage
Document largeDoc = Document.of(largeXmlFile);

// Process in sections if needed
Element root = largeDoc.root();
// ... process specific elements

// Clear references when done
largeDoc = null; // Allow garbage collection

Integration with Editor

The Document class works seamlessly with the Editor API:

// Create document and edit
Document doc = Document.withRootElement("config");
Editor editor = new Editor(doc);

// Editor operations modify the document
editor.addElement(editor.root(), "setting", "value");

// Document reflects changes
Element setting = doc.root().child("setting").orElse(null);

Best Practices

Do:

  • Use factory methods for document creation
  • Set encoding explicitly for non-UTF-8 documents
  • Preserve XML declarations when round-tripping
  • Use fluent API for complex document setup
  • Handle null checks for optional elements

Avoid:

  • Creating documents without root elements
  • Modifying document structure directly (use Editor instead)
  • Ignoring encoding when parsing from streams
  • Setting invalid XML version numbers
  • Creating malformed DOCTYPE declarations

Error Handling

try {
    String xmlString = "<root><child>value</child></root>";
    Document doc = Document.of(xmlString);

    // Validate document structure
    if (doc.root() == null) {
        throw new IllegalStateException("Document has no root element");
    }

} catch (Exception e) {
    // Handle parsing errors
    System.err.println("Failed to parse document: " + e.getMessage());
}

Performance Considerations

  • Lazy loading - Document content is parsed on demand
  • Memory efficient - Only modified nodes are tracked
  • Streaming friendly - Large documents can be processed efficiently
  • Minimal overhead - Document metadata has negligible memory impact

The Document API provides the foundation for all XML processing in DomTrip, offering both simplicity for basic use cases and power for complex document manipulation scenarios.