Document API
The Document class represents the root of an XML document, containing the document element and preserving document-level formatting like XML declarations and DTDs.
Overview
The Document class serves as the top-level container for an XML document, maintaining:
- Document element (root element)
- XML declaration with version, encoding, and standalone flag
- DOCTYPE declarations
- Processing instructions at document level
- Comments before and after the root element
- Whitespace preservation between top-level nodes
Creating Documents
Factory Methods
// Create empty document
Document doc = Document.of();
// Parse from XML string
String xmlString = "<root><child>value</child></root>";
Document doc2 = Document.of(xmlString);
// Create with root element
Document doc3 = Document.withRootElement("project");
// Create with XML declaration
Document doc4 = Document.withXmlDeclaration("1.0", "UTF-8");
Fluent API
// Build document using fluent API
Document doc = Document.withXmlDeclaration("1.1", "UTF-8");
doc.root(Element.of("project"));
Document Properties
XML Declaration
Document doc = Document.of();
// Set XML declaration components
doc.version("1.0");
doc.encoding("UTF-8");
// Generate XML declaration
doc.withXmlDeclaration();
// Access declaration
String xmlDecl = doc.xmlDeclaration();
Encoding Management
Document doc = Document.withXmlDeclaration("1.0", "UTF-16");
// Set document encoding
doc.encoding("UTF-16");
String encoding = doc.encoding(); // "UTF-16"
// Encoding affects serialization
String xml = doc.toXml();
// Output will use UTF-16 encoding
Version Control
Document doc = Document.withXmlDeclaration("1.1", "UTF-8");
// Set XML version
doc.version("1.1");
String version = doc.version(); // "1.1"
// Version 1.1 allows additional characters in names
Root Element Management
Setting Root Element
Document doc = Document.of();
// Create and set root element
Element root = Element.of("project");
doc.root(root);
// Access root element
Element rootElement = doc.root();
Root Element with Namespaces
Document doc = Document.of();
// Create root with namespace
QName rootName = QName.of("http://maven.apache.org/POM/4.0.0", "project");
Element root = Element.of(rootName);
doc.root(root);
// Namespace declarations are preserved
Document Structure
Adding Top-Level Nodes
Document doc = Document.withRootElement("html");
// Add processing instruction before root
ProcessingInstruction stylesheet =
ProcessingInstruction.of("xml-stylesheet", "type=\"text/xsl\" href=\"style.xsl\"");
doc.addNode(stylesheet);
// Add comment after root
Comment footer = Comment.of("Generated by DomTrip");
doc.addNode(footer);
Document Traversal
String xml =
"""
<?xml version="1.0"?>
<!-- Comment 1 -->
<root>
<child>value</child>
</root>
<!-- Comment 2 -->
""";
Document doc = Document.of(xml);
// Access all document nodes
Stream<Node> allNodes = doc.nodes();
// Find specific node types
List<Comment> comments = doc.nodes()
.filter(node -> node instanceof Comment)
.map(node -> (Comment) node)
.collect(Collectors.toList());
// Find processing instructions
List<ProcessingInstruction> pis = doc.nodes()
.filter(node -> node instanceof ProcessingInstruction)
.map(node -> (ProcessingInstruction) node)
.collect(Collectors.toList());
DOCTYPE Support
Setting DOCTYPE
Document doc = Document.of();
// Set DOCTYPE declaration
doc.doctype("<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" "
+ "\"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">");
// Access DOCTYPE
String doctype = doc.doctype();
DOCTYPE Preservation
String xmlWithDoctype =
"""
<?xml version="1.0"?>
<!DOCTYPE root SYSTEM "example.dtd">
<root>
<element>content</element>
</root>
""";
Document doc = Document.of(xmlWithDoctype);
// DOCTYPE is preserved exactly as written
String preserved = doc.doctype();
Document Statistics
Node Counting
String complexXml =
"""
<?xml version="1.0"?>
<!-- Comment -->
<root>
<child1>value1</child1>
<child2>value2</child2>
</root>
""";
Document doc = Document.of(complexXml);
// Count total nodes
int totalNodes = (int) doc.nodes().count();
// Count specific node types
long elementCount = doc.nodes().filter(node -> node instanceof Element).count();
long commentCount = doc.nodes().filter(node -> node instanceof Comment).count();
Serialization
Basic Serialization
Document doc = Document.withRootElement("root");
// Serialize with default settings
String xml = doc.toXml();
Custom Serialization
Document doc = Document.withRootElement("root");
// Serialize with custom configuration using Serializer
DomTripConfig config =
DomTripConfig.prettyPrint().withIndentString(" ").withLineEnding("\n");
Serializer serializer = new Serializer(config);
String xml = serializer.serialize(doc);
Advanced Features
Document Cloning
String xmlString = "<root><child>value</child></root>";
Document original = Document.of(xmlString);
// Create a copy (deep clone)
Document copy = Document.of(original.toXml());
// Modifications to copy don't affect original
Document Validation
Document doc = Document.withRootElement("root");
// Check if document is valid
boolean hasRoot = doc.root() != null;
boolean hasValidStructure = doc.root() != null; // Document with root element is valid
// Validate XML declaration
boolean hasXmlDecl = !doc.xmlDeclaration().isEmpty();
Memory Management
// For large documents, consider memory usage
Document largeDoc = Document.of(largeXmlFile);
// Process in sections if needed
Element root = largeDoc.root();
// ... process specific elements
// Clear references when done
largeDoc = null; // Allow garbage collection
Integration with Editor
The Document class works seamlessly with the Editor API:
// Create document and edit
Document doc = Document.withRootElement("config");
Editor editor = new Editor(doc);
// Editor operations modify the document
editor.addElement(editor.root(), "setting", "value");
// Document reflects changes
Element setting = doc.root().child("setting").orElse(null);
Best Practices
✅ Do:
- Use factory methods for document creation
- Set encoding explicitly for non-UTF-8 documents
- Preserve XML declarations when round-tripping
- Use fluent API for complex document setup
- Handle null checks for optional elements
❌ Avoid:
- Creating documents without root elements
- Modifying document structure directly (use Editor instead)
- Ignoring encoding when parsing from streams
- Setting invalid XML version numbers
- Creating malformed DOCTYPE declarations
Error Handling
try {
String xmlString = "<root><child>value</child></root>";
Document doc = Document.of(xmlString);
// Validate document structure
if (doc.root() == null) {
throw new IllegalStateException("Document has no root element");
}
} catch (Exception e) {
// Handle parsing errors
System.err.println("Failed to parse document: " + e.getMessage());
}
Performance Considerations
- Lazy loading - Document content is parsed on demand
- Memory efficient - Only modified nodes are tracked
- Streaming friendly - Large documents can be processed efficiently
- Minimal overhead - Document metadata has negligible memory impact
The Document API provides the foundation for all XML processing in DomTrip, offering both simplicity for basic use cases and power for complex document manipulation scenarios.