Lossless Parsing
DomTrip's core strength is its ability to parse XML documents while preserving every single detail of the original formatting. This enables true round-trip editing where unmodified sections remain completely unchanged.
What Gets Preserved
1. Comments (Including Multi-line)
// ❌ Snippet 'comments-preservation' not found
// Available snippets: attribute-manipulation, element-creation, modification-tracking, whitespace-preserving-text, thread-safety-pattern, namespace-inheritance, xml-declaration-handling, intelligent-editing, modifying-processing-instructions, processing-instruction-creation, configuration-control, fluent-builder-api, custom-serialization, best-practices-editing, migration-namespace-handling, stream-with-optionals, parsing-exceptions, descendant-streams, indentation-options, document-validation, stream-transformations, namespaced-elements, environment-specific-configurations, xml-declaration, factory-method-best-practices, adding-jdk-toolchains, advanced-attribute-formatting, jackson-xml-object-mapping, stream-aggregation, custom-stream-sources, multi-module-project, encoding-override, complex-stream-queries, whitespace-tracking, basic-attributes, filtering-streams, inner-element-whitespace, adding-servers, inputstream-error-handling, memory-management, finding-elements-basic, configuration-best-practices, doctype-support, encoding-issues, loading-xml-string, input-validation, element-finding, comment-out-single-element, namespace-conflicts, validation-exceptions, version-control, xml-declaration-parsing, preset-configurations, safe-element-handling, attribute-quote-preservation, working-with-existing-documents, adding-elements-simple, modern-java-api, element-addition, namespace-best-practices, advanced-constructor-examples, position-whitespace-preservation, web-service-responses, dom4j-attribute-handling, jdom-document-loading, programmatic-document-creation, removing-elements, commenting-integration, whitespace-configuration, managing-namespace-declarations, minimal-modification, basic-constructors, jackson-xml-simple-parsing, java-dom-document-loading, complex-structure-creation, loading-xml-from-inputstream, parsing-performance, document-traversal, basic-toolchains-creation, finding-processing-instructions, commenting-error-handling, round-trip-verification, xml-stylesheet-declaration, qname-usage, basic-pom-creation, complete-configuration, comment-out-multiple-elements, editing-existing-pom, entity-preservation, loading-xml-config, element-reordering, parsing-documents-with-pis, root-element-namespaces, basic-operations, document-creation, adding-dependencies, configuration-options, node-hierarchy, performance-monitoring, gradual-migration-phase3, error-context, modification-performance, gradual-migration-phase1, gradual-migration-phase2, element-operations, text-content, adding-mirrors, adding-namespace-declarations, adding-new-elements, fluent-chaining, charset-vs-string, buffered-streams, graceful-parsing, element-tag-whitespace, node-serialization, large-file-handling, special-characters, migration-error-handling, text-comment-creation, round-trip-preservation, basic-stream-navigation, bom-handling, encoding-preservation, malformed-xml, element-reordering-before, encoding-management, document-error-handling, processing-instruction-preservation, processing-instructions-with-data, basic-element-creation, complex-namespace-scenario, line-ending-configuration, attribute-operations, serialization-options, resource-cleanup, basic-serialization, error-handling, supported-encodings, migration-memory-usage, validation-with-fallbacks, quick-example, modifying-content, best-practices, using-builder-patterns, maven-pom-updating-version, namespace-validation, child-navigation, complex-structure-preservation, dom4j-document-loading, namespace-preservation, parsing-from-file, whitespace-preservation, parsing-from-network, document-type-preservation, performance-testing, parallel-streams, element-builders, simple-document-creation, application-specific-instructions, intelligent-inference, performance-optimizations, batch-processing, adding-elements-attributes, adding-various-toolchains, basic-editor-usage, fluent-element-addition, php-processing-instructions, quote-style-configuration, dom4j-element-navigation, basic-extensions-creation, logging-integration, spring-configuration, best-practices-preserve-formatting, reusable-factory-methods, basic-configuration, file-based-document-loading, memory-profiling, large-file-processing, text-content-operations, comment-operations, configuration-files, node-whitespace, document-cloning, specific-exception-handling, lossless-round-trip, safe-navigation, configuration-optimization, advanced-document-creation, attribute-formatting, stream-based-navigation, element-builder, jdom-element-operations, doctype-preservation, dual-content-storage, basic-namespace-handling, configuration-system, encoding-detection-fallback, comment-creation, java-dom-element-navigation, stream-modification, complex-reordering, java-dom-creating-elements, namespace-aware-navigation, domtrip-exception, maven-pom-handling, root-element-management, quick-start-basic, dom4j-adding-elements, attribute-creation, safe-element-access, serializer-with-encoding, validation-mode, adding-profiles, maven-pom-adding-dependencies, simple-element-modification, document-serialization, memory-usage, streaming-large-files, namespace-support, batch-attribute-operations, creating-processing-instructions, element-streams, cdata-preservation, optional-based-navigation, available-configuration-methods, namespace-declarations, comment-pi-handling, working-with-namespaces, configuration-patterns, migration-xpath-queries, large-document-processing, batch-operations, fluent-element-builders, java-dom-attributes, installation-test, real-world-maven-example, node-counting, round-trip-processing, editor-integration, finding-elements-by-namespace, best-practices-optional, jdom-text-content, text-node-creation, round-trip-operations, minimal-change-serialization, comment-management, prefixed-namespaces, insert-element-after, stream-chaining, advanced-element-creation, basic-format-preservation, attribute-management, dom4j-serialization, insert-element-before, basic-settings-creation, adding-plugins, automatic-encoding-detection, batch-element-creation, uncomment-element, element-reordering-after, configuration-access, comment-preservation, adding-top-level-nodes, soap-document-handling, element-whitespace, lossless-philosophy, adding-extensions, parsing-from-inputstream, fluent-api, exception-handling, attribute-handling, namespace-attribute-handling, loading-xml-from-file, element-removal, encoding-consistency, insert-element-at, whitespace-inference, creating-namespaced-elements
2. Whitespace and Indentation
String xmlWithWhitespace =
"""
<project>
<groupId>com.example</groupId>
<artifactId>my-app</artifactId>
</project>
""";
Document doc = Document.of(xmlWithWhitespace);
Editor editor = new Editor(doc);
// Whitespace between elements is preserved exactly
String result = editor.toXml();
// All blank lines and spacing are maintained
Assertions.assertEquals(xmlWithWhitespace, result);
3. Entity Encoding
String xmlWithEntities = """
<message>Hello & goodbye <world></message>
""";
Document doc = Document.of(xmlWithEntities);
Editor editor = new Editor(doc);
Element message = doc.root();
// For your code - entities are decoded
String decoded = message.textContent(); // "Hello & goodbye <world>"
// For serialization - entities are preserved in the XML output
String raw = message.textContent(); // The API handles entity encoding automatically
String result = editor.toXml();
4. Attribute Quote Styles
String xmlWithMixedQuotes =
"""
<dependency scope='test' optional="true" classifier='sources'/>
""";
Document doc = Document.of(xmlWithMixedQuotes);
Editor editor = new Editor(doc);
// Quote styles are preserved exactly
String result = editor.toXml();
Assertions.assertEquals(xmlWithMixedQuotes, result);
5. CDATA Sections
String xmlWithCData =
"""
<script>
<![CDATA[
function example() {
if (x < y && y > z) {
return "complex & special chars";
}
}
]]>
</script>
""";
Document doc = Document.of(xmlWithCData);
Editor editor = new Editor(doc);
// CDATA sections are preserved exactly
String result = editor.toXml();
Assertions.assertTrue(result.contains("<![CDATA["));
Assertions.assertTrue(result.contains("]]>"));
Assertions.assertTrue(result.contains("x < y && y > z"));
6. Processing Instructions
String xml =
"""
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="style.xsl"?>
<document>
<?custom-instruction data="value"?>
<content>text</content>
</document>
""";
Document doc = Document.of(xml);
Editor editor = new Editor(doc);
// Processing instructions with data are preserved exactly
String result = editor.toXml();
Assertions.assertTrue(result.contains("<?xml-stylesheet type=\"text/xsl\" href=\"style.xsl\"?>"));
Assertions.assertTrue(result.contains("<?custom-instruction data=\"value\"?>"));
How It Works
DomTrip achieves lossless parsing through several key techniques:
1. Dual Content Storage
Each text node stores both the decoded content (for programmatic access) and the raw content (for preservation):
// Internal representation
Text textNode = new Text(
"decoded content: < & >", // For your code to use
"raw content: < & >" // For serialization
);
// You work with decoded content
String content = textNode.getTextContent(); // "decoded content: < & >"
// Serialization uses raw content to preserve entities
String xml = textNode.toXml(); // "raw content: < & >"
2. Attribute Metadata
Attributes store comprehensive formatting information:
public class Attribute {
private String value; // The actual value
private QuoteStyle quoteStyle; // SINGLE or DOUBLE
private String whitespace; // Surrounding whitespace
private String rawValue; // Original encoded value
}
3. Whitespace Tracking
Every node tracks its surrounding whitespace:
public abstract class Node {
protected String precedingWhitespace; // Whitespace before the node
// Note: followingWhitespace has been removed in favor of a simplified model
// where whitespace is stored as precedingWhitespace of the next node
}
4. Modification Tracking
Nodes track whether they've been modified to determine serialization strategy:
// Unmodified nodes use original formatting
if (!node.isModified() && !node.getOriginalContent().isEmpty()) {
return node.getOriginalContent();
}
// Modified nodes are rebuilt with preserved style
return buildFromScratch(node);
Round-Trip Verification
You can verify lossless parsing with this simple test:
// Create a temporary file for testing
String complexXml =
"""
<?xml version="1.0" encoding="UTF-8"?>
<!-- Configuration file -->
<config>
<database>
<host>localhost</host>
<port>5432</port>
</database>
</config>
""";
// Load with automatic encoding detection
Document doc = Document.of(complexXml);
Editor editor = new Editor(doc);
String result = editor.toXml();
// Load again to verify round-trip preservation
Document doc2 = Document.of(result);
Editor editor2 = new Editor(doc2);
String result2 = editor2.toXml();
// Should be identical
Assertions.assertEquals(result, result2);
Performance Considerations
Lossless parsing requires additional memory to store formatting metadata:
- Memory overhead: ~20-30% compared to traditional parsers
- Parse time: ~10-15% slower due to metadata collection
- Serialization: Faster for unmodified sections, slower for modified sections
Memory Usage Example
// Traditional parser memory usage
Document traditionalDoc = traditionalParser.parse(xml);
// Memory: ~1x base size
// DomTrip memory usage
Document domtripDoc = domtripParser.parse(xml);
// Memory: ~1.3x base size (includes formatting metadata)
Limitations
While DomTrip preserves almost everything, there are a few edge cases:
- DTD Internal Subsets: Complex DTD declarations may be simplified
- Exotic Encodings: Some rare character encodings may be normalized
- XML Declaration Order: Attribute order in XML declarations may be standardized
Best Practices
1. Use for Editing Scenarios
// ✅ Perfect for editing existing files
String existingConfigXml = createConfigXml();
Document doc = Document.of(existingConfigXml);
Editor editor = new Editor(doc);
Element root = editor.root();
editor.addElement(root, "newSetting", "value");
String result = editor.toXml();
// Result preserves all original formatting
2. Verify Round-Trip in Tests
// ✅ Always test round-trip preservation
@Test
void testConfigurationEditing() {
String original = loadTestXml();
Editor editor = new Editor(original);
// Make changes...
editor.addElement(root, "test", "value");
// Verify only intended changes occurred
String result = editor.toXml();
assertThat(result).contains("<test>value</test>");
assertThat(countLines(result)).isEqualTo(countLines(original) + 1);
}
3. Handle Large Files Carefully
// ✅ For large files, consider streaming or chunking
String xmlContent = createConfigXml();
long fileSize = xmlContent.length();
if (fileSize > 10_000_000) { // 10MB
// Consider alternative approaches for very large files
System.out.println("Large file detected, consider streaming approach");
}
// For normal-sized files, DomTrip works efficiently
Document doc = Document.of(xmlContent);
Editor editor = new Editor(doc);
String result = editor.toXml();
Comparison with Other Libraries
Feature | DomTrip | DOM4J | JDOM | Java DOM |
---|---|---|---|---|
Comment preservation | ✅ Perfect | ✅ Yes | ✅ Yes | ✅ Yes |
Between-element whitespace | ✅ Exact | ⚠️ Partial | ✅ Yes* | ⚠️ Limited |
In-element whitespace | ✅ Exact | ❌ Lost | ⚠️ Configurable** | ⚠️ Limited |
Entity preservation | ✅ Perfect | ❌ Decoded | ❌ Decoded | ❌ Decoded |
Quote style preservation | ✅ Perfect | ❌ Normalized | ❌ Normalized | ❌ Normalized |
Attribute order preservation | ✅ Perfect | ❌ Lost | ❌ Lost | ❌ Lost |
Processing instructions | ✅ Perfect | ✅ Yes | ✅ Yes | ✅ Yes |
CDATA preservation | ✅ Perfect | ✅ Yes | ✅ Yes | ✅ Yes |
Round-trip fidelity | ✅ 100% | ❌ ~70% | ⚠️ ~80%*** | ❌ ~75% |
* JDOM: Use Format.getRawFormat()
to preserve original whitespace between elements
** JDOM: Configure with TextMode.PRESERVE
to maintain text content whitespace
*** JDOM: Higher fidelity possible with careful configuration, but still loses some formatting details
Key Insight: While other libraries can preserve individual aspects of formatting, DomTrip is unique in preserving all formatting details simultaneously without requiring special configuration or losing any information during round-trip operations.