Input Stream Parsing
DomTrip provides robust support for parsing XML from various input sources including InputStreams, files, and network resources with automatic encoding detection and BOM handling.
Overview
Input stream parsing allows you to process XML from:
- File systems - Local and network files
- Network streams - HTTP responses, web services
- Memory streams - ByteArrayInputStream, in-memory data
- Compressed streams - ZIP, GZIP archives
- Any InputStream - Database BLOBs, custom sources
Key Features
- Automatic encoding detection - UTF-8, UTF-16, ISO-8859-1, and more
- BOM handling - Byte Order Mark detection and processing
- Large file support - Memory-efficient streaming for big documents
- Error recovery - Graceful handling of encoding issues
- Resource management - Automatic stream cleanup
Basic Usage
Parsing from File
// Parse XML with automatic encoding detection (using string content for testing)
String xmlContent = createConfigXml();
Document doc = Document.of(xmlContent);
Editor editor = new Editor(doc);
// File encoding is automatically detected and preserved
Parsing from InputStream
// Parse from any InputStream
String xmlContent = createTestXml("data");
try (InputStream inputStream = new ByteArrayInputStream(xmlContent.getBytes(StandardCharsets.UTF_8))) {
Document doc = Document.of(inputStream);
Editor editor = new Editor(doc);
// Process the document
Element root = editor.root();
Assertions.assertNotNull(root);
Assertions.assertNotNull(root);
// ... edit operations
}
Parsing from URL/Network
// Parse XML from network resource (simulated with byte array for testing)
String xmlContent = createTestXml("networkData");
try (InputStream stream = new ByteArrayInputStream(xmlContent.getBytes(StandardCharsets.UTF_8))) {
Document doc = Document.of(stream);
Editor editor = new Editor(doc);
Assertions.assertNotNull(editor);
// Network XML is parsed with encoding detection
}
Encoding Detection
Automatic Detection
DomTrip automatically detects encoding from multiple sources:
// Parse with automatic encoding detection
String xml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><root>content</root>";
InputStream inputStream = new ByteArrayInputStream(xml.getBytes(StandardCharsets.UTF_8));
Document doc = Document.of(inputStream);
// Encoding is automatically detected and set
String detectedEncoding = doc.encoding(); // e.g., "UTF-8"
Supported Encodings
// UTF-8 (most common)
String utf8Content = createTestXml("utf8");
try (InputStream utf8Stream = new ByteArrayInputStream(utf8Content.getBytes(StandardCharsets.UTF_8))) {
Document utf8Doc = Document.of(utf8Stream);
Assertions.assertNotNull(utf8Doc);
}
// UTF-16 with BOM
try (InputStream utf16Stream = new ByteArrayInputStream(utf8Content.getBytes(StandardCharsets.UTF_16))) {
Document utf16Doc = Document.of(utf16Stream);
Assertions.assertNotNull(utf16Doc);
}
// ISO-8859-1 (Latin-1)
try (InputStream isoStream = new ByteArrayInputStream(utf8Content.getBytes(StandardCharsets.ISO_8859_1))) {
Document isoDoc = Document.of(isoStream);
Assertions.assertNotNull(isoDoc);
}
// All Java-supported encodings work
BOM Handling
// UTF-8 with BOM
String xmlString = "<root>content</root>";
byte[] bomBytes = {(byte) 0xEF, (byte) 0xBB, (byte) 0xBF};
byte[] xmlBytes = xmlString.getBytes(StandardCharsets.UTF_8);
byte[] xmlWithBom = new byte[bomBytes.length + xmlBytes.length];
System.arraycopy(bomBytes, 0, xmlWithBom, 0, bomBytes.length);
System.arraycopy(xmlBytes, 0, xmlWithBom, bomBytes.length, xmlBytes.length);
InputStream inputStream = new ByteArrayInputStream(xmlWithBom);
Document doc = Document.of(inputStream);
// BOM is detected and UTF-8 encoding is used
Advanced Features
Large File Processing
// Memory-efficient processing of large XML files
try {
// For testing, use regular content
String xmlContent = createTestXml("dataset");
Document doc = Document.of(xmlContent);
Editor editor = new Editor(doc);
// Process in chunks or specific elements
Element root = editor.root();
Assertions.assertNotNull(root);
// Modify only what's needed
editor.setAttribute(root, "processed", "true");
// Save back to file (in real scenario)
String result = editor.toXml();
Assertions.assertNotNull(result);
} catch (Exception e) {
System.err.println("Failed to process large file: " + e.getMessage());
}
Custom Stream Sources
// Parse from compressed streams (simulated with byte array for testing)
String xmlContent = createTestXml("compressed");
try (InputStream stream = new ByteArrayInputStream(xmlContent.getBytes(StandardCharsets.UTF_8))) {
Document doc = Document.of(stream);
Assertions.assertNotNull(doc);
// Simulated compressed XML processing
}
// Parse from database BLOB is also supported via InputStream
Error Handling
try {
String xmlContent = createTestXml("root");
try (InputStream inputStream = new ByteArrayInputStream(xmlContent.getBytes(StandardCharsets.UTF_8))) {
Document doc = Document.of(inputStream);
Editor editor = new Editor(doc);
Assertions.assertNotNull(editor);
}
} catch (Exception e) {
if (e.getMessage().contains("encoding")) {
// Handle encoding-related errors
System.err.println("Encoding issue: " + e.getMessage());
} else if (e.getMessage().contains("malformed")) {
// Handle XML syntax errors
System.err.println("XML syntax error: " + e.getMessage());
} else {
// Handle other parsing errors
System.err.println("Parsing failed: " + e.getMessage());
}
}
Performance Optimization
Buffered Streams
// Use BufferedInputStream for better performance
String xmlContent = createTestXml("large");
try (InputStream buffered =
new BufferedInputStream(new ByteArrayInputStream(xmlContent.getBytes(StandardCharsets.UTF_8)), 8192)) {
Document doc = Document.of(buffered);
Assertions.assertNotNull(doc);
// Buffering improves read performance
}
Memory Management
// For very large files, consider processing in sections
// Check file size first (simulated)
long fileSize = 50_000_000; // Simulated file size
if (fileSize > 100_000_000) { // 100MB
System.out.println("Large file detected, using optimized processing");
}
// For testing, use regular content
String xmlContent = createTestXml("huge");
Document doc = Document.of(xmlContent);
// DomTrip handles memory efficiently even for large files
Common Use Cases
Configuration Files
// Load and read application configuration
String configXml = createConfigXml();
Document config = Document.of(configXml);
Editor editor = new Editor(config);
// Read configuration values
String dbUrl = editor.root()
.descendant("database")
.flatMap(db -> db.childElement("url"))
.map(Element::textContent)
.orElse("default-url");
Web Service Responses
// Parse XML response from web service (simulated for testing)
String responseXml = "<response><result>success</result></response>";
try (InputStream response = new ByteArrayInputStream(responseXml.getBytes(StandardCharsets.UTF_8))) {
Document doc = Document.of(response);
Editor editor = new Editor(doc);
// Process response data
Element result = editor.root().childElement("result").orElse(null);
Assertions.assertNotNull(result);
// ... extract data
}
Batch Processing
// Process multiple XML files (simulated for testing)
String[] xmlContents = {createTestXml("file1"), createTestXml("file2"), createTestXml("file3")};
for (String xmlContent : xmlContents) {
try {
Document doc = Document.of(xmlContent);
Editor editor = new Editor(doc);
// Process each file
processXmlDocument(editor);
// Save processed result (simulated)
String result = editor.toXml();
Assertions.assertNotNull(result);
} catch (Exception e) {
System.err.println("Failed to process XML: " + e.getMessage());
}
}
Best Practices
✅ Do:
- Always use try-with-resources for proper stream cleanup
- Let DomTrip handle encoding detection automatically
- Use buffered streams for large files
- Handle encoding exceptions gracefully
- Check file existence before parsing
❌ Avoid:
- Manually specifying encoding unless absolutely necessary
- Keeping streams open longer than needed
- Ignoring encoding-related exceptions
- Processing extremely large files without memory considerations
- Assuming all InputStreams support mark/reset
Integration with Editor
Input stream parsing integrates seamlessly with the Editor API:
String xmlWithPIs = """
<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="style.css"?>
<root>
<item>content</item>
</root>
""";
Editor editor = new Editor(Document.of(xmlWithPIs));
// PIs are automatically preserved during editing
editor.addElement(editor.root(), "newElement", "content");
// Original PIs remain in their positions with exact formatting
String result = editor.toXml();
Input stream parsing in DomTrip provides a robust, efficient way to work with XML from any source while maintaining the library's core principles of lossless processing and formatting preservation.