Input Stream Parsing

DomTrip provides robust support for parsing XML from various input sources including InputStreams, files, and network resources with automatic encoding detection and BOM handling.

Overview

Input stream parsing allows you to process XML from:

  • File systems - Local and network files
  • Network streams - HTTP responses, web services
  • Memory streams - ByteArrayInputStream, in-memory data
  • Compressed streams - ZIP, GZIP archives
  • Any InputStream - Database BLOBs, custom sources

Key Features

  • Automatic encoding detection - UTF-8, UTF-16, ISO-8859-1, and more
  • BOM handling - Byte Order Mark detection and processing
  • Large file support - Memory-efficient streaming for big documents
  • Error recovery - Graceful handling of encoding issues
  • Resource management - Automatic stream cleanup

Basic Usage

Parsing from File

// Parse XML with automatic encoding detection (using string content for testing)
String xmlContent = createConfigXml();
Document doc = Document.of(xmlContent);

Editor editor = new Editor(doc);
// File encoding is automatically detected and preserved

Parsing from InputStream

// Parse from any InputStream
String xmlContent = createTestXml("data");
try (InputStream inputStream = new ByteArrayInputStream(xmlContent.getBytes(StandardCharsets.UTF_8))) {
    Document doc = Document.of(inputStream);
    Editor editor = new Editor(doc);

    // Process the document
    Element root = editor.root();
    Assertions.assertNotNull(root);
    Assertions.assertNotNull(root);
    // ... edit operations
}

Parsing from URL/Network

// Parse XML from network resource (simulated with byte array for testing)
String xmlContent = createTestXml("networkData");
try (InputStream stream = new ByteArrayInputStream(xmlContent.getBytes(StandardCharsets.UTF_8))) {
    Document doc = Document.of(stream);
    Editor editor = new Editor(doc);
    Assertions.assertNotNull(editor);

    // Network XML is parsed with encoding detection
}

Encoding Detection

Automatic Detection

DomTrip automatically detects encoding from multiple sources:

// Parse with automatic encoding detection
String xml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><root>content</root>";
InputStream inputStream = new ByteArrayInputStream(xml.getBytes(StandardCharsets.UTF_8));
Document doc = Document.of(inputStream);

// Encoding is automatically detected and set
String detectedEncoding = doc.encoding(); // e.g., "UTF-8"

Supported Encodings

// UTF-8 (most common)
String utf8Content = createTestXml("utf8");
try (InputStream utf8Stream = new ByteArrayInputStream(utf8Content.getBytes(StandardCharsets.UTF_8))) {
    Document utf8Doc = Document.of(utf8Stream);
    Assertions.assertNotNull(utf8Doc);
}

// UTF-16 with BOM
try (InputStream utf16Stream = new ByteArrayInputStream(utf8Content.getBytes(StandardCharsets.UTF_16))) {
    Document utf16Doc = Document.of(utf16Stream);
    Assertions.assertNotNull(utf16Doc);
}

// ISO-8859-1 (Latin-1)
try (InputStream isoStream = new ByteArrayInputStream(utf8Content.getBytes(StandardCharsets.ISO_8859_1))) {
    Document isoDoc = Document.of(isoStream);
    Assertions.assertNotNull(isoDoc);
}

// All Java-supported encodings work

BOM Handling

// UTF-8 with BOM
String xmlString = "<root>content</root>";
byte[] bomBytes = {(byte) 0xEF, (byte) 0xBB, (byte) 0xBF};
byte[] xmlBytes = xmlString.getBytes(StandardCharsets.UTF_8);
byte[] xmlWithBom = new byte[bomBytes.length + xmlBytes.length];
System.arraycopy(bomBytes, 0, xmlWithBom, 0, bomBytes.length);
System.arraycopy(xmlBytes, 0, xmlWithBom, bomBytes.length, xmlBytes.length);

InputStream inputStream = new ByteArrayInputStream(xmlWithBom);
Document doc = Document.of(inputStream);
// BOM is detected and UTF-8 encoding is used

Advanced Features

Large File Processing

// Memory-efficient processing of large XML files

try {
    // For testing, use regular content
    String xmlContent = createTestXml("dataset");
    Document doc = Document.of(xmlContent);
    Editor editor = new Editor(doc);

    // Process in chunks or specific elements
    Element root = editor.root();
    Assertions.assertNotNull(root);

    // Modify only what's needed
    editor.setAttribute(root, "processed", "true");

    // Save back to file (in real scenario)
    String result = editor.toXml();
    Assertions.assertNotNull(result);

} catch (Exception e) {
    System.err.println("Failed to process large file: " + e.getMessage());
}

Custom Stream Sources

// Parse from compressed streams (simulated with byte array for testing)
String xmlContent = createTestXml("compressed");
try (InputStream stream = new ByteArrayInputStream(xmlContent.getBytes(StandardCharsets.UTF_8))) {
    Document doc = Document.of(stream);
    Assertions.assertNotNull(doc);
    // Simulated compressed XML processing
}

// Parse from database BLOB is also supported via InputStream

Error Handling

try {
    String xmlContent = createTestXml("root");
    try (InputStream inputStream = new ByteArrayInputStream(xmlContent.getBytes(StandardCharsets.UTF_8))) {
        Document doc = Document.of(inputStream);
        Editor editor = new Editor(doc);
        Assertions.assertNotNull(editor);
    }

} catch (Exception e) {
    if (e.getMessage().contains("encoding")) {
        // Handle encoding-related errors
        System.err.println("Encoding issue: " + e.getMessage());
    } else if (e.getMessage().contains("malformed")) {
        // Handle XML syntax errors
        System.err.println("XML syntax error: " + e.getMessage());
    } else {
        // Handle other parsing errors
        System.err.println("Parsing failed: " + e.getMessage());
    }
}

Performance Optimization

Buffered Streams

// Use BufferedInputStream for better performance
String xmlContent = createTestXml("large");
try (InputStream buffered =
        new BufferedInputStream(new ByteArrayInputStream(xmlContent.getBytes(StandardCharsets.UTF_8)), 8192)) {
    Document doc = Document.of(buffered);
    Assertions.assertNotNull(doc);
    // Buffering improves read performance
}

Memory Management

// For very large files, consider processing in sections

// Check file size first (simulated)
long fileSize = 50_000_000; // Simulated file size
if (fileSize > 100_000_000) { // 100MB
    System.out.println("Large file detected, using optimized processing");
}

// For testing, use regular content
String xmlContent = createTestXml("huge");
Document doc = Document.of(xmlContent);
// DomTrip handles memory efficiently even for large files

Common Use Cases

Configuration Files

// Load and read application configuration
String configXml = createConfigXml();
Document config = Document.of(configXml);
Editor editor = new Editor(config);

// Read configuration values
String dbUrl = editor.root()
        .descendant("database")
        .flatMap(db -> db.childElement("url"))
        .map(Element::textContent)
        .orElse("default-url");

Web Service Responses

// Parse XML response from web service (simulated for testing)
String responseXml = "<response><result>success</result></response>";
try (InputStream response = new ByteArrayInputStream(responseXml.getBytes(StandardCharsets.UTF_8))) {
    Document doc = Document.of(response);
    Editor editor = new Editor(doc);

    // Process response data
    Element result = editor.root().childElement("result").orElse(null);
    Assertions.assertNotNull(result);
    // ... extract data
}

Batch Processing

// Process multiple XML files (simulated for testing)
String[] xmlContents = {createTestXml("file1"), createTestXml("file2"), createTestXml("file3")};

for (String xmlContent : xmlContents) {
    try {
        Document doc = Document.of(xmlContent);
        Editor editor = new Editor(doc);

        // Process each file
        processXmlDocument(editor);

        // Save processed result (simulated)
        String result = editor.toXml();
        Assertions.assertNotNull(result);

    } catch (Exception e) {
        System.err.println("Failed to process XML: " + e.getMessage());
    }
}

Best Practices

Do:

  • Always use try-with-resources for proper stream cleanup
  • Let DomTrip handle encoding detection automatically
  • Use buffered streams for large files
  • Handle encoding exceptions gracefully
  • Check file existence before parsing

Avoid:

  • Manually specifying encoding unless absolutely necessary
  • Keeping streams open longer than needed
  • Ignoring encoding-related exceptions
  • Processing extremely large files without memory considerations
  • Assuming all InputStreams support mark/reset

Integration with Editor

Input stream parsing integrates seamlessly with the Editor API:

String xmlWithPIs = """
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/css" href="style.css"?>
    <root>
        <item>content</item>
    </root>
    """;
Editor editor = new Editor(Document.of(xmlWithPIs));

// PIs are automatically preserved during editing
editor.addElement(editor.root(), "newElement", "content");

// Original PIs remain in their positions with exact formatting
String result = editor.toXml();

Input stream parsing in DomTrip provides a robust, efficient way to work with XML from any source while maintaining the library's core principles of lossless processing and formatting preservation.