Input Stream Parsing

DomTrip provides robust support for parsing XML from various input sources including InputStreams, files, and network resources with automatic encoding detection and BOM handling.

Overview

Input stream parsing allows you to process XML from:

File systems - Local and network files
Network streams - HTTP responses, web services
Memory streams - ByteArrayInputStream, in-memory data
Compressed streams - ZIP, GZIP archives
Any InputStream - Database BLOBs, custom sources

Key Features

Automatic encoding detection - UTF-8, UTF-16, ISO-8859-1, and more
BOM handling - Byte Order Mark detection and processing
Large file support - Memory-efficient streaming for big documents
Error recovery - Graceful handling of encoding issues
Resource management - Automatic stream cleanup

Basic Usage

Parsing from File

// Parse XML file with automatic encoding detection
// Path xmlFile = Path.of("config.xml");
// Document doc = Document.of(xmlFile);

// For testing, use string content
String xmlContent = createConfigXml();
Document doc = Document.of(xmlContent);

Editor editor = new Editor(doc);
// File encoding is automatically detected and preserved

Parsing from InputStream

// Parse from any InputStream
String xmlContent = createTestXml("data");
try (InputStream inputStream = new ByteArrayInputStream(xmlContent.getBytes(StandardCharsets.UTF_8))) {
    Document doc = Document.of(inputStream);
    Editor editor = new Editor(doc);

    // Process the document
    Element root = editor.root();
    // ... edit operations
}

Parsing from URL/Network

// Parse XML from network resource
// URL xmlUrl = new URL("https://example.com/api/data.xml");
// try (InputStream stream = xmlUrl.openStream()) {
//     Document doc = Document.of(stream);
//     Editor editor = new Editor(doc);
//
//     // Network XML is parsed with encoding detection
// }

// For testing, simulate network content
String xmlContent = createTestXml("networkData");
try (InputStream stream = new ByteArrayInputStream(xmlContent.getBytes(StandardCharsets.UTF_8))) {
    Document doc = Document.of(stream);
    Editor editor = new Editor(doc);

    // Network XML is parsed with encoding detection
}

Encoding Detection

Automatic Detection

DomTrip automatically detects encoding from multiple sources:

// Encoding detected from:
// 1. Byte Order Mark (BOM)
// 2. XML declaration
// 3. Content analysis
// 4. Default fallback (UTF-8)

String xmlContent = createTestXml("root");
try (InputStream inputStream = new ByteArrayInputStream(xmlContent.getBytes(StandardCharsets.UTF_8))) {
    Document doc = Document.of(inputStream);
    String detectedEncoding = doc.encoding(); // "UTF-8", "UTF-16", etc.
}

Supported Encodings

// UTF-8 (most common)
String utf8Content = createTestXml("utf8");
try (InputStream utf8Stream = new ByteArrayInputStream(utf8Content.getBytes(StandardCharsets.UTF_8))) {
    Document utf8Doc = Document.of(utf8Stream);
}

// UTF-16 with BOM
try (InputStream utf16Stream = new ByteArrayInputStream(utf8Content.getBytes(StandardCharsets.UTF_16))) {
    Document utf16Doc = Document.of(utf16Stream);
}

// ISO-8859-1 (Latin-1)
try (InputStream isoStream = new ByteArrayInputStream(utf8Content.getBytes(StandardCharsets.ISO_8859_1))) {
    Document isoDoc = Document.of(isoStream);
}

// All Java-supported encodings work

BOM Handling

// BOM is automatically detected and handled
byte[] utf8WithBom = {(byte) 0xEF, (byte) 0xBB, (byte) 0xBF};
String xmlContent = createTestXml("root");
byte[] xmlBytes = xmlContent.getBytes(StandardCharsets.UTF_8);
byte[] combined = new byte[utf8WithBom.length + xmlBytes.length];
System.arraycopy(utf8WithBom, 0, combined, 0, utf8WithBom.length);
System.arraycopy(xmlBytes, 0, combined, utf8WithBom.length, xmlBytes.length);

ByteArrayInputStream stream = new ByteArrayInputStream(combined);

Document doc = Document.of(stream);
// BOM is processed transparently, encoding correctly detected

Advanced Features

Large File Processing

// Memory-efficient processing of large XML files
// Path largeXmlFile = Path.of("large-dataset.xml");

try {
    // For testing, use regular content
    String xmlContent = createTestXml("dataset");
    Document doc = Document.of(xmlContent);
    Editor editor = new Editor(doc);

    // Process in chunks or specific elements
    Element root = editor.root();

    // Modify only what's needed
    editor.setAttribute(root, "processed", "true");

    // Save back to file (in real scenario)
    String result = editor.toXml();

} catch (Exception e) {
    System.err.println("Failed to process large file: " + e.getMessage());
}

Custom Stream Sources

// Parse from compressed streams
// try (InputStream gzipStream = new GZIPInputStream(
//         new FileInputStream("data.xml.gz"))) {
//     Document doc = Document.of(gzipStream);
//     // Compressed XML is automatically decompressed and parsed
// }

// For testing, simulate compressed content
String xmlContent = createTestXml("compressed");
try (InputStream stream = new ByteArrayInputStream(xmlContent.getBytes(StandardCharsets.UTF_8))) {
    Document doc = Document.of(stream);
    // Simulated compressed XML processing
}

// Parse from database BLOB (simulated)
// try (InputStream blobStream = resultSet.getBinaryStream("xml_data")) {
//     Document doc = Document.of(blobStream);
//     // Database XML content is parsed with encoding detection
// }

Error Handling

try {
    String xmlContent = createTestXml("root");
    try (InputStream inputStream = new ByteArrayInputStream(xmlContent.getBytes(StandardCharsets.UTF_8))) {
        Document doc = Document.of(inputStream);
        Editor editor = new Editor(doc);
    }

} catch (Exception e) {
    if (e.getMessage().contains("encoding")) {
        // Handle encoding-related errors
        System.err.println("Encoding issue: " + e.getMessage());
    } else if (e.getMessage().contains("malformed")) {
        // Handle XML syntax errors
        System.err.println("XML syntax error: " + e.getMessage());
    } else {
        // Handle other parsing errors
        System.err.println("Parsing failed: " + e.getMessage());
    }
}

Performance Optimization

Buffered Streams

// Use BufferedInputStream for better performance
String xmlContent = createTestXml("large");
try (InputStream buffered =
        new BufferedInputStream(new ByteArrayInputStream(xmlContent.getBytes(StandardCharsets.UTF_8)), 8192)) {
    Document doc = Document.of(buffered);
    // Buffering improves read performance
}

Memory Management

// For very large files, consider processing in sections
// Path hugefile = Path.of("huge-dataset.xml");

// Check file size first (simulated)
long fileSize = 50_000_000; // Simulated file size
if (fileSize > 100_000_000) { // 100MB
    System.out.println("Large file detected, using optimized processing");
}

// For testing, use regular content
String xmlContent = createTestXml("huge");
Document doc = Document.of(xmlContent);
// DomTrip handles memory efficiently even for large files

Common Use Cases

Configuration Files

// Load application configuration
// Path configPath = Path.of("app-config.xml");
// if (Files.exists(configPath)) {
//     Document config = Document.of(configPath);
//     Editor editor = new Editor(config);
//
//     // Read configuration values
//     String dbUrl = editor.findElement("database")
//         .flatMap(db -> db.child("url"))
//         .map(Element::textContent)
//         .orElse("default-url");
// }

// For testing, use simulated config
String configXml = createConfigXml();
Document config = Document.of(configXml);
Editor editor = new Editor(config);

// Read configuration values
String dbUrl = editor.root()
        .descendant("database")
        .flatMap(db -> db.child("url"))
        .map(Element::textContent)
        .orElse("default-url");

Web Service Responses

// Parse XML response from web service
// HttpURLConnection connection = (HttpURLConnection) url.openConnection();
// try (InputStream response = connection.getInputStream()) {
//     Document doc = Document.of(response);
//     Editor editor = new Editor(doc);
//
//     // Process response data
//     Element result = editor.findElement("result");
//     // ... extract data
// }

// For testing, simulate web service response
String responseXml = "<response><result>success</result></response>";
try (InputStream response = new ByteArrayInputStream(responseXml.getBytes(StandardCharsets.UTF_8))) {
    Document doc = Document.of(response);
    Editor editor = new Editor(doc);

    // Process response data
    Element result = editor.root().child("result").orElse(null);
    // ... extract data
}

Batch Processing

// Process multiple XML files
// List<Path> xmlFiles = Files.list(Path.of("xml-data"))
//     .filter(path -> path.toString().endsWith(".xml"))
//     .collect(Collectors.toList());

// For testing, simulate multiple files
String[] xmlContents = {createTestXml("file1"), createTestXml("file2"), createTestXml("file3")};

for (String xmlContent : xmlContents) {
    try {
        Document doc = Document.of(xmlContent);
        Editor editor = new Editor(doc);

        // Process each file
        processXmlDocument(editor);

        // Save processed result (simulated)
        String result = editor.toXml();

    } catch (Exception e) {
        System.err.println("Failed to process XML: " + e.getMessage());
    }
}

Best Practices

✅ Do:

Always use try-with-resources for proper stream cleanup
Let DomTrip handle encoding detection automatically
Use buffered streams for large files
Handle encoding exceptions gracefully
Check file existence before parsing

❌ Avoid:

Manually specifying encoding unless absolutely necessary
Keeping streams open longer than needed
Ignoring encoding-related exceptions
Processing extremely large files without memory considerations
Assuming all InputStreams support mark/reset

Integration with Editor

Input stream parsing integrates seamlessly with the Editor API:

// Create document and edit
Document doc = Document.withRootElement("config");
Editor editor = new Editor(doc);

// Editor operations modify the document
editor.addElement(editor.root(), "setting", "value");

// Document reflects changes
Element setting = doc.root().child("setting").orElse(null);

Input stream parsing in DomTrip provides a robust, efficient way to work with XML from any source while maintaining the library's core principles of lossless processing and formatting preservation.