Input Stream Parsing

DomTrip provides robust support for parsing XML from various input sources including InputStreams, files, and network resources with automatic encoding detection and BOM handling.

Overview

Input stream parsing allows you to process XML from:

  • File systems - Local and network files
  • Network streams - HTTP responses, web services
  • Memory streams - ByteArrayInputStream, in-memory data
  • Compressed streams - ZIP, GZIP archives
  • Any InputStream - Database BLOBs, custom sources

Key Features

  • Automatic encoding detection - UTF-8, UTF-16, ISO-8859-1, and more
  • BOM handling - Byte Order Mark detection and processing
  • Large file support - Memory-efficient streaming for big documents
  • Error recovery - Graceful handling of encoding issues
  • Resource management - Automatic stream cleanup

Basic Usage

Parsing from File

// Parse XML file with automatic encoding detection
// Path xmlFile = Path.of("config.xml");
// Document doc = Document.of(xmlFile);

// For testing, use string content
String xmlContent = createConfigXml();
Document doc = Document.of(xmlContent);

Editor editor = new Editor(doc);
// File encoding is automatically detected and preserved

Parsing from InputStream

// Parse from any InputStream
String xmlContent = createTestXml("data");
try (InputStream inputStream = new ByteArrayInputStream(xmlContent.getBytes(StandardCharsets.UTF_8))) {
    Document doc = Document.of(inputStream);
    Editor editor = new Editor(doc);

    // Process the document
    Element root = editor.root();
    // ... edit operations
}

Parsing from URL/Network

// Parse XML from network resource
// URL xmlUrl = new URL("https://example.com/api/data.xml");
// try (InputStream stream = xmlUrl.openStream()) {
//     Document doc = Document.of(stream);
//     Editor editor = new Editor(doc);
//
//     // Network XML is parsed with encoding detection
// }

// For testing, simulate network content
String xmlContent = createTestXml("networkData");
try (InputStream stream = new ByteArrayInputStream(xmlContent.getBytes(StandardCharsets.UTF_8))) {
    Document doc = Document.of(stream);
    Editor editor = new Editor(doc);

    // Network XML is parsed with encoding detection
}

Encoding Detection

Automatic Detection

DomTrip automatically detects encoding from multiple sources:

// Parse with automatic encoding detection
String xml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><root>content</root>";
InputStream inputStream = new ByteArrayInputStream(xml.getBytes(StandardCharsets.UTF_8));
Document doc = Document.of(inputStream);

// Encoding is automatically detected and set
String detectedEncoding = doc.encoding(); // e.g., "UTF-8"

Supported Encodings

// UTF-8 (most common)
String utf8Content = createTestXml("utf8");
try (InputStream utf8Stream = new ByteArrayInputStream(utf8Content.getBytes(StandardCharsets.UTF_8))) {
    Document utf8Doc = Document.of(utf8Stream);
}

// UTF-16 with BOM
try (InputStream utf16Stream = new ByteArrayInputStream(utf8Content.getBytes(StandardCharsets.UTF_16))) {
    Document utf16Doc = Document.of(utf16Stream);
}

// ISO-8859-1 (Latin-1)
try (InputStream isoStream = new ByteArrayInputStream(utf8Content.getBytes(StandardCharsets.ISO_8859_1))) {
    Document isoDoc = Document.of(isoStream);
}

// All Java-supported encodings work

BOM Handling

// UTF-8 with BOM
String xmlString = "<root>content</root>";
byte[] bomBytes = {(byte) 0xEF, (byte) 0xBB, (byte) 0xBF};
byte[] xmlBytes = xmlString.getBytes(StandardCharsets.UTF_8);
byte[] xmlWithBom = new byte[bomBytes.length + xmlBytes.length];
System.arraycopy(bomBytes, 0, xmlWithBom, 0, bomBytes.length);
System.arraycopy(xmlBytes, 0, xmlWithBom, bomBytes.length, xmlBytes.length);

InputStream inputStream = new ByteArrayInputStream(xmlWithBom);
Document doc = Document.of(inputStream);
// BOM is detected and UTF-8 encoding is used

Advanced Features

Large File Processing

// Memory-efficient processing of large XML files
// Path largeXmlFile = Path.of("large-dataset.xml");

try {
    // For testing, use regular content
    String xmlContent = createTestXml("dataset");
    Document doc = Document.of(xmlContent);
    Editor editor = new Editor(doc);

    // Process in chunks or specific elements
    Element root = editor.root();

    // Modify only what's needed
    editor.setAttribute(root, "processed", "true");

    // Save back to file (in real scenario)
    String result = editor.toXml();

} catch (Exception e) {
    System.err.println("Failed to process large file: " + e.getMessage());
}

Custom Stream Sources

// Parse from compressed streams
// try (InputStream gzipStream = new GZIPInputStream(
//         new FileInputStream("data.xml.gz"))) {
//     Document doc = Document.of(gzipStream);
//     // Compressed XML is automatically decompressed and parsed
// }

// For testing, simulate compressed content
String xmlContent = createTestXml("compressed");
try (InputStream stream = new ByteArrayInputStream(xmlContent.getBytes(StandardCharsets.UTF_8))) {
    Document doc = Document.of(stream);
    // Simulated compressed XML processing
}

// Parse from database BLOB (simulated)
// try (InputStream blobStream = resultSet.getBinaryStream("xml_data")) {
//     Document doc = Document.of(blobStream);
//     // Database XML content is parsed with encoding detection
// }

Error Handling

try {
    String xmlContent = createTestXml("root");
    try (InputStream inputStream = new ByteArrayInputStream(xmlContent.getBytes(StandardCharsets.UTF_8))) {
        Document doc = Document.of(inputStream);
        Editor editor = new Editor(doc);
    }

} catch (Exception e) {
    if (e.getMessage().contains("encoding")) {
        // Handle encoding-related errors
        System.err.println("Encoding issue: " + e.getMessage());
    } else if (e.getMessage().contains("malformed")) {
        // Handle XML syntax errors
        System.err.println("XML syntax error: " + e.getMessage());
    } else {
        // Handle other parsing errors
        System.err.println("Parsing failed: " + e.getMessage());
    }
}

Performance Optimization

Buffered Streams

// Use BufferedInputStream for better performance
String xmlContent = createTestXml("large");
try (InputStream buffered =
        new BufferedInputStream(new ByteArrayInputStream(xmlContent.getBytes(StandardCharsets.UTF_8)), 8192)) {
    Document doc = Document.of(buffered);
    // Buffering improves read performance
}

Memory Management

// Optimize memory usage for long-running applications
XmlProcessor processor = new XmlProcessor();
String xmlContent = createTestXml("root");
Document result = processor.processWithCaching(xmlContent);

// Periodic cleanup
processor.cleanup();

Common Use Cases

Configuration Files

// Load application configuration
// Path configPath = Path.of("app-config.xml");
// if (Files.exists(configPath)) {
//     Document config = Document.of(configPath);
//     Editor editor = new Editor(config);
//
//     // Read configuration values
//     String dbUrl = editor.findElement("database")
//         .flatMap(db -> db.child("url"))
//         .map(Element::textContent)
//         .orElse("default-url");
// }

// For testing, use simulated config
String configXml = createConfigXml();
Document config = Document.of(configXml);
Editor editor = new Editor(config);

// Read configuration values
String dbUrl = editor.root()
        .descendant("database")
        .flatMap(db -> db.child("url"))
        .map(Element::textContent)
        .orElse("default-url");

Web Service Responses

// Parse XML response from web service
// HttpURLConnection connection = (HttpURLConnection) url.openConnection();
// try (InputStream response = connection.getInputStream()) {
//     Document doc = Document.of(response);
//     Editor editor = new Editor(doc);
//
//     // Process response data
//     Element result = editor.findElement("result");
//     // ... extract data
// }

// For testing, simulate web service response
String responseXml = "<response><result>success</result></response>";
try (InputStream response = new ByteArrayInputStream(responseXml.getBytes(StandardCharsets.UTF_8))) {
    Document doc = Document.of(response);
    Editor editor = new Editor(doc);

    // Process response data
    Element result = editor.root().child("result").orElse(null);
    // ... extract data
}

Batch Processing

// Process multiple XML files
// List<Path> xmlFiles = Files.list(Path.of("xml-data"))
//     .filter(path -> path.toString().endsWith(".xml"))
//     .collect(Collectors.toList());

// For testing, simulate multiple files
String[] xmlContents = {createTestXml("file1"), createTestXml("file2"), createTestXml("file3")};

for (String xmlContent : xmlContents) {
    try {
        Document doc = Document.of(xmlContent);
        Editor editor = new Editor(doc);

        // Process each file
        processXmlDocument(editor);

        // Save processed result (simulated)
        String result = editor.toXml();

    } catch (Exception e) {
        System.err.println("Failed to process XML: " + e.getMessage());
    }
}

Best Practices

Do:

  • Always use try-with-resources for proper stream cleanup
  • Let DomTrip handle encoding detection automatically
  • Use buffered streams for large files
  • Handle encoding exceptions gracefully
  • Check file existence before parsing

Avoid:

  • Manually specifying encoding unless absolutely necessary
  • Keeping streams open longer than needed
  • Ignoring encoding-related exceptions
  • Processing extremely large files without memory considerations
  • Assuming all InputStreams support mark/reset

Integration with Editor

Input stream parsing integrates seamlessly with the Editor API:

// Create document and edit
Document doc = Document.withRootElement("config");
Editor editor = new Editor(doc);

// Editor operations modify the document
editor.addElement(editor.root(), "setting", "value");

// Document reflects changes
Element setting = doc.root().child("setting").orElse(null);

Input stream parsing in DomTrip provides a robust, efficient way to work with XML from any source while maintaining the library's core principles of lossless processing and formatting preservation.