- Introduction and purpose
- Supported XML libraries
- Adapter comparison
- Getting started
- Real-world examples
- Working with documents
- XML objects and their methods
- Advanced features
- Error handling
- Configuration
- Thread safety
- Performance considerations
- Best practices
- Specific adapter limitations
- Round-trip XML Testing
- Development and testing
- Contributing
- License
Moxml provides a unified, modern XML processing interface for Ruby applications. It offers a consistent API that abstracts away the underlying XML implementation details while maintaining high performance through efficient node mapping and native XPath querying.
Key features:
-
Intuitive, Ruby-idiomatic API for XML manipulation
-
Consistent interface across different XML libraries
-
Efficient node mapping for XPath queries
-
Support for all XML node types and features
-
Easy switching between XML processing engines
-
Clean separation between interface and implementation
Moxml supports the following XML libraries:
- REXML
-
REXML, a pure Ruby XML parser distributed with standard Ruby. Not the fastest, but always available.
- Nokogiri
-
(default) Nokogiri, a widely used implementation which wraps around the performant libxml2 C library.
- Oga
-
Oga, a pure Ruby XML parser. Recommended when you need a pure Ruby solution say for Opal.
- Ox
-
Ox, a fast XML parser.
- LibXML
-
libxml-ruby, Ruby bindings for the performant libxml2 C library. Alternative to Nokogiri with similar performance characteristics.
Moxml exercises its best effort to provide a consistent interface across basic XML features, various XML libraries have different features and capabilities.
The following table summarizes the features supported by each library.
|
Note
|
The checkmarks indicate support for the feature, while the footnotes provide additional context for specific features. |
| Feature | Nokogiri | Oga | REXML | LibXML | Ox |
|---|---|---|---|---|---|
HeadedOx |
Parsing, serializing |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
SAX parsing |
✅ Full (10/10 events) |
✅ Full (10/10 events) |
✅ Full (10/10 events) |
✅ Full (10/10 events) |
Node manipulation |
✅ |
✅ |
||
✅ |
✅ |
✅ See NOTE 1. |
✅ See NOTE 1. |
Basic XPath |
✅ |
✅ |
✅ |
✅ |
Uses Ox-specific API |
✅ Full XPath 1.0. See NOTE 3. |
XPath with namespaces |
✅ |
✅ |
❌ |
✅ |
Uses Ox-specific API |
|
Note
|
Ox/HeadedOx: Text node replacement may fail in some cases due to internal node structure. |
|
Note
|
Limited XPath support via locate() method. See adapter limitations
section.
|
|
Note
|
HeadedOx provides full XPath 1.0 support via a pure Ruby XPath engine layered on top of Ox’s C parser. See HeadedOx documentation for details. |
|
Note
|
Ox/HeadedOx SAX: Only core events supported (start_element, end_element, characters, errors). No separate CDATA, comment, or processing instruction events. |
| Feature/Operation | Nokogiri | Oga | REXML | LibXML | Ox | HeadedOx |
|---|---|---|---|---|---|---|
Core Operations |
||||||
Parse XML string |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
Parse XML file/IO |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
Serialize to XML |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
Element Operations |
||||||
Create elements |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
Get/set attributes |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
Add/remove children |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
Replace nodes |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
||
Namespace Operations |
||||||
Add namespaces |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
Default namespaces |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
||
Namespace inheritance |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
❌ None |
❌ None5 |
Namespaced attributes |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
||
XPath Queries |
||||||
Basic paths ( |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
Attribute predicates ( |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
|
Attribute values ( |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
❌ None3 |
✅ Full |
Logical operators ( |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
❌ None |
✅ Full |
Position predicates ( |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
❌ None |
✅ Full |
Text predicates ( |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
❌ None |
✅ Full |
Namespace-aware queries |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
❌ None |
|
Parent axis ( |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
❌ None |
✅ Full |
Sibling axes |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
❌ None |
❌ None5 |
XPath functions ( |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
❌ None |
✅ All 27 |
Special Content |
||||||
CDATA sections |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
Comments |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
Processing instructions |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
DOCTYPE declarations |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
Performance |
||||||
Parse speed |
Fast |
Fast |
Medium |
Fast |
Very Fast |
Very Fast |
Serialize speed |
Fast |
Fast |
Medium |
Medium |
Very Fast |
Very Fast |
Memory usage |
Good |
Medium |
Medium |
Good |
Excellent |
Excellent |
Thread safety |
✅ Yes |
✅ Yes |
✅ Yes |
✅ Yes |
✅ Yes |
✅ Yes |
+
1 Ox/HeadedOx: Text node replacement may fail in some cases due to internal node structure
2 Ox: //book[@id] works (returns all book elements), but doesn’t filter by attribute existence
3 HeadedOx: Full XPath 1.0 with all 27 functions and 6 axes. Pure Ruby XPath engine on Ox’s C parser. 99.20% pass rate. See docs/headed-ox.adoc
4 Ox: Use .find { |el| el["id"] == "123" } instead of XPath attribute value predicates
5 HeadedOx limitations: Namespace introspection and 7 axes not implemented. See docs/HEADED_OX_LIMITATIONS.md
Choose Nokogiri when:
-
You need industry-standard compatibility
-
Large community support is important
-
C extension performance is acceptable
-
Cross-platform deployment is required
Choose Oga when:
-
Pure Ruby environment is required (JRuby, TruffleRuby)
-
Best test coverage is needed (98%)
-
No C extensions are allowed
-
Memory usage is not the primary concern
Choose REXML when:
-
Standard library only (no external gems)
-
Maximum portability is required
-
Small to medium documents
-
Deployment simplicity is critical
Choose LibXML when:
-
Alternative to Nokogiri is desired
-
Full namespace support is required
-
Good performance with correctness
-
Native C extension is acceptable
Choose Ox when:
-
Maximum parsing speed is critical
-
Simple document structures (limited nesting)
-
XPath usage is minimal or absent
-
Memory efficiency is paramount
Choose HeadedOx when:
-
Need Ox’s fast parsing with full XPath support
-
Want comprehensive XPath 1.0 features (functions, predicates)
-
Prefer pure Ruby XPath implementation for debugging
-
Need more XPath capabilities than standard Ox provides
-
Memory efficiency is important but XPath features are required
|
Caution
|
Ox’s custom XPath engine supports common patterns but cannot handle complex XPath expressions. Test thoroughly if your use case requires advanced XPath. |
TODO: We should throw errors when unsupported XPath features are used with Ox or HeadedOx to prevent silent failures.
Install the gem and at least one supported XML library:
# In your Gemfile
gem 'moxml'
gem 'nokogiri' # Or 'oga', 'rexml', 'ox', or 'libxml-ruby'doc = Moxml.new.create_document
# Add XML declaration
doc.add_child(doc.create_declaration("1.0", "UTF-8"))
# Create root element with namespace
root = doc.create_element('book')
root.add_namespace('dc', 'http://purl.org/dc/elements/1.1/')
doc.add_child(root)
# Add content
title = doc.create_element('dc:title')
title.text = 'XML Processing with Ruby'
root.add_child(title)
# Output formatted XML
puts doc.to_xml(indent: 2)Practical, runnable examples demonstrating Moxml usage in common scenarios are available in the examples directory.
These examples include:
- RSS Parser
-
Parse RSS/Atom feeds with XPath queries and namespace handling
- Web Scraper
-
Extract data from HTML/XML using DOM navigation and table parsing
- API Client
-
Build and parse XML API requests/responses with SOAP
Each example is:
-
Fully documented with detailed README
-
Self-contained and runnable
-
Demonstrates best practices
-
Includes sample data files
-
Shows comprehensive error handling
Run any example directly:
ruby examples/rss_parser/rss_parser.rb
ruby examples/web_scraper/web_scraper.rb
ruby examples/api_client/api_client.rbSee the examples README for complete documentation and learning paths.
The builder pattern provides a clean DSL for creating XML documents:
doc = Moxml::Builder.new(Moxml.new).build do
declaration version: "1.0", encoding: "UTF-8"
element 'library', xmlns: 'http://example.org/library' do
element 'book' do
element 'title' do
text 'Ruby Programming'
end
element 'author' do
text 'Jane Smith'
end
comment 'Publication details'
element 'published', year: '2024'
cdata '<custom>metadata</custom>'
end
end
enddoc = Moxml.new.create_document
# Add declaration
doc.add_child(doc.create_declaration("1.0", "UTF-8"))
# Create root with namespace
root = doc.create_element('library')
root.add_namespace(nil, 'http://example.org/library')
root.add_namespace("dc", "http://purl.org/dc/elements/1.1/")
doc.add_child(root)
# Add elements with attributes
book = doc.create_element('book')
book['id'] = 'b1'
book['type'] = 'technical'
root.add_child(book)
# Add mixed content
book.add_child(doc.create_comment('Book details'))
title = doc.create_element('title')
title.text = 'Ruby Programming'
book.add_child(title)
# Add entity reference (for declared entities)
book.add_child(doc.create_entity_reference('mdash'))Moxml supports EntityReference nodes for preserving entity syntax in XML documents. This enables round-trip preservation of entity references like , ©, and custom entities defined in the DOCTYPE.
# Create entity reference programmatically
ref = doc.create_entity_reference('nbsp')
element.add_child(ref)
# Or using the builder pattern
doc = Moxml::Builder.new(Moxml.new).build do
element 'text' do
entity_reference 'ndash'
entity_reference 'copy'
end
endParsing and Round-Trip:
When parsing XML with declared entities, Moxml preserves entity references:
# Parse document with custom entity
xml = <<-XML
<!DOCTYPE root [<!ENTITY nbsp " "> ]>
<root>hello world</root>
XML
doc = Moxml.new(:nokogiri).parse(xml)
doc.to_xml # => preserves entity referenceAdapter Notes:
-
Nokogiri: Preserves custom declared entities as
EntityReferencenodes -
Ox, Oga: These adapters resolve entities during parsing and do not expose entity reference nodes. Use Nokogiri or LibXML for entity preservation.
Entity Loading Configuration:
Moxml provides configurable entity loading with four modes to balance between functionality, performance, and security:
# Default: Load all W3C entities (HTML + MathML + ISO entity sets)
# Raises error if entity data is unavailable
context = Moxml.new
# Optional: Load entities if available, silently skip if not
context = Moxml.new do |config|
config.entity_load_mode = :optional
end
# Disabled: No entity loading (fastest, for controlled XML sources)
context = Moxml.new do |config|
config.entity_load_mode = :disabled
end
# Custom: Load entities from your own source
context = Moxml.new do |config|
config.entity_load_mode = :custom
config.entity_provider = -> { MyEntitySource.all_entities }
endThe entity data comes from the W3C XML Core WG Character Entities specification (HTMLMathML set), bundled locally in data/w3c_entities.json for offline capability. Set the MOXML_ENTITY_DEFINITIONS_PATH environment variable to use a custom entity data source.
For backward compatibility, config.load_external_entities = false maps to :disabled mode, and config.load_external_entities = true maps to :required mode.
Moxml provides a fluent, chainable API for improved developer experience:
element = doc.create_element('book')
.set_attributes(id: "123", type: "technical")
.with_namespace("dc", "http://purl.org/dc/elements/1.1/")
.with_child(doc.create_element("title"))For complete fluent API documentation including all chainable methods, convenience methods, and practical examples, see Working with Documents Guide.
SAX (Simple API for XML) provides memory-efficient, event-driven XML parsing for large documents.
When to use SAX:
-
Processing very large XML files (>100MB)
-
Memory-constrained environments
-
Streaming data extraction
-
Need to process data as it arrives
Quick example:
class BookExtractor < Moxml::SAX::ElementHandler
attr_reader :books
def initialize
super
@books = []
end
def on_start_element(name, attributes = {}, namespaces = {})
super
@books << { id: attributes["id"] } if name == "book"
end
end
handler = BookExtractor.new
Moxml.new.sax_parse(xml_string, handler)
puts handler.books.inspectFor complete SAX documentation including all handler types, event methods, adapter support, and best practices, see SAX Parsing Guide.
For complete node API reference including traversal methods, manipulation, queries, type checking, and node information, see Node API Reference.
Moxml provides a consistent #identifier method across all node types to safely identify nodes:
element = doc.at_xpath("//book")
puts element.identifier # => "book"
attr = element.attribute("id")
puts attr.identifier # => "id"The #identifier method returns the primary identifier for each node type (tag name for elements, attribute name for attributes, target for processing instructions, or nil for content nodes).
|
Important
|
Always use type-safe patterns when working with mixed node types. See the Node API Consistency Guide for complete documentation on safe coding patterns, API surface by node type, and migration guidelines. |
Moxml provides efficient XPath querying with consistent node mapping:
# Find all book elements
books = doc.xpath('//book')
# Find with namespaces
titles = doc.xpath('//dc:title', 'dc' => 'http://purl.org/dc/elements/1.1/')
# Find first matching node
first_book = doc.at_xpath('//book')# Add namespace to element
element.add_namespace('dc', 'http://purl.org/dc/elements/1.1/')
# Create element in namespace
title = doc.create_element('dc:title')For complete documentation on XPath querying, namespace handling, and accessing native implementations, see Advanced Features Guide.
Moxml provides comprehensive error classes with enhanced context for debugging:
begin
doc = Moxml.new.parse(xml_string, strict: true)
results = doc.xpath("//book[@id='123']")
rescue Moxml::ParseError => e
puts "Parse failed at line #{e.line}: #{e.message}"
rescue Moxml::XPathError => e
puts "XPath error: #{e.expression}"
rescue Moxml::Error => e
puts "XML processing error: #{e.message}"
endFor complete error class hierarchy, error types, best practices, and debugging techniques, see Error Handling Guide.
Moxml can be configured globally or per instance:
# Global configuration
Moxml.configure do |config|
config.default_adapter = :nokogiri
config.strict = true
config.encoding = 'UTF-8'
end
# Instance configuration
context = Moxml.new do |config|
config.adapter = :oga
config.strict = false
endMoxml validates namespace URIs against RFC 3986 by default, as required by the W3C Namespaces in XML specification.
For documents that use non-standard namespace identifiers, a lenient mode is available:
# Strict mode (default) — rejects invalid URIs per RFC 3986
context = Moxml.new do |config|
config.namespace_uri_mode = :strict
end
# Lenient mode — accepts any string as a namespace URI
context = Moxml.new do |config|
config.namespace_uri_mode = :lenient
endFor all configuration options, adapter selection, serialization options, and environment-based configuration, see Configuration Guide.
For complete information on thread-safe patterns, context management, and concurrent processing, see the Thread Safety Guide.
For detailed performance optimization strategies, memory management best practices, and efficient querying patterns, see the Performance Considerations Guide.
For comprehensive best practices covering XPath queries, adapter selection, error handling, namespace handling, memory management, thread safety, performance optimization, and testing strategies, see Best Practices Guide.
The Ox adapter provides maximum parsing speed but has XPath limitations.
XPath limitations:
-
No attribute value predicates:
//book[@id='123']❌ -
No logical operators, position predicates, text predicates ❌
-
No namespace queries, parent axis, sibling axes ❌
-
No XPath functions ❌
Workaround: Use Ruby enumerable methods:
# Instead of: doc.xpath("//book[@id='123']")
doc.xpath("//book").find { |book| book["id"] == "123" }For complete Ox adapter documentation including all limitations and workarounds, see Ox Adapter Guide.
The HeadedOx adapter combines Ox’s fast C-based XML parsing with Moxml’s comprehensive pure Ruby XPath 1.0 engine.
Status: Production-ready v1.2 (99.20% pass rate, 1,992/2,008 tests)
Key features:
-
Fast XML parsing (Ox C extension)
-
All 27 XPath 1.0 functions
-
6 XPath axes (child, descendant, parent, attribute, self, descendant-or-self)
-
Expression caching for performance
-
Pure Ruby XPath engine (debuggable)
When to use:
-
Need Ox’s fast parsing with comprehensive XPath
-
Want XPath functions (count, sum, contains, etc.)
-
Prefer pure Ruby XPath for debugging
-
Basic namespace queries are sufficient
# Use HeadedOx adapter
context = Moxml.new(:headed_ox)
doc = context.parse(xml_string)
# Full XPath 1.0 support
books = doc.xpath('//book[@price < 20]')
count = doc.xpath('count(//book)')
titles = doc.xpath('//book/title[contains(., "Ruby")]')For complete HeadedOx documentation including architecture, XPath capabilities, known limitations, and usage examples, see HeadedOx Adapter Guide and Limitations Documentation.
Moxml includes comprehensive round-trip testing to verify that XML documents remain semantically equivalent when parsed and serialized across different adapters.
Round-trip testing ensures:
-
Cross-adapter compatibility - XML parsed with one adapter (e.g., Nokogiri) can be serialized and re-parsed with another adapter (e.g., Oga) while preserving content
-
Structural fidelity - Element names, attributes, and document structure are maintained
-
Content preservation - Text content and entity references survive multiple parse/serialize cycles
-
Double round-trip verification - Source → Target → Source sequences produce semantically equivalent output
Round-trip tests use real-world XML documents organized into collections:
rfcxml - IETF RFC documents in XML format. These provide complex, standards-compliant XML with mixed content, namespaces, and attributes. The collection includes:
-
Large documents (500KB-2.4MB) for stress testing
-
Rich metadata and cross-references
-
Various XML schema patterns
metanorma - Metanorma document processing XML. These test:
-
Document structure preservation
-
Nested elements and complex hierarchies
-
Standard XML vocabularies
niso-jats - NISO Journal Article Tag Suite XML. These provide:
-
Scholarly publishing XML schemas
-
Rich bibliographic metadata
-
Mixed content models
# Run all round-trip tests
bundle exec rake spec:consistency
# Exclude REXML for larger fixtures (faster, REXML is pure Ruby)
MOXML_ROUNDTRIP_REXML_MAX_SIZE=0 bundle exec rake spec:consistency
# Adjust the per-example timeout (default: 120 seconds)
MOXML_ROUNDTRIP_TIMEOUT=300 bundle exec rake spec:consistencyREXML is a pure Ruby XML parser and becomes very slow on large documents (500KB+). By default, REXML adapter pairs are skipped for fixtures exceeding 500KB. All other adapters (Nokogiri, Oga, Ox) are tested against every fixture.
For each fixture, tests run across all adapter pairs (4 adapters = 12 combinations):
-
Parse with source adapter
-
Serialize to XML string
-
Parse serialized output with target adapter
-
Compare semantic equivalence (element names, attributes, text content)
A "double round-trip" test additionally verifies: Source → Target → Source → Target produces consistent results.
|
Note
|
REXML is excluded from adapter pairs for fixtures larger than 500KB (configurable
via MOXML_ROUNDTRIP_REXML_MAX_SIZE). This is because REXML is pure Ruby and cannot
parse large XML documents in a practical timeframe. A per-example timeout
(MOXML_ROUNDTRIP_TIMEOUT, default 120s) prevents tests from hanging indefinitely.
|
The Ox adapter produces elements in a different order than other adapters for certain
fixtures with complex nested structures (e.g., element_citation.xml,
collection1nested.xml, pnas_sample.xml). This causes the elements_with_attributes
comparison to fail with "Array length mismatch" even though the semantic equivalence
check (double round-trip) passes.
Round-trip tests automatically skip the elements_with_attributes comparison for these
known Ox ordering issues. The ruby-versions CI job tests only Nokogiri and Oga adapters;
the nokogiri-ox and nokogiri-rexml CI jobs test Ox and REXML respectively but are
marked as experimental since these adapters lack full XML feature support:
-
Ox: Lacks proper namespace support, XPath with predicates, and uses a custom
locate()method instead of standard XPath -
REXML: Pure Ruby, exponential time complexity with document size, impractical for documents over ~500KB
For production use, prefer Nokogiri or Oga which provide complete XML conformance.
To run tests with a specific adapter set locally:
# Nokogiri + Oga only (fast, full test suite)
MOXML_ROUNDTRIP_ADAPTERS=nokogiri,oga bundle exec rspec spec/consistency/ --tag round_trip
# Nokogiri × Ox only (experimental)
MOXML_ROUNDTRIP_ADAPTERS=nokogiri,ox MOXML_ROUNDTRIP_TIMEOUT=300 bundle exec rspec spec/consistency/ --tag round_trip
# Nokogiri × REXML only (experimental, small fixtures due to exponential complexity)
MOXML_ROUNDTRIP_ADAPTERS=nokogiri,rexml MOXML_ROUNDTRIP_TIMEOUT=300 MOXML_ROUNDTRIP_REXML_MAX_SIZE=50000 bundle exec rspec spec/consistency/ --tag round_tripWhile a pure round-trip test with raw XML comparison would be ideal, different XML adapters have fundamentally different philosophies for handling:
-
Element ordering - Some preserve document order, others sort alphabetically
-
Whitespace handling - Some normalize spaces, others preserve exactly
-
Attribute representation - Different data structures for the same attributes
-
Text extraction - Varying approaches to concatenating text content
Instead of raw comparison, Moxml implements semantic equivalence testing that focuses on meaningful XML structure and content:
# Element name must match
expect(target_element.name).to eq(source_element.name)
# Attributes must be semantically equivalent
expect(target_attributes).to eq(source_attributes)
# Text content must be preserved (whitespace-normalized)
expect(normalized_text(target)).to eq(normalized_text(source))
# Document structure (element count) must match
expect(doc.xpath("//*").size).to eq(original.xpath("//*").size)This approach tolerates adapter-specific serialization differences while ensuring the actual XML content remains intact.
For complete information on development setup, testing strategies, benchmarking, and coverage reporting, see the Development and Testing Guide.
-
Fork the repository
-
Create your feature branch (
git checkout -b feature/my-new-feature) -
Commit your changes (
git commit -am 'Add some feature') -
Push to the branch (
git push origin feature/my-new-feature) -
Create a new Pull Request
Copyright Ribose.
This project is licensed under the Ribose 3-Clause BSD License. See the LICENSE.md file for details.