Ruby XML, XSLT and XPath Tutorial
XML (eXtensible Markup Language) is a widely used data exchange format, with important applications in Web services, configuration files, and data storage. Ruby provides multiple ways to process XML, including the built-in REXML library and third-party libraries like Nokogiri. This chapter will explain in detail how to parse, generate, and manipulate XML data in Ruby, as well as advanced processing using XPath and XSLT.
🎯 XML Basics
What is XML
XML (eXtensible Markup Language) is a markup language designed to store and transport data. It has the following characteristics:
- Self-descriptive: Tag names can describe the meaning of data
- Hierarchical structure: Supports nested data structures
- Platform-independent: Can exchange data between different systems
- Extensible: Can define your own tags
XML Basic Structure
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book id="1">
<title>Ruby Programming Basics</title>
<author>John Smith</author>
<price currency="CNY">59.00</price>
<category>Programming</category>
</book>
<book id="2">
<title>Web Development Practice</title>
<author>Jane Doe</author>
<price currency="CNY">79.00</price>
<category>Web</category>
</book>
</bookstore>📖 Using REXML to Process XML
Parsing XML Documents
require 'rexml/document'
# XML data
xml_data = <<~XML
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book id="1">
<title>Ruby Programming Basics</title>
<author>John Smith</author>
<price currency="CNY">59.00</price>
<category>Programming</category>
</book>
<book id="2">
<title>Web Development Practice</title>
<author>Jane Doe</author>
<price currency="CNY">79.00</price>
<category>Web</category>
</book>
</bookstore>
XML
# Parse XML
doc = REXML::Document.new(xml_data)
# Access root element
root = doc.root
puts "Root element: #{root.name}" # bookstore
# Iterate child elements
root.elements.each('book') do |book|
id = book.attributes['id']
title = book.elements['title'].text
author = book.elements['author'].text
price = book.elements['price'].text
currency = book.elements['price'].attributes['currency']
puts "Book ID: #{id}"
puts "Title: #{title}"
puts "Author: #{author}"
puts "Price: #{price} #{currency}"
puts "---"
endGenerating XML Documents
require 'rexml/document'
# Create XML document
doc = REXML::Document.new
doc << REXML::XMLDecl.new('1.0', 'UTF-8')
# Create root element
bookstore = doc.add_element('bookstore')
# Add book elements
book1 = bookstore.add_element('book', {'id' => '1'})
book1.add_element('title').text = 'Ruby Programming Basics'
book1.add_element('author').text = 'John Smith'
price1 = book1.add_element('price', {'currency' => 'CNY'})
price1.text = '59.00'
book1.add_element('category').text = 'Programming'
book2 = bookstore.add_element('book', {'id' => '2'})
book2.add_element('title').text = 'Web Development Practice'
book2.add_element('author').text = 'Jane Doe'
price2 = book2.add_element('price', {'currency' => 'CNY'})
price2.text = '79.00'
book2.add_element('category').text = 'Web'
# Output XML
output = StringIO.new
doc.write(output, 2) # 2 indicates number of indent spaces
puts output.stringModifying XML Documents
require 'rexml/document'
# Parse existing XML
xml_data = <<~XML
<bookstore>
<book id="1">
<title>Old Title</title>
<author>John Smith</author>
<price currency="CNY">59.00</price>
</book>
</bookstore>
XML
doc = REXML::Document.new(xml_data)
# Modify element text
book = doc.root.elements['book']
book.elements['title'].text = 'New Title'
# Modify attribute
book.elements['price'].attributes['currency'] = 'USD'
# Add new element
book.add_element('category').text = 'Programming'
# Delete element
# book.delete_element('author')
# Output modified XML
output = StringIO.new
doc.write(output, 2)
puts output.string🔍 Using XPath to Query XML
XPath Basics
XPath is a language for finding nodes in XML documents. REXML supports XPath queries:
require 'rexml/document'
require 'rexml/xpath'
xml_data = <<~XML
<library>
<book category="fiction" id="1">
<title lang="zh">Novel A</title>
<author>Author A</author>
<year>2020</year>
<price>29.99</price>
</book>
<book category="fiction" id="2">
<title lang="en">Novel B</title>
<author>Author B</author>
<year>2021</year>
<price>39.99</price>
</book>
<book category="technical" id="3">
<title lang="zh">Technical Manual</title>
<author>Technical Author</author>
<year>2019</year>
<price>49.99</price>
</book>
</library>
XML
doc = REXML::Document.new(xml_data)
# Basic XPath queries
# Find all book elements
books = REXML::XPath.match(doc, '//book')
puts "Total books: #{books.length}"
# Find elements with specific attribute
fiction_books = REXML::XPath.match(doc, '//book[@category="fiction"]')
puts "Fiction books: #{fiction_books.length}"
# Find book with specific ID
book1 = REXML::XPath.first(doc, '//book[@id="1"]')
puts "Book 1 title: #{book1.elements['title'].text}"
# Find elements containing specific text
chinese_books = REXML::XPath.match(doc, '//book[title/@lang="zh"]')
puts "Chinese books: #{chinese_books.length}"
# Use axis queries
# Find books following the first book
following_books = REXML::XPath.match(doc, '//book[@id="1"]/following-sibling::book')
puts "Books after book 1: #{following_books.length}"
# Find parent element
book_parent = REXML::XPath.first(doc, '//book/parent::*')
puts "Parent element of book: #{book_parent.name}"Advanced XPath Queries
require 'rexml/document'
require 'rexml/xpath'
# Complex XML data
xml_data = <<~XML
<company>
<department name="Engineering">
<employee id="001">
<name>John Smith</name>
<position>Senior Engineer</position>
<salary>15000</salary>
<skills>
<skill>Ruby</skill>
<skill>JavaScript</skill>
<skill>Python</skill>
</skills>
</employee>
<employee id="002">
<name>Jane Doe</name>
<position>Junior Engineer</position>
<salary>8000</salary>
<skills>
<skill>Java</skill>
<skill>SQL</skill>
</skills>
</employee>
</department>
<department name="Design">
<employee id="003">
<name>Bob Wilson</name>
<position>UI Designer</position>
<salary>12000</salary>
<skills>
<skill>Photoshop</skill>
<skill>Sketch</skill>
</skills>
</employee>
</department>
</company>
XML
doc = REXML::Document.new(xml_data)
# Query high-salary employees (salary > 10000)
high_salary_employees = REXML::XPath.match(doc, '//employee[salary > 10000]')
puts "High salary employees:"
high_salary_employees.each do |emp|
name = emp.elements['name'].text
salary = emp.elements['salary'].text
puts " #{name}: #{salary}"
end
# Query employees with specific skill
ruby_developers = REXML::XPath.match(doc, '//employee[skills/skill="Ruby"]')
puts "\nRuby developers:"
ruby_developers.each do |emp|
puts " #{emp.elements['name'].text}"
end
# Query employee count per department
departments = REXML::XPath.match(doc, '//department')
puts "\nDepartment employee statistics:"
departments.each do |dept|
dept_name = dept.attributes['name']
employee_count = REXML::XPath.match(dept, './/employee').length
puts " #{dept_name}: #{employee_count} employees"
end
# Query all skills
all_skills = REXML::XPath.match(doc, '//skill')
unique_skills = all_skills.map { |skill| skill.text }.uniq
puts "\nAll skills: #{unique_skills.join(', ')}"🛠️ Using Nokogiri to Process XML
Installation and Basic Use
Nokogiri is a more powerful XML/HTML processing library that needs to be installed first:
gem install nokogirirequire 'nokogiri'
# Parse XML
xml_string = <<-XML
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book id="1">
<title>Ruby Programming Basics</title>
<author>John Smith</author>
<price>59.00</price>
</book>
</bookstore>
XML
doc = Nokogiri::XML(xml_string)
# Use CSS selectors
doc.css('book').each do |book|
puts "Book: #{book.at_css('title').text}"
end
# Use XPath
doc.xpath('//book').each do |book|
puts "Author: #{book.at_xpath('author').text}"
end
# Modify XML
book = doc.at_xpath('//book[@id="1"]')
book.at_xpath('title').content = 'New Title'
puts doc.to_xml📚 Next Steps
After mastering Ruby XML processing with XPath and XSLT, we recommend continuing to learn:
- Ruby JSON - Learn JSON data processing
- Ruby Web Services - Learn web service development
- Ruby Database Access - Learn database operations
Continue your Ruby learning journey!