Ruby XML, XSLT and XPath Tutorial

XML (eXtensible Markup Language) is a widely used data exchange format, with important applications in Web services, configuration files, and data storage. Ruby provides multiple ways to process XML, including the built-in REXML library and third-party libraries like Nokogiri. This chapter will explain in detail how to parse, generate, and manipulate XML data in Ruby, as well as advanced processing using XPath and XSLT.

🎯 XML Basics

What is XML

XML (eXtensible Markup Language) is a markup language designed to store and transport data. It has the following characteristics:

Self-descriptive: Tag names can describe the meaning of data
Hierarchical structure: Supports nested data structures
Platform-independent: Can exchange data between different systems
Extensible: Can define your own tags

XML Basic Structure

<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
  <book id="1">
    <title>Ruby Programming Basics</title>
    <author>John Smith</author>
    <price currency="CNY">59.00</price>
    <category>Programming</category>
  </book>
  <book id="2">
    <title>Web Development Practice</title>
    <author>Jane Doe</author>
    <price currency="CNY">79.00</price>
    <category>Web</category>
  </book>
</bookstore>

📖 Using REXML to Process XML

Parsing XML Documents

require 'rexml/document'

# XML data
xml_data = <<~XML
  <?xml version="1.0" encoding="UTF-8"?>
  <bookstore>
    <book id="1">
      <title>Ruby Programming Basics</title>
      <author>John Smith</author>
      <price currency="CNY">59.00</price>
      <category>Programming</category>
    </book>
    <book id="2">
      <title>Web Development Practice</title>
      <author>Jane Doe</author>
      <price currency="CNY">79.00</price>
      <category>Web</category>
    </book>
  </bookstore>
XML

# Parse XML
doc = REXML::Document.new(xml_data)

# Access root element
root = doc.root
puts "Root element: #{root.name}"  # bookstore

# Iterate child elements
root.elements.each('book') do |book|
  id = book.attributes['id']
  title = book.elements['title'].text
  author = book.elements['author'].text
  price = book.elements['price'].text
  currency = book.elements['price'].attributes['currency']
  
  puts "Book ID: #{id}"
  puts "Title: #{title}"
  puts "Author: #{author}"
  puts "Price: #{price} #{currency}"
  puts "---"
end

Generating XML Documents

require 'rexml/document'

# Create XML document
doc = REXML::Document.new
doc << REXML::XMLDecl.new('1.0', 'UTF-8')

# Create root element
bookstore = doc.add_element('bookstore')

# Add book elements
book1 = bookstore.add_element('book', {'id' => '1'})
book1.add_element('title').text = 'Ruby Programming Basics'
book1.add_element('author').text = 'John Smith'
price1 = book1.add_element('price', {'currency' => 'CNY'})
price1.text = '59.00'
book1.add_element('category').text = 'Programming'

book2 = bookstore.add_element('book', {'id' => '2'})
book2.add_element('title').text = 'Web Development Practice'
book2.add_element('author').text = 'Jane Doe'
price2 = book2.add_element('price', {'currency' => 'CNY'})
price2.text = '79.00'
book2.add_element('category').text = 'Web'

# Output XML
output = StringIO.new
doc.write(output, 2)  # 2 indicates number of indent spaces
puts output.string

Modifying XML Documents

require 'rexml/document'

# Parse existing XML
xml_data = <<~XML
  <bookstore>
    <book id="1">
      <title>Old Title</title>
      <author>John Smith</author>
      <price currency="CNY">59.00</price>
    </book>
  </bookstore>
XML

doc = REXML::Document.new(xml_data)

# Modify element text
book = doc.root.elements['book']
book.elements['title'].text = 'New Title'

# Modify attribute
book.elements['price'].attributes['currency'] = 'USD'

# Add new element
book.add_element('category').text = 'Programming'

# Delete element
# book.delete_element('author')

# Output modified XML
output = StringIO.new
doc.write(output, 2)
puts output.string

🔍 Using XPath to Query XML

XPath Basics

XPath is a language for finding nodes in XML documents. REXML supports XPath queries:

require 'rexml/document'
require 'rexml/xpath'

xml_data = <<~XML
  <library>
    <book category="fiction" id="1">
      <title lang="zh">Novel A</title>
      <author>Author A</author>
      <year>2020</year>
      <price>29.99</price>
    </book>
    <book category="fiction" id="2">
      <title lang="en">Novel B</title>
      <author>Author B</author>
      <year>2021</year>
      <price>39.99</price>
    </book>
    <book category="technical" id="3">
      <title lang="zh">Technical Manual</title>
      <author>Technical Author</author>
      <year>2019</year>
      <price>49.99</price>
    </book>
  </library>
XML

doc = REXML::Document.new(xml_data)

# Basic XPath queries
# Find all book elements
books = REXML::XPath.match(doc, '//book')
puts "Total books: #{books.length}"

# Find elements with specific attribute
fiction_books = REXML::XPath.match(doc, '//book[@category="fiction"]')
puts "Fiction books: #{fiction_books.length}"

# Find book with specific ID
book1 = REXML::XPath.first(doc, '//book[@id="1"]')
puts "Book 1 title: #{book1.elements['title'].text}"

# Find elements containing specific text
chinese_books = REXML::XPath.match(doc, '//book[title/@lang="zh"]')
puts "Chinese books: #{chinese_books.length}"

# Use axis queries
# Find books following the first book
following_books = REXML::XPath.match(doc, '//book[@id="1"]/following-sibling::book')
puts "Books after book 1: #{following_books.length}"

# Find parent element
book_parent = REXML::XPath.first(doc, '//book/parent::*')
puts "Parent element of book: #{book_parent.name}"

Advanced XPath Queries

require 'rexml/document'
require 'rexml/xpath'

# Complex XML data
xml_data = <<~XML
  <company>
    <department name="Engineering">
      <employee id="001">
        <name>John Smith</name>
        <position>Senior Engineer</position>
        <salary>15000</salary>
        <skills>
          <skill>Ruby</skill>
          <skill>JavaScript</skill>
          <skill>Python</skill>
        </skills>
      </employee>
      <employee id="002">
        <name>Jane Doe</name>
        <position>Junior Engineer</position>
        <salary>8000</salary>
        <skills>
          <skill>Java</skill>
          <skill>SQL</skill>
        </skills>
      </employee>
    </department>
    <department name="Design">
      <employee id="003">
        <name>Bob Wilson</name>
        <position>UI Designer</position>
        <salary>12000</salary>
        <skills>
          <skill>Photoshop</skill>
          <skill>Sketch</skill>
        </skills>
      </employee>
    </department>
  </company>
XML

doc = REXML::Document.new(xml_data)

# Query high-salary employees (salary > 10000)
high_salary_employees = REXML::XPath.match(doc, '//employee[salary > 10000]')
puts "High salary employees:"
high_salary_employees.each do |emp|
  name = emp.elements['name'].text
  salary = emp.elements['salary'].text
  puts "  #{name}: #{salary}"
end

# Query employees with specific skill
ruby_developers = REXML::XPath.match(doc, '//employee[skills/skill="Ruby"]')
puts "\nRuby developers:"
ruby_developers.each do |emp|
  puts "  #{emp.elements['name'].text}"
end

# Query employee count per department
departments = REXML::XPath.match(doc, '//department')
puts "\nDepartment employee statistics:"
departments.each do |dept|
  dept_name = dept.attributes['name']
  employee_count = REXML::XPath.match(dept, './/employee').length
  puts "  #{dept_name}: #{employee_count} employees"
end

# Query all skills
all_skills = REXML::XPath.match(doc, '//skill')
unique_skills = all_skills.map { |skill| skill.text }.uniq
puts "\nAll skills: #{unique_skills.join(', ')}"

🛠️ Using Nokogiri to Process XML

Installation and Basic Use

Nokogiri is a more powerful XML/HTML processing library that needs to be installed first:

gem install nokogiri

require 'nokogiri'

# Parse XML
xml_string = <<-XML
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
  <book id="1">
    <title>Ruby Programming Basics</title>
    <author>John Smith</author>
    <price>59.00</price>
  </book>
</bookstore>
XML

doc = Nokogiri::XML(xml_string)

# Use CSS selectors
doc.css('book').each do |book|
  puts "Book: #{book.at_css('title').text}"
end

# Use XPath
doc.xpath('//book').each do |book|
  puts "Author: #{book.at_xpath('author').text}"
end

# Modify XML
book = doc.at_xpath('//book[@id="1"]')
book.at_xpath('title').content = 'New Title'

puts doc.to_xml

📚 Next Steps

After mastering Ruby XML processing with XPath and XSLT, we recommend continuing to learn:

Ruby JSON - Learn JSON data processing
Ruby Web Services - Learn web service development
Ruby Database Access - Learn database operations

Continue your Ruby learning journey!

#Ruby XML, XSLT and XPath Tutorial

#🎯 XML Basics

#What is XML

#XML Basic Structure

#📖 Using REXML to Process XML

#Parsing XML Documents

#Generating XML Documents

#Modifying XML Documents

#🔍 Using XPath to Query XML

#XPath Basics

#Advanced XPath Queries

#🛠️ Using Nokogiri to Process XML

#Installation and Basic Use

#📚 Next Steps