YouTip LogoYouTip

Ruby Xml Xslt Xpath

# Ruby XML, XSLT, and XPath Tutorial * * * ## What is XML? XML stands for eXtensible Markup Language. It is a subset of the Standard Generalized Markup Language (SGML), a markup language used to tag electronic documents to give them structure. XML can be used to tag data and define data types. It is a source language that allows users to define their own markup languages. It is highly suitable for transmission over the World Wide Web, providing a unified method for describing and exchanging structured data independent of applications or vendors. For more information, please see our (#). * * * ## XML Parser Structure and APIs There are mainly two types of XML parsers: DOM and SAX. * The SAX parser is event-based. It needs to scan the entire XML document from beginning to end. During the scan, whenever it encounters a syntactic structure, it calls the event handler for that specific structure, sending an event to the application. * DOM stands for Document Object Model parsing. It builds the hierarchical syntactic structure of the document, creating a DOM tree in memory. The nodes of the DOM tree are identified as objects. After the document parsing is complete, the entire DOM tree of the document resides in memory. * * * ## Parsing and Creating XML in Ruby Ruby can use the REXML library to parse XML documents. REXML is an XML toolkit for Ruby, written in pure Ruby, and compliant with the XML 1.0 specification. Starting from Ruby 1.8 and later, REXML is included in the Ruby standard library. The path for the REXML library is: `rexml/document`. All methods and classes are encapsulated within a `REXML` module. The REXML parser has the following advantages over other parsers: * 100% written in Ruby. * Suitable for both SAX and DOM parsers. * It is lightweight, with less than 2000 lines of code. * Easy-to-understand methods and classes. * Based on the SAX2 API with full XPath support. * Comes with Ruby installation, no separate installation required. Here is an example XML code, saved as `movies.xml`: War, ThrillerDVD2003PG10Talk about a US-Japan warAnime, Science FictionDVD1989R8A schientific fictionAnime, ActionDVD4PG10Vash the Stampede!ComedyVHSPG2Viewable boredom * * * ## DOM Parser Let's first parse the XML data. We start by requiring the `rexml/document` library. Typically, we can include REXML in the top-level namespace: ## Example ```ruby require 'rexml/document' include REXML xmlfile = File.new("movies.xml") xmldoc = Document.new(xmlfile) root = xmldoc.root puts "Root element : " + root.attributes xmldoc.elements.each("collection/movie") { |e| puts "Movie Title : " + e.attributes } xmldoc.elements.each("collection/movie/type") { |e| puts "Movie Type : " + e.text } xmldoc.elements.each("collection/movie/description") { |e| puts "Movie Description : " + e.text } The output of the above example is: Root element : New Arrivals Movie Title : Enemy Behind Movie Title : Transformers Movie Title : Trigun Movie Title : Ishtar Movie Type : War, Thriller Movie Type : Anime, Science Fiction Movie Type : Anime, Action Movie Type : Comedy Movie Description : Talk about a US-Japan war Movie Description : A schientific fiction Movie Description : Vash the Stampede! Movie Description : Viewable boredom * * * ## SAX Parser Processing the same data file: `movies.xml`. SAX parsing is not recommended for small files. Here is a simple example: ## Example ```ruby require 'rexml/document' require 'rexml/streamlistener' include REXML class MyListener include REXML::StreamListener def tag_start(*args) puts "tag_start: #{args.map {|x| x.inspect}.join(', ')}" end def text(data) return if data =~ /^w*$/ abbrev = data[0..40] + (data.length > 40 ? "..." : "") puts " text : #{abbrev.inspect}" end end list = MyListener.new xmlfile = File.new("movies.xml") Document.parse_stream(xmlfile, list) The output of the above example is: tag_start: "collection", {"shelf"=>"New Arrivals"} tag_start: "movie", {"title"=>"Enemy Behind"} tag_start: "type", {} text : "War, Thriller" tag_start: "format", {} tag_start: "year", {} tag_start: "rating", {} tag_start: "stars", {} tag_start: "description", {} text : "Talk about a US-Japan war" tag_start: "movie", {"title"=>"Transformers"} tag_start: "type", {} text : "Anime, Science Fiction" tag_start: "format", {} tag_start: "year", {} tag_start: "rating", {} tag_start: "stars", {} tag_start: "description", {} text : "A schientific fiction" tag_start: "movie", {"title"=>"Trigun"} tag_start: "type", {} text : "Anime, Action" tag_start: "format", {} tag_start: "episodes", {} tag_start: "rating", {} tag_start: "stars", {} tag_start: "description", {} text : "Vash the Stampede!" tag_start: "movie", {"title"=>"Ishtar"} tag_start: "type", {} tag_start: "format", {} tag_start: "rating", {} tag_start: "stars", {} tag_start: "description", {} text : "Viewable boredom" * * * ## XPath and Ruby We can use XPath to query XML. XPath is a language for finding information in an XML document (see: (#)). XPath stands for XML Path Language. It is a language used to determine the location of a specific part within an XML document (a subset of the Standard Generalized Markup Language). Based on the tree structure of XML, XPath provides the ability to find nodes within a data structure tree. Ruby supports XPath through the REXML library's XPath class, which is based on tree-based parsing (Document Object Model). ## Example ```ruby require 'rexml/document' include REXML xmlfile = File.new("movies.xml") xmldoc = Document.new(xmlfile) movie = XPath.first(xmldoc, "//movie") p movie XPath.each(xmldoc, "//type") { |e| puts e.text } names = XPath.match(xmldoc, "//format").map {|x| x.text } p names The output of the above example is: ... War, Thriller Anime, Science Fiction Anime, Action Comedy ["DVD", "DVD", "DVD", "VHS"] * * * ## XSLT and Ruby There are two XSLT parsers in Ruby. Here is a brief description: ### Ruby-Sablotron This parser was written and maintained by Masayoshi Takahashi. It is primarily written for the Linux operating system and requires the following libraries: * Sablot * Iconv * Expat You can find these libraries at (http://www.rubycolor.org/sablot "Ruby Sablotron"). ### XSLT4R XSLT4R was written by Michael Neumann. XSLT4R is used for simple command-line interaction and can be used by third-party applications to transform XML documents. XSLT4R requires XMLScan to operate, which is included in the XSLT4R archive. It is a 100% Ruby module. These modules can be installed using the standard Ruby installation method (i.e., `ruby install.rb`). The syntax format for XSLT4R is: ruby xslt.rb stylesheet.xsl document.xml If you want to use XSLT4R in an application, you can require XSLT and pass the necessary arguments. Here is an example: ## Example ```ruby require "xslt" stylesheet = File.readlines("stylesheet.xsl").to_s xml_doc = File.readlines("document.xml").to_s arguments = { 'image_dir' => '/....' } sheet = XSLT::Stylesheet.new(stylesheet, arguments) sheet.apply(xml_doc) str = "" sheet.output = sheet.apply(xml_doc) * * * ## More Resources * For the complete REXML parser, please see the documentation (http://www.germane-software.com/software/rexml/ "REXML Parser"). * You can download XSLT4R from the (http://raa.ruby-lang.org/project/xslt4r/ "XSLT4R").
← Http Content TypeRuby Socket Programming β†’