In Haskell how do you extract strings from an XML document? In Haskell how do you extract strings from an XML document? xml xml

In Haskell how do you extract strings from an XML document?


I've never actually bothered to figure out how to extract bits out of XML documents using HaXML; HXT has met all my needs.

{-# LANGUAGE Arrows #-}import Data.Maybeimport Text.XML.HXT.Arrowtype Name = Stringtype Value = Stringdata LocalizedString = LS Name ValuegetLocalizedStrings :: String -> Maybe [LocalizedString]getLocalizedStrings = (.) listToMaybe . runLA $ xread >>> getRootatTag :: ArrowXml a => String -> a XmlTree XmlTreeatTag tag = deep $ isElem >>> hasName taggetRoot :: ArrowXml a => a XmlTree [LocalizedString]getRoot = atTag "root" >>> listA getElemgetElem :: ArrowXml a => a XmlTree LocalizedStringgetElem = atTag "elem" >>> proc x -> do    name <- getAttrValue "name" -< x    value <- getChildren >>> getText -< x    returnA -< LS name value

You'd probably like a little more error-checking (i.e. don't just lazily use atTag like me; actually verify that <root> is root, <elem> is direct descendent, etc.) but this works just fine on your example.


Now, if you need an introduction to Arrows, unfortunately I don't know of any good one. I myself learned it the "thrown into the ocean to learn how to swim" way.

Something that may be helpful to keep in mind is that the proc/-< syntax is simply sugar for the basic arrow operations (arr, >>>, etc.), just like do/<- is simply sugar for the basic monad operations (return, >>=, etc.). The following are equivalent:

getAttrValue "name" &&& (getChildren >>> getText) >>^ uncurry LSproc x -> do    name <- getAttrValue "name" -< x    value <- getChildren >>> getText -< x    returnA -< LS name value


Use one of the XML packages.

The most popular are, in order,

  1. haxml
  2. hxt
  3. xml-light
  4. hexpat


FWIW, HXT seems like overkill where a simple TagSoup will do :)