In Haskell how do you extract strings from an XML document?
I've never actually bothered to figure out how to extract bits out of XML documents using HaXML; HXT has met all my needs.
{-# LANGUAGE Arrows #-}import Data.Maybeimport Text.XML.HXT.Arrowtype Name = Stringtype Value = Stringdata LocalizedString = LS Name ValuegetLocalizedStrings :: String -> Maybe [LocalizedString]getLocalizedStrings = (.) listToMaybe . runLA $ xread >>> getRootatTag :: ArrowXml a => String -> a XmlTree XmlTreeatTag tag = deep $ isElem >>> hasName taggetRoot :: ArrowXml a => a XmlTree [LocalizedString]getRoot = atTag "root" >>> listA getElemgetElem :: ArrowXml a => a XmlTree LocalizedStringgetElem = atTag "elem" >>> proc x -> do name <- getAttrValue "name" -< x value <- getChildren >>> getText -< x returnA -< LS name value
You'd probably like a little more error-checking (i.e. don't just lazily use atTag
like me; actually verify that <root>
is root, <elem>
is direct descendent, etc.) but this works just fine on your example.
Now, if you need an introduction to Arrows, unfortunately I don't know of any good one. I myself learned it the "thrown into the ocean to learn how to swim" way.
Something that may be helpful to keep in mind is that the proc
/-<
syntax is simply sugar for the basic arrow operations (arr
, >>>
, etc.), just like do
/<-
is simply sugar for the basic monad operations (return
, >>=
, etc.). The following are equivalent:
getAttrValue "name" &&& (getChildren >>> getText) >>^ uncurry LSproc x -> do name <- getAttrValue "name" -< x value <- getChildren >>> getText -< x returnA -< LS name value