Parsing Mac XML PList into something readable Parsing Mac XML PList into something readable xml xml

Parsing Mac XML PList into something readable


my first thought for this is just to use XSLT (XSL transformations). i don't know exactly what format you are looking for based on your answer in the above comments, but i think i got the gist at least. unless there's something special you need that i didn't think of, i believe XSLT is powerful enough to do everything you need, and no need for a bunch of complicated looping constructs.

if you're not familiar, there's a lot of good information on XSLT on w3schools (probably start at the intro: http://www.w3schools.com/xsl/xsl_intro.asp) and also wikipedia has a decent writeup on it (http://en.wikipedia.org/wiki/XSLT).

it always takes me a while to get the rules working the way i want; it's a different way of thinking about this kind of transform and took me some getting used to. it's necessary to have a decent understanding of XPATH as well. i am constantly having to refer to both the XSLT spec (http://www.w3.org/TR/xslt) and the XPATH spec (http://www.w3.org/TR/xpath/) since i've only had a small amount of experience with it, probably once you have worked with it a while it goes more smoothly.

anyway, i have an app i wrote previously for playing around with these translations. it's a C# application with three textboxes: one for the XSLT, one for the source, and one for the output. i spent a few (okay, many) hours trying to get a first cut of an XSLT that would process your sample data, to get an idea of how hard it would be and what the structure of the transform would be. i think i finally figured out pretty much what was needed, but since i don't know exactly what format you need, i stopped there.

here's a link to the sample transformed output: http://pastebin.com/SMFxUdDK.

following is all the code to actually do the transform, included in a form that you can use to develop as you go. it's not fancy but it has worked well for me. the "heavy lifting" is all done in the "btnTransform_Click()" handler, plus i have implemented an XmlStringWriter to make it easy to output things the way i want. the main bit of the work here is just in coming up with the XSLT directives, the actual transform is fairly well handled for you in the .NET XslCompiledTransform class. however, i figured i had spent enough time figuring out all the little details on it when i wrote it that it was worth giving a working example...

be aware i changed a couple of occurrences of a namespace here on-the-fly, and also added some light comments to the XSLT, so if there are issues let me know and i will correct them.

so, with no further adieu: ;)

the XSLT file:

<?xml version="1.0" encoding="utf-8"?><xsl:stylesheet                        version="1.0"                        xmlns:xsl="http://www.w3.org/1999/XSL/Transform"                        xmlns:msxsl="urn:schemas-microsoft-com:xslt"                        exclude-result-prefixes="msxsl"                        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"                        xmlns:xsd="http://www.w3.org/2001/XMLSchema"                        xmlns:fn="http://www.w3.org/2005/xpath-functions">  <!-- this just says to output XML as opposed to HTML or raw text -->  <xsl:output method="xml" indent="yes" xsi:type="xsl:output" />  <!-- this matches the root element and then creates a root element -->  <!-- with more templates applied as children -->  <xsl:template match="/" priority="9" >    <xsl:element name="root" xmlns="http://www.tempuri.org/plist">      <xsl:apply-templates/>    </xsl:element>  </xsl:template>  <!-- wasn't sure how you would want the dict and arrays handled -->  <!-- for a final cut, so i just make them into parent nodes of -->  <!-- the data underneath them, and then apply the templates -->  <xsl:template match="dict" priority="3" >    <xsl:element name="dictionary" xmlns="http://www.tempuri.org/plist">      <xsl:apply-templates/>    </xsl:element>  </xsl:template>  <xsl:template match="array" priority="5" >    <xsl:element name="list" xmlns="http://www.tempuri.org/plist">      <xsl:apply-templates/>    </xsl:element>  </xsl:template>  <!-- actually, figuring the following step out is what hung me up; the -->  <!-- issue here is that i'm taking the text out of the string/integer/date -->  <!-- nodes and putting them into elements named after the 'key' nodes -->  <!-- because of this, you actually have to have the template match the -->  <!-- nodes you will be consuming and then just using the conditional -->  <!-- to only process the 'key' nodes.  also, there were a couple of -->  <!-- stray characters in the source XML; i think it was an encoding -->  <!-- issue, so i just stripped them out with the "translate" call when -->  <!-- creating the keyName variable. since those were the only two -->  <!-- and because they looked to be strays, i did not worry about it -->  <!-- further.  the only reason it is an issue is because i was -->  <!-- creating elements out of the contents of the keys, and key names -->  <!-- are restricted in what characters they can use. -->  <xsl:template match="key|string|integer|date" priority="1" >    <xsl:if test="local-name(self::node())='key'">      <xsl:variable name="keyName" select="translate(child::text(),' €™','---')" />      <xsl:element name="{$keyName}" xmlns="http://www.tempuri.org/plist" >        <!-- removed on-the-fly; i had put this in while testing          <xsl:if test="local-name(following-sibling::node())='string'">        -->          <xsl:value-of select="following-sibling::node()" />        <!--          </xsl:if>        -->      </xsl:element>    </xsl:if>  </xsl:template></xsl:stylesheet>

a little helper class i made (XmlStringWriter.cs) :

using System;using System.Collections.Generic;using System.Linq;using System.Text;using System.Xml;namespace XSLTTest.Xml{    public class XmlStringWriter :        XmlWriter    {        public static XmlStringWriter Create(XmlWriterSettings Settings)        {            return new XmlStringWriter(Settings);        }        public static XmlStringWriter Create()        {            return XmlStringWriter.Create(XmlStringWriter.XmlWriterSettings_display);        }        public static XmlWriterSettings XmlWriterSettings_display        {            get            {                XmlWriterSettings XWS = new XmlWriterSettings();                XWS.OmitXmlDeclaration = false; // make a choice?                XWS.NewLineHandling = NewLineHandling.Replace;                XWS.NewLineOnAttributes = false;                XWS.Indent = true;                XWS.IndentChars = "\t";                XWS.NewLineChars = Environment.NewLine;                //XWS.ConformanceLevel = ConformanceLevel.Fragment;                XWS.CloseOutput = false;                return XWS;            }        }        public override string ToString()        {            return myXMLStringBuilder.ToString();        }        //public static implicit operator XmlWriter(XmlStringWriter Me)        //{        //   return Me.myXMLWriter;        //}        //--------------        protected StringBuilder myXMLStringBuilder = null;        protected XmlWriter myXMLWriter = null;        protected XmlStringWriter(XmlWriterSettings Settings)        {            myXMLStringBuilder = new StringBuilder();            myXMLWriter = XmlWriter.Create(myXMLStringBuilder, Settings);        }        public override void Close()        {            myXMLWriter.Close();        }        public override void Flush()        {            myXMLWriter.Flush();        }        public override string LookupPrefix(string ns)        {            return myXMLWriter.LookupPrefix(ns);        }        public override void WriteBase64(byte[] buffer, int index, int count)        {            myXMLWriter.WriteBase64(buffer, index, count);        }        public override void WriteCData(string text)        {            myXMLWriter.WriteCData(text);        }        public override void WriteCharEntity(char ch)        {            myXMLWriter.WriteCharEntity(ch);        }        public override void WriteChars(char[] buffer, int index, int count)        {            myXMLWriter.WriteChars(buffer, index, count);        }        public override void WriteComment(string text)        {            myXMLWriter.WriteComment(text);        }        public override void WriteDocType(string name, string pubid, string sysid, string subset)        {            myXMLWriter.WriteDocType(name, pubid, sysid, subset);        }        public override void WriteEndAttribute()        {            myXMLWriter.WriteEndAttribute();        }        public override void WriteEndDocument()        {            myXMLWriter.WriteEndDocument();        }        public override void WriteEndElement()        {            myXMLWriter.WriteEndElement();        }        public override void WriteEntityRef(string name)        {            myXMLWriter.WriteEntityRef(name);        }        public override void WriteFullEndElement()        {            myXMLWriter.WriteFullEndElement();        }        public override void WriteProcessingInstruction(string name, string text)        {            myXMLWriter.WriteProcessingInstruction(name, text);        }        public override void WriteRaw(string data)        {            myXMLWriter.WriteRaw(data);        }        public override void WriteRaw(char[] buffer, int index, int count)        {            myXMLWriter.WriteRaw(buffer, index, count);        }        public override void WriteStartAttribute(string prefix, string localName, string ns)        {            myXMLWriter.WriteStartAttribute(prefix, localName, ns);        }        public override void WriteStartDocument(bool standalone)        {            myXMLWriter.WriteStartDocument(standalone);        }        public override void WriteStartDocument()        {            myXMLWriter.WriteStartDocument();        }        public override void WriteStartElement(string prefix, string localName, string ns)        {            myXMLWriter.WriteStartElement(prefix, localName, ns);        }        public override WriteState WriteState        {            get             {                return myXMLWriter.WriteState;            }        }        public override void WriteString(string text)        {            myXMLWriter.WriteString(text);        }        public override void WriteSurrogateCharEntity(char lowChar, char highChar)        {            myXMLWriter.WriteSurrogateCharEntity(lowChar, highChar);        }        public override void WriteWhitespace(string ws)        {            myXMLWriter.WriteWhitespace(ws);        }    }}

the windows forms designer class (frmXSLTTest.Designer.cs)

namespace XSLTTest{    partial class frmXSLTTest    {        /// <summary>        /// Required designer variable.        /// </summary>        private System.ComponentModel.IContainer components = null;        /// <summary>        /// Clean up any resources being used.        /// </summary>        /// <param name="disposing">true if managed resources should be disposed; otherwise, false.</param>        protected override void Dispose(bool disposing)        {            if (disposing && (components != null))            {                components.Dispose();            }            base.Dispose(disposing);        }        #region Windows Form Designer generated code        /// <summary>        /// Required method for Designer support - do not modify        /// the contents of this method with the code editor.        /// </summary>        private void InitializeComponent()        {            this.splitContainer1 = new System.Windows.Forms.SplitContainer();            this.btnTransform = new System.Windows.Forms.Button();            this.groupBox1 = new System.Windows.Forms.GroupBox();            this.txtStylesheet = new System.Windows.Forms.TextBox();            this.splitContainer2 = new System.Windows.Forms.SplitContainer();            this.groupBox2 = new System.Windows.Forms.GroupBox();            this.txtInputXML = new System.Windows.Forms.TextBox();            this.groupBox3 = new System.Windows.Forms.GroupBox();            this.txtOutputXML = new System.Windows.Forms.TextBox();            ((System.ComponentModel.ISupportInitialize)(this.splitContainer1)).BeginInit();            this.splitContainer1.Panel1.SuspendLayout();            this.splitContainer1.Panel2.SuspendLayout();            this.splitContainer1.SuspendLayout();            this.groupBox1.SuspendLayout();            ((System.ComponentModel.ISupportInitialize)(this.splitContainer2)).BeginInit();            this.splitContainer2.Panel1.SuspendLayout();            this.splitContainer2.Panel2.SuspendLayout();            this.splitContainer2.SuspendLayout();            this.groupBox2.SuspendLayout();            this.groupBox3.SuspendLayout();            this.SuspendLayout();            //             // splitContainer1            //             this.splitContainer1.Dock = System.Windows.Forms.DockStyle.Fill;            this.splitContainer1.Location = new System.Drawing.Point(0, 0);            this.splitContainer1.Name = "splitContainer1";            this.splitContainer1.Orientation = System.Windows.Forms.Orientation.Horizontal;            //             // splitContainer1.Panel1            //             this.splitContainer1.Panel1.Controls.Add(this.btnTransform);            this.splitContainer1.Panel1.Controls.Add(this.groupBox1);            //             // splitContainer1.Panel2            //             this.splitContainer1.Panel2.Controls.Add(this.splitContainer2);            this.splitContainer1.Size = new System.Drawing.Size(788, 363);            this.splitContainer1.SplitterDistance = 194;            this.splitContainer1.TabIndex = 0;            //             // btnTransform            //             this.btnTransform.Anchor = ((System.Windows.Forms.AnchorStyles)((System.Windows.Forms.AnchorStyles.Bottom | System.Windows.Forms.AnchorStyles.Left)));            this.btnTransform.Location = new System.Drawing.Point(6, 167);            this.btnTransform.Name = "btnTransform";            this.btnTransform.Size = new System.Drawing.Size(75, 23);            this.btnTransform.TabIndex = 1;            this.btnTransform.Text = "Transform";            this.btnTransform.UseVisualStyleBackColor = true;            this.btnTransform.Click += new System.EventHandler(this.btnTransform_Click);            //             // groupBox1            //             this.groupBox1.Anchor = ((System.Windows.Forms.AnchorStyles)((((System.Windows.Forms.AnchorStyles.Top | System.Windows.Forms.AnchorStyles.Bottom)             | System.Windows.Forms.AnchorStyles.Left)             | System.Windows.Forms.AnchorStyles.Right)));            this.groupBox1.Controls.Add(this.txtStylesheet);            this.groupBox1.Location = new System.Drawing.Point(3, 3);            this.groupBox1.Name = "groupBox1";            this.groupBox1.Size = new System.Drawing.Size(782, 161);            this.groupBox1.TabIndex = 0;            this.groupBox1.TabStop = false;            this.groupBox1.Text = "Stylesheet";            //             // txtStylesheet            //             this.txtStylesheet.Dock = System.Windows.Forms.DockStyle.Fill;            this.txtStylesheet.Font = new System.Drawing.Font("Lucida Console", 7F, System.Drawing.FontStyle.Regular, System.Drawing.GraphicsUnit.Point, ((byte)(0)));            this.txtStylesheet.Location = new System.Drawing.Point(3, 16);            this.txtStylesheet.MaxLength = 1000000;            this.txtStylesheet.Multiline = true;            this.txtStylesheet.Name = "txtStylesheet";            this.txtStylesheet.ScrollBars = System.Windows.Forms.ScrollBars.Both;            this.txtStylesheet.Size = new System.Drawing.Size(776, 142);            this.txtStylesheet.TabIndex = 0;            //             // splitContainer2            //             this.splitContainer2.Dock = System.Windows.Forms.DockStyle.Fill;            this.splitContainer2.Location = new System.Drawing.Point(0, 0);            this.splitContainer2.Name = "splitContainer2";            //             // splitContainer2.Panel1            //             this.splitContainer2.Panel1.Controls.Add(this.groupBox2);            //             // splitContainer2.Panel2            //             this.splitContainer2.Panel2.Controls.Add(this.groupBox3);            this.splitContainer2.Size = new System.Drawing.Size(788, 165);            this.splitContainer2.SplitterDistance = 395;            this.splitContainer2.TabIndex = 0;            //             // groupBox2            //             this.groupBox2.Controls.Add(this.txtInputXML);            this.groupBox2.Dock = System.Windows.Forms.DockStyle.Fill;            this.groupBox2.Location = new System.Drawing.Point(0, 0);            this.groupBox2.Name = "groupBox2";            this.groupBox2.Size = new System.Drawing.Size(395, 165);            this.groupBox2.TabIndex = 1;            this.groupBox2.TabStop = false;            this.groupBox2.Text = "Input XML";            //             // txtInputXML            //             this.txtInputXML.Dock = System.Windows.Forms.DockStyle.Fill;            this.txtInputXML.Font = new System.Drawing.Font("Lucida Console", 7F, System.Drawing.FontStyle.Regular, System.Drawing.GraphicsUnit.Point, ((byte)(0)));            this.txtInputXML.Location = new System.Drawing.Point(3, 16);            this.txtInputXML.MaxLength = 1000000;            this.txtInputXML.Multiline = true;            this.txtInputXML.Name = "txtInputXML";            this.txtInputXML.ScrollBars = System.Windows.Forms.ScrollBars.Both;            this.txtInputXML.Size = new System.Drawing.Size(389, 146);            this.txtInputXML.TabIndex = 1;            //             // groupBox3            //             this.groupBox3.Controls.Add(this.txtOutputXML);            this.groupBox3.Dock = System.Windows.Forms.DockStyle.Fill;            this.groupBox3.Location = new System.Drawing.Point(0, 0);            this.groupBox3.Name = "groupBox3";            this.groupBox3.Size = new System.Drawing.Size(389, 165);            this.groupBox3.TabIndex = 1;            this.groupBox3.TabStop = false;            this.groupBox3.Text = "Output XML";            //             // txtOutputXML            //             this.txtOutputXML.Dock = System.Windows.Forms.DockStyle.Fill;            this.txtOutputXML.Font = new System.Drawing.Font("Lucida Console", 7F, System.Drawing.FontStyle.Regular, System.Drawing.GraphicsUnit.Point, ((byte)(0)));            this.txtOutputXML.Location = new System.Drawing.Point(3, 16);            this.txtOutputXML.MaxLength = 1000000;            this.txtOutputXML.Multiline = true;            this.txtOutputXML.Name = "txtOutputXML";            this.txtOutputXML.ScrollBars = System.Windows.Forms.ScrollBars.Both;            this.txtOutputXML.Size = new System.Drawing.Size(383, 146);            this.txtOutputXML.TabIndex = 1;            //             // frmXSLTTest            //             this.AutoScaleDimensions = new System.Drawing.SizeF(6F, 13F);            this.AutoScaleMode = System.Windows.Forms.AutoScaleMode.Font;            this.ClientSize = new System.Drawing.Size(788, 363);            this.Controls.Add(this.splitContainer1);            this.Name = "frmXSLTTest";            this.Text = "frmXSLTTest";            this.splitContainer1.Panel1.ResumeLayout(false);            this.splitContainer1.Panel2.ResumeLayout(false);            ((System.ComponentModel.ISupportInitialize)(this.splitContainer1)).EndInit();            this.splitContainer1.ResumeLayout(false);            this.groupBox1.ResumeLayout(false);            this.groupBox1.PerformLayout();            this.splitContainer2.Panel1.ResumeLayout(false);            this.splitContainer2.Panel2.ResumeLayout(false);            ((System.ComponentModel.ISupportInitialize)(this.splitContainer2)).EndInit();            this.splitContainer2.ResumeLayout(false);            this.groupBox2.ResumeLayout(false);            this.groupBox2.PerformLayout();            this.groupBox3.ResumeLayout(false);            this.groupBox3.PerformLayout();            this.ResumeLayout(false);        }        #endregion        private System.Windows.Forms.SplitContainer splitContainer1;        private System.Windows.Forms.Button btnTransform;        private System.Windows.Forms.GroupBox groupBox1;        private System.Windows.Forms.TextBox txtStylesheet;        private System.Windows.Forms.SplitContainer splitContainer2;        private System.Windows.Forms.GroupBox groupBox2;        private System.Windows.Forms.GroupBox groupBox3;        private System.Windows.Forms.TextBox txtInputXML;        private System.Windows.Forms.TextBox txtOutputXML;    }}

the form class (frmXSLTTest.cs):

using System;using System.Collections.Generic;using System.ComponentModel;using System.Data;using System.Drawing;using System.Linq;using System.Text;using System.Windows.Forms;using System.Xml;using System.Xml.Xsl;using XSLTTest.Xml;namespace XSLTTest{    public partial class frmXSLTTest : Form    {        public frmXSLTTest()        {            InitializeComponent();        }        private void btnTransform_Click(object sender, EventArgs e)        {            try            {                // temporary to copy from clipboard when pressing                 // the button instead of using the text in the textbox                //txtStylesheet.Text = Clipboard.GetText();                XmlDocument Stylesheet = new XmlDocument();                Stylesheet.InnerXml = txtStylesheet.Text;                XslCompiledTransform XCT = new XslCompiledTransform(true);                XCT.Load(Stylesheet);                XmlDocument InputDocument = new XmlDocument();                InputDocument.InnerXml = txtInputXML.Text;                XmlStringWriter OutputWriter = XmlStringWriter.Create();                XCT.Transform(InputDocument, OutputWriter);                txtOutputXML.Text = OutputWriter.ToString();            }            catch (Exception Ex)            {                txtOutputXML.Text = Ex.Message;            }        }    }}