XML Data Type Methods

Chebykin D.N.

The XML data type was first introduced with SQL Server 2005.

This data type allows you to store XML data up to 2 GB.

There are five XML data type methods:

  • query() – Used to extract XML fragments from an XML data type.
  • value() – Used to extract a single value from an XML document.
  • exist() – Used to determine if a specified node exists. Returns 1 if yes and 0 if no.
  • modify() – Updates XML data in an XML data type.
  • nodes() – Shreds XML data into multiple rows by nodes.

XML data type methods require an XPath expression or XQuery query.

As an example we’ll use database with XML data type fields.

Table tArtist contains information about music bands, singers and their albums.

CREATE TABLE dbo.tArtist (
      artistId INT NOT NULL PRIMARY KEY
    , name VARCHAR(100) NOT NULL
    , xmlData XML NOT NULL
);

Let’s fill tables with sample data.

INSERT INTO dbo.tArtist (artistId, name, xmlData) VALUES
(1, 'Radiohead',  
'<albums>  
    <album title="The King of Limbs">  
        <labels>  
            <label>Self-released</label>  
        </labels>  
        <song title="Bloom" length="5:15"/>  
        <song title="Morning Mr Magpie" length="4:41"/>  
        <song title="Little by Little" length="4:27"/>  
        <song title="Feral" length="3:13"/>  
        <song title="Lotus Flower" length="5:01"/>  
        <song title="Codex" length="4:47"/>  
        <song title="Give Up the Ghost" length="4:50"/>  
        <song title="Separator" length="5:20"/>  
        <description link="http://en.wikipedia.org/wiki/The_King_of_Limbs">  
        The King of Limbs is the eighth studio album by English rock band Radiohead,  
        produced by Nigel Godrich. It was self-released on 18 February 2011 as a  
        download in MP3 and WAV formats, followed by physical CD and 12" vinyl  
        releases on 28 March, a wider digital release via AWAL, and a special  
        "newspaper" edition on 9 May 2011. The physical editions were released  
        through the band''s Ticker Tape imprint on XL in the United Kingdom,  
        TBD in the United States, and Hostess Entertainment in Japan.  
        </description>  
    </album>  
    <album title="OK Computer">  
        <labels>  
            <label>Parlophone</label>  
            <label>Capitol</label>  
        </labels>  
        <song title="Airbag" length="4:44"/>  
        <song title="Paranoid Android" length="6:23"/>  
        <song title="Subterranean Homesick Alien" length="4:27"/>  
        <song title="Exit Music (For a Film)" length="4:24"/>  
        <song title="Let Down" length="4:59"/>  
        <song title="Karma Police" length="4:21"/>  
        <song title="Fitter Happier" length="1:57"/>  
        <song title="Electioneering" length="3:50"/>  
        <song title="Climbing Up the Walls" length="4:45"/>  
        <song title="No Surprises" length="3:48"/>  
        <song title="Lucky" length="4:19"/>  
        <song title="The Tourist" length="5:24"/>  
        <description link="http://en.wikipedia.org/wiki/OK_Computer">  
        OK Computer is the third studio album by the English alternative rock band  
        Radiohead, released on 16 June 1997 on Parlophone in the United Kingdom and  
        1 July 1997 by Capitol Records in the United States. It marks a deliberate  
        attempt by the band to move away from the introspective guitar-oriented  
        sound of their previous album The Bends. Its layered sound and wide range  
        of influences set it apart from many of the Britpop and alternative rock  
        bands popular at the time and laid the groundwork for Radiohead''s later,  
        more experimental work.  
        </description>  
    </album>  
</albums>'),  
(2, 'Guns N'' Roses',  
'<albums>  
    <album title="Use Your Illusion I">  
        <labels>  
            <label>Geffen Records</label>  
        </labels>  
        <song title="Right Next Door to Hell" length="3:02"/>  
        <song title="Dust N'' Bones" length="4:58"/>  
        <song title="Live and Let Die (Paul McCartney and Wings cover)"  
        length="3:04"/>  
        <song title="Don''t Cry (original version)" length="4:44"/>  
        <song title="Perfect Crime" length="2:24"/>  
        <song title="You Ain''t the First" length="2:36"/>  
        <song title="Bad Obsession" length="5:28"/>  
        <song title="Back Off Bitch" length="5:04"/>  
        <song title="Double Talkin'' Jive" length="3:24"/>  
        <song title="November Rain" length="8:57"/>  
        <song title="The Garden (featuring Alice Cooper and Shannon Hoon)"  
        length="5:22"/>  
        <song title="Garden of Eden" length="2:42"/>  
        <song title="Don''t Damn Me" length="5:19"/>  
        <song title="Bad Apples" length="4:28"/>  
        <song title="Dead Horse" length="4:18"/>  
        <song title="Coma" length="10:13"/>  
        <description link="http://ru.wikipedia.org/wiki/Use_Your_Illusion_I">  
        Use Your Illusion I is the third studio album by GnR. It was the first of two  
        albums released in conjunction with the Use Your Illusion Tour, the other  
        being Use Your Illusion II. The two are thus sometimes considered a double album.  
        In fact, in the original vinyl releases, both Use Your Illusion albums are  
        double albums. Material for all two/four discs (depending on the medium) was  
        recorded at the same time and there was some discussion of releasing a  
        ''quadruple album''. The album debuted at No. 2 on the Billboard charts, selling  
        685,000 copies in its first week, behind Use Your Illusion II''s first week sales  
        of 770,000. Use Your Illusion I has sold 5,502,000 units in the U.S. as of 2010,  
        according to Nielsen SoundScan. It was nominated for a Grammy Award in 1992.  
        </description>  
    </album>  
    <album title="Use Your Illusion II">  
        <labels>  
            <label>Geffen Records</label>  
        </labels>  
        <song title="Civil War" length="7:42"/>  
        <song title="14 Years" length="4:21"/>  
        <song title="Yesterdays" length="3:16"/>  
        <song title="Knockin'' on Heaven''s Door (Bob Dylan cover)" length="5:36"/>  
        <song title="Get in the Ring" length="5:41"/>  
        <song title="Shotgun Blues" length="3:23"/>  
        <song title="Breakdown" length="7:05"/>  
        <song title="Pretty Tied Up" length="4:48"/>  
        <song title="Locomotive (Complicity)" length="8:42"/>  
        <song title="So Fine" length="4:06"/>  
        <song title="Estranged" length="9:24"/>  
        <song title="You Could Be Mine" length="5:43"/>  
        <song title="Don''t Cry (Alternate lyrics)" length="4:44"/>  
        <song title="My World" length="1:24"/>  
        <description link="http://ru.wikipedia.org/wiki/Use_Your_Illusion_II">  
        Use Your Illusion II is the fourth studio album by GnR. It was one of two albums  
        released in conjunction with the Use Your Illusion Tour, and as a result the two  
        albums are sometimes considered a double album. Bolstered by lead single ''You  
        Could Be Mine'', Use Your Illusion II was the slightly more popular of the two  
        albums, selling 770,000 copies its first week and debuting at No. 1 on the U.S.  
        charts, ahead of Use Your Illusion I''s first week sales of 685,000.  
        </description>  
    </album>  
</albums>');

The query() Method

This method basically needs an XPath expression in the XQuery parameter and returns an XML data type.

The XPath expression ‘/albums/album[2]/labels/label[1]’ specifies that we want to get the first label of the second album for every single artist.

The query() method returns the XML fragment containing everything between (and including) the beginning and ending tags of that “label” element.

select name, xmlData.query('/albums/album[2]/labels/label[1]') as SecondAlbumLabel
from dbo.tArtist;
nameSecondAlbumLabel
Radiohead<label>Parlophone</label>
Guns N’ Roses<label>Geffen Records</label>

Let’s consider another example. We need to find albums that have word “record” in description.

select name,
xmlData.query ('/albums/album[description[contains(., "record")]]') as ContainsRecord
from dbo.tArtist;

XPath expression in detail can be described with the following phrase: find in list of albums (albums) such an album (album), which has a description (description), that contains word “record” (contains(., “record”)). A dot in function contains() refers to the current element, in this case it is description element.

nameContainsRecord
Radiohead
Guns N’ Roses<album title="Use Your Illusion I">...</album>

We see that such an album is found only among Guns N’ Roses’ albums. But Radiohead has in a description of OK Computer album word “Records”. This album wasn’t found because XQuery is case-sensitive. To make this album be also found we’ll use lower-case() function that brings the processed text to lowercase.

select name,
xmlData.query ('/albums/album[description[contains(lower-case(.), "record")]]')
as ContainsRecord
from dbo.tArtist;
nameContainsRecord
Radiohead<album title="OK Computer">...</album>
Guns N’ Roses<album title="Use Your Illusion I">...</album>

The Value() Method

 The value() method allows you to pull out the data of a single element or a single attribute and specify it’s data type.

If provided XPath expression points to multiple nodes, an error message will be thrown.

select name,
xmlData.value('/albums/album[2]/labels/label[1]/text()', 'varchar(100)')
as SecondAlbumLabel
from dbo.tArtist;
XQuery [dbo.tArtist.xmlData.value()]: 'value()' requires a singleton (or empty sequence), found operand of type 'xdt:untypedAtomic *'

This error message tells that there can be more than one sequence found with provided XPath expression. For example, in case where there are many “albums” elements or if there would be more than one “labels” element.

In order to fix the error change the XPath expression on one of the following:

  • ‘/albums[1]/album[2]/labels[1]/label[1]/text()[1]’
  • ‘(/albums/album[2]/labels/label/text())[1]’

Result set will be the same:

nameSecondAlbumLabel
RadioheadParlophone
Guns N’ RosesGeffen Records

Here is an example data type “time”. Let’s assume that we want to get name of the first album, it’s first song name and this song’s length for each artist.

select name
    , xmlData.value('/albums[1]/album[1]/@title', 'varchar(100)') as FirstAlbum
    , xmlData.value('/albums[1]/album[1]/song[1]/@title', 'varchar(100)') as FirstSongTitle
    , xmlData.value('/albums[1]/album[1]/song[1]/@length', 'time(0)') as FirstSongLength
from dbo.tArtist;
nameFirstAlbumFirstSongTitleFirstSongLength
RadioheadThe King of LimbsBloom05:15:00
Guns N’ RosesUse Your Illusion IRight Next Door to Hell03:02:00

Unfortunately the “FirstSongLength” field is recognised as hh:mm:ss, whereas we a sure that our XML document contains time in mm:ss format. To solve this problem we can use standard T-SQL syntax or we may use XQuery:

select name as artist
  , xmlData.value('/albums[1]/album[1]/@title', 'varchar(100)') as FirstAlbum
  , xmlData.value('/albums[1]/album[1]/song[1]/@title', 'varchar(100)') as FirstSongTitle
  , xmlData.value('concat("00:", /albums[1]/album[1]/song[1]/@length)', 'time(0)')
as FirstSongLength
from dbo.tArtist;
artistFirstAlbumFirstSongTitleFirstSongLength
RadioheadThe King of LimbsBloom00:05:15
Guns N’ RosesUse Your Illusion IRight Next Door to Hell00:03:02

The exist() Method

In this method you do not want any data returned from the XML stream. You just want a simple check to know whether it is there. The exist() method will check for the existence and even the value of the XPath expression you specify.

In this example you will use the exist() method to determine if a particular song is in the track listing. The song we are looking for is called “Garden of Eden”.

SELECT name, xmlData.exist('/albums[1]/album/song[@title="Garden of Eden"]') AS SongExists
from dbo.tArtist;
nameSongExists
Radiohead0
Guns N’ Roses1

As you can see, song with this title does exist among Guns N’ Roses’ songs (1 is for ‘True’) and doesn’t exist among Radiohead’s songs (0 is for ‘False’).

Now let’s check if there is a song that lasts more than 10 minutes.

SELECT name
    , xmlData.exist('/albums[1]/album/song[@length>"10:00"]') AS LongSongExists
from dbo.tArtist;
nameSongExists
Radiohead1
Guns N’ Roses1

From the result set we can see that both artists have at least one song that lasts more than 10 minutes. But that’s not true. All Radiohead’s songs last less than 10 minutes.

Problem is that @length attribute value has string data type by default. That is we just compare strings, not time. In order to fix this problem we can convert @length attribute value and the value which it compares with to time data type. Both of them should be brought to the “hh:mm:ss” format before converting data types. We’ll use xs:time() function which takes a string as an input and returns a time as a result.

SELECT name
    , xmlData.exist('
        /albums[1]/album/song/@length[
                                    (
                                    if (string-length(.) = 4)
                                    then xs:time(concat("00:0", .))
                                    else xs:time(concat("00:", .))
                                    )
                                    > xs:time("00:10:00")
                                ]') AS LongSongExists
from dbo.tArtist;

Now we see that Radiohead doesn’t have a song that lasts more than 10 minutes, whereas Guns N’ Roses have at least one.

nameSongExists
Radiohead0
Guns N’ Roses1

In this example function time() is used in xs namespace. All built-in type conversion functions have to be used in this namespace (xs:string, xs:boolean, xs:decimal, xs:float, xs:double, xs:dateTime, xs:time, xs:date etc.). Other built-in functions are being used in fn namespace but its’ specification is not necessary. That means that string-length(.) and fn:string-length(.) are identical.

The modify() Method

The modify() method allows you to change values directly in your XML stream.

Like all other XML data type methods, it needs an XPath parameter to know which value to change. However, unlike the other methods, modify() works with an UPDATE statement (it will not work with a SELECT statement). Also, modify() can only work with one data value at a time, which is a mathematical and programming concept known as a singleton.

Since there is no limit to the number of elements which can be under another element, any given XPath may have many children. For example, if we have an XML like this:

<week>
    <day>Monday</day>
    <day>Tuesday</day>
    <day>Wednesday</day>
</week>

the XPath of /week/day shown below has three elements and, therefore, is not a singleton:

<day>Monday</day>
<day>Tuesday</day>
<day>Wednesday</day>

However, if you changed your XPath to (/week/day)[1], then you would only get Monday in your result.

<day>Monday</day>

Let’s try to get the name of the first label of Radiohead’s album “OK Computer”.

SELECT xmlData.query('(/albums/album[@title="OK Computer"]/labels/label/text())[1]')
AS FirstLabelText
from dbo.tArtist
WHERE name = 'Radiohead';
FirstLabelText
Parlophone

The modify() method has the sole purpose to change a value in an XML document, which is a helpful capability. Suppose an XML document is imported into SQL Server and you found a typo or need to update just one value. It is no longer necessary to rerun the step to bring in the XML document in order to make that change – you can simply use the modify() method and write the change directly to the XML stream contained in the SQL Server instance.

In this example we’ll swap labels of Radiohead’s album “OK Computer”. The first label needs to be changed to “Capitol”. And the second one needs to be changed to “Parlophone”.

The modify() method can be used in a SET clause of an UPDATE statement against a variable or a field of xml data type.

UPDATE dbo.tArtist
SET xmlData.modify('replace value of
(/albums/album[@title="OK Computer"]/labels/label/text())[1] with "Capitol"')
WHERE name = 'Radiohead';
UPDATE dbo.tArtist
SET xmlData.modify('replace value of
(/albums/album[@title="OK Computer"]/labels/label/text())[2] with "Parlophone"')
WHERE name = 'Radiohead';

Perfect – your revised code runs correctly, and you can see the confirmation:

(1 row(s) affected)

(1 row(s) affected)

Now return to your original query (using the SELECT statement) and run it to confirm that the labels were swapped correctly.

SELECT xmlData.query('/albums/album[@title="OK Computer"]/labels')
FROM dbo.tArtist
WHERE name = 'Radiohead';
<labels>
    <label>Capitol</label>
    <label>Parlophone</label>
</labels>

This task could be done in another way. We could swap elements “label” without replacing their values. We’ll just insert a copy of the first label at the end of labels list.

UPDATE dbo.tArtist
SET xmlData.modify('insert (/albums/album[@title="OK Computer"]/labels/label)[1] as last
into (/albums/album[@title="OK Computer"]/labels)[1]')
WHERE name = 'Radiohead';

And then delete the first label.

UPDATE dbo.tArtist
SET xmlData.modify('delete (/albums/album[@title="OK Computer"]/labels/label)[1]')
WHERE name = 'Radiohead';

To ensure that labels swapped correctly run again the following query:

SELECT xmlData.query('/albums/album[@title="OK Computer"]/labels')
FROM dbo.tArtist
WHERE name = 'Radiohead';
<labels>
    <label>Parlophone</label>
    <label>Capitol</label>
</labels>

Let’s try to flag the “Perfect Crime” song from Guns N’ Roses’ album “Use Your Illusion I” as popular one. For this we’ll add into XML document an attribute isPopular with value 1.

UPDATE dbo.tArtist
SET xmlData.modify('
    insert attribute isPopular { "1" }
    into (/albums[1]/album[@title="Use Your Illusion I"]/song[@title="Perfect Crime"])[1]
')
WHERE name = 'Guns N'' Roses';

As one more example let’s add information about release date and record date of “Estranged” song from Guns N’ Roses’ album “Use Your Illusion II”. Note, that in this example there are two attributes inserting into the element.

UPDATE dbo.tArtist
SET xmlData.modify('
    insert (
        attribute Recorded { "1991" },
        attribute Released { "1994-01-17" }
    )
    into (/albums[1]/album[@title="Use Your Illusion II"]/song[@title="Estranged"])[1]
')
WHERE name = 'Guns N'' Roses';

The nodes() Method

Breaks down XML structure on one or many subtrees according to XPath expression provided.

Let’s run the following query:

SELECT name as artist
    , album.query('.') as album
FROM dbo.tArtist A
CROSS APPLY A.xmlData.nodes('/albums[1]/album')col(album);

This query breaks down the original structure on rows by quantity of “album” elements and returns to rows for each artist:

artistalbum
Radiohead<album title="The King of Limbs"><labels><label>S...
Radiohead<album title="OK Computer"><labels><label>Parlop...
Guns N’ Roses<album title="Use Your Illusion I"><labels><label>G...
Guns N’ Roses<album title="Use Your Illusion II"><labels><label>G...

Let’s examine this query in detail.

CROSS APPLY A.xmlData.nodes(’/albums[1]/album’) – breaks down every table row by quantity of “album” elements found.

col is a name of derived table and album is a name of a column. They will be needed for further work with results.

album.query(’.’) – this is a query to every single row of results using an alias. This subquery just takes the whole subtree.

Let’s examine another example. Assume that we need to get first two songs from every album for every artist in a tabular form.

SELECT name as artist
    , song.value('../@title', 'varchar(50)') as album
    , song.value('@title', 'varchar(100)') as song
FROM dbo.tArtist A
CROSS APPLY A.xmlData.nodes('/albums[1]/album/song[position()<=2]')col(song);
artistalbumsong
RadioheadThe King of LimbsBloom
RadioheadThe King of LimbsMorning Mr Magpie
RadioheadOK ComputerAirbag
RadioheadOK ComputerParanoid Android
Guns N’ RosesUse Your Illusion IRight Next Door to Hell
Guns N’ RosesUse Your Illusion IDust N’ Bones
Guns N’ RosesUse Your Illusion IICivil War
Guns N’ RosesUse Your Illusion II14 Years

[position()<=2] – specifies that we need only two first elements “song” inside every “album” element.

‘../@title’ – refers to a parent element and takes it’s attribute @title.

Result set is correct but in case of big data such query would be extremely slow and ineffective. Problem is that for every song the runtime searches for the parent element and reads it’s attribute. Let’s rewrite the query in a following way:

SELECT name as artist
    , album.value('@title', 'varchar(50)') as album
    , song.value('@title', 'varchar(100)') as song
FROM dbo.tArtist A
CROSS APPLY A.xmlData.nodes('/albums[1]/album')c1(album)
CROSS APPLY c1.album.nodes('song[position()<=2]')c2(song);

Result set would be the same, but now query selects all the albums first and then for each of them searches their songs.

Since quantity of songs would always be much greater than quantity of albums, this query, this query will giv a a significant performance boost.