Wednesday, April 13, 2011

Performance optimization for XML query in SQL SERVER


As we all uses XML Query instead of row level query to insert/update bulk data into a database in order to achieve better performance. But when there is a large dataset having multilevel xml nodes, the performance degrade significantly.

While doing performance tuning in “AcceleRATE” I came across with “XML indexes” on an xml column.


Below is the step by step process to achieve this.

1. Create a temporary table with on identity column with primary key and one other column with xml datatype.

CREATE TABLE #OptimizedXML(id INT IDENTITY PRIMARY KEY, SurveyXML xml NOT NULL)



2. Create a primary XML index

CREATE PRIMARY XML INDEX PrimaryXMLIndex ON #OptimizedXML(SurveyXML)

3. Create a secondary XML index

CREATE XML INDEX

XMLDataStore_XmlCol_PATH ON #OptimizedXML(SurveyXML)

USING XML INDEX PrimaryXMLIndex FOR PATH

4. Now insert the values into a table
INSERT INTO #OptimizedXML VALUES(@SurveyXML)
5. Here we are ready to perform the same XML query with some tweak to achieve performance.

SELECT F.Manager.value('@fname','varchar(255)') as fname,

F.Manager.value('@lname', 'varchar(255)') as lname,

F.Manager.value('@title', 'varchar(255)') as title,

F.Manager.value('@email', 'varchar(255)') as email,

F.Manager.value('@domainid', 'int') as domainid,

F.Manager.value('@ID', 'int') as userid,

F.Manager.value('@encryptedmail', 'varchar(500)') as encryptedemail

FROM #OPtimizedXML X

cross apply X.SurveyXML.nodes('Surveys[1]/Survey/Manager') AS F(Manager)

By doing the above tweak, the query which was earlier taking more than 2 minutes to execute, now takes just 10-15 seconds.

Additionally, there are few points, which we need to keep in mind while using this approach.

We must have one primary key (clustered index) on a table; it is required to create a XML index.
There must be one primary xml index followed by one or more secondary xml indexes on xml column(s).
While querying the table, we have to use a cross apply on the node where in we want to fetch data.