Optimal way to concatenate/aggregate strings Optimal way to concatenate/aggregate strings sql-server sql-server

Optimal way to concatenate/aggregate strings


SOLUTION

The definition of optimal can vary, but here's how to concatenate strings from different rows using regular Transact SQL, which should work fine in Azure.

;WITH Partitioned AS(    SELECT         ID,        Name,        ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Name) AS NameNumber,        COUNT(*) OVER (PARTITION BY ID) AS NameCount    FROM dbo.SourceTable),Concatenated AS(    SELECT         ID,         CAST(Name AS nvarchar) AS FullName,         Name,         NameNumber,         NameCount     FROM Partitioned     WHERE NameNumber = 1    UNION ALL    SELECT         P.ID,         CAST(C.FullName + ', ' + P.Name AS nvarchar),         P.Name,         P.NameNumber,         P.NameCount    FROM Partitioned AS P        INNER JOIN Concatenated AS C                 ON P.ID = C.ID                 AND P.NameNumber = C.NameNumber + 1)SELECT     ID,    FullNameFROM ConcatenatedWHERE NameNumber = NameCount

EXPLANATION

The approach boils down to three steps:

  1. Number the rows using OVER and PARTITION grouping and ordering them as needed for the concatenation. The result is Partitioned CTE. We keep counts of rows in each partition to filter the results later.

  2. Using recursive CTE (Concatenated) iterate through the row numbers (NameNumber column) adding Name values to FullName column.

  3. Filter out all results but the ones with the highest NameNumber.

Please keep in mind that in order to make this query predictable one has to define both grouping (for example, in your scenario rows with the same ID are concatenated) and sorting (I assumed that you simply sort the string alphabetically before concatenation).

I've quickly tested the solution on SQL Server 2012 with the following data:

INSERT dbo.SourceTable (ID, Name)VALUES (1, 'Matt'),(1, 'Rocks'),(2, 'Stylus'),(3, 'Foo'),(3, 'Bar'),(3, 'Baz')

The query result:

ID          FullName----------- ------------------------------2           Stylus3           Bar, Baz, Foo1           Matt, Rocks


Are methods using FOR XML PATH like below really that slow? Itzik Ben-Gan writes that this method has good performance in his T-SQL Querying book (Mr. Ben-Gan is a trustworthy source, in my view).

create table #t (id int, name varchar(20))insert into #tvalues (1, 'Matt'), (1, 'Rocks'), (2, 'Stylus')select  id        ,Names = stuff((select ', ' + name as [text()]        from #t xt        where xt.id = t.id        for xml path('')), 1, 2, '')from #t tgroup by id