Removing repeated duplicated characters Removing repeated duplicated characters sql sql

Removing repeated duplicated characters


This works for strings that are exclusively commas or have up to 398 contiguous commas.

 SELECT      CASE          WHEN TargetString NOT LIKE '%[^,]%'              THEN '' /*The string is exclusively commas*/         ELSE             REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(TargetString,            REPLICATE(',',16),','), /*399/16 = 24 remainder 15*/            REPLICATE(',',8),','),  /* 39/ 8 =  4 remainder 7*/            REPLICATE(',',4),','),  /* 11/ 4 =  2 remainder 3*/            REPLICATE(',',2),','),  /*  5/ 2 =  2 remainder 1*/            REPLICATE(',',2),',')   /*  3/ 2 =  1 remainder 1*/         END FROM T    

Add extra powers of 2 at the top if you need more or remove from the top if you need less. The comments by each stage indicate the smallest number that this stage will not deal with successfully.

All the comment lines are in this format

/*  L/D    =  Q remainder R */D:    Corresponds to the length of the string generated by `REPLICATE`R:    Is always D-1Q+R:  Form L for the next step

So to extend the series upwards with another REPLICATE(',',32),',') stage

D = 32 R = 31Q = 368 (399-31)L = (368 * 32) + 31 = 11807

So that would deal with sections of commas up to 11,806 characters.


I would suggest a UDF to do this. Since the UDF I am about to suggest doesn't touch any tables, the performance should be pretty good.

CREATE Function [dbo].[CleanDuplicates](@Data VarChar(8000), @DuplicateChar VarChar(1))Returns VarChar(8000)WITH SCHEMABINDINGASBegin    Set @Data = @DuplicateChar + @Data    While PATINDEX('%' + @DuplicateChar + @DuplicateChar + '%',@Data) > 0        Set @Data = REPLACE(@Data, @DuplicateChar + @DuplicateChar,@DuplicateChar)    Return Right(@Data, Len(@Data)-1)End

You can test the function like this:

Select dbo.CleanDuplicates(',,,', ',')Select dbo.CleanDuplicates(',,,sam,,bob,', ',')


try this

SELECT @Parameter AS 'BEFORE'BEGINWHILE CHARINDEX(',,', @Parameter) > 0    BEGIN        SELECT @Parameter = REPLACE(@Parameter, ',,',',')     ENDSELECT @Parameter AS 'AFTER'END