How to replace every other instance of a particular character in a MySQL string?
You should consider to store your data in a normalized schema. In your case the table should look like:
| id | k | v ||----|---|----------|| 1 | A | 10 || 1 | B | 20 || 1 | C | 30 || 2 | A | Positive || 2 | B | Negative |
This schema is more flexible and you will see why.
So how to convert the given data into the new schema? You will need a helper table containing sequence numbers. Since your column is varchar(255)
you can only store 128 values (+ 127 delimiters) in it. But let's just create 1000 numbers. You can use any table with enough rows. But since any MySQL server has the information_schema.columns
table, I will use it.
drop table if exists helper_sequence;create table helper_sequence (i int auto_increment primary key) select null as i from information_schema.columns c1 join information_schema.columns c2 limit 1000;
We will use this numbers as position of the values in your string by joining the two tables.
To extract a value from a delimited string you can use the substring_index()
function. The value at position i
will be
substring_index(substring_index(t.options, '|', i ), '|', -1)
In your string you have a sequence of keys followed by its values. The position of a key is an odd number. So if the position of the key is i
, the position of the corresponding value will be i+1
To get the number of the delimiters in the string and limit our join we can use
char_length(t.options) - char_length(replace(t.options, '|', ''))
The query to store the data in a normalized form would be:
create table normalized_table select t.id , substring_index(substring_index(t.options, '|', i ), '|', -1) as k , substring_index(substring_index(t.options, '|', i+1), '|', -1) as v from old_table t join helper_sequence s on s.i <= char_length(t.options) - char_length(replace(t.options, '|', '')) where s.i % 2 = 1
Now run select * from normalized_table
and you will get this:
| id | k | v ||----|---|----------|| 1 | A | 10 || 1 | B | 20 || 1 | C | 30 || 2 | A | Positive || 2 | B | Negative |
So why is this format a better choice? Besides many other reasons, one is that you can easily convert it to your old schema with
select id, group_concat(concat(k, '|', v) order by k separator '|') as optionsfrom normalized_tablegroup by id;| id | options ||----|-----------------------|| 1 | A|10|B|20|C|30 || 2 | A|Positive|B|Negative |
or to your desired format
select id, group_concat(concat(k, '|', v) order by k separator ',') as optionsfrom normalized_tablegroup by id;| id | options ||----|-----------------------|| 1 | A|10,B|20,C|30 || 2 | A|Positive,B|Negative |
If you don't care about normalization and just want this task to be done, you can update your table with
update old_table ojoin ( select id, group_concat(concat(k, '|', v) order by k separator ',') as options from normalized_table group by id) n using (id)set o.options = n.options;
And drop the normalized_table
.
But then you won't be able to use simple queries like
select *from normalized_tablewhere k = 'A'
Not using stored procedures, I would do it in 2 steps:
Insert the comma at the second occurrence of the pipe character:
update options set options = insert(options, locate('|', options, locate('|', options) + 1), 1, ',');
Insert the remaining commas - execute the query N times:
update options set options = insert(options, locate('|', options, locate('|', options, length(options) - locate(',', reverse(options)) + 1) + 1), 1, ',');
where N =
select max(round(((length(options) - length(replace(options, '|', ''))) - 1 ) / 2) - 1) from options;
(or don't bother with counting and continue to execute the query as long as it doesn't tell you "0 rows affected")
Checked with this set of data:
id options1 A|10|B|20|C|302 A|Positive|B|Negative3 A|10|B|20|C|30|D|40|E|50|F|604 A|Positive|B|Negative|C|Neutral|D|Dunno
results in:
id options1 A|10,B|20,C|302 A|Positive,B|Negative3 A|10,B|20,C|30,D|40,E|50,F|604 A|Positive,B|Negative,C|Neutral,D|Dunno
(I'll provide an explanation later)
Demo
Explanation
This could be solved relatively easily if only MySQL had a regular expression replacement function but unfortunately it doesn't. So I wrote one - see this blog post. The "advanced version" is needed here to allows it to perform a recursive replace in the found match for the replacement. Then the following relatively simple SQL can be used:
SQL (function code omitted for brevity)
SELECT id, options AS `before`, reg_replace(options, '\\|.*\\|', -- 2 pipe symbols with any text in between '\\|$', -- Replace the second pipe symbol ',', -- Replace with a comma FALSE, -- Non-greedy matching 2, -- Min match length = 2 (2 pipe symbols) 0, -- No max match length 0, -- Min sub-match length = 1 (1 pipe symbol) 0 -- Max sub-match length = 1 (1 pipe symbol) ) AS `after`FROM tbl;