How to get the position of the first occurrence of a digit in a postgres select
Use the regex function substring()
:
with my_data(sdk) as (values ('JavaScript/2.3.4'), ('JavaScript/4.3.1'), ('Android4.6.5'), ('Android3.2.1'), ('Swift4.5.3'), ('Swift/3.1.1.5'))select substring(sdk from '[^\d/]*') as sdk_name, substring(sdk from '\d.*') as sdk_versionfrom my_data sdk_name | sdk_version ------------+------------- JavaScript | 2.3.4 JavaScript | 4.3.1 Android | 4.6.5 Android | 3.2.1 Swift | 4.5.3 Swift | 3.1.1.5(6 rows)
Update.
You can place your select
query in the with
part (instead of values
):
with my_data(sdk) as ( <select sdk from ...>)select substring(sdk from '[^\d/]*') as sdk_name, substring(sdk from '\d.*') as sdk_versionfrom my_data
or in the from
clause:
select substring(sdk from '[^\d/]*') as sdk_name, substring(sdk from '\d.*') as sdk_versionfrom ( <select sdk from ...>) my_data
You can use translate for this:
SELECT sdk, translate(sdk,'0123456789/.','') AS sdk_name, translate(lower(sdk),'abcdefghijklmnopqrstuvwxyz/','') AS sdk_version FROM table1;
Working fiddle
EDIT (by Gordon):
This is a good idea. I find it simpler to use regexp_replace()
:
select regexp_replace(sdk, '[0-9/.]', '', 'g') as sdk_name, regexp_replace(sdk, '[a-zA-Z/]', '', 'g') as sdk_version
Regular expression parsing is quite compute-intensive, so rather than using two function calls (as in the other answers), you should combine them into a single call and then extract the desired values from the result:
WITH d(sdk) AS ( VALUES ('JavaScript/2.3.4'), ('JavaScript/4.3.1'), ('Android4.6.5'), ('Android3.2.1'), ('Swift4.5.3'), ('Swift/3.1.1.5'), ('C#/23.1') )SELECT unq.sdk, re.match[1] AS sdk_name, re.match[2] AS sdk_versionFROM (SELECT DISTINCT sdk FROM d) unq, regexp_match(unq.sdk, '([^0-9/]*)/*([0-9.]*)') re (match);
The regular expression does the following:
([^0-9/]*)
Capture everything up to the first digit or forward slash. Note that this will also match SDK names that have characters other thanA-Za-z
./*
Jump over a forward slash, if present([0-9.]*)
Capture any following digits or dots. If you are confident that only digits and dots will follow then you can also do(*)
.
Note also that I put the DISTINCT
clause in a separate subquery. It is not very efficient to first process every row to then throw out any duplicates. Instead, get rid of the duplicates first.
Pre PG-10 versions
The function regexp_match()
was introduced in version 10. If you have an older version then you can use regexp_matches()
without the g
flag for the same result (PG8.3+).