How to get the position of the first occurrence of a digit in a postgres select How to get the position of the first occurrence of a digit in a postgres select postgresql postgresql

How to get the position of the first occurrence of a digit in a postgres select


Use the regex function substring():

with my_data(sdk) as (values    ('JavaScript/2.3.4'),    ('JavaScript/4.3.1'),    ('Android4.6.5'),    ('Android3.2.1'),    ('Swift4.5.3'),    ('Swift/3.1.1.5'))select     substring(sdk from '[^\d/]*') as sdk_name,     substring(sdk from '\d.*') as sdk_versionfrom my_data   sdk_name  | sdk_version ------------+------------- JavaScript | 2.3.4 JavaScript | 4.3.1 Android    | 4.6.5 Android    | 3.2.1 Swift      | 4.5.3 Swift      | 3.1.1.5(6 rows)

Update.

You can place your select query in the with part (instead of values):

with my_data(sdk) as (    <select sdk from ...>)select     substring(sdk from '[^\d/]*') as sdk_name,     substring(sdk from '\d.*') as sdk_versionfrom my_data 

or in the from clause:

select     substring(sdk from '[^\d/]*') as sdk_name,     substring(sdk from '\d.*') as sdk_versionfrom (    <select sdk from ...>) my_data


You can use translate for this:

SELECT sdk, translate(sdk,'0123456789/.','') AS sdk_name,        translate(lower(sdk),'abcdefghijklmnopqrstuvwxyz/','') AS sdk_version FROM   table1;

Working fiddle

EDIT (by Gordon):

This is a good idea. I find it simpler to use regexp_replace():

select regexp_replace(sdk, '[0-9/.]', '', 'g') as sdk_name,       regexp_replace(sdk, '[a-zA-Z/]', '', 'g') as sdk_version


Regular expression parsing is quite compute-intensive, so rather than using two function calls (as in the other answers), you should combine them into a single call and then extract the desired values from the result:

WITH d(sdk) AS (  VALUES    ('JavaScript/2.3.4'),    ('JavaScript/4.3.1'),    ('Android4.6.5'),    ('Android3.2.1'),    ('Swift4.5.3'),    ('Swift/3.1.1.5'),    ('C#/23.1') )SELECT unq.sdk, re.match[1] AS sdk_name, re.match[2] AS sdk_versionFROM (SELECT DISTINCT sdk FROM d) unq,     regexp_match(unq.sdk, '([^0-9/]*)/*([0-9.]*)') re (match);

The regular expression does the following:

  1. ([^0-9/]*) Capture everything up to the first digit or forward slash. Note that this will also match SDK names that have characters other than A-Za-z.
  2. /* Jump over a forward slash, if present
  3. ([0-9.]*) Capture any following digits or dots. If you are confident that only digits and dots will follow then you can also do (*).

Note also that I put the DISTINCT clause in a separate subquery. It is not very efficient to first process every row to then throw out any duplicates. Instead, get rid of the duplicates first.

Pre PG-10 versions

The function regexp_match() was introduced in version 10. If you have an older version then you can use regexp_matches() without the g flag for the same result (PG8.3+).