Hive: how to convert yyyy-mm-ddThh:mm:SS:sssZ into hour units Hive: how to convert yyyy-mm-ddThh:mm:SS:sssZ into hour units hadoop hadoop

Hive: how to convert yyyy-mm-ddThh:mm:SS:sssZ into hour units


Use unix_timestamp and from_unixtime functions to convert and format the timestamp required.

select from_unixtime(unix_timestamp(string("2020-03-09T07:34:06:825Z"),"yyyy-MM-dd'T'hh:mm:ss:SSS'Z'"),"yyyy-MM-dd'T'hh:00:00") as new_ts;+-------------------+|new_ts             |+-------------------+|2020-03-09T07:00:00|+-------------------+

Explanation:

unix_timestamp(string("2020-03-09T07:34:06:825Z"), --sample data"yyyy-MM-dd'T'hh:mm:ss:SSS'Z'") --match the data format

from_unixtime('unix_timestamp...etc',"yyyy-MM-dd'T'hh:00:00") --to format as required


Using regexp_replace:

with your_data as (select stack('2020-03-09T07:34:06:825Z','2020-03-09T07:54:12:220Z','2020-03-09T03:54:11:041Z','2020-03-09T09:22:10:220Z','2020-03-09T11:13:36:217Z','2020-03-09T11:23:26:040Z','2020-03-09T11:43:35:721Z') as str)select regexp_replace(str,'(\\d{4}-\\d{2}-\\d{2})T(\\d{2}).*','$1T$2:00:00')    from your_data;

Result:

2020-03-09T07:00:002020-03-09T07:00:002020-03-09T03:00:002020-03-09T09:00:002020-03-09T11:00:002020-03-09T11:00:002020-03-09T11:00:00

Explanation:

Regular expression defines two groups:

$1 is date part (\\d{4}-\\d{2}-\\d{2})

$2 is hour part after T '(\d{2})' everything else at the end .* is ignored.

You extract '$1T$2:00:00'