Pandas: how to change all the values of a column?
As @DSM points out, you can do this more directly using the vectorised string methods:
df['Date'].str[-4:].astype(int)
Or using extract (assuming there is only one set of digits of length 4 somewhere in each string):
df['Date'].str.extract('(?P<year>\d{4})').astype(int)
An alternative slightly more flexible way, might be to use apply
(or equivalently map
) to do this:
df['Date'] = df['Date'].apply(lambda x: int(str(x)[-4:])) # converts the last 4 characters of the string to an integer
The lambda function, is taking the input from the Date
and converting it to a year.
You could (and perhaps should) write this more verbosely as:
def convert_to_year(date_in_some_format): date_as_string = str(date_in_some_format) # cast to string year_as_string = date_in_some_format[-4:] # last four characters return int(year_as_string)df['Date'] = df['Date'].apply(convert_to_year)
Perhaps 'Year' is a better name for this column...
You can do a column transformation by using apply
Define a clean function to remove the dollar and commas and convert your data to float.
def clean(x): x = x.replace("$", "").replace(",", "").replace(" ", "") return float(x)
Next, call it on your column like this.
data['Revenue'] = data['Revenue'].apply(clean)