While working at pandas, most common task is to create new column from existing data. Since pandas present data as X-Y grid (i.e. dataframe), so it is possible to determine new column content based on Row content as well as based on Column content. In this post, such concepts are shown with example.
For a given dataframe, create 2 new column where 1 column represents whether or not the "Year" column started with "19" and another column is negated value of this.
Year Freq 0 1972 9.0 1 2072 5.0 2 1372 1.0 3 3872 6.0
Approach 1: Using function over axis / column
Using lambda, if we mention in which axis we want to perform the operation and pass a series, then it would be sufficient.
def f_19(series): return 1 if str(series.Year).startswith('19') else 0 df1['is_19'] = df1.apply( lambda series: f_19(series), axis=1 )
Approach 2: Using list comprehension in rows
To be honest, this example is not a strong one because we can easily find negated values of
df1['is_19'] column by applying another operation on this column. Nonetheless, this post is about showing alternatives so I am mentioning how you can find negated values in another row by traversing rows:
df1['is_not_19'] = [0 if row == 1 else 1 for row in df1['is_19']]
Bonus: If we want only filtered result
In case, we want to apply filtering, then our approach should be finding the location (i.e. index of rows) where the condition is true and pick only those rows:
df4= df1.loc[df1.apply( lambda series: series.Year==1972,axis=1)]
Result dataframe will show 2 new column as follows:
Year Freq is_19 is_not_19 0 1972 9.0 1 0