[P00] Pandas: Create new column from existing

While working at pandas, most common task is to create new column from existing data. Since pandas present data as X-Y grid (i.e. dataframe), so it is possible to determine new column content based on Row content as well as based on Column content. In this post, such concepts are shown with example.

Proble definition

For a given dataframe, create 2 new column where 1 column represents whether or not the "Year" column started with "19" and another column is negated value of this.

Input data:

    Year    Freq
0   1972    9.0
1   2072    5.0
2   1372    1.0
3   3872    6.0

Approach 1: Using function over axis / column

Using lambda, if we mention in which axis we want to perform the operation and pass a series, then it would be sufficient.

def f_19(series):
    return 1 if str(series.Year).startswith('19') else 0

df1['is_19'] = df1.apply( lambda series: f_19(series), axis=1 )

Approach 2: Using list comprehension in rows

To be honest, this example is not a strong one because we can easily find negated values of df1['is_19'] column by applying another operation on this column. Nonetheless, this post is about showing alternatives so I am mentioning how you can find negated values in another row by traversing rows:

df1['is_not_19'] = [0 if row == 1 else 1 for row in df1['is_19']]

Bonus: If we want only filtered result

In case, we want to apply filtering, then our approach should be finding the location (i.e. index of rows) where the condition is true and pick only those rows:

df4= df1.loc[df1.apply( lambda series: series.Year==1972,axis=1)]


Result dataframe will show 2 new column as follows:

    Year    Freq    is_19   is_not_19
0   1972    9.0 1       0