This study plan in LeetCode aims to teach basics of Pandas with 15 simple questions.

I’m writing this note to remember the functions of Pandas and their abilities. In this note, I will simply name the questions, type my answers to the problems and explain the methods I used in the solutions.

I’m not going to give the details of the problems since they are available in a better format on LeetCode.

Create a DataFrame from List

df = pd.DataFrame(student_data)
df.columns = ["student_id", "age"]
return df

Creates a dataframe with specific column names. df.columns is used to name the columns of a dataframe.

Get the Size of a DataFrame

return [players.shape[0], players.shape[1]]

df.shape returns a tuple of rows and columns of df: (row_count, column_count)

Display the First Three Rows

return employees.head(3)

df.head(n) returns first n rows of df.

Select Data

return students.loc[students['student_id'] == 101, ['name', 'age']]

In df.loc, first parameter is the condition used for the search, second parameters is a list with desired columns.

Here is an example with multiple conditions:

students.loc[(students['student_id'] == 101) & (students['name'] == "Ulysses"), ['name', 'age']]

Create a New Column

bonus = []
    for s in employees["salary"]:
        bonus.append(s*2)
 
    result = employees.assign(bonus=bonus)
    return result

In df.assign, there is a column name and a list of values to be used in the column:

df.assign(column_name=[element1, element2, element3])

Drop Duplicate Rows

df = customers.drop_duplicates(subset=['email'])
    return df

df.drop_duplicates simply drop duplicates according to the values given in a column or columns.

dedup_df = df.drop_duplicates(subset=['A', 'B'])

Drop Missing Data

return students.dropna(subset=['name'])

df.dropna drops the rows with missing values. In this question, a column name is given to drop the rows with missing values if they are only in the given column. df.dropna can get various parameters to handle missing values in different ways.

Modify Columns

employees.salary = employees.salary*2
    return employees

In this question, it is asked to double the values of a column and I directly accessed the column with df.row_name and doubled its values.

Here is an additional example:

import numpyas np
 
# Step 1: Select the column
age_column= df['age']
 
# Step 2: Apply a function to each value
def sqrt(x):
    return np.sqrt(x)
 
new_age_column= age_column.apply(sqrt)
 
# Step 3: Assign the new values back to the column
df['age']= new_age_column

df.apply is used to apply a function to each value in the column.

Rename Columns

return students.rename(columns = {'id':'student_id', 
                                        'first':'first_name', 
                                        'last':'last_name',
                                        'age':'age_in_years'})

df.rename can be used to change names of index or columns like this case. With inplace=True parameter, df can be modified instead of creating a new one.

Change Data Type

return students.astype({'grade': int})

df.astype is used to change the data type of an object in the dataframe. It can be used for specific or all columns. To solve this question, different approaches can be used such as df.apply to all elements in a column or df.to_numeric to convert non-numeric objects into numeric ones if possible.

Fill Missing Data

products['quantity'] = products['quantity'].fillna(0)
return products

In this case, it is asked to fill missing data in a single column. That’s why I operated on “quantity” column. df.fillna(x) can be used to replace all missing values with given parameter x.

To achieve the same result, df.replace could be used too:

df['DataFrame Column'] = df['DataFrame Column'].replace(np.nan, 0)

Reshape Data: Concatenate

return pd.concat([df1,df2], axis = 0)

pd.concat can be used to concatenate 2 dataframes horizontally (same rows, new columns) or vertically (same columns, new rows). axis = 0 is for vertical and axis = 1 is for horizontal concatenation. Other than this, pd.merge, df.append and df.join can be used for concatenation.

#Concatenation with pd.merge
result = pd.merge(df, df1, on='Courses', how='outer', suffixes=('_df1', '_df2')).fillna(0)
 
result['Fee'] = result['Fee_df1'] + result['Fee_df2']
result = result[['Courses', 'Fee']]
 
#Concatenation with df.join
result = df.join(df1)
 
#Concatenation with df.append (only vertical concatenation)
result = df.append(df1, ignore_index=True

Reshape Data: Pivot

return weather.pivot(index = 'month', columns = 'city', values = 'temperature')

df.pivot is used to pivot a dataframe with 3 columns. This function is used to reshape to a simpler, smaller dataframe that the same meaning can be deduced from it. With this function, the index and columns of a dataframe can be set and the new dataframe can be filled with desired values.

Reshape Data: Melt

return pd.melt(report, id_vars=['product'], 
                value_vars=['quarter_1', 'quarter_2', 'quarter_3', 'quarter_4'],
                var_name='quarter', value_name='sales')

pd.melt reshapes a dataframe to be more computer friendly. In this problem, pd.melt is used to merge values of multiple columns to a single column. The names of the columns are also used as variable names for the values.

Method Chaining

return animals[animals['weight'] > 100].sort_values(['weight'], ascending = False,)[['name']]

Method chaining is a newer approach to data manipulation by allowing for the execution of multiple operations in a single line of code. With method chaining, each operation is chained together using the dot notation.