This study plan in LeetCode aims to teach basics of Pandas with 15 simple questions.
I’m writing this note to remember the functions of Pandas and their abilities. In this note, I will simply name the questions, type my answers to the problems and explain the methods I used in the solutions.
I’m not going to give the details of the problems since they are available in a better format on LeetCode.
Create a DataFrame from List
Creates a dataframe with specific column names. df.columns
is used to name the columns of a dataframe.
Get the Size of a DataFrame
df.shape
returns a tuple of rows and columns of df: (row_count, column_count)
Display the First Three Rows
df.head(n)
returns first n rows of df.
Select Data
In df.loc
, first parameter is the condition used for the search, second parameters is a list with desired columns.
Here is an example with multiple conditions:
Create a New Column
In df.assign
, there is a column name and a list of values to be used in the column:
Drop Duplicate Rows
df.drop_duplicates
simply drop duplicates according to the values given in a column or columns.
Drop Missing Data
df.dropna
drops the rows with missing values. In this question, a column name is given to drop the rows with missing values if they are only in the given column. df.dropna
can get various parameters to handle missing values in different ways.
Modify Columns
In this question, it is asked to double the values of a column and I directly accessed the column with df.row_name
and doubled its values.
Here is an additional example:
df.apply
is used to apply a function to each value in the column.
Rename Columns
df.rename
can be used to change names of index or columns like this case. With inplace=True
parameter, df can be modified instead of creating a new one.
Change Data Type
df.astype
is used to change the data type of an object in the dataframe. It can be used for specific or all columns. To solve this question, different approaches can be used such as df.apply
to all elements in a column or df.to_numeric
to convert non-numeric objects into numeric ones if possible.
Fill Missing Data
In this case, it is asked to fill missing data in a single column. That’s why I operated on “quantity” column. df.fillna(x)
can be used to replace all missing values with given parameter x
.
To achieve the same result, df.replace
could be used too:
Reshape Data: Concatenate
pd.concat
can be used to concatenate 2 dataframes horizontally (same rows, new columns) or vertically (same columns, new rows). axis = 0
is for vertical and axis = 1
is for horizontal concatenation. Other than this, pd.merge
, df.append
and df.join
can be used for concatenation.
Reshape Data: Pivot
df.pivot
is used to pivot a dataframe with 3 columns. This function is used to reshape to a simpler, smaller dataframe that the same meaning can be deduced from it. With this function, the index and columns of a dataframe can be set and the new dataframe can be filled with desired values.
Reshape Data: Melt
pd.melt
reshapes a dataframe to be more computer friendly. In this problem, pd.melt is used to merge values of multiple columns to a single column. The names of the columns are also used as variable names for the values.
Method Chaining
Method chaining is a newer approach to data manipulation by allowing for the execution of multiple operations in a single line of code. With method chaining, each operation is chained together using the dot notation.