In real-world data analysis, your data will likely:
Fortunately, pandas
can help you with all of this!
.filter()
.loc[]
method (note the brackets).query()
will “query” your
dataset based on an expression&
(and) and |
(or).assign()
methoddtype
conversion.astype()
method.dropna()
method: delete all rows (or columns) that
have any missing values (NaN
in pandas
).fillna()
method: fill in missing data with a specified
valuecols_to_keep = ['INSTNM', 'STABBR', 'GRAD_DEBT_MDN_SUPP']
states = ['OK', 'NM', 'TX', 'LA']
sw_debt_clean = (full_df
.filter(cols_to_keep)
.set_axis(['name', 'state', 'debt'], axis = 'columns')
.query("debt != 'PrivacySuppressed' & state in @states")
.assign(debtnum = lambda x: x.debt.astype(float))
.dropna()
)
pandas
: .groupby()
method!Process:
.groupby()
in
pandas
seaborn
seaborn
.merge()
method in
pandas
pandas
how
parameter):
'inner'
(default), 'left'
,
'right'
, and 'outer'
.pivot()
method in pandas
pd.melt()
function in pandas