Includes:
Your core Python tools for EDA: NumPy, pandas, and seaborn/matplotlib
import
statementstdlib
, the standard
library that ships with Pythonre
for regular expressions;
os
for operating system functions; random
for
random-number generation; and many more. Full list: https://docs.python.org/3/library/import numpy as np
import pandas as pd
grad['rate']
,
or as attributes of the data frame, e.g. grad.rate
del
statementMake sure you know your column types (dtypes
) and levels
of measurement before doing analysis!
The mean of a sample (\(\overline{x}\)) is calculated as follows:
\[\overline{x} = \dfrac{x_1 + x_2 + ... + x_n}{n}\]
where \(n\) is the number of elements in the sample.
pandas
as data frame
methods, e.g. grad.mean()
, grad.std()
.describe()
will give you back a number of
important descriptive stats at oncematplotlib
seaborn
: extension to matplotlib
to make
your graphics look nicer! Standard import:
import seaborn as sns
.pandas
pandas
and
seaborn