Includes:
Your core Python tools for EDA: NumPy, pandas, and seaborn/matplotlib
import
statementstdlib, the standard
library that ships with Pythonre for regular expressions;
os for operating system functions; random for
random-number generation; and many more. Full list: https://docs.python.org/3/library/import numpy as npimport pandas as pdgrad['rate'],
or as attributes of the data frame, e.g. grad.ratedel statementMake sure you know your column types (dtypes) and levels
of measurement before doing analysis!
The mean of a sample (\(\overline{x}\)) is calculated as follows:
\[\overline{x} = \dfrac{x_1 + x_2 + ... + x_n}{n}\]
where \(n\) is the number of elements in the sample.
pandas as data frame
methods, e.g. grad.mean(), grad.std().describe() will give you back a number of
important descriptive stats at oncematplotlibseaborn: extension to matplotlib to make
your graphics look nicer! Standard import:
import seaborn as sns.pandaspandas and
seaborn