Contingency table
- plot_utils.contingency_table(array_horizontal, array_vertical, fig=None, ax=None, figsize='auto', dpi=100, color_map='auto', xlabel=None, ylabel=None, dropna=False, rot=45, normalize=True, symm_cbar=True, show_stats=True)[source]
Calculate and visualize the contingency table from two categorical arrays. Also perform a Pearson’s chi-squared test to evaluate whether the two arrays are independent.
- Parameters:
array_horizontal (list, numpy.ndarray, or pandas.Series) – Array to show as the horizontal margin in the contigency table (i.e., its categories are the column headers).
array_vertical (list, numpy.ndarray, or pandas.Series) – Array to show as the vertical margin in the contigency table (i.e., its categories are the row names).
fig (matplotlib.figure.Figure or
None
) – Figure object. If None, a new figure will be created.ax (matplotlib.axes._subplots.AxesSubplot or
None
) – Axes object. If None, a new axes will be created.figsize ((float, float)) – Figure size in inches, as a tuple of two numbers. The figure size of
fig
(if notNone
) will override this parameter.dpi (float) – Figure resolution. The dpi of
fig
(if notNone
) will override this parameter.color_map (str or matplotlib.colors.Colormap) – The color scheme specifications. Valid names are listed in https://matplotlib.org/users/colormaps.html. If relative_color is True, use diverging color maps (e.g., PiYG, PRGn, BrBG, PuOr, RdGy, RdBu, RdYlBu, RdYlGn, Spectral, coolwarm, bwr, seismic). Otherwise, use sequential color maps (e.g., viridis, jet).
xlabel (str) – The label for the horizontal axis. If
None
andarray_horizontal
is a pandas Series, use the ‘name’ attribute ofarray_horizontal
as xlabel.ylabel (str) – The label for the vertical axis. If
None
andarray_vertical
is a pandas Series, use the ‘name’ attribute ofarray_vertical
as ylabel.dropna (bool) – If
True
, ignore entries (in both arrays) where there are missing values in at least one array. IfFalse
, the missing values are treated as a new category: “N/A”.rot (float or 'vertical' or 'horizontal') – The rotation of the x axis labels (in degrees).
normalize (bool) – If
True
, plot the contingency table as the relative difference between the observed and the expected (i.e., (obs. - exp.)/exp. ). IfFalse
, plot the original “observed frequency”.symm_cbar (bool) – If
True
, the limits of the color bar are symmetric. Otherwise, the limits are the natural minimum/maximum of the table to be plotted. It has no effect if “normalize” is set toFalse
.show_stats (bool) – Whether or not to show the statistical test results (chi2 statistics and p-value) on the figure.
- Returns:
fig (matplotlib.figure.Figure) – The figure object being created or being passed into this function.
ax (matplotlib.axes._subplots.AxesSubplot) – The axes object being created or being passed into this function.
chi2_results (tuple<float>) – A tuple in the order of (chi2, p_value, degree_of_freedom).
correlation_metrics (tuple<float>) – A tuple in the order of (phi coef., coeff. of contingency, Cramer’s V).