Contingency table

plot_utils.contingency_table(array_horizontal, array_vertical, fig=None, ax=None, figsize='auto', dpi=100, color_map='auto', xlabel=None, ylabel=None, dropna=False, rot=45, normalize=True, symm_cbar=True, show_stats=True)[source]

Calculate and visualize the contingency table from two categorical arrays. Also perform a Pearson’s chi-squared test to evaluate whether the two arrays are independent.

Parameters:
  • array_horizontal (list, numpy.ndarray, or pandas.Series) – Array to show as the horizontal margin in the contigency table (i.e., its categories are the column headers).

  • array_vertical (list, numpy.ndarray, or pandas.Series) – Array to show as the vertical margin in the contigency table (i.e., its categories are the row names).

  • fig (matplotlib.figure.Figure or None) – Figure object. If None, a new figure will be created.

  • ax (matplotlib.axes._subplots.AxesSubplot or None) – Axes object. If None, a new axes will be created.

  • figsize ((float, float)) – Figure size in inches, as a tuple of two numbers. The figure size of fig (if not None) will override this parameter.

  • dpi (float) – Figure resolution. The dpi of fig (if not None) will override this parameter.

  • color_map (str or matplotlib.colors.Colormap) – The color scheme specifications. Valid names are listed in https://matplotlib.org/users/colormaps.html. If relative_color is True, use diverging color maps (e.g., PiYG, PRGn, BrBG, PuOr, RdGy, RdBu, RdYlBu, RdYlGn, Spectral, coolwarm, bwr, seismic). Otherwise, use sequential color maps (e.g., viridis, jet).

  • xlabel (str) – The label for the horizontal axis. If None and array_horizontal is a pandas Series, use the ‘name’ attribute of array_horizontal as xlabel.

  • ylabel (str) – The label for the vertical axis. If None and array_vertical is a pandas Series, use the ‘name’ attribute of array_vertical as ylabel.

  • dropna (bool) – If True, ignore entries (in both arrays) where there are missing values in at least one array. If False, the missing values are treated as a new category: “N/A”.

  • rot (float or 'vertical' or 'horizontal') – The rotation of the x axis labels (in degrees).

  • normalize (bool) – If True, plot the contingency table as the relative difference between the observed and the expected (i.e., (obs. - exp.)/exp. ). If False, plot the original “observed frequency”.

  • symm_cbar (bool) – If True, the limits of the color bar are symmetric. Otherwise, the limits are the natural minimum/maximum of the table to be plotted. It has no effect if “normalize” is set to False.

  • show_stats (bool) – Whether or not to show the statistical test results (chi2 statistics and p-value) on the figure.

Returns:

  • fig (matplotlib.figure.Figure) – The figure object being created or being passed into this function.

  • ax (matplotlib.axes._subplots.AxesSubplot) – The axes object being created or being passed into this function.

  • chi2_results (tuple<float>) – A tuple in the order of (chi2, p_value, degree_of_freedom).

  • correlation_metrics (tuple<float>) – A tuple in the order of (phi coef., coeff. of contingency, Cramer’s V).