Positive rate

plot_utils.positive_rate(categorical_array, two_classes_array, fig=None, ax=None, figsize=None, dpi=100, barh=True, top_n=None, dropna=False, xlabel=None, ylabel=None, show_stats=True)[source]

Calculate the proportions of the different categories in categorical_array that fall into class “1” (or True) in two_classes_array, and optionally show a figure.

Also, a Pearson’s chi-squared test is performed to test the independence between categorical_array and two_classes_array. The chi-squared statistics, p-value, and degree-of-freedom are returned.

Parameters:
  • categorical_array (list, numpy.ndarray, or pandas.Series) – An array of categorical values.

  • two_class_array (list, numpy.ndarray, or pandas.Series) – The target variable containing two classes. Each value in this parameter correspond to a value in categorical_array (at the same index). It must have the same length as categorical_array. The second unique value in this parameter will be considered as the positive class (for example, “True” in [True, False, True], or “3” in [1, 1, 3, 3, 1]).

  • fig (matplotlib.figure.Figure or None) – Figure object. If None, a new figure will be created.

  • ax (matplotlib.axes._subplots.AxesSubplot or None) – Axes object. If None, a new axes will be created.

  • figsize ((float, float)) – Figure size in inches, as a tuple of two numbers. The figure size of fig (if not None) will override this parameter.

  • dpi (float) – Figure resolution. The dpi of fig (if not None) will override this parameter.

  • barh (bool) – Whether or not to show the bars as horizontal (otherwise, vertical).

  • top_n (int) – Only shows top_n categories (ranked by their positive rate) in the figure. Useful when there are too many categories. If None, show all categories.

  • dropna (bool) – If True, ignore entries (in both arrays) where there are missing values in at least one array. If False, the missing values are treated as a new category: “N/A”.

  • xlabel (str) – X axes label.

  • ylabel (str) – Y axes label.

  • show_stats (bool) – Whether or not to show the statistical test results (chi2 statistics and p-value) on the figure.

Returns:

  • fig (matplotlib.figure.Figure) – The figure object being created or being passed into this function.

  • ax (matplotlib.axes._subplots.AxesSubplot) – The axes object being created or being passed into this function.

  • pos_rate (pandas.Series) – The positive rate of each categories in x

  • chi2_results (tuple<float>) – A tuple in the order of (chi2, p_value, degree_of_freedom)