Positive rate
- plot_utils.positive_rate(categorical_array, two_classes_array, fig=None, ax=None, figsize=None, dpi=100, barh=True, top_n=None, dropna=False, xlabel=None, ylabel=None, show_stats=True)[source]
Calculate the proportions of the different categories in
categorical_array
that fall into class “1” (orTrue
) intwo_classes_array
, and optionally show a figure.Also, a Pearson’s chi-squared test is performed to test the independence between
categorical_array
andtwo_classes_array
. The chi-squared statistics, p-value, and degree-of-freedom are returned.- Parameters:
categorical_array (list, numpy.ndarray, or pandas.Series) – An array of categorical values.
two_class_array (list, numpy.ndarray, or pandas.Series) – The target variable containing two classes. Each value in this parameter correspond to a value in
categorical_array
(at the same index). It must have the same length ascategorical_array
. The second unique value in this parameter will be considered as the positive class (for example, “True” in [True, False, True], or “3” in [1, 1, 3, 3, 1]).fig (matplotlib.figure.Figure or
None
) – Figure object. If None, a new figure will be created.ax (matplotlib.axes._subplots.AxesSubplot or
None
) – Axes object. If None, a new axes will be created.figsize ((float, float)) – Figure size in inches, as a tuple of two numbers. The figure size of
fig
(if notNone
) will override this parameter.dpi (float) – Figure resolution. The dpi of
fig
(if notNone
) will override this parameter.barh (bool) – Whether or not to show the bars as horizontal (otherwise, vertical).
top_n (int) – Only shows
top_n
categories (ranked by their positive rate) in the figure. Useful when there are too many categories. IfNone
, show all categories.dropna (bool) – If
True
, ignore entries (in both arrays) where there are missing values in at least one array. IfFalse
, the missing values are treated as a new category: “N/A”.xlabel (str) – X axes label.
ylabel (str) – Y axes label.
show_stats (bool) – Whether or not to show the statistical test results (chi2 statistics and p-value) on the figure.
- Returns:
fig (matplotlib.figure.Figure) – The figure object being created or being passed into this function.
ax (matplotlib.axes._subplots.AxesSubplot) – The axes object being created or being passed into this function.
pos_rate (pandas.Series) – The positive rate of each categories in x
chi2_results (tuple<float>) – A tuple in the order of (chi2, p_value, degree_of_freedom)