Category means
- plot_utils.category_means(categorical_array, continuous_array, fig=None, ax=None, figsize=None, dpi=100, title=None, xlabel=None, ylabel=None, rot=0, dropna=False, show_stats=True, sort_by='name', vert=True, plot_violins=True, **extra_kwargs)[source]
Summarize the mean values of entries of
continuous_array
corresponding to each distinct category incategorical_array
, and show a violin plot to visualize it. The violin plot will show the distribution of values incontinuous_array
corresponding to each category incategorical_array
.Also, a one-way ANOVA test (H0: different categories in
categorical_array
yield the same average values incontinuous_array
) is performed, and F statistics and p-value are returned.- Parameters:
categorical_array (list, numpy.ndarray, or pandas.Series) – An vector of categorical values.
continuous_array (list, numpy.ndarray, or pandas.Series) – The target variable whose values correspond to the values in x. Must have the same length as x. It is natural that y contains continuous values, but if y contains categorical values (expressed as integers, not strings), this function should also work.
fig (matplotlib.figure.Figure or
None
) – Figure object. If None, a new figure will be created.ax (matplotlib.axes._subplots.AxesSubplot or
None
) – Axes object. If None, a new axes will be created.figsize ((float, float)) – Figure size in inches, as a tuple of two numbers. The figure size of
fig
(if notNone
) will override this parameter.dpi (float) – Figure resolution. The dpi of
fig
(if notNone
) will override this parameter.title (str) – The title of the violin plot, usually the name of ``categorical_array`
xlabel (str) – The label for the x axis (i.e., categories) of the violin plot. If
None
andcategorical_array
is a pandas Series, use the ‘name’ attribute ofcategorical_array
as xlabel.ylabel (str) – The label for the y axis (i.e., average
continuous_array
values) of the violin plot. IfNone
andcontinuous_array
is a pandas Series, use the ‘name’ attribute ofcontinuous_array
as ylabel.rot (float) – The rotation (in degrees) of the x axis labels.
dropna (bool) – Whether or not to exclude N/A records in the data.
show_stats (bool) – Whether or not to show the statistical test results (F statistics and p-value) on the figure.
sort_by ({'name', 'mean', 'median', None}) – Option to arrange the different categories in categorical_array in the violin plot.
None
means no sorting, i.e., using the hashed order of the category names; ‘mean’ and ‘median’ mean sorting the violins according to the mean/median values of each category; ‘name’ means sorting the violins according to the category names.vert (bool) – Whether to show the violins as vertical.
plot_violins (bool) – If
True
, use violin plots to illustrate the distribution of groups. Otherwise, use multi-histogram (hist_multi()).**extra_kwargs – Keyword arguments to be passed to plt.violinplot() or hist_multi(). (https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.violinplot.html) Note that this subroutine overrides the default behavior of violinplot: showmeans is overriden to True and showextrema to False.
- Returns:
fig (matplotlib.figure.Figure) – The figure object being created or being passed into this function.
ax (matplotlib.axes._subplots.AxesSubplot) – The axes object being created or being passed into this function.
mean_values (dict) – A dictionary whose keys are the categories in x, and their corresponding values are the mean values in y.
F_test_result (tuple<float>) – A tuple in the order of (F_stat, p_value), where F_stat is the computed F-value of the one-way ANOVA test, and p_value is the associated p-value from the F-distribution.