Multiple histograms
- plot_utils.hist_multi(X, bins=10, fig=None, ax=None, figsize=None, dpi=100, nan_warning=False, showmeans=True, showmedians=False, vert=True, data_names=[], rot=45, name_ax_label=None, data_ax_label=None, sort_by=None, title=None, show_vals=True, show_pct_diff=False, baseline_data_index=0, legend_loc='best', show_counts_on_data_ax=True, **extra_kwargs)[source]
Generate multiple histograms, one for each data set within
X
.- Parameters:
X (pandas.DataFrame, pandas.Series, numpy.ndarray, or dict) –
The data to be visualized. It can be of the following types:
- pandas.DataFrame:
Each column contains a set of data
- pandas.Series:
Contains only one set of data
- numpy.ndarray:
1D numpy array: only one set of data
2D numpy array: each column contains a set of data
Higher dimensional numpy array: not allowed
- dict:
Each key-value pair is one set of data
- list of lists:
Each sub-list is a data set
Note that the NaN values in the data are implicitly excluded.
bins (int or sequence or str) – If an integer is given, the whole range of data (i.e., all the numbers within
X
) is divided intobins
segments. If sequence or str, they will be passed to thebins
argument ofmatplotlib.pyplot.hist()
.fig (matplotlib.figure.Figure or
None
) – Figure object. If None, a new figure will be created.ax (matplotlib.axes._subplots.AxesSubplot or
None
) – Axes object. If None, a new axes will be created.figsize ((float, float)) – Figure size in inches, as a tuple of two numbers. The figure size of
fig
(if notNone
) will override this parameter.dpi (float) – Figure resolution. The dpi of
fig
(if notNone
) will override this parameter.nan_warning (bool) – Whether to show a warning if there are NaN values in the data.
showmeans (bool) – Whether to show the mean values of each data group.
showmedians (bool) – Whether to show the median values of each data group.
vert (bool) – Whether to show the “base” of the histograms as vertical.
data_names (list<str>,
[]
, orNone
) –The names of each data set, to be shown as the axis tick label of each data set. If
[]
orNone
, it will be determined automatically. IfX
is a:- numpy.ndarray:
data_names = [‘data_0’, ‘data_1’, ‘data_2’, …]
- pandas.Series:
data_names = X.name
- pd.DataFrame:
data_names = list(X.columns)
- dict:
data_names = list(X.keys())
rot (float) – The rotation (in degrees) of the data_names when shown as the tick labels. If vert is False, rot has no effect.
name_ax_label (str) – The label of the “name axis”. (“Name axis” is the axis along which different violins are presented.)
data_ax_label (str) – The labels of the “data axis”. (“Data axis” is the axis along which the data values are presented.)
sort_by ({‘name’, ‘mean’, ‘median’,
None
}) – Option to sort the different data groups inX
in the violin plot.None
means no sorting, keeping the violin plot order as provided; ‘mean’ and ‘median’ mean sorting the violins according to the mean/median values of each data group; ‘name’ means sorting the violins according to the names of the groups.title (str) – The title of the plot.
show_vals (bool) – Whether to show mean and/or median values along the mean/median bars. Only effective if
showmeans
and/orshowmedians
are turned on.show_pct_diff (bool) – Whether to show percent difference of mean and/or median values between different data sets. Only effective when
show_vals
is set toTrue
.baseline_data_index (int) – Which data set is considered the “baseline” when showing percent differences.
legend_loc (str) – The location specification for the legend.
show_counts_on_data_ax (bool) – Whether to show counts besides the histograms.
**extra_kwargs (dict) – Other keyword arguments to be passed to
matplotlib.pyplot.bar()
.
- Returns:
fig (matplotlib.figure.Figure) – The figure object being created or being passed into this function.
ax (matplotlib.axes._subplots.AxesSubplot) – The axes object being created or being passed into this function.