Correlation matrix

plot_utils.correlation_matrix(X, color_map='RdBu_r', fig=None, ax=None, figsize=None, dpi=100, variable_names=None, rot=45, scatter_plots=False)[source]

Plot correlation matrix of a dataset X, whose columns are different variables (or a sample of a certain random variable).

Parameters:
  • X (numpy.ndarray or pandas.DataFrame) – The data set.

  • color_map (str or matplotlib.colors.Colormap) – The color scheme to show high, low, negative high correlations. Valid names are listed in https://matplotlib.org/users/colormaps.html. Using diverging color maps is recommended: PiYG, PRGn, BrBG, PuOr, RdGy, RdBu, RdYlBu, RdYlGn, Spectral, coolwarm, bwr, seismic.

  • fig (matplotlib.figure.Figure or None) – Figure object. If None, a new figure will be created.

  • ax (matplotlib.axes._subplots.AxesSubplot or None) – Axes object. If None, a new axes will be created.

  • figsize ((float, float)) – Figure size in inches, as a tuple of two numbers. The figure size of fig (if not None) will override this parameter.

  • dpi (float) – Figure resolution. The dpi of fig (if not None) will override this parameter.

  • variable_names (list<str>) – Names of the variables in X. If X is a pandas DataFrame, this argument is not needed: column names of X is automatically used as variable names. If X is a numpy array, and this argument is not provided, then X’s column indices are used. The length of variable_names should match the number of columns in X; if not, a warning will be thrown (not error).

  • rot (float) – The rotation of the x axis labels, in degrees.

  • scatter_plots (bool) – Whether or not to show the scatter plots of pairs of variables.

Returns:

  • correlations (pandas.DataFrame) – The correlation matrix.

  • fig (matplotlib.figure.Figure) – The figure object being created or being passed into this function.

  • ax (matplotlib.axes._subplots.AxesSubplot) – The axes object being created or being passed into this function.