Skip to content

Stats API Reference

Statistical transformations for data.

stat_identity

ggplotly.stats.stat_identity.stat_identity

Bases: Stat

Identity statistical transformation (no transformation).

This stat passes data through unchanged. It's the default stat for most geoms when you want to display raw data values without any aggregation or transformation.

Parameters:

Name Type Description Default
data DataFrame

Data to use for this stat.

None
mapping dict

Aesthetic mappings.

None
**params

Additional parameters for the stat.

{}

Examples:

>>> ggplot(df, aes(x='x', y='y')) + geom_point(stat='identity')

__init__(data=None, mapping=None, **params)

Initialize the stat_identity.

Parameters:

Name Type Description Default
data DataFrame

Data to use for this stat.

None
mapping dict

Aesthetic mappings.

None
**params

Additional parameters.

{}

compute(data)

Return the data unchanged.

Parameters:

Name Type Description Default
data DataFrame

The input data.

required

Returns:

Name Type Description
tuple

(unchanged data, unchanged mapping)

stat_count

ggplotly.stats.stat_count.stat_count

Bases: Stat

Count the number of observations in each group.

This stat is used internally by geom_bar when you want to display counts of categorical data.

Parameters:

Name Type Description Default
data DataFrame

Data to use for this stat.

None
mapping dict

Aesthetic mappings.

None
**params

Additional parameters.

{}

Examples:

>>> ggplot(df, aes(x='category')) + geom_bar(stat='count')

__init__(data=None, mapping=None, **params)

Initialize the stat_count.

Parameters:

Name Type Description Default
data DataFrame

Data to use for this stat.

None
mapping dict

Aesthetic mappings.

None
**params

Additional parameters.

{}

compute(data)

Compute counts for each group in the data.

Parameters:

Name Type Description Default
data DataFrame

The input data.

required

Returns:

Name Type Description
tuple

(transformed DataFrame, updated mapping dict)

stat_bin

ggplotly.stats.stat_bin.stat_bin

Bases: Stat

Bin continuous data for histograms.

This stat divides continuous data into bins and counts the number of observations in each bin. It's used internally by geom_histogram.

Parameters:

Name Type Description Default
bins int

Number of bins to create. Default is 30. Ignored if binwidth is specified.

30
binwidth float

Width of each bin. Overrides bins.

None
boundary float

Bin boundary. One bin edge will be at this value.

None
center float

Bin center. One bin center will be at this value. Mutually exclusive with boundary.

None
breaks array - like

Explicit bin breaks. Overrides bins and binwidth.

None
closed str

Which side of bins is closed. Options: - 'right' (default): bins are (a, b] - 'left': bins are [a, b)

'right'
pad bool

If True, add empty bins at start and end. Default is False.

False
na_rm bool

If True, remove NA values. Default is False.

False
Computed variables
  • count: Number of observations in bin
  • density: Density of observations (count / total / width)
  • ncount: Count scaled to maximum of 1
  • ndensity: Density scaled to maximum of 1
  • width: Width of each bin
  • x: Bin center
  • xmin: Bin left edge
  • xmax: Bin right edge

Examples:

>>> ggplot(df, aes(x='value')) + geom_histogram(bins=20)
>>> ggplot(df, aes(x='value')) + geom_histogram(binwidth=0.5)
>>> ggplot(df, aes(x='value')) + geom_histogram(boundary=0)

__init__(data=None, mapping=None, bins=30, binwidth=None, boundary=None, center=None, breaks=None, closed='right', pad=False, na_rm=False, **params)

Initialize the binning stat.

Parameters:

Name Type Description Default
data DataFrame

Data to use.

None
mapping dict

Aesthetic mappings.

None
bins int

Number of bins. Default is 30.

30
binwidth float

Width of bins.

None
boundary float

Bin boundary position.

None
center float

Bin center position.

None
breaks array - like

Explicit bin breaks.

None
closed str

Which side is closed ('right' or 'left'). Default is 'right'.

'right'
pad bool

Add empty edge bins. Default is False.

False
na_rm bool

Remove NA values. Default is False.

False
**params

Additional parameters.

{}

compute(data, bins=None)

Compute bin counts for the data.

Parameters:

Name Type Description Default
data DataFrame

Data containing the variable to bin.

required
bins int

Number of bins. Default is self.bins.

None

Returns:

Name Type Description
DataFrame

Data with binning information including: - x: bin centers - count: counts per bin - density: density per bin - ncount: normalized count - ndensity: normalized density - width: bin width - xmin: bin left edge - xmax: bin right edge

stat_density

ggplotly.stats.stat_density.stat_density

Bases: Stat

Compute kernel density estimate for continuous data.

This stat performs kernel density estimation, useful for visualizing the distribution of a continuous variable as a smooth curve.

Parameters:

Name Type Description Default
data DataFrame

Data to use for this stat.

None
mapping dict

Aesthetic mappings.

None
bw str or float

Bandwidth method or value. Options: - 'nrd0' (default): Silverman's rule-of-thumb (R default) - 'nrd': Scott's variation of Silverman's rule - 'scott': Scott's rule - 'silverman': Silverman's rule - float: Explicit bandwidth value

'nrd0'
adjust float

Bandwidth adjustment multiplier. Default is 1. Larger values produce smoother curves.

1
kernel str

Kernel function. Default is 'gaussian'. Note: scipy only supports gaussian kernel.

'gaussian'
n int

Number of equally spaced points for density evaluation. Default is 512 (matching R's default).

512
trim bool

If True, trim the density curve to the data range. Default is False (extend slightly beyond data range).

False
na_rm bool

If True, remove NA values. Default is False.

False
**params

Additional parameters for the stat.

{}
Computed variables
  • x: Evaluation points
  • y: Density estimates (integrate to 1)
  • density: Same as y
  • count: Density * n (useful for histograms)
  • scaled: Density scaled to maximum of 1
  • ndensity: Alias for scaled

Examples:

>>> stat_density()  # Default: nrd0 bandwidth, 512 points
>>> stat_density(bw='scott', adjust=0.5)  # Narrower bandwidth
>>> stat_density(n=256, trim=True)  # Fewer points, trimmed to data range

__init__(data=None, mapping=None, bw='nrd0', adjust=1, kernel='gaussian', n=512, trim=False, na_rm=False, **params)

Initialize the density stat.

Parameters:

Name Type Description Default
data DataFrame

Data to use for this stat.

None
mapping dict

Aesthetic mappings.

None
bw str or float

Bandwidth method or value. Default is 'nrd0'.

'nrd0'
adjust float

Bandwidth adjustment multiplier. Default is 1.

1
kernel str

Kernel function. Default is 'gaussian'.

'gaussian'
n int

Number of evaluation points. Default is 512.

512
trim bool

Trim to data range. Default is False.

False
na_rm bool

Remove NA values. Default is False.

False
**params

Additional parameters.

{}

compute(data)

Estimates density for density plots.

Parameters:

Name Type Description Default
data DataFrame or array - like

Data for density estimation. If DataFrame, uses the column specified in mapping['x'].

required

Returns:

Name Type Description
tuple

(DataFrame with density data, updated mapping dict)

compute_array(x)

Compute density for a given array (backward compatibility).

Parameters:

Name Type Description Default
x array - like

Data for density estimation.

required

Returns:

Name Type Description
dict

Contains 'x', 'y', 'density', 'count', 'scaled', 'ndensity'.

stat_smooth

ggplotly.stats.stat_smooth.stat_smooth

Bases: Stat

Stat for computing smoothed lines (LOESS, linear regression, etc.).

Handles the computation of smoothed values, which are then passed to geom_smooth for visualization.

Parameters:

Name Type Description Default
data DataFrame

Data to use for this stat.

None
mapping dict

Aesthetic mappings.

None
method str

The smoothing method. Options: - 'loess': Custom LOESS with degree-2 polynomials (default, matches R) - 'lowess': statsmodels lowess (degree-1, faster) - 'lm': Linear regression

'loess'
span float

The smoothing parameter for LOESS (fraction of points to use). Default is 2/3 to match R's loess default.

2 / 3
se bool

Whether to compute standard errors. Default is True.

True
level float

Confidence level for intervals. Default is 0.95 (95% CI), matching R's ggplot2 default.

0.95
degree int

Polynomial degree for LOESS fitting (1 or 2). Default is 2.

2
**params

Additional parameters for the stat.

{}

__init__(data=None, mapping=None, method='loess', span=2 / 3, se=True, level=0.95, degree=2, **params)

Initializes the smoothing stat.

Parameters:

Name Type Description Default
data DataFrame

Data to use for this stat.

None
mapping dict

Aesthetic mappings.

None
method str

The smoothing method. Options: - 'loess': Custom LOESS with degree-2 polynomials (default, matches R) - 'lowess': statsmodels lowess (degree-1, faster) - 'lm': Linear regression

'loess'
span float

The smoothing parameter for LOESS (fraction of points to use). Default is 2/3 to match R's loess default.

2 / 3
se bool

Whether to compute standard errors. Default is True.

True
level float

Confidence level for intervals. Default is 0.95 (95% CI), matching R's ggplot2 default.

0.95
degree int

Polynomial degree for LOESS fitting (1 or 2). Default is 2.

2
**params

Additional parameters.

{}

apply_smoothing(x, y, return_hat_diag=False)

Applies smoothing based on the chosen method.

Parameters:

Name Type Description Default
x array - like

The x-values.

required
y array - like

The y-values.

required
return_hat_diag bool

If True, also return diagonal of hat matrix (for LOESS only)

False

Returns:

Type Description

Smoothed y-values, or tuple (smoothed_y, hat_diag) if return_hat_diag=True

compute(data)

Compute smoothed values for the data.

Parameters:

Name Type Description Default
data DataFrame

Input data with x and y columns.

required

Returns:

Name Type Description
tuple

(DataFrame with smoothed values, updated mapping dict)

compute_confidence_intervals(x, y, smoothed_y, hat_diag=None)

Compute confidence intervals for the smoothed line.

Parameters:

Name Type Description Default
x array - like

The x-values.

required
y array - like

The original y-values.

required
smoothed_y array - like

The smoothed y-values.

required
hat_diag array - like

Diagonal of hat matrix (for LOESS with exact CI)

None

Returns:

Name Type Description
tuple

(ymin, ymax) arrays for confidence interval bounds.

compute_stat(data, x_col='x', y_col='y')

Computes the stat for smoothing, modifying the data with smoothed values.

Parameters:

Name Type Description Default
data DataFrame

The input data containing x and y columns.

required
x_col str

Name of the x column. Default is 'x'.

'x'
y_col str

Name of the y column. Default is 'y'.

'y'

Returns:

Name Type Description
DataFrame

Modified data with smoothed 'y' values and optional confidence intervals.

stat_summary

ggplotly.stats.stat_summary.stat_summary

Bases: Stat

Summarize y values at each unique x.

Computes summary statistics (mean, median, etc.) of y for each x value. Can compute central tendency and error bars in one step.

Parameters:

Name Type Description Default
fun str or callable

Function for the central value. Options: - 'mean' (default), 'median', 'min', 'max', 'sum' - Or a custom function that takes a Series and returns a scalar Alias: fun_y (deprecated, for backward compatibility)

'mean'
fun_min str or callable

Function for lower error bar. Alias: fun_ymin (deprecated, for backward compatibility)

None
fun_max str or callable

Function for upper error bar. Alias: fun_ymax (deprecated, for backward compatibility)

None
fun_data str or callable

Function that returns y, ymin, ymax together. Built-in options: - 'mean_se': mean +/- standard error - 'mean_cl_normal': mean +/- 95% CI (t-distribution) - 'mean_sdl': mean +/- 1 SD - 'median_hilow': median with 95% quantile range

None
fun_args dict

Additional arguments passed to fun/fun_min/fun_max.

None
geom str

Default geom to use. Options: 'pointrange', 'errorbar', 'point'

'pointrange'
na_rm bool

If True, remove NA values before computation. Default is False.

False
Aesthetics computed
  • y: The central summary value
  • ymin: Lower bound (if fun_min or fun_data provided)
  • ymax: Upper bound (if fun_max or fun_data provided)

Examples:

Mean with standard error bars

geom_pointrange(stat='summary', fun_data='mean_se')

Median with 95% quantile range

stat_summary(fun_data='median_hilow')

Custom: mean with min/max range (R-style parameter names)

stat_summary(fun='mean', fun_min='min', fun_max='max')

Custom: mean with min/max range (legacy parameter names, still supported)

stat_summary(fun_y='mean', fun_ymin='min', fun_ymax='max')

Custom function

stat_summary(fun=lambda x: x.quantile(0.75))

fun property writable

R-style alias for fun_y.

fun_max property writable

R-style alias for fun_ymax.

fun_min property writable

R-style alias for fun_ymin.

compute(data)

Compute summary statistics for each x value.

Parameters:

Name Type Description Default
data DataFrame

Input data with x and y columns.

required

Returns:

Name Type Description
tuple

(summarized DataFrame, updated mapping)

stat_ecdf

ggplotly.stats.stat_ecdf.stat_ecdf

Bases: Stat

Compute the empirical cumulative distribution function.

The ECDF shows the proportion of data points less than or equal to each value. Useful for visualizing distributions without binning like histograms.

Parameters:

Name Type Description Default
data DataFrame

Data to use for this stat.

None
mapping dict

Aesthetic mappings.

None
n int

Number of points to evaluate. Default uses all unique values.

None
pad bool

If True, pad the ECDF with (min-eps, 0) and (max+eps, 1). Default False.

False
**params

Additional parameters for the stat.

{}

Examples:

>>> ggplot(df, aes(x='value')) + geom_step(stat='ecdf')
>>> ggplot(df, aes(x='value')) + geom_line(stat='ecdf')

__init__(data=None, mapping=None, n=None, pad=False, **params)

Initialize the stat_ecdf.

Parameters:

Name Type Description Default
data DataFrame

Data to use for this stat.

None
mapping dict

Aesthetic mappings.

None
n int

Number of evaluation points.

None
pad bool

Whether to pad ECDF at ends. Default False.

False
**params

Additional parameters.

{}

compute(data)

Compute the ECDF for the given data.

Parameters:

Name Type Description Default
data DataFrame or array - like

The input data. If DataFrame, uses the column specified in mapping['x'].

required

Returns:

Name Type Description
tuple

(DataFrame with 'x' and 'y' columns, updated mapping dict)

compute_array(x)

Compute the ECDF values for a given array of x values.

This is a convenience method for direct array computation.

Parameters:

Name Type Description Default
x array - like

The input data values.

required

Returns:

Name Type Description
tuple

(x_sorted, y_values) arrays

stat_stl

ggplotly.stats.stat_stl.stat_stl

Bases: Stat

Stat for STL (Seasonal-Trend decomposition using Loess).

Decomposes a time series into observed, trend, seasonal, and residual components. Returns a stacked DataFrame with a 'component' column suitable for use with facet_wrap().

Parameters

period : int, optional Seasonal period. Required unless data has DatetimeIndex with frequency. seasonal : int, optional Length of the seasonal smoother. Must be odd. Default is 7. trend : int, optional Length of the trend smoother. Default is auto-calculated. robust : bool, optional Use robust fitting to downweight outliers. Default is False.

Examples

STL with faceting

(ggplot(df, aes(x='date', y='value')) ... + stat_stl(period=12) ... + geom_line() ... + facet_wrap('component', ncol=1, scales='free_y'))

Color by component

(ggplot(df, aes(x='date', y='value', color='component')) ... + stat_stl(period=12) ... + geom_line())

compute(data)

Compute STL decomposition and return stacked DataFrame.

stat_fanchart

ggplotly.stats.stat_fanchart.stat_fanchart

Bases: Stat

Stat for computing percentile bands from T×N matrices.

Computes percentiles across columns at each row (time point) and returns a DataFrame with percentile columns suitable for ribbon plotting.

Parameters

columns : list, optional Specific columns to include in percentile calculation. Default is all numeric columns. percentiles : list, optional Percentile levels to compute. Default is [10, 25, 50, 75, 90].

Returns

DataFrame with columns: - x (from index or x aesthetic) - p{N} for each percentile (e.g., p10, p25, p50, p75, p90) - median (alias for p50)

Examples

Use with geom_ribbon

(ggplot(df) ... + stat_fanchart() ... + geom_ribbon(aes(ymin='p10', ymax='p90'), alpha=0.3) ... + geom_ribbon(aes(ymin='p25', ymax='p75'), alpha=0.3) ... + geom_line(aes(y='median')))

compute(data)

Compute percentiles across columns.

stat_function

ggplotly.stats.stat_function.stat_function

Bases: Stat

Stat for computing y values from a function over the x range.

Evaluates a user-provided function over a grid of x values within the data range, returning x and y columns for line plotting.

Default geom: line

Parameters

fun : callable Function that takes an array of x values and returns y values. n : int, optional Number of points to evaluate. Default is 101. xlim : tuple, optional (min, max) range for x values. If None, uses data range. args : tuple, optional Additional positional arguments to pass to fun.

Examples

from scipy import stats

Standard normal distribution

stat_function(fun=lambda x: stats.norm.pdf(x, loc=0, scale=1))

Normal with custom mean and std

stat_function(fun=lambda x: stats.norm.pdf(x, loc=5, scale=2))

Exponential distribution (lambda=1)

stat_function(fun=lambda x: stats.expon.pdf(x, scale=1))

Gamma distribution (shape=2, scale=1)

stat_function(fun=lambda x: stats.gamma.pdf(x, a=2, scale=1))

Beta distribution (a=2, b=5)

stat_function(fun=lambda x: stats.beta.pdf(x, a=2, b=5), xlim=(0, 1))

Student's t distribution (df=3)

stat_function(fun=lambda x: stats.t.pdf(x, df=3))

Chi-squared distribution (df=5)

stat_function(fun=lambda x: stats.chi2.pdf(x, df=5))

Custom polynomial function

stat_function(fun=lambda x: x*2 - 2x + 1, n=50)

Plot without data - must provide xlim (uses geom_line by default)

(ggplot() ... + stat_function(fun=lambda x: stats.norm.pdf(x), xlim=(-4, 4)))

compute(data)

Compute function values over x range.

stat_qq

ggplotly.stats.stat_qq.stat_qq

Bases: Stat

Stat for computing theoretical vs sample quantiles for Q-Q plots.

Computes sample quantiles against theoretical quantiles from a specified distribution. By default uses the standard normal distribution.

Default geom: point

Parameters

distribution : scipy.stats distribution, optional A scipy.stats distribution object with a ppf method. Default is scipy.stats.norm. dparams : dict, optional Additional parameters to pass to the distribution's ppf method. For example, {'df': 5} for a t-distribution.

Aesthetics

sample : str (required) Column name containing the sample data to compare against the theoretical distribution. color : str, optional Grouping variable for colored points. group : str, optional Grouping variable for separate Q-Q plots.

Examples

import numpy as np import pandas as pd from scipy import stats

Basic Q-Q plot against normal distribution

df = pd.DataFrame({'values': np.random.randn(100)}) (ggplot(df, aes(sample='values')) ... + stat_qq())

Q-Q plot against t-distribution

(ggplot(df, aes(sample='values')) ... + stat_qq(distribution=stats.t, dparams={'df': 5}))

Q-Q plot with reference line

(ggplot(df, aes(sample='values')) ... + stat_qq() ... + stat_qq_line())

compute(data)

Compute theoretical vs sample quantiles.

stat_qq_line

ggplotly.stats.stat_qq_line.stat_qq_line

Bases: Stat

Stat for computing the Q-Q reference line through specified quantiles.

Computes a reference line for Q-Q plots that passes through the points where the sample and theoretical quantiles match at specified probability levels (default: Q1 and Q3, i.e., 25th and 75th percentiles).

Default geom: line

Parameters

distribution : scipy.stats distribution, optional A scipy.stats distribution object with a ppf method. Default is scipy.stats.norm. dparams : dict, optional Additional parameters to pass to the distribution's ppf method. line_p : tuple of float, optional Two probability values (between 0 and 1) specifying which quantiles to use for fitting the line. Default is (0.25, 0.75) for Q1 and Q3.

Aesthetics

sample : str (required) Column name containing the sample data. color : str, optional Line color.

Examples

import numpy as np import pandas as pd from scipy import stats

Q-Q plot with default reference line (through Q1 and Q3)

df = pd.DataFrame({'values': np.random.randn(100)}) (ggplot(df, aes(sample='values')) ... + stat_qq() ... + stat_qq_line())

Reference line through 10th and 90th percentiles

(ggplot(df, aes(sample='values')) ... + stat_qq() ... + stat_qq_line(line_p=(0.10, 0.90)))

Q-Q plot against t-distribution

(ggplot(df, aes(sample='values')) ... + stat_qq(distribution=stats.t, dparams={'df': 5}) ... + stat_qq_line(distribution=stats.t, dparams={'df': 5}))

compute(data)

Compute Q-Q reference line through specified quantiles.

stat_contour

ggplotly.stats.stat_contour.stat_contour

Bases: Stat

Compute 2D density or interpolate gridded data for contour plots.

This stat performs either 2D kernel density estimation (when only x, y are provided) or interpolation of irregular z data to a regular grid (when x, y, z are provided).

Attributes:

Name Type Description
gridsize int

Resolution of the output grid (number of points per axis)

bw_method

Bandwidth method for KDE ('scott', 'silverman', or scalar)

na_rm bool

Whether to remove NA values before computation

Computed Variables

x: Grid x coordinates (1D array of length gridsize) y: Grid y coordinates (1D array of length gridsize) z: 2D density/interpolated values (2D array of shape gridsize x gridsize) density: Alias for z

Examples:

>>> # 2D density estimation from scatter points
>>> ggplot(df, aes(x='x', y='y')) + geom_contour()
>>> # Contours from explicit z values (e.g., elevation data)
>>> ggplot(df, aes(x='lon', y='lat', z='elevation')) + geom_contour()
>>> # With custom grid resolution and bandwidth
>>> stat = stat_contour(gridsize=50, bw_method='silverman')

__init__(data=None, mapping=None, gridsize=100, bw_method=None, na_rm=False, **params)

Initialize the contour stat.

Parameters:

Name Type Description Default
data DataFrame

Data to use for this stat. If None, data will be provided by the geom or plot.

None
mapping dict

Aesthetic mappings. Required: 'x', 'y' Optional: 'z' (if provided, interpolates instead of computing KDE)

None
gridsize int

Resolution of the output grid. Higher values give smoother contours but take longer to compute. Default is 100 (produces 100x100 grid = 10,000 points).

100
bw_method str or float

Bandwidth method for KDE. - 'scott': Scott's rule of thumb (default if None) - 'silverman': Silverman's rule of thumb - float: Manual bandwidth factor Only used when computing KDE (when z is not provided).

None
na_rm bool

If True, remove NA values before computation. Default is False (NA values will cause errors).

False
**params

Additional parameters passed to the Stat base class.

{}

compute(data)

Compute 2D density or interpolate z values to a grid.

This is the main computation method. It: 1. Extracts x, y (and optionally z) from the data 2. Optionally removes NA values 3. Calls either _compute_kde or _compute_from_z 4. Returns the grid data and updated mapping

Parameters:

Name Type Description Default
data DataFrame

Data containing the columns specified in mapping. Must have x and y columns. May optionally have z column.

required

Returns:

Name Type Description
tuple

(result_dict, new_mapping) - result_dict: Contains 'x', 'y', 'z', 'density' arrays - new_mapping: Updated mapping pointing to result columns

Raises:

Type Description
ValueError

If x or y aesthetics are not specified in mapping.

compute_grid(data, x_col='x', y_col='y', z_col=None)

Convenience method to compute grid directly from column names.

This is a simpler interface when you want to call stat_contour directly without setting up mapping first.

Parameters:

Name Type Description Default
data DataFrame

Input data containing the specified columns.

required
x_col str

Name of x column. Default is 'x'.

'x'
y_col str

Name of y column. Default is 'y'.

'y'
z_col str

Name of z column. If None, computes KDE.

None

Returns:

Name Type Description
dict

Contains 'x', 'y', 'z' grid arrays.

Example

stat = stat_contour(gridsize=50) grid = stat.compute_grid(df, x_col='longitude', y_col='latitude')

grid['z'] contains the 2D KDE values