Stats API Reference¶
Statistical transformations for data.
stat_identity¶
ggplotly.stats.stat_identity.stat_identity
¶
Bases: Stat
Identity statistical transformation (no transformation).
This stat passes data through unchanged. It's the default stat for most geoms when you want to display raw data values without any aggregation or transformation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Data to use for this stat. |
None
|
mapping
|
dict
|
Aesthetic mappings. |
None
|
**params
|
Additional parameters for the stat. |
{}
|
Examples:
__init__(data=None, mapping=None, **params)
¶
Initialize the stat_identity.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Data to use for this stat. |
None
|
mapping
|
dict
|
Aesthetic mappings. |
None
|
**params
|
Additional parameters. |
{}
|
compute(data)
¶
Return the data unchanged.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
The input data. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
tuple |
(unchanged data, unchanged mapping) |
stat_count¶
ggplotly.stats.stat_count.stat_count
¶
Bases: Stat
Count the number of observations in each group.
This stat is used internally by geom_bar when you want to display counts of categorical data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Data to use for this stat. |
None
|
mapping
|
dict
|
Aesthetic mappings. |
None
|
**params
|
Additional parameters. |
{}
|
Examples:
__init__(data=None, mapping=None, **params)
¶
Initialize the stat_count.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Data to use for this stat. |
None
|
mapping
|
dict
|
Aesthetic mappings. |
None
|
**params
|
Additional parameters. |
{}
|
compute(data)
¶
Compute counts for each group in the data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
The input data. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
tuple |
(transformed DataFrame, updated mapping dict) |
stat_bin¶
ggplotly.stats.stat_bin.stat_bin
¶
Bases: Stat
Bin continuous data for histograms.
This stat divides continuous data into bins and counts the number of observations in each bin. It's used internally by geom_histogram.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bins
|
int
|
Number of bins to create. Default is 30. Ignored if binwidth is specified. |
30
|
binwidth
|
float
|
Width of each bin. Overrides bins. |
None
|
boundary
|
float
|
Bin boundary. One bin edge will be at this value. |
None
|
center
|
float
|
Bin center. One bin center will be at this value. Mutually exclusive with boundary. |
None
|
breaks
|
array - like
|
Explicit bin breaks. Overrides bins and binwidth. |
None
|
closed
|
str
|
Which side of bins is closed. Options: - 'right' (default): bins are (a, b] - 'left': bins are [a, b) |
'right'
|
pad
|
bool
|
If True, add empty bins at start and end. Default is False. |
False
|
na_rm
|
bool
|
If True, remove NA values. Default is False. |
False
|
Computed variables
- count: Number of observations in bin
- density: Density of observations (count / total / width)
- ncount: Count scaled to maximum of 1
- ndensity: Density scaled to maximum of 1
- width: Width of each bin
- x: Bin center
- xmin: Bin left edge
- xmax: Bin right edge
Examples:
>>> ggplot(df, aes(x='value')) + geom_histogram(bins=20)
>>> ggplot(df, aes(x='value')) + geom_histogram(binwidth=0.5)
>>> ggplot(df, aes(x='value')) + geom_histogram(boundary=0)
__init__(data=None, mapping=None, bins=30, binwidth=None, boundary=None, center=None, breaks=None, closed='right', pad=False, na_rm=False, **params)
¶
Initialize the binning stat.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Data to use. |
None
|
mapping
|
dict
|
Aesthetic mappings. |
None
|
bins
|
int
|
Number of bins. Default is 30. |
30
|
binwidth
|
float
|
Width of bins. |
None
|
boundary
|
float
|
Bin boundary position. |
None
|
center
|
float
|
Bin center position. |
None
|
breaks
|
array - like
|
Explicit bin breaks. |
None
|
closed
|
str
|
Which side is closed ('right' or 'left'). Default is 'right'. |
'right'
|
pad
|
bool
|
Add empty edge bins. Default is False. |
False
|
na_rm
|
bool
|
Remove NA values. Default is False. |
False
|
**params
|
Additional parameters. |
{}
|
compute(data, bins=None)
¶
Compute bin counts for the data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Data containing the variable to bin. |
required |
bins
|
int
|
Number of bins. Default is self.bins. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
DataFrame |
Data with binning information including: - x: bin centers - count: counts per bin - density: density per bin - ncount: normalized count - ndensity: normalized density - width: bin width - xmin: bin left edge - xmax: bin right edge |
stat_density¶
ggplotly.stats.stat_density.stat_density
¶
Bases: Stat
Compute kernel density estimate for continuous data.
This stat performs kernel density estimation, useful for visualizing the distribution of a continuous variable as a smooth curve.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Data to use for this stat. |
None
|
mapping
|
dict
|
Aesthetic mappings. |
None
|
bw
|
str or float
|
Bandwidth method or value. Options: - 'nrd0' (default): Silverman's rule-of-thumb (R default) - 'nrd': Scott's variation of Silverman's rule - 'scott': Scott's rule - 'silverman': Silverman's rule - float: Explicit bandwidth value |
'nrd0'
|
adjust
|
float
|
Bandwidth adjustment multiplier. Default is 1. Larger values produce smoother curves. |
1
|
kernel
|
str
|
Kernel function. Default is 'gaussian'. Note: scipy only supports gaussian kernel. |
'gaussian'
|
n
|
int
|
Number of equally spaced points for density evaluation. Default is 512 (matching R's default). |
512
|
trim
|
bool
|
If True, trim the density curve to the data range. Default is False (extend slightly beyond data range). |
False
|
na_rm
|
bool
|
If True, remove NA values. Default is False. |
False
|
**params
|
Additional parameters for the stat. |
{}
|
Computed variables
- x: Evaluation points
- y: Density estimates (integrate to 1)
- density: Same as y
- count: Density * n (useful for histograms)
- scaled: Density scaled to maximum of 1
- ndensity: Alias for scaled
Examples:
>>> stat_density() # Default: nrd0 bandwidth, 512 points
>>> stat_density(bw='scott', adjust=0.5) # Narrower bandwidth
>>> stat_density(n=256, trim=True) # Fewer points, trimmed to data range
__init__(data=None, mapping=None, bw='nrd0', adjust=1, kernel='gaussian', n=512, trim=False, na_rm=False, **params)
¶
Initialize the density stat.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Data to use for this stat. |
None
|
mapping
|
dict
|
Aesthetic mappings. |
None
|
bw
|
str or float
|
Bandwidth method or value. Default is 'nrd0'. |
'nrd0'
|
adjust
|
float
|
Bandwidth adjustment multiplier. Default is 1. |
1
|
kernel
|
str
|
Kernel function. Default is 'gaussian'. |
'gaussian'
|
n
|
int
|
Number of evaluation points. Default is 512. |
512
|
trim
|
bool
|
Trim to data range. Default is False. |
False
|
na_rm
|
bool
|
Remove NA values. Default is False. |
False
|
**params
|
Additional parameters. |
{}
|
compute(data)
¶
Estimates density for density plots.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame or array - like
|
Data for density estimation. If DataFrame, uses the column specified in mapping['x']. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
tuple |
(DataFrame with density data, updated mapping dict) |
compute_array(x)
¶
Compute density for a given array (backward compatibility).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
array - like
|
Data for density estimation. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
dict |
Contains 'x', 'y', 'density', 'count', 'scaled', 'ndensity'. |
stat_smooth¶
ggplotly.stats.stat_smooth.stat_smooth
¶
Bases: Stat
Stat for computing smoothed lines (LOESS, linear regression, etc.).
Handles the computation of smoothed values, which are then passed to geom_smooth for visualization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Data to use for this stat. |
None
|
mapping
|
dict
|
Aesthetic mappings. |
None
|
method
|
str
|
The smoothing method. Options: - 'loess': Custom LOESS with degree-2 polynomials (default, matches R) - 'lowess': statsmodels lowess (degree-1, faster) - 'lm': Linear regression |
'loess'
|
span
|
float
|
The smoothing parameter for LOESS (fraction of points to use). Default is 2/3 to match R's loess default. |
2 / 3
|
se
|
bool
|
Whether to compute standard errors. Default is True. |
True
|
level
|
float
|
Confidence level for intervals. Default is 0.95 (95% CI), matching R's ggplot2 default. |
0.95
|
degree
|
int
|
Polynomial degree for LOESS fitting (1 or 2). Default is 2. |
2
|
**params
|
Additional parameters for the stat. |
{}
|
__init__(data=None, mapping=None, method='loess', span=2 / 3, se=True, level=0.95, degree=2, **params)
¶
Initializes the smoothing stat.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Data to use for this stat. |
None
|
mapping
|
dict
|
Aesthetic mappings. |
None
|
method
|
str
|
The smoothing method. Options: - 'loess': Custom LOESS with degree-2 polynomials (default, matches R) - 'lowess': statsmodels lowess (degree-1, faster) - 'lm': Linear regression |
'loess'
|
span
|
float
|
The smoothing parameter for LOESS (fraction of points to use). Default is 2/3 to match R's loess default. |
2 / 3
|
se
|
bool
|
Whether to compute standard errors. Default is True. |
True
|
level
|
float
|
Confidence level for intervals. Default is 0.95 (95% CI), matching R's ggplot2 default. |
0.95
|
degree
|
int
|
Polynomial degree for LOESS fitting (1 or 2). Default is 2. |
2
|
**params
|
Additional parameters. |
{}
|
apply_smoothing(x, y, return_hat_diag=False)
¶
Applies smoothing based on the chosen method.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
array - like
|
The x-values. |
required |
y
|
array - like
|
The y-values. |
required |
return_hat_diag
|
bool
|
If True, also return diagonal of hat matrix (for LOESS only) |
False
|
Returns:
| Type | Description |
|---|---|
|
Smoothed y-values, or tuple (smoothed_y, hat_diag) if return_hat_diag=True |
compute(data)
¶
Compute smoothed values for the data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Input data with x and y columns. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
tuple |
(DataFrame with smoothed values, updated mapping dict) |
compute_confidence_intervals(x, y, smoothed_y, hat_diag=None)
¶
Compute confidence intervals for the smoothed line.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
array - like
|
The x-values. |
required |
y
|
array - like
|
The original y-values. |
required |
smoothed_y
|
array - like
|
The smoothed y-values. |
required |
hat_diag
|
array - like
|
Diagonal of hat matrix (for LOESS with exact CI) |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
tuple |
(ymin, ymax) arrays for confidence interval bounds. |
compute_stat(data, x_col='x', y_col='y')
¶
Computes the stat for smoothing, modifying the data with smoothed values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
The input data containing x and y columns. |
required |
x_col
|
str
|
Name of the x column. Default is 'x'. |
'x'
|
y_col
|
str
|
Name of the y column. Default is 'y'. |
'y'
|
Returns:
| Name | Type | Description |
|---|---|---|
DataFrame |
Modified data with smoothed 'y' values and optional confidence intervals. |
stat_summary¶
ggplotly.stats.stat_summary.stat_summary
¶
Bases: Stat
Summarize y values at each unique x.
Computes summary statistics (mean, median, etc.) of y for each x value. Can compute central tendency and error bars in one step.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fun
|
str or callable
|
Function for the central value. Options: - 'mean' (default), 'median', 'min', 'max', 'sum' - Or a custom function that takes a Series and returns a scalar Alias: fun_y (deprecated, for backward compatibility) |
'mean'
|
fun_min
|
str or callable
|
Function for lower error bar. Alias: fun_ymin (deprecated, for backward compatibility) |
None
|
fun_max
|
str or callable
|
Function for upper error bar. Alias: fun_ymax (deprecated, for backward compatibility) |
None
|
fun_data
|
str or callable
|
Function that returns y, ymin, ymax together. Built-in options: - 'mean_se': mean +/- standard error - 'mean_cl_normal': mean +/- 95% CI (t-distribution) - 'mean_sdl': mean +/- 1 SD - 'median_hilow': median with 95% quantile range |
None
|
fun_args
|
dict
|
Additional arguments passed to fun/fun_min/fun_max. |
None
|
geom
|
str
|
Default geom to use. Options: 'pointrange', 'errorbar', 'point' |
'pointrange'
|
na_rm
|
bool
|
If True, remove NA values before computation. Default is False. |
False
|
Aesthetics computed
- y: The central summary value
- ymin: Lower bound (if fun_min or fun_data provided)
- ymax: Upper bound (if fun_max or fun_data provided)
Examples:
Mean with standard error bars¶
geom_pointrange(stat='summary', fun_data='mean_se')
Median with 95% quantile range¶
stat_summary(fun_data='median_hilow')
Custom: mean with min/max range (R-style parameter names)¶
stat_summary(fun='mean', fun_min='min', fun_max='max')
Custom: mean with min/max range (legacy parameter names, still supported)¶
stat_summary(fun_y='mean', fun_ymin='min', fun_ymax='max')
Custom function¶
stat_summary(fun=lambda x: x.quantile(0.75))
fun
property
writable
¶
R-style alias for fun_y.
fun_max
property
writable
¶
R-style alias for fun_ymax.
fun_min
property
writable
¶
R-style alias for fun_ymin.
compute(data)
¶
Compute summary statistics for each x value.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Input data with x and y columns. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
tuple |
(summarized DataFrame, updated mapping) |
stat_ecdf¶
ggplotly.stats.stat_ecdf.stat_ecdf
¶
Bases: Stat
Compute the empirical cumulative distribution function.
The ECDF shows the proportion of data points less than or equal to each value. Useful for visualizing distributions without binning like histograms.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Data to use for this stat. |
None
|
mapping
|
dict
|
Aesthetic mappings. |
None
|
n
|
int
|
Number of points to evaluate. Default uses all unique values. |
None
|
pad
|
bool
|
If True, pad the ECDF with (min-eps, 0) and (max+eps, 1). Default False. |
False
|
**params
|
Additional parameters for the stat. |
{}
|
Examples:
>>> ggplot(df, aes(x='value')) + geom_step(stat='ecdf')
>>> ggplot(df, aes(x='value')) + geom_line(stat='ecdf')
__init__(data=None, mapping=None, n=None, pad=False, **params)
¶
Initialize the stat_ecdf.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Data to use for this stat. |
None
|
mapping
|
dict
|
Aesthetic mappings. |
None
|
n
|
int
|
Number of evaluation points. |
None
|
pad
|
bool
|
Whether to pad ECDF at ends. Default False. |
False
|
**params
|
Additional parameters. |
{}
|
compute(data)
¶
Compute the ECDF for the given data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame or array - like
|
The input data. If DataFrame, uses the column specified in mapping['x']. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
tuple |
(DataFrame with 'x' and 'y' columns, updated mapping dict) |
compute_array(x)
¶
Compute the ECDF values for a given array of x values.
This is a convenience method for direct array computation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
array - like
|
The input data values. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
tuple |
(x_sorted, y_values) arrays |
stat_stl¶
ggplotly.stats.stat_stl.stat_stl
¶
Bases: Stat
Stat for STL (Seasonal-Trend decomposition using Loess).
Decomposes a time series into observed, trend, seasonal, and residual components. Returns a stacked DataFrame with a 'component' column suitable for use with facet_wrap().
Parameters¶
period : int, optional Seasonal period. Required unless data has DatetimeIndex with frequency. seasonal : int, optional Length of the seasonal smoother. Must be odd. Default is 7. trend : int, optional Length of the trend smoother. Default is auto-calculated. robust : bool, optional Use robust fitting to downweight outliers. Default is False.
Examples¶
STL with faceting¶
(ggplot(df, aes(x='date', y='value')) ... + stat_stl(period=12) ... + geom_line() ... + facet_wrap('component', ncol=1, scales='free_y'))
Color by component¶
(ggplot(df, aes(x='date', y='value', color='component')) ... + stat_stl(period=12) ... + geom_line())
compute(data)
¶
Compute STL decomposition and return stacked DataFrame.
stat_fanchart¶
ggplotly.stats.stat_fanchart.stat_fanchart
¶
Bases: Stat
Stat for computing percentile bands from T×N matrices.
Computes percentiles across columns at each row (time point) and returns a DataFrame with percentile columns suitable for ribbon plotting.
Parameters¶
columns : list, optional Specific columns to include in percentile calculation. Default is all numeric columns. percentiles : list, optional Percentile levels to compute. Default is [10, 25, 50, 75, 90].
Returns¶
DataFrame with columns: - x (from index or x aesthetic) - p{N} for each percentile (e.g., p10, p25, p50, p75, p90) - median (alias for p50)
Examples¶
Use with geom_ribbon¶
(ggplot(df) ... + stat_fanchart() ... + geom_ribbon(aes(ymin='p10', ymax='p90'), alpha=0.3) ... + geom_ribbon(aes(ymin='p25', ymax='p75'), alpha=0.3) ... + geom_line(aes(y='median')))
compute(data)
¶
Compute percentiles across columns.
stat_function¶
ggplotly.stats.stat_function.stat_function
¶
Bases: Stat
Stat for computing y values from a function over the x range.
Evaluates a user-provided function over a grid of x values within the data range, returning x and y columns for line plotting.
Default geom: line
Parameters¶
fun : callable Function that takes an array of x values and returns y values. n : int, optional Number of points to evaluate. Default is 101. xlim : tuple, optional (min, max) range for x values. If None, uses data range. args : tuple, optional Additional positional arguments to pass to fun.
Examples¶
from scipy import stats
Standard normal distribution¶
stat_function(fun=lambda x: stats.norm.pdf(x, loc=0, scale=1))
Normal with custom mean and std¶
stat_function(fun=lambda x: stats.norm.pdf(x, loc=5, scale=2))
Exponential distribution (lambda=1)¶
stat_function(fun=lambda x: stats.expon.pdf(x, scale=1))
Gamma distribution (shape=2, scale=1)¶
stat_function(fun=lambda x: stats.gamma.pdf(x, a=2, scale=1))
Beta distribution (a=2, b=5)¶
stat_function(fun=lambda x: stats.beta.pdf(x, a=2, b=5), xlim=(0, 1))
Student's t distribution (df=3)¶
stat_function(fun=lambda x: stats.t.pdf(x, df=3))
Chi-squared distribution (df=5)¶
stat_function(fun=lambda x: stats.chi2.pdf(x, df=5))
Custom polynomial function¶
stat_function(fun=lambda x: x*2 - 2x + 1, n=50)
Plot without data - must provide xlim (uses geom_line by default)¶
(ggplot() ... + stat_function(fun=lambda x: stats.norm.pdf(x), xlim=(-4, 4)))
compute(data)
¶
Compute function values over x range.
stat_qq¶
ggplotly.stats.stat_qq.stat_qq
¶
Bases: Stat
Stat for computing theoretical vs sample quantiles for Q-Q plots.
Computes sample quantiles against theoretical quantiles from a specified distribution. By default uses the standard normal distribution.
Default geom: point
Parameters¶
distribution : scipy.stats distribution, optional A scipy.stats distribution object with a ppf method. Default is scipy.stats.norm. dparams : dict, optional Additional parameters to pass to the distribution's ppf method. For example, {'df': 5} for a t-distribution.
Aesthetics¶
sample : str (required) Column name containing the sample data to compare against the theoretical distribution. color : str, optional Grouping variable for colored points. group : str, optional Grouping variable for separate Q-Q plots.
Examples¶
import numpy as np import pandas as pd from scipy import stats
Basic Q-Q plot against normal distribution¶
df = pd.DataFrame({'values': np.random.randn(100)}) (ggplot(df, aes(sample='values')) ... + stat_qq())
Q-Q plot against t-distribution¶
(ggplot(df, aes(sample='values')) ... + stat_qq(distribution=stats.t, dparams={'df': 5}))
Q-Q plot with reference line¶
(ggplot(df, aes(sample='values')) ... + stat_qq() ... + stat_qq_line())
compute(data)
¶
Compute theoretical vs sample quantiles.
stat_qq_line¶
ggplotly.stats.stat_qq_line.stat_qq_line
¶
Bases: Stat
Stat for computing the Q-Q reference line through specified quantiles.
Computes a reference line for Q-Q plots that passes through the points where the sample and theoretical quantiles match at specified probability levels (default: Q1 and Q3, i.e., 25th and 75th percentiles).
Default geom: line
Parameters¶
distribution : scipy.stats distribution, optional A scipy.stats distribution object with a ppf method. Default is scipy.stats.norm. dparams : dict, optional Additional parameters to pass to the distribution's ppf method. line_p : tuple of float, optional Two probability values (between 0 and 1) specifying which quantiles to use for fitting the line. Default is (0.25, 0.75) for Q1 and Q3.
Aesthetics¶
sample : str (required) Column name containing the sample data. color : str, optional Line color.
Examples¶
import numpy as np import pandas as pd from scipy import stats
Q-Q plot with default reference line (through Q1 and Q3)¶
df = pd.DataFrame({'values': np.random.randn(100)}) (ggplot(df, aes(sample='values')) ... + stat_qq() ... + stat_qq_line())
Reference line through 10th and 90th percentiles¶
(ggplot(df, aes(sample='values')) ... + stat_qq() ... + stat_qq_line(line_p=(0.10, 0.90)))
Q-Q plot against t-distribution¶
(ggplot(df, aes(sample='values')) ... + stat_qq(distribution=stats.t, dparams={'df': 5}) ... + stat_qq_line(distribution=stats.t, dparams={'df': 5}))
compute(data)
¶
Compute Q-Q reference line through specified quantiles.
stat_contour¶
ggplotly.stats.stat_contour.stat_contour
¶
Bases: Stat
Compute 2D density or interpolate gridded data for contour plots.
This stat performs either 2D kernel density estimation (when only x, y are provided) or interpolation of irregular z data to a regular grid (when x, y, z are provided).
Attributes:
| Name | Type | Description |
|---|---|---|
gridsize |
int
|
Resolution of the output grid (number of points per axis) |
bw_method |
Bandwidth method for KDE ('scott', 'silverman', or scalar) |
|
na_rm |
bool
|
Whether to remove NA values before computation |
Computed Variables
x: Grid x coordinates (1D array of length gridsize) y: Grid y coordinates (1D array of length gridsize) z: 2D density/interpolated values (2D array of shape gridsize x gridsize) density: Alias for z
Examples:
>>> # Contours from explicit z values (e.g., elevation data)
>>> ggplot(df, aes(x='lon', y='lat', z='elevation')) + geom_contour()
>>> # With custom grid resolution and bandwidth
>>> stat = stat_contour(gridsize=50, bw_method='silverman')
__init__(data=None, mapping=None, gridsize=100, bw_method=None, na_rm=False, **params)
¶
Initialize the contour stat.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Data to use for this stat. If None, data will be provided by the geom or plot. |
None
|
mapping
|
dict
|
Aesthetic mappings. Required: 'x', 'y' Optional: 'z' (if provided, interpolates instead of computing KDE) |
None
|
gridsize
|
int
|
Resolution of the output grid. Higher values give smoother contours but take longer to compute. Default is 100 (produces 100x100 grid = 10,000 points). |
100
|
bw_method
|
str or float
|
Bandwidth method for KDE. - 'scott': Scott's rule of thumb (default if None) - 'silverman': Silverman's rule of thumb - float: Manual bandwidth factor Only used when computing KDE (when z is not provided). |
None
|
na_rm
|
bool
|
If True, remove NA values before computation. Default is False (NA values will cause errors). |
False
|
**params
|
Additional parameters passed to the Stat base class. |
{}
|
compute(data)
¶
Compute 2D density or interpolate z values to a grid.
This is the main computation method. It: 1. Extracts x, y (and optionally z) from the data 2. Optionally removes NA values 3. Calls either _compute_kde or _compute_from_z 4. Returns the grid data and updated mapping
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Data containing the columns specified in mapping. Must have x and y columns. May optionally have z column. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
tuple |
(result_dict, new_mapping) - result_dict: Contains 'x', 'y', 'z', 'density' arrays - new_mapping: Updated mapping pointing to result columns |
Raises:
| Type | Description |
|---|---|
ValueError
|
If x or y aesthetics are not specified in mapping. |
compute_grid(data, x_col='x', y_col='y', z_col=None)
¶
Convenience method to compute grid directly from column names.
This is a simpler interface when you want to call stat_contour directly without setting up mapping first.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Input data containing the specified columns. |
required |
x_col
|
str
|
Name of x column. Default is 'x'. |
'x'
|
y_col
|
str
|
Name of y column. Default is 'y'. |
'y'
|
z_col
|
str
|
Name of z column. If None, computes KDE. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
Contains 'x', 'y', 'z' grid arrays. |
Example
stat = stat_contour(gridsize=50) grid = stat.compute_grid(df, x_col='longitude', y_col='latitude')
grid['z'] contains the 2D KDE values¶