stage(start= None , after_stat= None , after_scale= None )
Stage allows you evaluating mapping at more than one stage
You can evaluate an expression of a variable in a dataframe, and later evaluate an expression that modifies the values mapped to the scale.
Parameters
start : str | array_like | scalar = None
Aesthetic expression using primary variables from the layer data.
after_stat : str = None
Aesthetic expression using variables calculated by the stat.
after_scale : str = None
Aesthetic expression using aesthetics of the layer.
Examples
% load_ext autoreload
% autoreload 2
% aimport plotnine
import pandas as pd
import numpy as np
from plotnine import (
ggplot,
aes,
after_stat,
stage,
geom_bar,
geom_text,
geom_bin_2d,
stat_bin_2d,
)
stage
df = pd.DataFrame({
"var1" : list ("abbcccddddeeeee" ),
"cat" : list ("RSRSRSRRRSRSSRS" )
})
(
ggplot(df, aes("var1" ))
+ geom_bar()
)
Add the corresponding count on top of each bar.
(
ggplot(df, aes("var1" ))
+ geom_bar()
+ geom_text(aes(label= after_stat("count" )), stat= "count" )
)
Adjust the y
position so that the counts do not overlap the bars.
(
ggplot(df, aes("var1" ))
+ geom_bar()
+ geom_text(
aes(label= after_stat("count" ), y= stage(after_stat= "count" , after_scale= "y+.1" )),
stat= "count" ,
)
)
Note that this will work even nicely for stacked bars where adjustig the position with nudge_y=0.1
would not.
(
ggplot(df, aes("var1" , fill= "cat" ))
+ geom_bar()
+ geom_text(
aes(label= after_stat("count" ), y= stage(after_stat= "count" , after_scale= "y+.1" )),
stat= "count" ,
position= "stack" ,
)
)
Create a binned 2d plot with counts
np.random.seed(123 )
df = pd.DataFrame({
"col_1" : np.random.rand(1000 ),
"col_2" : np.random.rand(1000 )
})
(
ggplot(df, aes(x= "col_1" , y= "col_2" ))
+ geom_bin_2d(position= "identity" , binwidth= 0.1 )
)
Add counts to the bins. stat_bin_2d
bins are specified using retangular minimum and maximum end-points for dimension; we use these values to compute the mid-points at which to place the counts.
First x
and y
aesthetics are mapped to col_1 and col_2 variables, then after the statistic consumes them and creates xmin
, xmax
, ymin
& ymax
values for each bin along with associated count
. After the statistic computation the x
and y
aesthetics do not exist, but we create meaningful values using the minimum and maximum end-points.
Note that the binning parameters for the geom
and stat
combination must be the same. In this case it is the binwidth
.
(
ggplot(df, aes(x= "col_1" , y= "col_2" ))
+ geom_bin_2d(position= "identity" , binwidth= 0.1 )
+ stat_bin_2d(
aes(
x= stage(start= "col_1" , after_stat= "(xmin+xmax)/2" ),
y= stage(start= "col_2" , after_stat= "(ymin+ymax)/2" ),
label= after_stat("count" ),
),
binwidth= 0.1 ,
geom= "text" ,
format_string= " {:.0f} " ,
size= 10 ,
)
)
Source: stage