import pandas as pd
from plotnine import (
ggplot,
aes,
after_stat,
stage,
geom_bar,
geom_text,
geom_label,
position_dodge2,
facet_wrap,
)from plotnine.data import mtcars
Show counts and percentages for bar plots
We can plot a bar graph and easily show the counts for each bar
("factor(cyl)", fill="factor(cyl)"))
ggplot(mtcars, aes(+ geom_bar()
+ geom_text(
=after_stat("count")), stat="count", nudge_y=0.125, va="bottom"
aes(label
) )
stat_count
also calculates proportions (as prop
) and a proportion can be converted to a percentage.
("factor(cyl)", fill="factor(cyl)"))
ggplot(mtcars, aes(+ geom_bar()
+ geom_text(
=after_stat("prop*100")),
aes(label="count",
stat=0.125,
nudge_y="bottom",
va="{:.1f}% ",
format_string
) )
These are clearly wrong percentages. The system puts each bar in a separate group. We need to tell it to put all bars in the panel in single group, so that the percentage are what we expect.
("factor(cyl)", fill="factor(cyl)"))
ggplot(mtcars, aes(+ geom_bar()
+ geom_text(
=after_stat("prop*100"), group=1),
aes(label="count",
stat=0.125,
nudge_y="bottom",
va="{:.1f}%",
format_string
) )
Without group=1
, you can calculate the proportion / percentage after statistics have been calculated. This works because mapping expressions are evaluated across the whole panel. It can work when you have more than 1 categorical.
("factor(cyl)", fill="factor(cyl)"))
ggplot(mtcars, aes(+ geom_bar()
+ geom_text(
=after_stat("count / sum(count) * 100")),
aes(label="count",
stat=0.125,
nudge_y="bottom",
va="{:.1f}%",
format_string
) )
For more on why automatic grouping may work the way you want, see this tutorial.
We can get the counts
and we can get the percentages
we need to print both. We can do that in two ways,
- Using two
geom_text
layers.
("factor(cyl)", fill="factor(cyl)"))
ggplot(mtcars, aes(+ geom_bar()
+ geom_text(
=after_stat("count")),
aes(label="count",
stat=-0.14,
nudge_x=0.125,
nudge_y="bottom",
va
)+ geom_text(
=after_stat("prop*100"), group=1),
aes(label="count",
stat=0.14,
nudge_x=0.125,
nudge_y="bottom",
va="({:.1f}%)",
format_string
) )
- Using a function to combine the counts and percentages
def combine(counts, percentages):
= "{} ({:.1f}%)".format
fmt return [fmt(c, p) for c, p in zip(counts, percentages)]
("factor(cyl)", fill="factor(cyl)"))
ggplot(mtcars, aes(+ geom_bar()
+ geom_text(
=after_stat("combine(count, prop*100)"), group=1),
aes(label="count",
stat=0.125,
nudge_y="bottom",
va
) )
It works with facetting.
("factor(cyl)", fill="factor(cyl)"))
ggplot(mtcars, aes(+ geom_bar()
+ geom_text(
=after_stat("combine(count, prop*100)"), group=1),
aes(label="count",
stat=0.125,
nudge_y="bottom",
va=9,
size
)+ facet_wrap("am")
)
Credit: This example was motivated by the github user Fandekasp (Adrien Lemaire) and difficulty he faced in displaying percentages of bar plots.
Percentages when you have more than one categorical.
group = 1
does not work when you have more than one categories per x
location.
("factor(cyl)", fill="factor(am)"))
ggplot(mtcars, aes(+ geom_bar(position="dodge2")
+ geom_text(
=after_stat("prop * 100"), group=1),
aes(label="count",
stat=position_dodge2(width=0.9),
position="{:.1f}%",
format_string=9,
size
) )
You have to calculate the percentages after statistics for the panel have been calculated.
("factor(cyl)", fill="factor(am)"))
ggplot(mtcars, aes(+ geom_bar(position="dodge2")
+ geom_text(
aes(=after_stat("count / sum(count) * 100"),
label=stage(after_stat="count", after_scale="y + 0.25"),
y
),="count",
stat=position_dodge2(width=0.9),
position="{:.1f}%",
format_string=9,
size
) )
For percentages per bar at each x
location, you have to group the counts per location can compute the proportions.
Bars with Group Percentages
# Gallery, bars
def prop_per_x(x, count):
"""
Compute the proportion of the counts for each value of x
"""
= pd.DataFrame({"x": x, "count": count})
df = df["count"] / df.groupby("x")["count"].transform("sum")
prop return prop
("factor(cyl)", fill="factor(am)"))
ggplot(mtcars, aes(+ geom_bar(position="dodge2")
+ geom_text(
aes(=after_stat("prop_per_x(x, count) * 100"),
label=stage(after_stat="count", after_scale="y+.25"),
y
),="count",
stat=position_dodge2(width=0.9),
position="{:.1f}%",
format_string=9,
size
) )
Stacked Bars with Group Percentages
# Gallery, bars
("factor(cyl)", fill="factor(am)"))
ggplot(mtcars, aes(+ geom_bar(position="fill")
+ geom_label(
=after_stat("prop_per_x(x, count) * 100")),
aes(label="count",
stat="fill",
position="{:.1f}%",
format_string=9,
size
) )
NOTE
With more categories, if it becomes harder get the right groupings withing plotnine, the solution is to do all (or most) the data manipulation in pandas then plot using geom_col + geom_text
.