plotnine.geom_boxplot

geom_boxplot(mapping=None, data=None, **kwargs)

Box and whiskers plot

{usage}

Parameters

width : float = None

Box width. If None, the width is set to 90% of the resolution of the data. Note that if the stat has a width parameter, that takes precedence over this one.

outlier_alpha : float = 1

Transparency of the outlier points.

outlier_color : str | tuple = None

Color of the outlier points.

outlier_shape : str = "o"

Shape of the outlier points. An empty string hides the outliers.

outlier_size : float = 1.5

Size of the outlier points.

outlier_stroke : float = 0.5

Stroke-size of the outlier points.

notch : bool = False

Whether the boxes should have a notch.

varwidth : bool = False

If True, boxes are drawn with widths proportional to the square-roots of the number of observations in the groups.

notchwidth : float = 0.5

Width of notch relative to the body width.

fatten : float = 2

A multiplicative factor used to increase the size of the middle bar across the box.

See Also

stat_boxplot

The default stat for this geom.

Examples

import numpy as np
import pandas as pd

from plotnine import (
    ggplot,
    aes,
    geom_boxplot,
    geom_jitter,
    scale_x_discrete,
    coord_flip,
)
from plotnine.data import pageviews

A box and whiskers plot

The boxplot compactly displays the distribution of a continuous variable.

Read more: + wikipedia + ggplot2 docs

flights = pd.read_csv("data/flights.csv")
flights.head()
year month passengers
0 1949 January 112
1 1949 February 118
2 1949 March 132
3 1949 April 129
4 1949 May 121

Basic boxplot

months = [month[:3] for month in flights.month[:12]]
print(months)
['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']

A Basic Boxplot

# Gallery, distributions
(
    ggplot(flights)
    + geom_boxplot(aes(x="factor(month)", y="passengers"))
    + scale_x_discrete(labels=months, name="month")  # change ticks labels on OX
)

Horizontal boxplot

(
    ggplot(flights)
    + geom_boxplot(aes(x="factor(month)", y="passengers"))
    + coord_flip()
    + scale_x_discrete(
        labels=months[::-1],
        limits=flights.month[11::-1],
        name="month",
    )
)

Boxplot with jittered points:

(
    ggplot(flights, aes(x="factor(month)", y="passengers"))
    + geom_boxplot()
    + geom_jitter()
    + scale_x_discrete(labels=months, name="month")  # change ticks labels on OX
)

Precomputed boxplots

For datasets that do not fit in memory, you can precompute the boxplot metrics (for example by aggregating the statistics using database queries) and then use geom_boxplot with stat="identity".

# Precompute the metrics
def q25(x):
    return x.quantile(0.25)

def q75(x):
    return x.quantile(0.75)
    
pageviews["hour"] = pageviews.date_hour.dt.hour
precomputed_metrics = pageviews.groupby("hour").agg({'pageviews': ["min", q25, "median", q75, "max"]})
precomputed_metrics.columns = [col_name[1] for col_name in precomputed_metrics.columns]  
precomputed_metrics = precomputed_metrics.reset_index()
precomputed_metrics.head()
hour min q25 median q75 max
0 0 8437.500380 8842.109077 9297.046035 9600.362430 11762.446233
1 1 8852.123978 9177.938537 9457.821814 10530.072887 11974.437292
2 2 8793.076686 9176.462389 9704.885172 10446.315276 12105.406628
3 3 8683.606449 9574.722286 10615.670464 11290.246605 11651.443193
4 4 8252.974951 9898.998785 10959.909095 11409.657288 11603.711837
(
    ggplot(precomputed_metrics)
    + geom_boxplot(
        aes(x="factor(hour)", ymin="min", lower="q25", middle="median", upper="q75", ymax="max"),
        stat="identity"
    )
)

Source: A box and whiskers plot