import pandas as pd
from plotnine import (
ggplot,
aes,
geom_path,
geom_line,
labs,
scale_color_continuous,
element_text,
theme,
)from plotnine.data import economics
Path plots
geom_path()
connects the observations in the order in which they appear in the data, this is different from geom_line()
which connects observations in order of the variable on the x axis.
10) # notice the rows are ordered by date economics.head(
date | pce | pop | psavert | uempmed | unemploy | |
---|---|---|---|---|---|---|
0 | 1967-07-01 | 507.4 | 198712 | 12.5 | 4.5 | 2944 |
1 | 1967-08-01 | 510.5 | 198911 | 12.5 | 4.7 | 2945 |
2 | 1967-09-01 | 516.3 | 199113 | 11.7 | 4.6 | 2958 |
3 | 1967-10-01 | 512.9 | 199311 | 12.5 | 4.9 | 3143 |
4 | 1967-11-01 | 518.1 | 199498 | 12.5 | 4.7 | 3066 |
5 | 1967-12-01 | 525.8 | 199657 | 12.1 | 4.8 | 3018 |
6 | 1968-01-01 | 531.5 | 199808 | 11.7 | 5.1 | 2878 |
7 | 1968-02-01 | 534.2 | 199920 | 12.2 | 4.5 | 3001 |
8 | 1968-03-01 | 544.9 | 200056 | 11.6 | 4.1 | 2877 |
9 | 1968-04-01 | 544.6 | 200208 | 12.2 | 4.6 | 2709 |
Because the data is in date order geom_path()
(in pint) produces the same result as geom_line()
(in black):
(="date", y="unemploy"))
ggplot(economics, aes(x+ geom_line(size=5) # plot geom_line as the first layer
+ geom_path(
="#ff69b4", # plot a path - colour pink
colour=1,
size
)+ labs(x="date", y="unemployment (,000)") # label x & y-axis
)
Plotting unemployment (scaled by population) versus savings rate shows how geom_path()
differs from geom_line()
. Because geom_path()
connects the observations in the order in which they appear in the data, this line is like a “journey through time”:
(="unemploy/pop", y="psavert"))
ggplot(economics, aes(x+ geom_path() # plot geom path
+ labs(x="scaled unemployment", y="saving rate") # label x & y-axis
)
Comparing geom_line()
(black) to geom_path()
(pink) shows how these two plots differ in what they can show: geom_path()
shows the savings rate has gone down over time, which is not evident with geom_path()
.
(="unemploy/pop", y="psavert"))
ggplot(economics, aes(x+ geom_path(
="#ff69b4", # plot geom_path as the first layer - colour pink
colour=0.5, # line transparency
alpha=2.5,
size# line thickness
) + geom_line() # layer geom_line
+ labs(x="scaled unemployment", y="saving rate") # label x & y-axis
)
The geom_path
can be easier to interpret if time is coloured in. First convert time to a number, and use this number to colour the path:
# convert date to a number
"date_as_number"] = pd.to_numeric(economics["date"]) economics[
# inspect new column
economics.head()
date | pce | pop | psavert | uempmed | unemploy | date_as_number | |
---|---|---|---|---|---|---|---|
0 | 1967-07-01 | 507.4 | 198712 | 12.5 | 4.5 | 2944 | -79056000000000000 |
1 | 1967-08-01 | 510.5 | 198911 | 12.5 | 4.7 | 2945 | -76377600000000000 |
2 | 1967-09-01 | 516.3 | 199113 | 11.7 | 4.6 | 2958 | -73699200000000000 |
3 | 1967-10-01 | 512.9 | 199311 | 12.5 | 4.9 | 3143 | -71107200000000000 |
4 | 1967-11-01 | 518.1 | 199498 | 12.5 | 4.7 | 3066 | -68428800000000000 |
The path is coloured such that it changes with time using the command aes(colour='date_as_number')
within geom_path()
.
# input
= [
legend_breaks -79056000000000000,
709948800000000000,
1401580800000000000,
# used to modify colour-graded legend
] = ["1967-Jul", "1992-Jun", "2014-Jun"]
legend_labels
# plot
(="unemploy/pop", y="psavert"))
ggplot(economics, aes(x+ geom_path(
="date_as_number")
aes(colour# colour geom_path using time variable "date_as_number"
) + labs(x="scaled unemployment", y="saving rate")
+ scale_color_continuous(
=legend_breaks, # set legend breaks (where labels will appear)
breaks=legend_labels,
labels# set labels on legend
) + theme(legend_title=element_text(text="Date")) # set title of legend
)