Plotly 3.0.0 in Jupyter Notebook

Plotly 3.0.0 in Jupyter Notebook

Plotly.py 3.0.0 was recently released, and I finally got a chance to tinker with it! This is exciting because this release includes features that are specifically designed for Jupyter Notebooks. Namely, JavaScript is directly embedded in the figure that you can now access directly through your notebook. Exciting!

If you haven’t installed plotly or need to upgrade, open your Anaconda command prompt (as Administrator) and follow these directions. After you install plotly, launch Jupyter Notebook (by typing “Jupyter Notebook” into your Anaconda command prompt or by opening Jupyter Notebook using your computer menu). Next, enter your plotly username and api key in your notebook. You can sign up for plotly here. Directions for generating an api key here.

#first import plotly and provide username and api key
import plotly
plotly.tools.set_credentials_file(username='UserName', api_key='XXXXX')

Now load the following:

import plotly.plotly as py
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import numpy as np
import pandas as pd

init_notebook_mode(connected=True) #tells the notebook to load figures in offline mode

Plotly should now work within your notebook.

Here’s an example of a 2D plot:

x=np.random.randn(1000)
y=np.random.randn(1000)
go.FigureWidget(
    data=[
        {'x': x, 'y': y, 'type': 'histogram2dcontour'}
    ]
)

newplotExample of a 2D plot with markers:

x = np.random.randn(2000)
y = np.random.randn(2000)
iplot([go.Histogram2dContour(x=x, y=y, contours=dict(coloring='heatmap')),
       go.Scatter(x=x, y=y, mode='markers', marker=dict(color='white', size=3, opacity=0.3))], show_link=False)

newplot(1)Example of a 3D plot:

s = np.linspace(0, 2 * np.pi, 240)
t = np.linspace(0, np.pi, 240)
tGrid, sGrid = np.meshgrid(s, t)

r = 2 + np.sin(7 * sGrid + 5 * tGrid)  # r = 2 + sin(7s+5t)
x = r * np.cos(sGrid) * np.sin(tGrid)  # x = r*cos(s)*sin(t)
y = r * np.sin(sGrid) * np.sin(tGrid)  # y = r*sin(s)*sin(t)
z = r * np.cos(tGrid)                  # z = r*cos(t)

surface = go.Surface(x=x, y=y, z=z)
data = [surface]

layout = go.Layout(
    title='Parametric Plot',
    scene=dict(
        xaxis=dict(
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)'
        ),
        yaxis=dict(
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)'
        ),
        zaxis=dict(
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)'
        )
    )
)

fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='jupyter-parametric_plot')
newplot(2)
Interact with it here

Lastly, an animated plot:

from plotly.offline import init_notebook_mode, iplot
from IPython.display import display, HTML


init_notebook_mode(connected=True)

url = 'https://raw.githubusercontent.com/plotly/datasets/master/gapminderDataFiveYear.csv'
dataset = pd.read_csv(url)

years = ['1952', '1962', '1967', '1972', '1977', '1982', '1987', '1992', '1997', '2002', '2007']


# make list of continents
continents = []
for continent in dataset['continent']:
    if continent not in continents:
        continents.append(continent)
# make figure
figure = {
    'data': [],
    'layout': {},
    'frames': []
}

# fill in most of layout
figure['layout']['xaxis'] = {'range': [30, 85], 'title': 'Life Expectancy'}
figure['layout']['yaxis'] = {'title': 'GDP per Capita', 'type': 'log'}
figure['layout']['hovermode'] = 'closest'
figure['layout']['sliders'] = {
    'args': [
        'transition', {
            'duration': 400,
            'easing': 'cubic-in-out'
        }
    ],
    'initialValue': '1952',
    'plotlycommand': 'animate',
    'values': years,
    'visible': True
}
figure['layout']['updatemenus'] = [
    {
        'buttons': [
            {
                'args': [None, {'frame': {'duration': 500, 'redraw': False},
                         'fromcurrent': True, 'transition': {'duration': 300, 'easing': 'quadratic-in-out'}}],
                'label': 'Play',
                'method': 'animate'
            },
            {
                'args': [[None], {'frame': {'duration': 0, 'redraw': False}, 'mode': 'immediate',
                'transition': {'duration': 0}}],
                'label': 'Pause',
                'method': 'animate'
            }
        ],
        'direction': 'left',
        'pad': {'r': 10, 't': 87},
        'showactive': False,
        'type': 'buttons',
        'x': 0.1,
        'xanchor': 'right',
        'y': 0,
        'yanchor': 'top'
    }
]
#custom colors
custom_colors = {
    'Asia': 'rgb(171, 99, 250)',
    'Europe': 'rgb(230, 99, 250)',
    'Africa': 'rgb(99, 110, 250)',
    'Americas': 'rgb(25, 211, 243)',
    'Oceania': 'rgb(50, 170, 255)'
}
sliders_dict = {
    'active': 0,
    'yanchor': 'top',
    'xanchor': 'left',
    'currentvalue': {
        'font': {'size': 20},
        'prefix': 'Year:',
        'visible': True,
        'xanchor': 'right'
    },
    'transition': {'duration': 300, 'easing': 'cubic-in-out'},
    'pad': {'b': 10, 't': 50},
    'len': 0.9,
    'x': 0.1,
    'y': 0,
    'steps': []
}

# make data
year = 1952
for continent in continents:
    dataset_by_year = dataset[dataset['year'] == year]
    dataset_by_year_and_cont = dataset_by_year[dataset_by_year['continent'] == continent]

    data_dict = {
        'x': list(dataset_by_year_and_cont['lifeExp']),
        'y': list(dataset_by_year_and_cont['gdpPercap']),
        'mode': 'markers',
        'text': list(dataset_by_year_and_cont['country']),
        'marker': {
            'sizemode': 'area',
            'sizeref': 200000,
            'size': list(dataset_by_year_and_cont['pop'])
        },
        'name': continent
    }
    figure['data'].append(data_dict)
    
# make frames
for year in years:
    frame = {'data': [], 'name': str(year)}
    for continent in continents:
        dataset_by_year = dataset[dataset['year'] == int(year)]
        dataset_by_year_and_cont = dataset_by_year[dataset_by_year['continent'] == continent]

        data_dict = {
            'x': list(dataset_by_year_and_cont['lifeExp']),
            'y': list(dataset_by_year_and_cont['gdpPercap']),
            'mode': 'markers',
            'text': list(dataset_by_year_and_cont['country']),
            'marker': {
                'sizemode': 'area',
                'sizeref': 200000,
                'size': list(dataset_by_year_and_cont['pop'])
            },
            'name': continent
        }
        frame['data'].append(data_dict)

    figure['frames'].append(frame)
    slider_step = {'args': [
        [year],
        {'frame': {'duration': 300, 'redraw': False},
         'mode': 'immediate',
       'transition': {'duration': 300}}
     ],
     'label': year,
     'method': 'animate'}
    sliders_dict['steps'].append(slider_step)

    
figure['layout']['sliders'] = [sliders_dict]

iplot(figure)
newplot(3)
Interact with it here

Neat, right?!

 

Overall, everything ran smoothly except the last plot. I actually initially tried to make this one:

gapminder_custom
From: https://plot.ly/python/gapminder-example/ (scroll to the bottom)

but I kept getting an error:

error

Update: Jon commented and pointed out that I was using an older version of plotly (3.0.0rc10) instead of 3.0.0rc11. You can check which version you have by typing the following:

import plotly
plotly.__version__

After I updated plotly, I successfully made the last graph!

newplot(4)
Interact with it here

Special thanks to Jon! I sincerely appreciate your help!

US Fertility Heat Map DIY

US Fertility Heat Map DIY

The US fertility heat maps that I made a couple of weeks ago received a lot of attention and one of the questions I’ve been asked is how I produced it, which I describe in this post.

As I mentioned in my previous post, I simply followed the directions specified in this article, but I limited the UN data to the US. Overall, I think the article does a good job of explaining how they created their heat map in Tableau. The reason why I remade the heat map in R is because I was just frustrated with the process of trying to embed the visualization into WordPress. Both Tableau and WordPress charge you to embed visualizations in a format that is aesthetically pleasing. Luckily, recreating the heat map in R was extremely easy and just as pretty, at least in my opinion. Here’s how I did it:

First, download the data from the UN website–limit the data to the US only. Alternatively, I’ve linked to the (formatted) data on my OSF account, which also provides access to my code.

Now type the following in Rstudio:


#load libraries:
#if you need to install first, type: install.packages("package_name",dependencies=TRUE)
library(tidyverse)
library(viridis)

#set your working directory to the folder your data is stored in
setwd("C:/Users/Stella/Documents/blog/US birth Map")
#if you don't know what directory is currently set to, type: getwd()

#now import your data
us_fertility<-read.csv("USBirthscsv.csv", header=TRUE) #change the file name if you did not use the data I provided (osf.io/h9ta2)

#limit to relevant data
dta% select(Year, January:December)

#gather (i.e., "aggregate") data of interest, in preparation for graphing
dta%
arrange(Year)

#orderring the data by most frequent incidence of births
dta %>%
group_by(Year) %>%
mutate(rank=dense_rank(desc(births)))

#plot the data
plot<- ggplot(bb2, aes(x =fct_rev(Month),
y = Year,
fill=rank)) +
scale_x_discrete(name="Months", labels=c("Jan", "Feb", "Mar",
"Apr", "May","Jun",
"Jul", "Aug", "Sep",
"Oct", "Nov", "Dec")) +
scale_fill_viridis(name = "Births", option="magma") + #optional command to change the colors of the heat map
geom_tile(colour = "White", size = 0.4) +
labs(title = "Heat Map of US Births",
subtitle = "Frequency of Births from 1969-2014",
x = "Month",
y = "Year",
caption = "Source: UN Data") +
theme_tufte()

plot+ aes(x=fct_inorder(Month))

#if you want to save the graph
dev.copy(png, "births.png")
dev.off()

 
And that’s it! Simple, right?!

Gapminder gif with Rstudio

Gapminder gif with Rstudio

I decided to remake the Gapminder gif that I made the other day in Python, but in Rstudio this time. I’ll probably continue doing this for a while, as I try to figure out the advantages of using one program over the other. Here’s is a walk-through of what I did to recreate it:

#install these packages if you haven't already
install.packages(c("devtools", "dplyr", "ggplot2", "readr"))
devtools::install_github("dgrtwo/gganimate",force=TRUE)

library(devtools)
library(dplyr)
library(readr)
library(viridis)
library(ggplot2)
library(gganimate)
library(animation)

#Set up ImageMagick --for gifs
install.packages("installr",dependencies = TRUE)
library(installr)

#Configure your environment--change the location
Sys.setenv(PATH = paste("C:/Program Files/ImageMagick-7.0.7-Q16", Sys.getenv("PATH"), sep = ";")) #change the path to where you installed ImageMagick
#Again, change the location:
magickPath <- shortPathName("C:/Program Files/ImageMagick-7.0.7-Q16/magick.exe")
#ani.options(convert=magickPath)

If you need to download ImageMagick, go to this link

Load data and create plot

Once you’ve installed the appropriate packages and configured ImageMagick to work with Rstudio, you can load your data and plot as usual.

gapminder_data<-read.csv("https://python-graph-gallery.com/wp-content/uploads/gapminderData.csv", header=TRUE)

glimpse(gapminder_data) #print to make sure it loaded correctly
## Observations: 1,704
## Variables: 6
## $ country    Afghanistan, Afghanistan, Afghanistan, Afghanistan, ...
## $ year       1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992...
## $ pop        8425333, 9240934, 10267083, 11537966, 13079460, 1488...
## $ continent  Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia...
## $ lifeExp    28.801, 30.332, 31.997, 34.020, 36.088, 38.438, 39.8...
## $ gdpPercap  779.4453, 820.8530, 853.1007, 836.1971, 739.9811, 78...
# Helper function for string wrapping. 
# Default 20 character target width.
swr = function(string, nwrap=40) {
  paste(strwrap(string, width=nwrap), collapse="\n")
}
swr = Vectorize(swr)

gapminder_plot<-ggplot(gapminder_data) +
  aes(x = gdpPercap,
      y = lifeExp,
      colour = continent,
      size = pop, 
      frame=year) +
      scale_x_log10() +
  scale_size_continuous(guide =FALSE) + #suppresses the second legend (size=pop)
  geom_point() +
  scale_color_viridis(discrete=TRUE)+ #optional way to change colors of the plot
  theme_bw() +
  labs(title=swr("Relationship Between Life Expectancy and GDP per Capita"),
       x= "GDP Per Capita",
       y= "Life expectancy",
      caption="Data: Gapminder")
  theme(legend.position = "none",
        axis.title.x=element_text(size=.2),
        axis.title.y=element_text(size=.2),
        plot.caption = element_text(size=.1))</

#getOption("device") #try running this if your plot doesn't immediately show gapminder_plot

#if you want to save the plot:
ggsave("title.png", 
       plot = last_plot(), # or give ggplot object name as in myPlot,
       width = 5, height = 5, 
       units = "in", # other options c("in", "cm", "mm"), 
       dpi = 300)

Notice that I created the swr function to wrap the title text. If I don’t include that function, the title runs off the plot, like this:

gapminderplot

Animate the plot

Now you can animate the plot using gganimate. Also, if you want to change any of the axis-titles or any other feature of the plot, I like to reference STHDA.

#remember to assign a working directory first:
#setwd() <--use this to change the working directory, if needed
gganimate(gapminder_plot,interval=.5,"gapminderplot.gif")

 

All in all, I’d say that creating the gif was equally easy in Python and R. Although I had more trouble initally configuring Python with ImageMagic–I might have found it easier in R simply because I used Python to figure this out the first time.  On the other hand, I like the way the Python gif looks much more than the gif that Rstudio rendered.

animated_gapminder

Looks like I’ll have to continue experimenting.

Data Visualization in Python

Data Visualization in Python

Sharing a visualization that I made with Python, in Jupyter Notebook.

First, import the following libraries:

# Set up libraries
%matplotlib notebook

import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
sns.set_style("white")
import pandas as pd
my_dpi=96
Then import data and make scatter plots for each year of life expectancy data, courtesy of Gapminder:
# Get Gapminder Life Expectancy data (csv file is hosted on the web)
url = 'https://python-graph-gallery.com/wp-content/uploads/gapminderData.csv'
data = pd.read_csv(url)
 
# Transform Continent into numerical values group1->1, group2->2...
data['continent']=pd.Categorical(data['continent'])
 
# For each year:
for i in data.year.unique():
 
# initialize a figure
fig = plt.figure(figsize=(680/my_dpi, 480/my_dpi), dpi=my_dpi)
 
# Change color for the x-axis values
tmp=data[ data.year == i ]
plt.scatter(tmp['lifeExp'], tmp['gdpPercap'] , s=tmp['pop']/200000 , c=tmp['continent'].cat.codes, cmap="Accent", alpha=0.6, edgecolors="white", linewidth=2)
 
# Add titles (main and on axis)
plt.yscale('log')
plt.xlabel("Life Expectancy")
plt.ylabel("GDP per Capita")
plt.title("Year: "+str(i) )
plt.ylim(0,100000)
plt.xlim(30, 90)
 
# Save the results
filename='Gapminder_step'+str(i)+'.png'
plt.savefig(filename, dpi=96)
plt.gca()

Next, download and install ImageMagick to make the following gif, by typing in your (Windows 10) command prompt:

magick convert.exeGapminder*.png animated_gapminder.gif

–If you have any issues configuring ImageMagick, like I did, you may find this link useful.

animated_gapminder

Note: You can make a gif using Matplotlib or moviepy, but I couldn’t quite figure it out. I will update once I do.

US Fertility Trends by Month and Year

I came across this beautiful data visualization heat map of live births by month and country/region, and I decided to recreate it for the US but by year. The figure below shows the frequency of births from 1972 to 2014, with darker boxes indicating higher incidences of fertility. The data is from the UN. (PS-Tableau and WordPress are annoying–if you don’t want to pay for extras. I can’t embed my table without the play sign in front it. AND it has a broken link *sigh*. So, please see twitter link ^_^).

I was somewhat surprised by how much the timing of fertility by month has stayed relatively consistent over the past 40 years. I expected more variability starting around the 1980s, as marriage rates declined and nonmarital fertility increased. However, the most common birth months in the US has consistently remained July through October, suggesting most babies are conceived through late fall and early winter. Looks like holidays are good for baby-making.