Data Visualization in Python

Data Visualization in Python

Sharing a visualization that I made with Python, in Jupyter Notebook.

First, import the following libraries:

# Set up libraries
%matplotlib notebook

import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
sns.set_style("white")
import pandas as pd
my_dpi=96
Then import data and make scatter plots for each year of life expectancy data, courtesy of Gapminder:
# Get Gapminder Life Expectancy data (csv file is hosted on the web)
url = 'https://python-graph-gallery.com/wp-content/uploads/gapminderData.csv'
data = pd.read_csv(url)
 
# Transform Continent into numerical values group1->1, group2->2...
data['continent']=pd.Categorical(data['continent'])
 
# For each year:
for i in data.year.unique():
 
# initialize a figure
fig = plt.figure(figsize=(680/my_dpi, 480/my_dpi), dpi=my_dpi)
 
# Change color for the x-axis values
tmp=data[ data.year == i ]
plt.scatter(tmp['lifeExp'], tmp['gdpPercap'] , s=tmp['pop']/200000 , c=tmp['continent'].cat.codes, cmap="Accent", alpha=0.6, edgecolors="white", linewidth=2)
 
# Add titles (main and on axis)
plt.yscale('log')
plt.xlabel("Life Expectancy")
plt.ylabel("GDP per Capita")
plt.title("Year: "+str(i) )
plt.ylim(0,100000)
plt.xlim(30, 90)
 
# Save the results
filename='Gapminder_step'+str(i)+'.png'
plt.savefig(filename, dpi=96)
plt.gca()

Next, download and install ImageMagick to make the following gif, by typing in your (Windows 10) command prompt:

magick convert.exeGapminder*.png animated_gapminder.gif

–If you have any issues configuring ImageMagick, like I did, you may find this link useful.

animated_gapminder

Note: You can make a gif using Matplotlib or moviepy, but I couldn’t quite figure it out. I will update once I do.

US Fertility Trends by Month and Year

I came across this beautiful data visualization heat map of live births by month and country/region, and I decided to recreate it for the US but by year. The figure below shows the frequency of births from 1972 to 2014, with darker boxes indicating higher incidences of fertility. The data is from the UN. (PS-Tableau and WordPress are annoying–if you don’t want to pay for extras. I can’t embed my table without the play sign in front it. AND it has a broken link *sigh*. So, please see twitter link ^_^).

I was somewhat surprised by how much the timing of fertility by month has stayed relatively consistent over the past 40 years. I expected more variability starting around the 1980s, as marriage rates declined and nonmarital fertility increased. However, the most common birth months in the US has consistently remained July through October, suggesting most babies are conceived through late fall and early winter. Looks like holidays are good for baby-making.

An effort to blog

I was inspired to begin blogging–despite the fact that I have strong reservations about putting too much information about myself on the internet–because I was humbled by the number of people who put in the time and effort to create and share valuable resources with others. Most importantly, I realized that I was benefiting from the generosity of the sharing community, while neglecting to contribute to the community in return. Therefore, in the spirit of helping others, I will begin sharing what I find, in addition to sharing the resources I’ve developed over the years.

While I sort through my old programming files and clean them up in preparation for sharing, here is a list of blogs and resources that inspired this post:

Data Analytics/Data Science

Demography & Sociology & Economics

More to come!