Article & Media Roundup 7/8/18-7/15/18

Article & Media Roundup 7/8/18-7/15/18

^image: changing patterns in interracial/interethnic marriage by the U.S. Census Bureau. See link below.


Areas at highest risk of flooding in the U.S.

Research linking air pollution to diabetes


The extinction of the middle child

Free eBook! — Visualizing Mortality Dynamics in the Lexis Diagram

A recent study on the water crisis in Flint, Michigan, showed that fertility rates dropped by 12% and fetal deaths rose by 58% after lead contamination spiked in the city’s drinking water.

U.S. Urban Rural Divide in Marriage

“As our nation becomes more racially and ethnically diverse, so are married couples.”

A peek into Japanese dating apps

Interactive tool that shows how much you make depending on where you live

8% gap in home ownership between Millennials and Gen X, which is unlikely to closeStudent debt is part of the issue

Visual breakdown of US gayborhoods

New CDC report on US Fertility

New York cemeteries are running out of space, and the cost of burials are rising as a result

The number of families with small children who rent now outnumbers the number of families with small children who own their homes

Tech, Science, & Privacy

Consumer behaviors can tell researchers a lot about a person

The hyper-integration of Chinese businesses, tech, and government, and what it means for privacy


Teachers rated as more attractive on tend to be rated higher on other metrics like quality, clarity and helpfulness.

Race & Inequality

Negative spillover effects for the mental health of Black Americans after police shootings of unarmed Black victims


2018 World Population Day!

2018 World Population Day!

Happy World Population Day!

In case you’re unfamiliar with World Population Day, it started in 1989 by the Governing Council of the United Nations Development Programme. July 11th was chosen because in 1987, it marked the approximate date in which the world’s population reached 5 billion people. The purpose of World Population Day is to draw attention to issues related to the global population, including the implications of population growth on the environment, economic development, gender equality, education, poverty, and human rights. The latter issue is the theme celebrated this year. Specifically, family planning as a human right, as this year marks the 50th anniversary of the 1968 International Conference on Human Rights, where family planning was for the first time globally affirmed to be a human right.

This year, the world population is estimated to be around 7,632,819,325 people. If you want to see estimates of the global population in real time, you can visit the Worldometers website, which will also show other interesting estimates of population-related statistics, such as healthcare expenditures, energy consumption, and water use. If you’re interested in where they get their data and their methods, you can visit their FAQ section.

World Population #Rstats Edition

In celebration of World Population Day, I thought I would share an R program that pulls data from the Worldometers site:


and creates a world map that highlights the top 10 countries with the largest total populations:

The top 10 countries with the largest total populations is highlighted in dark green.

R code below:

#Load libraries
#Retrieve data:
html.global_pop <- read_html("")

#Create dataframe
df.global_pop_RAW <- html.global_pop %>%
  html_nodes("table") %>%
  extract2(1) %>%

#Check data

#Check for unnecessary spaces in values

#Check if country names match those in the map package
as.factor(df.global_pop_RAW$`Country (or dependency)`) %>% levels()

#Renaming countries to match how they are named in the package
df.global_pop_RAW$`Country (or dependency)` <- recode(df.global_pop_RAW$`Country (or dependency)`
                                   ,'U.S.' = 'USA'
                                   ,'U.K.' = 'UK')

#Convert population to numeric--you have to remove the "," before converting 
df.global_pop_RAW$`Population (2018)`<-as.numeric(as.vector(unlist(gsub(",", "",df.global_pop_RAW$`Population (2018)` ))))
sapply(df.global_pop_RAW,class) #Check that it worked

#Generate a world map
world_map<- map_data('world')

#Join map data with our data
map.world_joined <- left_join(world_map, df.global_pop_RAW, 
                              by = c('region' = 'Country (or dependency)'))

#Take only top 10 countries
df.global_pop10 <- df.global_pop_RAW %>%

#Printing to check

#Change data to numeric
df.global_pop10$`Population (2018)`<-as.numeric(as.vector(unlist(gsub(",", "",df.global_pop10$`Population (2018)` ))))

#Check if worked correctly

#Join map data to our data
map.world_joined2 <- left_join(world_map, df.global_pop10, 
                              by = c('region' = 'Country (or dependency)'))

#Create Flag to indicate that it will be colored in for the map
map.world_joined2 <- map.world_joined2 %>%
  mutate(tofill2 = ifelse(`#`), F, T))

#Now generate the map
ggplot() +
  geom_polygon(data = map.world_joined2, 
               aes(x = long, y = lat, group = group, fill = tofill2)) +
  scale_fill_manual(values = c("lightcyan2","darkturquoise")) +
  labs(title =  'Top 10 Countries with Largest populations (2018)'
       ,caption = "source:") +
  theme_minimal() +
  theme(text = element_text(family = "Gill Sans")
        ,plot.title = element_text(size = 16)
        ,plot.caption = element_text(size = 5)
        ,axis.text = element_blank()
        ,axis.title = element_blank()
        ,axis.ticks = element_blank()
        ,legend.position = "none"

Alternatively, you could include all the countries and use a gradient to indicate population size. However, China and India’s population is so large relative to other countries that it becomes difficult to see any real comparison.

#Generate map data (again)
world_map<- map_data('world')

#re-join with data
map.world_joined <- left_join(world_map, df.global_pop_RAW, 
                              by = c('region' = 'Country (or dependency)'))

#flag to fill ALL countries that match with the map package
map.world_joined <- map.world_joined %>%
  mutate(tofill = ifelse(`#`), F, T))

#Check that it worked correctly

#Then generate new map
ggplot(data = map.world_joined, aes(x = long, y = lat, group = group), color="white", size=.001) +
  geom_polygon(aes(x = long, y = lat, group = group, fill = `Population (2018)`)) +
  scale_fill_viridis(option = 'magma') +
  labs(title =  'Top 10 Countries with Largest populations'
       ,caption = "source:") +
  theme_minimal() +
  theme(text = element_text(family = "Gill Sans")
        ,plot.title = element_text(size = 18)
        ,plot.caption = element_text(size = 5)
        ,axis.text = element_blank()
        ,axis.title = element_blank()
        ,axis.ticks = element_blank()

Which should produce this map:
You can see that the other countries that made the top 10 list are not black, which reflects the smallest population sizes, but this map really just highlights how large China and India’s population are relative to the other countries.

More population data and viz

If you want to know more about the global population and how it has changed over time, here are some great resources:

Our World in Data— see also their estimates for future population growth

8 min PBS video

Hans Rosling Tedx video (10 min)

20 min Hans Rosling video –this uses the gapminder data I often code with in Python and in R

Kurzgsagt animated video (6.5 min)

If you’re interested in theories and analytical concepts of demography, here are some links to free online class material:

Johns Hopkins Demographic Methods –or here

Johns Hopkins Principles of Population Change



Around the Web 7/1/18-7/8/18

Around the Web 7/1/18-7/8/18

^image: Regional water use from Reddit’s Data Is Beautiful


A visual explanation of the Census

Unmarried women over 55 is one of the largest, and fastest-growing, demographics of home buyers

Economic barriers are among the top reasons for postponing motherhood or having fewer babies

Resources & Environment

Informative 9 min. video on Plastic Pollution

We are running out of sand! –7.5min video

Global climate change may accelerate gentrification.

As Americans Age, Their Support for Environmentalism Declines

Science, Privacy, & Tech

The rise of surveillance capitalism

Silicon Valley’s Exclusive Salary Database

Technology and language

A project to encourage researchers to state that they’ve lost confidence in their previous study

Health & Well-being

Research indicates that expanding access to Medicare has overall been beneficial

Empirical research on the effects of family separation on children’s well-being

Diabetes linked to air pollution


“There are now more women in law school than ever before, but men still lead the pack when it comes to private practice, making up about two-thirds of attorneys in this sector of the legal profession with gender-discrimination suits being filed by women at breakneck speeds.”


For the first time in the 18 years that Gallup asked the question, a majority of Americans did not say they were “extremely proud” to be American.

“America is a nation of immigrants, and its economy is propelled and activated by its openness to immigration and the new ideas and entrepreneurial energy that immigrants provide.”

Data Science

Python v. (and) R for Data Science


Publishing Opportunities for Demographers

Applied Demography News Letter

Population Research and Policy Review is now accepting research briefs

Plotly 3.0.0 in Jupyter Notebook

Plotly 3.0.0 in Jupyter Notebook 3.0.0 was recently released, and I finally got a chance to tinker with it! This is exciting because this release includes features that are specifically designed for Jupyter Notebooks. Namely, JavaScript is directly embedded in the figure that you can now access directly through your notebook. Exciting!

If you haven’t installed plotly or need to upgrade, open your Anaconda command prompt (as Administrator) and follow these directions. After you install plotly, launch Jupyter Notebook (by typing “Jupyter Notebook” into your Anaconda command prompt or by opening Jupyter Notebook using your computer menu). Next, enter your plotly username and api key in your notebook. You can sign up for plotly here. Directions for generating an api key here.

#first import plotly and provide username and api key
import plotly'UserName', api_key='XXXXX')

Now load the following:

import plotly.plotly as py
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import numpy as np
import pandas as pd

init_notebook_mode(connected=True) #tells the notebook to load figures in offline mode

Plotly should now work within your notebook.

Here’s an example of a 2D plot:

        {'x': x, 'y': y, 'type': 'histogram2dcontour'}

newplotExample of a 2D plot with markers:

x = np.random.randn(2000)
y = np.random.randn(2000)
iplot([go.Histogram2dContour(x=x, y=y, contours=dict(coloring='heatmap')),
       go.Scatter(x=x, y=y, mode='markers', marker=dict(color='white', size=3, opacity=0.3))], show_link=False)

newplot(1)Example of a 3D plot:

s = np.linspace(0, 2 * np.pi, 240)
t = np.linspace(0, np.pi, 240)
tGrid, sGrid = np.meshgrid(s, t)

r = 2 + np.sin(7 * sGrid + 5 * tGrid)  # r = 2 + sin(7s+5t)
x = r * np.cos(sGrid) * np.sin(tGrid)  # x = r*cos(s)*sin(t)
y = r * np.sin(sGrid) * np.sin(tGrid)  # y = r*sin(s)*sin(t)
z = r * np.cos(tGrid)                  # z = r*cos(t)

surface = go.Surface(x=x, y=y, z=z)
data = [surface]

layout = go.Layout(
    title='Parametric Plot',
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            backgroundcolor='rgb(230, 230,230)'
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            backgroundcolor='rgb(230, 230,230)'
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            backgroundcolor='rgb(230, 230,230)'

fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='jupyter-parametric_plot')
Interact with it here

Lastly, an animated plot:

from plotly.offline import init_notebook_mode, iplot
from IPython.display import display, HTML


url = ''
dataset = pd.read_csv(url)

years = ['1952', '1962', '1967', '1972', '1977', '1982', '1987', '1992', '1997', '2002', '2007']

# make list of continents
continents = []
for continent in dataset['continent']:
    if continent not in continents:
# make figure
figure = {
    'data': [],
    'layout': {},
    'frames': []

# fill in most of layout
figure['layout']['xaxis'] = {'range': [30, 85], 'title': 'Life Expectancy'}
figure['layout']['yaxis'] = {'title': 'GDP per Capita', 'type': 'log'}
figure['layout']['hovermode'] = 'closest'
figure['layout']['sliders'] = {
    'args': [
        'transition', {
            'duration': 400,
            'easing': 'cubic-in-out'
    'initialValue': '1952',
    'plotlycommand': 'animate',
    'values': years,
    'visible': True
figure['layout']['updatemenus'] = [
        'buttons': [
                'args': [None, {'frame': {'duration': 500, 'redraw': False},
                         'fromcurrent': True, 'transition': {'duration': 300, 'easing': 'quadratic-in-out'}}],
                'label': 'Play',
                'method': 'animate'
                'args': [[None], {'frame': {'duration': 0, 'redraw': False}, 'mode': 'immediate',
                'transition': {'duration': 0}}],
                'label': 'Pause',
                'method': 'animate'
        'direction': 'left',
        'pad': {'r': 10, 't': 87},
        'showactive': False,
        'type': 'buttons',
        'x': 0.1,
        'xanchor': 'right',
        'y': 0,
        'yanchor': 'top'
#custom colors
custom_colors = {
    'Asia': 'rgb(171, 99, 250)',
    'Europe': 'rgb(230, 99, 250)',
    'Africa': 'rgb(99, 110, 250)',
    'Americas': 'rgb(25, 211, 243)',
    'Oceania': 'rgb(50, 170, 255)'
sliders_dict = {
    'active': 0,
    'yanchor': 'top',
    'xanchor': 'left',
    'currentvalue': {
        'font': {'size': 20},
        'prefix': 'Year:',
        'visible': True,
        'xanchor': 'right'
    'transition': {'duration': 300, 'easing': 'cubic-in-out'},
    'pad': {'b': 10, 't': 50},
    'len': 0.9,
    'x': 0.1,
    'y': 0,
    'steps': []

# make data
year = 1952
for continent in continents:
    dataset_by_year = dataset[dataset['year'] == year]
    dataset_by_year_and_cont = dataset_by_year[dataset_by_year['continent'] == continent]

    data_dict = {
        'x': list(dataset_by_year_and_cont['lifeExp']),
        'y': list(dataset_by_year_and_cont['gdpPercap']),
        'mode': 'markers',
        'text': list(dataset_by_year_and_cont['country']),
        'marker': {
            'sizemode': 'area',
            'sizeref': 200000,
            'size': list(dataset_by_year_and_cont['pop'])
        'name': continent
# make frames
for year in years:
    frame = {'data': [], 'name': str(year)}
    for continent in continents:
        dataset_by_year = dataset[dataset['year'] == int(year)]
        dataset_by_year_and_cont = dataset_by_year[dataset_by_year['continent'] == continent]

        data_dict = {
            'x': list(dataset_by_year_and_cont['lifeExp']),
            'y': list(dataset_by_year_and_cont['gdpPercap']),
            'mode': 'markers',
            'text': list(dataset_by_year_and_cont['country']),
            'marker': {
                'sizemode': 'area',
                'sizeref': 200000,
                'size': list(dataset_by_year_and_cont['pop'])
            'name': continent

    slider_step = {'args': [
        {'frame': {'duration': 300, 'redraw': False},
         'mode': 'immediate',
       'transition': {'duration': 300}}
     'label': year,
     'method': 'animate'}

figure['layout']['sliders'] = [sliders_dict]

Interact with it here

Neat, right?!


Overall, everything ran smoothly except the last plot. I actually initially tried to make this one:

From: (scroll to the bottom)

but I kept getting an error:


Update: Jon commented and pointed out that I was using an older version of plotly (3.0.0rc10) instead of 3.0.0rc11. You can check which version you have by typing the following:

import plotly

After I updated plotly, I successfully made the last graph!

Interact with it here

Special thanks to Jon! I sincerely appreciate your help!

Around the web 6/25/18-7/1/18

Around the web 6/25/18-7/1/18

^image: Priceonomics analyzed the gender gap in earnings at Amerirca’s universities. The figure shows the disparity between male and female graduate students.


Millennials are happier in urban areas.

Sharp decline in migration to Europe.

“If there’s a fixed biological limit [for human age], we are not close to it”


The Gender Wage Gap at America’s Top Colleges

An Australian study showed that women leaned in just as often as men, but they were still less likely to get a raise.


The Remaking of Class in America


Democrats and Republicans don’t know each other that well

In 2018, 67 percent of Republicans and 20 percent of Democrats said the U.S. set a good moral example, compared to 2015, when 49 percent of Democrats and 44 percent of Republicans said the U.S. was a good moral example.

Where the trade war hits the hardest

Disease & Health

The CDC collects less information about ticks are and their activities compared to mosquitoes, but the burden of tick-borne disease in the U.S. is much higher than it is for diseases carried by mosquitoes.

The incidence of shingles is increasing across all age groups

Families and Education

How transparent is school data when parents can’t find it or understand it?

While the share of children living in households with food insecurity has continued to fall in the national numbers, only four states are doing statistically significantly better than pre-recession levels (2005-07 to 2014-16).


Tips for Conversational Writing

Tips for Conversational Writing


In my two previous posts, I’ve been sharing some tidbits that I learned at the PRB Policy Communication Workshop. In my first post, I aimed to motivate you to think about the broader impacts of research, especially considering the unique role researchers play within the process of policy formation or change. In my second post, I discussed three different outlets–aside from academic journals–where researchers can share their findings with the public. This week, in my third and final post about policy communication, I will share some tips that I learned about conversational writing. Special thanks to Craig Storti for his enlightening presentation about some bad habits that I picked up in grad school!

Disclaimer: This blog post contains several cat puns. This may result in audible groaning and face-palming. Reader discretion is advised.


Academic Jargon and Dense Prose

It may seem obvious that we should avoid academic jargon when writing for non-technical audiences. As I said previously, abstract concepts such as macro- and micro-level processes or statistical methods are not well understood outside a specific discipline. We are also often told that we should stop using words such as ‘utilize’ when we could easily substitute ‘use.’ But even if we are acutely aware of these bad habits, here are two other occupational hazards that I did not consider before the workshop: 1) Nominalization and 2) Noun Compounds:

Nominalization is when we transform a verb into a noun. For example, nominalization  itself is a noun that was derived from a verb–i.e., ‘nominalize.’ Another example is the word ‘investigation’, which is from ‘investigate.’ Sentences that contain nominalized verbs can be weaker and less concise than sentences that use the actual verb.

A Noun Compound is when we use a consecutive string of two or more nouns in a sentence. For example, ‘Policy Communication Workshop Fellowship’ or ‘national community health operations research technical working group.’ Excessive use of noun compounds can result in dense writing that is difficult to understand.


To demonstrate how easy it can be to both nominalize our verbs and string several nouns together, I wrote a hypothetical introduction to the cat meme inequality study<–noun compound!–that I used as an example in my previous post. Nominalizations are underlined; noun compounds are in red (excluding the phrase ‘cat meme’ alone); and jargon is in blue. Puns are italicized 🙂 :

Differences in purr household consumption of cat memes have been dramatically increasing over the past half-century, and research suggests that this growing disparity is due to incongrooment access to cat memes. Informed by this body of research, my study utilized data from the Cat Meme Survey of Households and Families and found that legislative pawlicies have, in part, catapulted these cat meme inequality access issues. Right meow, cat meme pawlicies are littered with supurrrfluous loopholes fur the rich and privileged. However, my research indicates that these catastrophic inequalities in cat meme access can be mitigated if pawlicymakers consider the implementation of laws or clawses that focus on the inadequacy of cat meme access fur more disadvantaged households through the creation of cat meme inclusion zones, which would allow fur the dissemination of more provisions fur those who are in need.

Tips for avoiding dense prose

The simplest way to avoid nominalizations is by restoring the verb. For instance, the first sentence of my example could be changed to “Rich households consume more cat memes than poor households…” Alternatively, the sentence could be changed to “Households are consuming cat memes at a different rate…” The latter example uses the gerund form of the verb.

The benefit of correcting nominalizations is that you will likely break up noun compounds, like I did in my first example:

Original: Differences in purr household consumption of cat memes have been dramatically increasing over the past half-century…

Corrected: Rich households consume more cat memes than poor households, which is a trend that has been increasing over the past half-century.

Another way to fix noun compounds is by including a preposition such as ‘of’, ‘in’, ‘to’, and ‘for’:

Original: However, my research indicates that these catastrophic inequalities in cat meme access can be mitigated if pawlicymakers consider the implementation of laws or clawses that focus on the inadequacy of cat meme access fur more disadvantaged households through the creation of cat meme inclusion zones, which would allow fur the dissemination of more provisions fur those who are in need.

Corrected: My research indicates that access to cat memes across households is inadequate. Policymakers should consider implementing laws that help more disadvantaged households gain access to cat memes. For example, by creating incentives to encourage builders and investors to provide more households with equal access to cat memes, or restricting builders and investors from accessing permits unless they agree to these terms, which is often referred to as “inclusionary zoning.”

It gets better with practice

I was surprised by how difficult it was to correct nominalizations and (especially) noun compounds at the workshop. I found that some of my resistance to removing noun compounds is that it can result in longer sentences. But unless I am writing for an academic journal, the value of writing more concisely is lost when my audience does not understand what I am writing about. It’s a skill that I will have to continue to practice and be more thoughtful about in the future. I encourage you to do the same!


From the Net 6/17/18-6/24/18

From the Net 6/17/18-6/24/18

^image from the Joint Center for Housing Studies at Harvard University. See link below.

Housing Affordability & Inequality

A new report from the Joint Center for Housing Studies at Harvard University compiled hundreds of metrics on the health of America’s housing sector and finds that, despite some short-term progress since the recession, the long-term prognosis is grim.

While the national prison admissions rate has decreased by 24 percent in the last 12 years, that number has been driven largely by 10 states: California alone accounts for 37 percent of the decline. In contrast, Florida’s prison rate is 13 percent higher than it was in 2000.

In New York, how much bail you owe when you’re arrested — and whether you owe it at all — can depend on who hears your case the day you’re arraigned.

The Brookings Institute’s assessment of the US housing market: finding the Goldilocks metros


According to the Bureau of Labor Statistics, for the average worker in “production and nonsupervisory” positions,  paychecks have declined in the past year after accounting for inflation.


Why people choose to stay in areas vulnerable to natural disasters


 A new report from the U.N. High Commissioner for Refugees found that 68.5 million people were forcibly displaced in 2017, setting a new record for the fifth straight year.

How cities can plan for a decreasing population

In 26 states in the U.S., Non-Hispanic white people are dying faster than they are being born. Two years ago, that number was only 17 states.

NY Times data viz on immigration (and attitudes about it) around the world


“There are so few positive variations on what a “real man” can look like, that when the youngest generations show signs of reshaping masculinity, the only word that exists for them is nonconforming.”


Best Mario Kart character, according to data science