Happy World Population Day!

In case you’re unfamiliar with World Population Day, it started in 1989 by the Governing Council of the United Nations Development Programme. July 11th was chosen because in 1987, it marked the approximate date in which the world’s population reached 5 billion people. The purpose of World Population Day is to draw attention to issues related to the global population, including the implications of population growth on the environment, economic development, gender equality, education, poverty, and human rights. The latter issue is the theme celebrated this year. Specifically, family planning as a human right, as this year marks the 50th anniversary of the 1968 International Conference on Human Rights, where family planning was for the first time globally affirmed to be a human right.

This year, the world population is estimated to be around 7,632,819,325 people. If you want to see estimates of the global population in real time, you can visit the Worldometers website, which will also show other interesting estimates of population-related statistics, such as healthcare expenditures, energy consumption, and water use. If you’re interested in where they get their data and their methods, you can visit their FAQ section.

World Population #Rstats Edition

In celebration of World Population Day, I thought I would share an R program that pulls data from the Worldometers site:

worldometer-countries

and creates a world map that highlights the top 10 countries with the largest total populations:

WorldPop
The top 10 countries with the largest total populations is highlighted in dark green.

R code below:

#Load libraries
library(tidyverse)
library(rvest)
library(magrittr)
library(ggmap)
library(stringr)
library(viridis)
library(scales)
#Retrieve data:
html.global_pop <- read_html("http://www.worldometers.info/world-population/population-by-country/")

#Create dataframe
df.global_pop_RAW <- html.global_pop %>%
  html_nodes("table") %>%
  extract2(1) %>%
  html_table()

#Check data
head(df.global_pop_RAW) 

#Check for unnecessary spaces in values
glimpse(df.global_pop_RAW)

#Check if country names match those in the map package
as.factor(df.global_pop_RAW$`Country (or dependency)`) %>% levels()

#Renaming countries to match how they are named in the package
df.global_pop_RAW$`Country (or dependency)` <- recode(df.global_pop_RAW$`Country (or dependency)`
                                   ,'U.S.' = 'USA'
                                   ,'U.K.' = 'UK')

#Convert population to numeric--you have to remove the "," before converting 
df.global_pop_RAW$`Population (2018)`<-as.numeric(as.vector(unlist(gsub(",", "",df.global_pop_RAW$`Population (2018)` ))))
sapply(df.global_pop_RAW,class) #Check that it worked

#Generate a world map
world_map<- map_data('world')

#Join map data with our data
map.world_joined <- left_join(world_map, df.global_pop_RAW, 
                              by = c('region' = 'Country (or dependency)'))

#Take only top 10 countries
df.global_pop10 <- df.global_pop_RAW %>%
  top_n(10)

#Printing to check
df.global_pop10 

#Change data to numeric
df.global_pop10$`Population (2018)`<-as.numeric(as.vector(unlist(gsub(",", "",df.global_pop10$`Population (2018)` ))))

#Check if worked correctly
sapply(df.global_pop10,class)

#Join map data to our data
map.world_joined2 <- left_join(world_map, df.global_pop10, 
                              by = c('region' = 'Country (or dependency)'))

#Create Flag to indicate that it will be colored in for the map
map.world_joined2 <- map.world_joined2 %>%
  mutate(tofill2 = ifelse(is.na(`#`), F, T))


#Now generate the map
ggplot() +
  geom_polygon(data = map.world_joined2, 
               aes(x = long, y = lat, group = group, fill = tofill2)) +
  scale_fill_manual(values = c("lightcyan2","darkturquoise")) +
  labs(title =  'Top 10 Countries with Largest populations (2018)'
       ,caption = "source: http://www.worldometers.info/world-population/population-by-country/") +
  theme_minimal() +
  theme(text = element_text(family = "Gill Sans")
        ,plot.title = element_text(size = 16)
        ,plot.caption = element_text(size = 5)
        ,axis.text = element_blank()
        ,axis.title = element_blank()
        ,axis.ticks = element_blank()
        ,legend.position = "none"
  )

Alternatively, you could include all the countries and use a gradient to indicate population size. However, China and India’s population is so large relative to other countries that it becomes difficult to see any real comparison.

#Generate map data (again)
world_map<- map_data('world')

#re-join with data
map.world_joined <- left_join(world_map, df.global_pop_RAW, 
                              by = c('region' = 'Country (or dependency)'))

#flag to fill ALL countries that match with the map package
map.world_joined <- map.world_joined %>%
  mutate(tofill = ifelse(is.na(`#`), F, T))

#Check that it worked correctly
head(map.world_joined,12)

#Then generate new map
ggplot(data = map.world_joined, aes(x = long, y = lat, group = group), color="white", size=.001) +
  geom_polygon(aes(x = long, y = lat, group = group, fill = `Population (2018)`)) +
  scale_fill_viridis(option = 'magma') +
  labs(title =  'Top 10 Countries with Largest populations'
       ,caption = "source: http://www.worldometers.info/world-population/population-by-country/") +
  theme_minimal() +
  theme(text = element_text(family = "Gill Sans")
        ,plot.title = element_text(size = 18)
        ,plot.caption = element_text(size = 5)
        ,axis.text = element_blank()
        ,axis.title = element_blank()
        ,axis.ticks = element_blank()
  )

Which should produce this map:
PopMap2
You can see that the other countries that made the top 10 list are not black, which reflects the smallest population sizes, but this map really just highlights how large China and India’s population are relative to the other countries.

More population data and viz

If you want to know more about the global population and how it has changed over time, here are some great resources:

Our World in Data— see also their estimates for future population growth

8 min PBS video

Hans Rosling Tedx video (10 min)

20 min Hans Rosling video –this uses the gapminder data I often code with in Python and in R

Kurzgsagt animated video (6.5 min)

If you’re interested in theories and analytical concepts of demography, here are some links to free online class material:

Johns Hopkins Demographic Methods –or here

Johns Hopkins Principles of Population Change

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s