Plotly 3.0.0 in Jupyter Notebook

Plotly.py 3.0.0 was recently released, and I finally got a chance to tinker with it! This is exciting because this release includes features that are specifically designed for Jupyter Notebooks. Namely, JavaScript is directly embedded in the figure that you can now access directly through your notebook. Exciting!

If you haven’t installed plotly or need to upgrade, open your Anaconda command prompt (as Administrator) and follow these directions. After you install plotly, launch Jupyter Notebook (by typing “Jupyter Notebook” into your Anaconda command prompt or by opening Jupyter Notebook using your computer menu). Next, enter your plotly username and api key in your notebook. You can sign up for plotly here. Directions for generating an api key here.

#first import plotly and provide username and api key
import plotly
plotly.tools.set_credentials_file(username='UserName', api_key='XXXXX')

Now load the following:

import plotly.plotly as py
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import numpy as np
import pandas as pd

init_notebook_mode(connected=True) #tells the notebook to load figures in offline mode

Plotly should now work within your notebook.

Here’s an example of a 2D plot:

x=np.random.randn(1000)
y=np.random.randn(1000)
go.FigureWidget(
    data=[
        {'x': x, 'y': y, 'type': 'histogram2dcontour'}
    ]
)

Example of a 2D plot with markers:

x = np.random.randn(2000)
y = np.random.randn(2000)
iplot([go.Histogram2dContour(x=x, y=y, contours=dict(coloring='heatmap')),
       go.Scatter(x=x, y=y, mode='markers', marker=dict(color='white', size=3, opacity=0.3))], show_link=False)

Example of a 3D plot:

s = np.linspace(0, 2 * np.pi, 240)
t = np.linspace(0, np.pi, 240)
tGrid, sGrid = np.meshgrid(s, t)

r = 2 + np.sin(7 * sGrid + 5 * tGrid)  # r = 2 + sin(7s+5t)
x = r * np.cos(sGrid) * np.sin(tGrid)  # x = r*cos(s)*sin(t)
y = r * np.sin(sGrid) * np.sin(tGrid)  # y = r*sin(s)*sin(t)
z = r * np.cos(tGrid)                  # z = r*cos(t)

surface = go.Surface(x=x, y=y, z=z)
data = [surface]

layout = go.Layout(
    title='Parametric Plot',
    scene=dict(
        xaxis=dict(
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)'
        ),
        yaxis=dict(
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)'
        ),
        zaxis=dict(
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)'
        )
    )
)

fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='jupyter-parametric_plot')

Lastly, an animated plot:

from plotly.offline import init_notebook_mode, iplot
from IPython.display import display, HTML


init_notebook_mode(connected=True)

url = 'https://raw.githubusercontent.com/plotly/datasets/master/gapminderDataFiveYear.csv'
dataset = pd.read_csv(url)

years = ['1952', '1962', '1967', '1972', '1977', '1982', '1987', '1992', '1997', '2002', '2007']


# make list of continents
continents = []
for continent in dataset['continent']:
    if continent not in continents:
        continents.append(continent)
# make figure
figure = {
    'data': [],
    'layout': {},
    'frames': []
}

# fill in most of layout
figure['layout']['xaxis'] = {'range': [30, 85], 'title': 'Life Expectancy'}
figure['layout']['yaxis'] = {'title': 'GDP per Capita', 'type': 'log'}
figure['layout']['hovermode'] = 'closest'
figure['layout']['sliders'] = {
    'args': [
        'transition', {
            'duration': 400,
            'easing': 'cubic-in-out'
        }
    ],
    'initialValue': '1952',
    'plotlycommand': 'animate',
    'values': years,
    'visible': True
}
figure['layout']['updatemenus'] = [
    {
        'buttons': [
            {
                'args': [None, {'frame': {'duration': 500, 'redraw': False},
                         'fromcurrent': True, 'transition': {'duration': 300, 'easing': 'quadratic-in-out'}}],
                'label': 'Play',
                'method': 'animate'
            },
            {
                'args': [[None], {'frame': {'duration': 0, 'redraw': False}, 'mode': 'immediate',
                'transition': {'duration': 0}}],
                'label': 'Pause',
                'method': 'animate'
            }
        ],
        'direction': 'left',
        'pad': {'r': 10, 't': 87},
        'showactive': False,
        'type': 'buttons',
        'x': 0.1,
        'xanchor': 'right',
        'y': 0,
        'yanchor': 'top'
    }
]
#custom colors
custom_colors = {
    'Asia': 'rgb(171, 99, 250)',
    'Europe': 'rgb(230, 99, 250)',
    'Africa': 'rgb(99, 110, 250)',
    'Americas': 'rgb(25, 211, 243)',
    'Oceania': 'rgb(50, 170, 255)'
}
sliders_dict = {
    'active': 0,
    'yanchor': 'top',
    'xanchor': 'left',
    'currentvalue': {
        'font': {'size': 20},
        'prefix': 'Year:',
        'visible': True,
        'xanchor': 'right'
    },
    'transition': {'duration': 300, 'easing': 'cubic-in-out'},
    'pad': {'b': 10, 't': 50},
    'len': 0.9,
    'x': 0.1,
    'y': 0,
    'steps': []
}

# make data
year = 1952
for continent in continents:
    dataset_by_year = dataset[dataset['year'] == year]
    dataset_by_year_and_cont = dataset_by_year[dataset_by_year['continent'] == continent]

    data_dict = {
        'x': list(dataset_by_year_and_cont['lifeExp']),
        'y': list(dataset_by_year_and_cont['gdpPercap']),
        'mode': 'markers',
        'text': list(dataset_by_year_and_cont['country']),
        'marker': {
            'sizemode': 'area',
            'sizeref': 200000,
            'size': list(dataset_by_year_and_cont['pop'])
        },
        'name': continent
    }
    figure['data'].append(data_dict)
    
# make frames
for year in years:
    frame = {'data': [], 'name': str(year)}
    for continent in continents:
        dataset_by_year = dataset[dataset['year'] == int(year)]
        dataset_by_year_and_cont = dataset_by_year[dataset_by_year['continent'] == continent]

        data_dict = {
            'x': list(dataset_by_year_and_cont['lifeExp']),
            'y': list(dataset_by_year_and_cont['gdpPercap']),
            'mode': 'markers',
            'text': list(dataset_by_year_and_cont['country']),
            'marker': {
                'sizemode': 'area',
                'sizeref': 200000,
                'size': list(dataset_by_year_and_cont['pop'])
            },
            'name': continent
        }
        frame['data'].append(data_dict)

    figure['frames'].append(frame)
    slider_step = {'args': [
        [year],
        {'frame': {'duration': 300, 'redraw': False},
         'mode': 'immediate',
       'transition': {'duration': 300}}
     ],
     'label': year,
     'method': 'animate'}
    sliders_dict['steps'].append(slider_step)

    
figure['layout']['sliders'] = [sliders_dict]

iplot(figure)

Neat, right?!

Overall, everything ran smoothly except the last plot. I ~~actually~~ initially tried to make this one:

gapminder_custom — From: https://plot.ly/python/gapminder-example/ (scroll to the bottom)

but I kept getting an error:

Update: Jon commented and pointed out that I was using an older version of plotly (3.0.0rc10) instead of 3.0.0rc11. You can check which version you have by typing the following:

import plotly
plotly.__version__

After I updated plotly, I successfully made the last graph!

Special thanks to Jon! I sincerely appreciate your help!

Mining Data from Twitter (and replies to Tweets) with Tweepy

I recently met someone who is interested in mining data from Twitter. In addition to mining data from Twitter however, they’re also interested in collecting all of the replies. I thought that I would try giving it a shot and sharing what I learn.

Note: This post assumes that Python is installed on your computer. If you haven’t installed Python, this Python Wiki walks you through the process.

To scrape tweets from Twitter, I recommend using Tweepy, but there are several other options. To install tweepy:

pip install tweepy

*Note: If your environments are configured like mine, you may need to type: conda install -c conda-forge tweepy

Now, go to Twitter’s developer page to register your app (you will have to sign in with your username and password, or sign up with a new username). You should see a button on the right-hand side of the page that says “Create New App.” Fill out the necessary fields (i.e. name of the app; it’s description; your website) and then check the box that says you agree to their terms, which I linked to above. If you don’t have a publicly accessible website, just list the web address that is hosting your app (e.g., link to your school profile; link to your work website). You can likely ignore the Callback URL field, unless you are allowing users to log into your app to authenticate themselves. In which case, enter the URL where they would be returned after they’ve given permission to Twitter to use your app.

After registering your app, you should see a page where you can create your access token. Click the “Create my access token” button. If you don’t see this button after a few seconds, refresh the page. The next page will ask you what type of access you need. For this example, we will need Read, Write, and Access Direct Messages. Now, note your OAuth settings, particularly your Consumer Key, Consumer Secret, OAuth Access Token, and OAuth Access Token Secret. Don’t share this information with anyone!

Next, import tweepy and use the OAuth interface to collect data.

import tweepy
from tweepy import OAuthHandler

consumer_key = 'YOUR-CONSUMER-KEY'
consumer_secret = 'YOUR-CONSUMER-SECRET'
access_token = 'YOUR-ACCESS-TOKEN'
access_secret = 'YOUR-ACCESS-SECRET'

auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)

api = tweepy.API(auth,  wait_on_rate_limit=True)

To collect tweets that are currently being shared on Twitter. For example, tweets that are under “StarwarsDay”. Note that I’ve asked Python to format the tweets by first listing the the user and screen name, and then their tweet. If you want ALL the meta data, remove the specifications that I wrote.

#you'll need to import json to run this script
import json
class PrintListener(tweepy.StreamListener):
    def on_data(self, data):
        # Decode the JSON data
        tweet = json.loads(data)

        # Print out the Tweet
        print('@%s: %s' % (tweet['user']['screen_name'], tweet['text'].encode('ascii', 'ignore')))

    def on_error(self, status):
        print(status)


if __name__ == '__main__':
    listener = PrintListener()

    # Show system message
    print('I will now print Tweets containing "StarWarsDay"! ==>')

    # Authenticate
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_secret)

    # Connect the stream to our listener
    stream = tweepy.Stream(auth, listener)
    stream.filter(track=['StarWarsDay'], async=True)

Here’s a brief glimpse of what I got:

I will now print Tweets containing "StarWarsDay"! ==>
@Amanda33441401: b'RT @givepennyuk: This #StarWarsDay we want you to feel inspired by the force to think up a Star Wars fundraiser!\n\n  Star Wars Movie Marat'
@jin_keapjjang: b'RT @HamillHimself: People around the world are marking #StarWarsDay in spectacular style. #MayThe4thBeWithYou https://t.co/BM02D965Xa via @'
@Bradleyg1996G: b'RT @NHLonNBCSports: MAY THE PORGS BE WITH YOU\n\n#StarWarsDay #MayThe4thBeWithYou https://t.co/HWAmptYND5'
@thays_jeronimo: b"RT @g1: 'May the 4th'  celebrado por fs de 'Star Wars' https://t.co/ggNhaEQCPV #MayThe4thBeWithYou #StarWarsDay #G1 https://t.co/wUY74DZL"
@DF_SomersetKY: b"If you're a fan of the franchise, you're going to love all of this Star Wars gear for your car! Tweet us your favor https://t.co/PteAtqS1Ui"
@zakrhssn: b'RT @williamvercetti: #StarWarsDay https://t.co/fgHZzTZ0Fm'
@kymaticaa: b'RT @Electric_Forest: May The Forest be with you.  #ElectricForest #StarWarsDay #StarWars https://t.co/bfQnZHI8eX'
@hullodave: b'"Only Imperial Stormtroopers are this precise" How precise? Not very? But why? Science! #StarWarsDay https://t.co/niZ2h6ssnp'

To store the data you just collected, rather than printing it, you’ll have to add some extra code (see text in red)

import csv
import json
class PrintListener(tweepy.StreamListener):
    def on_data(self, data):
        # Decode the JSON data
        tweet = json.loads(data)

        # Print out the Tweet
        print('@%s: %s' % (tweet['user']['screen_name'], tweet['text'].encode('ascii', 'ignore')))

        with open('StarWarsDay.csv','a') as f:
                f.write(data)
        except:
            pass

    def on_error(self, status):
        print(status)


if __name__ == '__main__':
    listener = PrintListener()

    # Show system message
    print('I will now print Tweets containing "StarWarsDay"! ==>')

    # Authenticate
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_secret)

    # Connect the stream to our listener
    stream = tweepy.Stream(auth, listener)
    stream.filter(track=['StarWarsDay'], async=True)

Now, let’s see if it’s possible to get replies to a tweet. I checked these users (listed above) and none of them have any replies. So I decided to just search #StarWarsDay on Twitter instead. I immediately found the twitter handle for Arrested Development, which has replies (at the time of this writing 9) to their tweet that includes #StarWarsDay:

Go see a Star War #MayTheForceBeWithYou #StarWarsDay pic.twitter.com/OLcmCAEl30

— Arrested Development (@bluthquotes) May 4, 2018

Look at the hyperlink: [“https://twitter.com/bluthquotes/status/992433028155654144”%5D
You can see that the user we are interested in is @bluthquotes and that the id for this particular tweet is “992433028155654144”

To get tweets from just @bluthquotes, you would type

bluthquotes_tweets = api.user_timeline(screen_name = 'bluthquotes', count = 100)

for status in bluthquotes_tweets:
print(status)

To see all the replies to @bluthquotes tweet posted above:


replies=[] 
non_bmp_map = dict.fromkeys(range(0x10000, sys.maxunicode + 1), 0xfffd)  

for full_tweets in tweepy.Cursor(api.user_timeline,screen_name='bluthquotes',timeout=999999).items(10):
  for tweet in tweepy.Cursor(api.search,q='to:bluthquotes', since_id=992433028155654144, result_type='recent',timeout=999999).items(1000):
    if hasattr(tweet, 'in_reply_to_status_id_str'):
      if (tweet.in_reply_to_status_id_str==full_tweets.id_str):
        replies.append(tweet.text)
  print("Tweet :",full_tweets.text.translate(non_bmp_map))
  for elements in replies:
       print("Replies :",elements)
  replies.clear()

*Note: change the .items(10) line to get more replies. Remember that Twitter limits you to 100 per hour (at least at the time of this writing).

This is what I got:

Tweet : @HulkHogan You ok hermano?
Tweet : Go see a Star War #MayTheForceBeWithYou #StarWarsDay https://t.co/OLcmCAEl30
Replies : @bluthquotes @mlot11
Replies : @bluthquotes Oh my yes
Replies : @bluthquotes Star Wars needs more gay characters #jarjarXobama
Replies : @bluthquotes @TheSAPeacock May the 4th be with you!!!!
Replies : @bluthquotes @SaraAnneGill
Replies : @bluthquotes @kurkobains Por q xuxa la sacaron de #Netflix
Replies : @bluthquotes  https://t.co/Qv4KJJ6dFU
Replies : @bluthquotes @hiagorecanello
Replies : @bluthquotes No it’s #CincoDeCuatro
Replies : @bluthquotes @auburnhays 😂
Replies : @bluthquotes Go see a Star War on Cinco de Quatro! 🤠🌶️🍹
Replies : @bluthquotes @jmdroberts
Tweet : RT @arresteddev: Hey, hermanos! It's Cinco de Cuatro! Season 4 Remix is now streaming. https://t.co/Alw0Z2Zwlm
Tweet : Keep fighting little guy #StarWarsDay  #MayThe4thBeWithYou https://t.co/Uim4D2BP49
Replies : @bluthquotes "worth every penny"
Replies : @bluthquotes You’re still doin’ that?
Replies : @bluthquotes Im crying 😭
Replies : @bluthquotes @theJdog 😂😂😂😂😂
Tweet : @JTHM8008 @herooine @JeffEisenband @ZachAJacobson I say HUZZAH! like this at least 5 times a week.
Tweet : @herooine @JeffEisenband @ZachAJacobson Checks out ✅
Tweet : @gjb512 It’s there already. Huzzah!
Tweet : @drkatiemd_ @MitchHurwitz It’s a wonderful program!
Tweet : I prematurely blue myself  #EmbarrassmentIn4Words https://t.co/QYUFeSKFT2
Tweet : @sebastrivi @VICE @arresteddev I’m not on board

You can see that it’s not quite what I wanted, which is just the responses to the Star Wars tweet (see red text). According to the API reference page, there should be a way to limit the returned text to the replies to the specific tweet we are interested, but I will have to continue tinkering with it. I’ll post an update when I figure it out.

Data Visualization in Python

Sharing a visualization that I made with Python, in Jupyter Notebook.

First, import the following libraries:

# Set up libraries
%matplotlib notebook

import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
sns.set_style("white")
import pandas as pd
my_dpi=96

Then import data and make scatter plots for each year of life expectancy data, courtesy of Gapminder:

# Get Gapminder Life Expectancy data (csv file is hosted on the web)
url = 'https://python-graph-gallery.com/wp-content/uploads/gapminderData.csv'
data = pd.read_csv(url)
 
# Transform Continent into numerical values group1->1, group2->2...
data['continent']=pd.Categorical(data['continent'])
 
# For each year:
for i in data.year.unique():
 
# initialize a figure
fig = plt.figure(figsize=(680/my_dpi, 480/my_dpi), dpi=my_dpi)
 
# Change color for the x-axis values
tmp=data[ data.year == i ]
plt.scatter(tmp['lifeExp'], tmp['gdpPercap'] , s=tmp['pop']/200000 , c=tmp['continent'].cat.codes, cmap="Accent", alpha=0.6, edgecolors="white", linewidth=2)
 
# Add titles (main and on axis)
plt.yscale('log')
plt.xlabel("Life Expectancy")
plt.ylabel("GDP per Capita")
plt.title("Year: "+str(i) )
plt.ylim(0,100000)
plt.xlim(30, 90)
 
# Save the results
filename='Gapminder_step'+str(i)+'.png'
plt.savefig(filename, dpi=96)
plt.gca()

Next, download and install ImageMagick to make the following gif, by typing in your (Windows 10) command prompt:

magick convert.exeGapminder*.png animated_gapminder.gif

–If you have any issues configuring ImageMagick, like I did, you may find this link useful.

Note: You can make a gif using Matplotlib or moviepy, but I couldn’t quite figure it out. I will update once I do.

Stella Min

Demography

Tag: Python

Plotly 3.0.0 in Jupyter Notebook

Mining Data from Twitter (and replies to Tweets) with Tweepy

Data Visualization in Python

Menu