Twitter API
API Authentication
The package tweepy
is great at handling all the Twitter API OAuth Authentication details for you. All you need to do is pass it your authentication credentials. In this interactive exercise, we have created some mock authentication credentials (if you wanted to replicate this at home, you would need to create a Twitter App as Hugo detailed in the video). Your task is to pass these credentials to tweepy's OAuth handler.
- Import the package
tweepy
. - Pass the parameters
consumer_key
andconsumer_secret
to the functiontweepy.OAuthHandler()
. - Complete the passing of OAuth credentials to the OAuth handler
auth
by applying to it the methodset_access_token()
, along with argumentsaccess_token
andaccess_token_secret
- Create your
Stream
object with authentication by passingtweepy.Stream()
the authentication handlerauth
and the Stream listenerl
; - To filter Twitter streams, pass to the
track
argument instream.filter()
a list containing the desired keywords'clinton'
,'trump'
,'sanders'
, and'cruz'
.
Now that you've got your Twitter data sitting locally in a text file, it's time to explore it! This is what you'll do in the next few interactive exercises. In this exercise, you'll read the Twitter data into a list: tweets_data
.
Be aware that this is real data from Twitter and as such there is always a risk that it may contain profanity or other offensive content (in this exercise, and any following exercises that also use real Twitter data).
- Assign the filename
'tweets.txt'
to the variabletweets_data_path
. - Initialize
tweets_data
as an empty list to store the tweets in. - Within the
for
loop initiated byfor line in tweets_file:
, load each tweet into a variable,tweet
, usingjson.loads()
, then appendtweet
totweets_data
using theappend()
method. - Hit submit and check out the keys of the first tweet dictionary printed to the shell.
Now that you have your DataFrame of tweets set up, you're going to do a bit of text analysis to count how many tweets contain the words 'clinton'
, 'trump'
, 'sanders'
and 'cruz'
. In the pre-exercise code, we have defined the following function word_in_text()
, which will tell you whether the first argument (a word) occurs within the 2nd argument (a tweet).
import re
def word_in_text(word, text):
word = word.lower()
text = text.lower()
match = re.search(word, text)
if match:
return True
return False
You're going to iterate over the rows of the DataFrame and calculate how many tweets contain each of our keywords! The list of objects for each candidate has been initialized to 0.
- Within the
for
loopfor index, row in df.iterrows():
, the code currently increases the value ofclinton
by1
each time a tweet (text row) mentioning 'Clinton' is encountered; complete the code so that the same happens fortrump
,sanders
andcruz
. - Initialize list to store tweet counts[clinton, trump, sanders, cruz] = [0, 0, 0, 0]# Iterate through df, counting the number of tweets in which# each candidate is mentionedfor index, row in df.iterrows():clinton += word_in_text('clinton', row['text'])trump += word_in_text('trump',row['text'])sanders += word_in_text('sanders', row['text'])cruz += word_in_text('cruz', row['text'])
Comments
Post a Comment