Python Data Science Coding Reference

Posts

Showing posts from October, 2020

Twitter API

October 30, 2020

API Authentication The package tweepy is great at handling all the Twitter API OAuth Authentication details for you. All you need to do is pass it your authentication credentials. In this interactive exercise, we have created some mock authentication credentials (if you wanted to replicate this at home, you would need to create a Twitter App as Hugo detailed in the video). Your task is to pass these credentials to tweepy's OAuth handler. Import the package tweepy . Pass the parameters consumer_key and consumer_secret to the function tweepy.OAuthHandler() . Complete the passing of OAuth credentials to the OAuth handler auth by applying to it the method set_access_token() , along with arguments access_token and access_token_secret import tweepy # Store OAuth authentication credentials in relevant variables access_token = "1092294848-a...

Web Scraping using JSON and API

October 30, 2020

JSON–from the web to Python Wow, congrats! You've just queried your first API programmatically in Python and printed the text of the response to the shell. However, as you know, your response is actually a JSON, so you can do one step better and decode the JSON. You can then print the key-value pairs of the resulting dictionary. That's what you're going to do now! Pass the variable url to the requests.get() function in order to send the relevant request and catch the response, assigning the resultant response message to the variable r . Apply the json() method to the response object r and store the resulting dictionary in the variable json_data . # Import package import requests # Assign URL to variable: url url = 'http://www.omdbapi.com/?apikey=72bc447a&t=social+network' # Package the request, send the request and catch the...

Webscrapping using BeautifulSoup

October 28, 2020

Assign the URL of interest to the variable url . Package the request to the URL, send the request and catch the response with a single function requests.get() , assigning the response to the variable r . Use the text attribute of the object r to return the HTML of the webpage as a string; store the result in a variable html_doc . Create a BeautifulSoup object soup from the resulting HTML using the function BeautifulSoup() . Use the method prettify() on soup and assign the result to pretty_soup . # Import packages import requests from bs4 import BeautifulSoup # Specify url: url url = 'https://www.python.org/~guido/' # Package the request, send the request and catch the response: r r = requests . get ( url ) # Extracts the response as...

Web Scrapping using request

October 28, 2020

Now that you know the basics behind HTTP GET requests, it's time to perform some of your own. In this interactive exercise, you will ping our very own DataCamp servers to perform a GET request to extract information from the first coding exercise of this course, "https://campus.datacamp.com/courses/1606/4135?ex=2" . Import the functions urlopen and Request from the subpackage urllib.request . Package the request to the url "https://campus.datacamp.com/courses/1606/4135?ex=2" using the function Request() and assign it to request . Send the request and catch the response in the variable response with the function urlopen() . Run the rest of the code to see the datatype of response and to close the connection! Code - # Import packages from urllib . request import urlopen , Request # Specify the url url = "https://campus.datac...

Importing Data From Web in Python

October 28, 2020

# Import package from urllib . request import urlretrieve # Import pandas import pandas as pd # Assign url of file: url url = 'https://s3.amazonaws.com/assets.datacamp.com/production/course_1606/datasets/winequality-red.csv' # Save file locally urlretrieve ( url , 'winequality-red.csv' ) # Read file into a DataFrame and print its head df = pd . read_csv ( 'winequality-red.csv' , sep = ';' ) print ( df . head ()) Importing excel file online # Import package import pandas as pd # Assign url of file: url url = 'http://s3.amazonaws.com/assets.datacamp.com/course/importing_data_into_r/latitude.xls' # Read in all sheets of Excel file: xls xls = pd . read_excel ( url , sheet_name = None ...

Using Pandas to connect to SQL and write SQL queries

October 28, 2020

Exercise Exercise Pandas and The Hello World of SQL Queries! Here, you'll take advantage of the power of pandas to write the results of your SQL query to a DataFrame in one swift line of Python code! You'll first import pandas and create the SQLite 'Chinook.sqlite' engine. Then you'll query the database to select all records from the Album table. Recall that to select all records from the Orders table in the Northwind database, Hugo executed the following command: df = pd.read_sql_query("SELECT * FROM Orders", engine) # Import packages from sqlalchemy import create_engine import pandas as pd # Create engine: engine engine = create_engine ( 'sqlite:///Chinook.sqlite' ) # Execute query and store records in DataFrame: df df = pd . read_sql_query ( "Select * FROM Album" , en...

Connecting SQL database with Python

October 28, 2020

Here, you're going to fire up your very first SQL engine. You'll create an engine to connect to the SQLite database 'Chinook.sqlite' , which is in your working directory. Remember that to create an engine to connect to 'Northwind.sqlite' , Hugo executed the command engine = create_engine('sqlite:///Northwind.sqlite') Here, 'sqlite:///Northwind.sqlite' is called the connection string to the SQLite database Northwind.sqlite . A little bit of background on the Chinook database : the Chinook database contains information about a semi-fictional digital media store in which media data is real and customer, employee and sales data has been manually created. The Hello World of SQL Queries! Now, it's time for liftoff! In this exercise, you'll perform the Hello World of SQL queries, SELECT , in order to retrieve all columns of the table Album in the Chinook database. Recall that the query SELECT * selects all c...

Python Analysing Police Activity with Pandas

October 19, 2020

# Create a DataFrame of female drivers stopped for speeding female_and_speeding = ri[(ri.driver_gender == 'F') & (ri.violation == 'Speeding')] # Create a DataFrame of male drivers stopped for speeding male_and_speeding = ri[(ri.driver_gender == 'M') & (ri.violation == 'Speeding')] # Compute the stop outcomes for female drivers (as proportions) print(female_and_speeding.stop_outcome.value_counts(normalize=True)) # Compute the stop outcomes for male drivers (as proportions) print(male_and_speeding.stop_outcome.value_counts(normalize=True))

Python Writing an Iterator and a Function to add columns automation

October 18, 2020

# Define plot_pop() def plot_pop(filename, country_code): # Initialize reader object: urb_pop_reader urb_pop_reader = pd.read_csv(filename, chunksize=1000) # Initialize empty DataFrame: data data = pd.DataFrame() # Iterate over each DataFrame chunk for df_urb_pop in urb_pop_reader: # Check out specific country: df_pop_ceb df_pop_ceb = df_urb_pop[df_urb_pop['CountryCode'] == country_code] # Zip DataFrame columns of interest: pops pops = zip(df_pop_ceb['Total Population'], df_pop_ceb['Urban population (% of total)']) # Turn zip object into list: pops_list pops_list = list(pops) # Use list comprehension to create new Dat...

Python Generator

October 18, 2020

Writing a generator to load data in chunks (2) In the previous exercise, you processed a file line by line for a given number of lines. What if, however, you want to do this for the entire file? In this case, it would be useful to use generators . Generators allow users to lazily evaluate data . This concept of lazy evaluation is useful when you have to deal with very large datasets because it lets you generate values in an efficient manner by yielding only chunks of data at a time instead of the whole thing at once. In this exercise, you will define a generator function read_large_file() that produces a generator object which yields a single line from a file each time next() is called on it. The csv file 'world_dev_ind.csv' is in your current directory for your use. Note that when you open a connection to a file, the resulting file object is already a generator! So out in the wild, you won't have to ex...