Posts

Showing posts from August, 2020

Slicing and Indexing in Python Pandas

 # Make a list of cities to subset on cities = ["Moscow", "Saint Petersburg"] # Subset temperatures using square brackets print(temperatures[temperatures["city"].isin(cities)]) # Subset temperatures_ind using .loc[] print(temperatures_ind.loc[cities]) temperatures_ind = temperatures.set_index(['country','city']) # List of tuples: Brazil, Rio De Janeiro & Pakistan, Lahore rows_to_keep = [('Brazil','Rio De Janeiro'),('Pakistan','Lahore')] # Subset for rows to keep print(temperatures_ind.loc[rows_to_keep]) # Sort temperatures_ind by index values print(temperatures_ind.sort_index()) # Sort temperatures_ind by index values at the city level print(temperatures_ind.sort_index(level="city")) # Sort temperatures_ind by country then descending city print(temperatures_ind.sort_index(level=["country", "city"], ascending = [True, False])) Slice rows with code like  df.loc[("a", ...

Python Syntax and Functions Part2 (Summary Statistics)

The  .agg()  method allows you to apply your own custom functions to a DataFrame, as well as apply functions to more than one column of a DataFrame at once, making your aggregations super efficient. In the custom function for this exercise, "IQR" is short for inter-quartile range, which is the 75th percentile minus the 25th percentile. It's an alternative to standard deviation that is helpful if your data contains   import numpy as np def iqr(column):     return column.quantile(0.75) - column.quantile(0.25) # Update to print IQR and median of temperature_c, fuel_price_usd_per_l, & unemployment print(sales[["temperature_c", "fuel_price_usd_per_l", "unemployment"]].agg([iqr, np.median])) Calulating Cumulative Statistics Syntax = df.drop_duplicates sales_1_1 = sales_1_1.sort_values("date") # Get the cumulative sum of weekly_sales, add as cum_weekly_sales col sales_1_1["cum_weekly_sales"] = sales_1_1["weekly_sales...

Python Pandas Function and Syntax 1 (Filtering and Sorting)

  .values : A two-dimensional NumPy array of values. .columns : An index of columns: the column names. .index : An index for the rows: either row numbers or row #print(homelessness.values) # Print the column index of homelessness print(homelessness.columns) # Print the row index of homelessness print(homelessness.index) Sorting in Pandas one column df.sort_values("breed") multiple columns df.sort_values(["breed", "weight_kg"]) homelessness_reg_fam = homelessness.sort_values(["region", "family_members"], ascending=[True, False]) Subsetting Rows dogs[dogs["height_cm"] > 60] dogs[dogs["color"] == "tan"] You can filter for multiple conditions at once by using the "logical and" operator,  & . dogs[(dogs["height_cm"] > 60) & (dogs["col_b"] == "tan")] homelessness  is available and  pandas  is loaded as  pd . fam_lt_1k_pac = homelessness[(homelessness['fami...

Data Science Syntax from DataCamp

 for key,value in europe.items() Use .items to iterate in a dictionary using for loop If you're dealing with a 2D Numpy array, it's more complicated. A 2D array is built up of multiple 1D arrays. To explicitly iterate over all separate elements of a multi-dimensional array, you'll need this syntax: for x in np.nditer(my_array) : for a in np.nditer(np_baseball): print(a)f Iterating over a Pandas DataFrame is typically done with the  iterrows()  method. Used in a  for  loop, every observation is iterated over and on every iteration the row label and actual row contents are available: Eg -  for lab, row in cars.iterrows():     print(lab)     print(row)   # Code for loop that adds COUNTRY column for lab, row in cars.iterrows(): cars.loc[lab, 'COUNTRY'] = (row['country'].upper())  # Use .apply(str.upper) for lab, row in cars.iterrows() : cars["COUNTRY"] = cars['country'].apply(str.upper)

Thred Shed Project String Method Codeacademy

daily_sales  =  \ """Edith Mcbride   ;,;$1.21   ;,;   white ;,;  09/15/17   ,Herbert Tran   ;,;   $7.29;,;  white&blue;,;   09/15/17 ,Paul Clarke ;,;$12.52  ;,;   white&blue ;,; 09/15/17 ,Lucille Caldwell    ;,;   $5.13   ;,; white   ;,; 09/15/17, Eduardo George   ;,;$20.39;,; white&yellow  ;,;09/15/17   ,   Danny Mclaughlin;,;$30.82;,;    purple ;,;09/15/17 ,Stacy Vargas;,; $1.85   ;,;  purple&yellow ;,;09/15/17,   Shaun Brock;,;  $17.98;,;purple&yellow ;,; 09/15/17 ,  Erick Harper ;,;$17.41;,; blue ;,; 09/15/17,...

String methods python codeacademy

  highlighted_poems  =  "Afterimages:Audre Lorde:1997,   The Shadow:William Carlos Williams:1915, Ecstasy:Gabriela Mistral:1925,    Georgia Dusk:Jean Toomer:1923,   Parting Before Daybreak:An Qi:2014, The Untold Want:Walt Whitman:1871, Mr. Grumpledump's Song:Shel Silverstein:2004,  Angel Sound Mexico City:Carmen Boullosa:2013, In Love:Kamala Suraiyya:1965, Dream Variations:Langston Hughes:1994, Dreamwood:Adrienne Rich:1987" # print(highlighted_poems) highlighted_poems_list  =  highlighted_poems . split ( ',' ) # print(highlighted_poems_list) highlighted_poems_stripped  = [] for   poem   in   highlighted_poems_list :    highlighted_poems_stripped . append ( poem . strip ())    # print(highlighted_poems_strip...

SQL Date and time Function

The  HH  is the hour of the day, from 0 to 23. The  MM  is the minute of the hour, from 0 to 59. The  SS  is the seconds within a minute, from 0 to 59. This returns the seconds,  SS , of the timestamp column! For  strftime(__, timestamp) : %Y  returns the year (YYYY) %m  returns the month (01-12) %d  returns the day of the month (1-31) %H  returns 24-hour clock (00-23) %M  returns the minute (00-59) %S  returns the seconds (00-59) if  sign_up_at  format is  YYYY-MM-DD HH:MM:SS .

PYTHON FUNCTION WITH RETURN

train_mass  =  22680 train_acceleration  =  10 train_distance  =  100 bomb_mass  =  1 def f_to_c ( f_temp ):    c_temp  = ( f_temp  -  32 ) *  5 / 9    return   c_temp   f100_in_celsius  =  f_to_c ( 100 ) def c_to_f ( c_temp ):    f_temp  =  c_temp  * ( 9 / 5 ) +  32    return   f_temp c0_in_fahrenheit  =  c_to_f ( 0 )  print( c0_in_fahrenheit ) def get_force ( mass , acceleration ):    return   mass  *  acceleration train_force  =  get_force ( train_mass , train_acceleration ) #print("The GE train supplies " + str(train_force) + " Newtons of force.") def get_energy ( mass , c  =  3 * 10 ** 8 ):    return   mass  *  c bomb_energy  =  get_energy ( bomb_mass )...