Slicing and Indexing in Python Pandas

 # Make a list of cities to subset on

cities = ["Moscow", "Saint Petersburg"]


# Subset temperatures using square brackets

print(temperatures[temperatures["city"].isin(cities)])


# Subset temperatures_ind using .loc[]

print(temperatures_ind.loc[cities])



temperatures_ind = temperatures.set_index(['country','city'])


# List of tuples: Brazil, Rio De Janeiro & Pakistan, Lahore

rows_to_keep = [('Brazil','Rio De Janeiro'),('Pakistan','Lahore')]


# Subset for rows to keep

print(temperatures_ind.loc[rows_to_keep])

# Sort temperatures_ind by index values

print(temperatures_ind.sort_index())


# Sort temperatures_ind by index values at the city level

print(temperatures_ind.sort_index(level="city"))


# Sort temperatures_ind by country then descending city

print(temperatures_ind.sort_index(level=["country", "city"], ascending = [True, False]))



  • Slice rows with code like df.loc[("a", "b"):("c", "d")].
  • Slice columns with code like df.loc[:, "e":"f"].
  • Slice both ways with code like df.loc[("a", "b"):("c", "d"), "e":"f"]

  • Slicing for Time Series Analysis (Date)
  • Subsetting via Boolean conditions takes the form df[(condition1) & (condition2)].
  • Dates in 2010-2011 are between 2010-01-01 and 2011-12-31.
  • Use .set_index() to set the index.
  • Subsetting via .loc[] takes the form df["first":"last"]
temperatures_bool = temperatures[(temperatures["date"] >= "2010-01-01") & (temperatures["date"] <= "2011-12-31")]
print(temperatures_bool)

# Set date as an index
temperatures_ind = temperatures.set_index("date")

# Use .loc[] to subset temperatures_ind for rows in 2010 and 2011
print(temperatures_ind.loc["2010":"2011"])

# Use .loc[] to subset temperatures_ind for rows from Aug 2010 to Feb 2011
print(temperatures_ind.loc["2010-08":"2011-02"])

Comments

Popular posts from this blog

Binomial Test in Python

Python Syntax and Functions Part2 (Summary Statistics)