Pandas in 10 Minutes— Part-1
Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
Pandas is build on top of NumPy and Matplolib python libraries.
Installing Pandas
If you Have Anaconda installed in your System, then you can simply install from your terminal or command prompt using:
conda install pandas
Otherwise, if pip is installed in your system, then you can install it from your terminal or command prompt using:
pip install pandas
Importing pandas
import pandas as pd
Instead of writing “pandas.” and using the method inside pandas we can simply write “pd.”. so we are importing it as “pd”.
Data Structures in pandas
Series
It is a 1-Dimensional data structure which is very similar to array.
This is a list in python
a = [3, 5, 2.71, -9.4, 8.432]
print(type(a))
a
Creating a Series
1.
s = Series(a)
print(type(s))
s
2. In series the indices can be any values.
idx = ['a', 'd', 'f', 'h', 'i', 't']
idx
s1 = Series(a, idx) #Series(data, index)
s1
3. Using dictionary
dic = {"a" : 6, "b": 7, "c": "Disha", "d" : 30}
dic
s2 = Series(dic)
s2
Accessing the elements of Series
s[3]
s2[“name”]
Arithmetic Operations on Series
s1 + s2
s2 + s2
s1 — s2
Pandas gives output as NaN when it is unable to find a match.
s — s
s * s
s / s
Data Frame
Data frame is a 2-Dimensional data structure where the data is aligned in tabular form.
Mostly used data structure in Pandas
Creating a data structure
import numpy as np #another most important python library
df = pd.DataFrame(np.random.randn(5, 2), columns = list('AB'))
df
sample = {'name' : ['Riya', 'Sandy', 'Tonny', 'Alex'
'age' : [14, 24, 30, 38],
sample = {'name' : ['Riya', 'Sandy', 'Tonny', 'Alex'],
'age' : [14, 24, 30, 38],
'country' : ['India', 'New Zealand', 'Russia', 'Bangladesh']}
#creating a dataframe using dictionary
df1 = pd.DataFrame(sample)
df1
Selecting columns from a data frame
df1['name']
df1['name', 'age']
Adding columns
df1['year'] = [2006, 1996, 1990, 1982]
df1
Removing or dropping a column
df1 = df1.drop(‘year’, axis = 1)
df1
for dropping multiple columns we can place multiple columns in a list as shown below:
df1 = df1.drop([column1, column2, axis = 1)
Removing or dropping rows
df1 =df1.drop(3)
df1
axis = 1 refers to columns and axis = 0 refers to rows by default axis value is ‘0’.
Access elements
- accessing one column
df1['name']
2. accessing multiple columns
df1[['name', 'age']]
3. Accessing columns of a Data Frame based on certain condition
df1[df1['age'] > 18] # df1[condition]