Pandas in 10 Minutes — part2

VARSHITHA GUDIMALLA
2 min readMay 5, 2021
image by Mika Baumeister on unsplash

Reading input Files

  1. CSV file
pd.read_csv('file.csv')

2. EXCEL file

i. reading a file containing single sheet

pd.read_excel('file.excel')

ii. reading a file containing multiple sheets

excel = pd.ExcelFile('file.xls')
df = pd.read_excel(excel, 'Sheet1')

Retrieving basic information about a Series/DataFrame

  1. Shape
df.shape

(rows, columns)

2. Describe index

df1.index

3. Describe Data Frame columns

df1.columns

4. Information about the Data Frame

df1.info()

5. Description about the data frame

It only works on columns with numeric data type

df1.describe()

6. Sum

df1.sum()

7. Minimum

df1.min()

8. Maximum

df1.max()

Mostly used function from pandas library

  1. Detecting missing values
DataFrame.isnull()

2. Fill null values

DataFrame.fillna([value, axis, …])

3. Remove missing values.

DataFrame.dropna()

4. Round a Data Frame to a variable number of decimal places.

DataFrame.round([decimals])

5. printing first 5 rows of a Data Frame

DataFrame.head()

6. Printing last 5 rows of a Data Frame

DataFrame.tail()

7. Checking weather each element in the Data Frame is contained in the values

DataFrame.isin(values)

8. Applying a function on Data Frame

DataFrame.apply(func[, axis, raw, …])

9. Group Data Frame using a mapper or by a Series of columns.

DataFrame.groupby(condition)

10. Sorting

i. Sort by labels along an axis

df.sort_index()

ii. Sort by the values along an axis

df.sort_values(by=column)

--

--

VARSHITHA GUDIMALLA

Computer Science Engineering Graduate || Machine Learning enthusiast.