Pandas in 10 Minutes — part2
Reading input Files
- CSV file
pd.read_csv('file.csv')
2. EXCEL file
i. reading a file containing single sheet
pd.read_excel('file.excel')
ii. reading a file containing multiple sheets
excel = pd.ExcelFile('file.xls')
df = pd.read_excel(excel, 'Sheet1')
Retrieving basic information about a Series/DataFrame
- Shape
df.shape
(rows, columns)
2. Describe index
df1.index
3. Describe Data Frame columns
df1.columns
4. Information about the Data Frame
df1.info()
5. Description about the data frame
It only works on columns with numeric data type
df1.describe()
6. Sum
df1.sum()
7. Minimum
df1.min()
8. Maximum
df1.max()
Mostly used function from pandas library
- Detecting missing values
DataFrame.isnull()
2. Fill null values
DataFrame.fillna([value, axis, …])
3. Remove missing values.
DataFrame.dropna()
4. Round a Data Frame to a variable number of decimal places.
DataFrame.round([decimals])
5. printing first 5 rows of a Data Frame
DataFrame.head()
6. Printing last 5 rows of a Data Frame
DataFrame.tail()
7. Checking weather each element in the Data Frame is contained in the values
DataFrame.isin(values)
8. Applying a function on Data Frame
DataFrame.apply(func[, axis, raw, …])
9. Group Data Frame using a mapper or by a Series of columns.
DataFrame.groupby(condition)
10. Sorting
i. Sort by labels along an axis
df.sort_index()
ii. Sort by the values along an axis
df.sort_values(by=column)