image by Mika Baumeister on unsplash

Reading input Files

  1. CSV file

2. EXCEL file

i. reading a file containing single sheet


ii. reading a file containing multiple sheets

excel = pd.ExcelFile('file.xls')
df = pd.read_excel(excel, 'Sheet1')

Retrieving basic information about a Series/DataFrame

  1. Shape

image by Ilona Froehlich on unsplash

Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.

Pandas is build on top of NumPy and Matplolib python libraries.

Installing Pandas

If you Have Anaconda installed in your System, then you can simply install from your terminal or command prompt using:

conda install pandas

Otherwise, if pip is installed in your system, then you can install it from your terminal or command prompt using:

pip install pandas

Importing pandas

import pandas as pd

Instead of writing “pandas.” and using the method inside pandas…

image by PhotosTheArt from Unsplash

Importing the required modules

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
import matplotlib.pyplot as plt
import joblib
from sklearn.preprocessing import LabelEncoder
import seaborn as sns
import pandas as pd
import numpy as np

Loading the dataset

df = pd.read_csv(“datasets/weatherHistory.csv”)

photo by Isaac Smith on Unsplash

Linear regression is one of the most well-known and well-understood algorithms in statistics and machine learning. Before going to linear regression let’s understand what is Regression.

What is Regression?

Regression falls under the supervised learning category. The main goal of regression is the construction of an efficient model to predict the dependent attributes from a bunch of attribute variables. A regression problem is used when the output variable is either real or a continuous value i.e salary, score, weight, etc. It tries to draw the line that best fit from the data gathered from several points.

Common Types Of Regression

The following are common types of regression.

photo by Armands Brants on Unsplash

Importing the required modules

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn.naive_bayes import GaussianNB
import matplotlib.pyplot as plt
from sklearn.svm import SVC
import pandas as pd
import numpy as np

Loading dataset

df = pd.read_csv(“datasets/winequality-red.csv”)

Understanding the dataset


photo by Edgar Soto unsplash

Importing Required Libraries.

from sklearn.ensemble import RandomForestRegressor, AdaBoostRegressor, ExtraTreesRegressor, GradientBoostingRegressor
from sklearn.metrics import mean_squared_error, r2_score,mean_absolute_error
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
from sklearn import ensemble
import sklearn.naive_bayes
import seaborn as sns
import pandas as pd
import sklearn

Loading the Dataset

data_path= 'datasets/diamonds.csv'
diamonds_org = pd.read_csv(data_path)

photo by Michael Browning on Unsplash

The Boston housing data was collected in 1978 and each of the 506 entries represent aggregated data about 14 features for homes from various suburbs in Boston, Massachusetts.

Boston housing price dataset is available in sklearn package, so we are just importing and using it.

import pandas as pd
from sklearn.datasets import load_boston

Loading the dataset

boston = load_boston()


{'data': array([[6.3200e-03, 1.8000e+01, 2.3100e+00, ..., 1.5300e+01, 3.9690e+02, 4.9800e+00], [2.7310e-02, 0.0000e+00, 7.0700e+00, ..., 1.7800e+01, 3.9690e+02, 9.1400e+00], [2.7290e-02, 0.0000e+00, 7.0700e+00, ..., 1.7800e+01, 3.9283e+02, 4.0300e+00], ..., [6.0760e-02, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9690e+02, 5.6400e+00], [1.0959e-01, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9345e+02, 6.4800e+00], [4.7410e-02, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01…

photo by Owen Beard on unsplash

Simple Definition of Machine Learning:

Machine Learning is an Application of Artificial Intelligence (AI) it gives devices the ability to learn from their experiences and improve themselves without doing any coding. In the past decade, machine learning has given us self-driving cars, practical speech recognition, effective web search, and a vastly improved understanding of the human genome. Machine learning is so pervasive today that you probably use it dozens of times a day without knowing it. Many researchers also think it is the best way to make progress towards human-level AI.

What is Machine Learning?

Arthur Samuel coined the term Machine Learning in the year 1959. He was a…


Pursuing 3rd year of computer science engineering. Machine learning enthusiast.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store