Weather Prediction — 1. Deploying a Machine Learning Model Locally using Flask
Importing the required modules
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
import matplotlib.pyplot as plt
import joblib
from sklearn.preprocessing import LabelEncoder
import seaborn as sns
import pandas as pd
import numpy as np
Loading the dataset
df = pd.read_csv(“datasets/weatherHistory.csv”)
df.head(3)
Preprocessing the data
checking the datatypes of each column
df.dtypes
Checking for the null values in the data frame
df.isnull().sum()
As we can see “Precip Type” column contains 517 null rows, so we are simply dropping that column. And we are even dropping some of the other columns because they are irrelavent to the model we want to develop.
df = df.drop([‘Precip Type’, ‘Formatted Date’, ‘Apparent Temperature ©’, ‘Daily Summary’], axis= 1)df.head()
Data Visualization
We are plotting a heatmap to check the relation between the variables or simply columns
f,ax = plt.subplots(figsize=(8, 8))
sns.heatmap(df.corr(), annot=True,ax=ax)
plt.show()
df[‘Temperature ©’].hist()
df[‘Humidity’].hist()
df[‘Wind Speed (km/h)’].hist()
df['Loud Cover'].hist()
As we can see in the above plot the value of “Loud Cover” is not changing so we drop this column.
df = df.drop(['Loud Cover'], axis = 1)
now we will plot the boxplot of pressure
plt.boxplot(df['Pressure (millibars)'])
In the above graph, we can see that the pressure value is ‘0’ which invalid so let’s replace it with the mean value.
pressure_median = df['Pressure (millibars)'].median()
def pressure(x):
if x==0:
return x + pressure_median
else:
return x
df["Pressure (millibars)"] = df.apply(lambda row:pressure(row["Pressure (millibars)"]) , axis = 1)
Now lets see the boxplot
plt.boxplot(df['Pressure (millibars)'])
df.head()
The values in the data frame are containing many decimal place so lets round them.
df = df.round({"Temperature (C)" : 3, "Wind Speed (km/h)" : 0, "Pressure (millibars)" : 0, "Visibility (km)" : 0})
df.head()
As we know the model only takes numerical values as input but the “Summary” column is not numeric, so lets label encode the column.
le = LabelEncoder()
df["Summary"] = le.fit_transform(df['Summary'])
df.head()
Splitting the dataFrame
X = df.drop([‘Temperature ©’], axis=1)
y = df['Temperature (C)']X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=21)X_test.head()
model = RandomForestRegressor()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
y_pred
Saving the model
joblib.dump(model,’model.pkl’)