Weather Prediction — 1. Deploying a Machine Learning Model Locally using Flask

VARSHITHA GUDIMALLA
4 min readMay 1, 2021
image by PhotosTheArt from Unsplash

Importing the required modules

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
import matplotlib.pyplot as plt
import joblib
from sklearn.preprocessing import LabelEncoder
import seaborn as sns
import pandas as pd
import numpy as np

Loading the dataset

df = pd.read_csv(“datasets/weatherHistory.csv”)
df.head(3)

Preprocessing the data

checking the datatypes of each column

df.dtypes

Checking for the null values in the data frame

df.isnull().sum()

As we can see “Precip Type” column contains 517 null rows, so we are simply dropping that column. And we are even dropping some of the other columns because they are irrelavent to the model we want to develop.

df = df.drop([‘Precip Type’, ‘Formatted Date’, ‘Apparent Temperature ©’, ‘Daily Summary’], axis= 1)df.head()

Data Visualization

We are plotting a heatmap to check the relation between the variables or simply columns

f,ax = plt.subplots(figsize=(8, 8))
sns.heatmap(df.corr(), annot=True,ax=ax)
plt.show()
df[‘Temperature ©’].hist()
df[‘Humidity’].hist()
df[‘Wind Speed (km/h)’].hist()
df['Loud Cover'].hist()

As we can see in the above plot the value of “Loud Cover” is not changing so we drop this column.

df = df.drop(['Loud Cover'], axis = 1)

now we will plot the boxplot of pressure

plt.boxplot(df['Pressure (millibars)'])

In the above graph, we can see that the pressure value is ‘0’ which invalid so let’s replace it with the mean value.

pressure_median = df['Pressure (millibars)'].median()

def pressure(x):
if x==0:
return x + pressure_median
else:
return x

df["Pressure (millibars)"] = df.apply(lambda row:pressure(row["Pressure (millibars)"]) , axis = 1)

Now lets see the boxplot

plt.boxplot(df['Pressure (millibars)'])
df.head()

The values in the data frame are containing many decimal place so lets round them.

df = df.round({"Temperature (C)" : 3, "Wind Speed (km/h)" : 0, "Pressure (millibars)" : 0, "Visibility (km)" : 0})
df.head()

As we know the model only takes numerical values as input but the “Summary” column is not numeric, so lets label encode the column.

le = LabelEncoder()
df["Summary"] = le.fit_transform(df['Summary'])
df.head()

Splitting the dataFrame

X = df.drop([‘Temperature ©’], axis=1)
y = df['Temperature (C)']
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=21)X_test.head()
model = RandomForestRegressor()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
y_pred

Saving the model

joblib.dump(model,’model.pkl’)

--

--

VARSHITHA GUDIMALLA

Computer Science Engineering Graduate || Machine Learning enthusiast.