Introduction to Machine Learning
Simple Definition of Machine Learning:
Machine Learning is an Application of Artificial Intelligence (AI) it gives devices the ability to learn from their experiences and improve themselves without doing any coding. In the past decade, machine learning has given us self-driving cars, practical speech recognition, effective web search, and a vastly improved understanding of the human genome. Machine learning is so pervasive today that you probably use it dozens of times a day without knowing it. Many researchers also think it is the best way to make progress towards human-level AI.
What is Machine Learning?
Arthur Samuel coined the term Machine Learning in the year 1959. He was a pioneer in Artificial Intelligence and computer gaming, and defined Machine Learning as a “Field of study that gives computers the capability to learn without being explicitly programmed”.
Machine Learning is a subset of Artificial Intelligence. Machine Learning is the study of making machines more human-like in their behavior and decisions making by giving them the ability to learn and develop their own programs. This is done with minimum human intervention, i.e., no explicit programming. The learning process is automated and improved based on the experiences of the machines throughout the process. Good quality data is fed to the machines, and different algorithms are used to build ML models to train the machines on this data. The choice of algorithm depends on the type of data at hand, and the type of activity that needs to be automated.
Traditional programming vs Machine learning
In traditional programming, we would feed the input data and program into a machine to generate output. While in machine learning, we feed input data along with the output into the machine during the learning phase, and it works out a program for itself.
Types of Machine Learning
- Supervised Machine Learning
- unsupervised Machine Learning
- Reinforcement Learning
1. Supervised Machine Learning
Here, the training data we feed to the algorithm includes the desired solutions, or labels.
A typical supervised learning task is classification. The spam filter is a good example of this: it is trained with many example emails along with their labels (spam or not spam), and the machine must learn how to classify new emails.
Another typical task is to predict a target numeric value, such as the price of a car, given a set of features. This sort of task is called regression. To train the system, you need to give it many examples of cars, including both their predictors and their labels (their prices).
Some regression algorithms can be used for classification as well, and vice versa
Here are some of the most important supervised learning algorithms:
• k-Nearest Neighbors
• Linear Regression
• Logistic Regression
• Support Vector Machines (SVMs)
• Decision Trees and Random Forests
• Neural networks
2. Unsupervised Machine Learning
Here, as you might guess, the training data is unlabeled. The system tries to learn without a teacher.
Some unsupervised learning techniques:
- Clustering:
For example, say we have a lot of data about customers. We may want to run a clustering algorithm to try to detect groups of similar customers . At no point do we tell the algorithm which group a customer belongs to: it finds those connections without our help. For example, it might notice that 25% of customers are Females and generally shop during the evening, while 40% are young worker who visit during the weekends, and so on.
2. Anomaly detection and novelty detection:
For example, detecting unusual Logins to some account to prevent fraud, automatically removing outliers from a dataset before feeding it to another learning algorithm, or catching manufacturing defects. The system is shown mostly normal data during training, so it learns to recognize them and when it sees a new data it can tell whether it looks like a normal one or whether it is likely an anomaly.
A very similar task is novelty detection: the difference is that novelty detection algorithms expect to see only normal data during training, while anomaly detection algorithms are usually more tolerant, they can often perform well even with a small percentage of outliers in the training set.
3. Visualization and dimensionality reduction:
Visualization algorithms are also good examples of unsupervised learning algorithms: we feed them a lot of complex and unlabeled data, and the output will be a 2D or 3D representation of our data that can easily be plotted. These algorithms try to preserve as much structure as they can , so you can understand how the data is organized and perhaps identify unsuspected patterns.
4. Association rule learning:
Another common unsupervised task is association rule learning, in which the goal is to dig into large amounts of data and discover interesting relations between attributes. For example, suppose we own a mart. Running an association rule on our sales logs may reveal that people who purchase barbecue sauce and potato chips also tend to buy steak. Thus, we may want to place these items close to each other.
Some of the most important unsupervised learning algorithms
• Clustering
— K-Means
— DBSCAN
— Hierarchical Cluster Analysis (HCA)
• Anomaly detection and novelty detection
— One-class SVM
— Isolation Forest
• Visualization and dimensionality reduction
— Principal Component Analysis (PCA)
— Kernel PCA
— Locally-Linear Embedding (LLE)
— t-distributed Stochastic Neighbor Embedding (t-SNE)
• Association rule learning
— Apriori
— Eclat
3. Reinforcement Learning
Reinforcement learning is probably the closest to how we as humans learn. The learning system, called an agent, can observe the environment, select and perform actions, and get rewards, in return. It must then learn by itself what is the best strategy, called a policy, to get the most reward over time. A policy defines what action the agent should choose when it is in a given situation.
Common algorithms include Q-Learning, Temporal Difference, etc.