Machine Learning Terminologies Demystified.

Chibuzo Ugonabo
The Startup
Published in
6 min readAug 6, 2020

--

Till this day, my favorite definition of a Machine is ; something that makes work easier. At its simplest, a machine is an invention that does a job better and faster and more powerfully than a human being. With regards to machine learning, this is the why. There is a need to preform a task more efficiently and at a faster rate. What is the task? to make decisions. Hence what then is Machine learning??

Before I answer that, a quick introduction. In my journey to becoming a data scientist, I found myself having to learn a lot of new terminologies. Even certain terms that already existed in my vocabulary, took on a new meaning. A lot of these terminologies can be wordy and somewhat intimidating. My aim in this write up is to provide as much as possible layman definitions for the basic terminologies associated with machine learning that I have come across.

Data science in its essence is the skill of using information available to gain insight and improve processes. It does this using a blend of machine learning algorithms, statistics, business intelligence, and programming. It aims to discover patterns from the raw data, which in turn provides insights into any processes.

Now back to the question, what is machine learning?

Machine learning is a field in technology that allows machine to learn from data and self improve. Machine-learning algorithms use statistics and other mathematical tools to find patterns in data.

Machine Learning can be separated into three groups:

Supervised learning, is a type of machine learning, where data is labeled to tell the machine exactly what patterns it should look for. Under the umbrella of supervised learning:

  • Classification: In classification tasks, the machine learning program must draw a conclusion from observed values and determine to
    what category new observations belong
  • Regression: In regression tasks, the machine learning program must estimate and understand the relationships among variables.Regression analysis focuses on one dependent variable and a series of other changing variables.
  • Forecasting: Forecasting is the process of making predictions about the future based on the past and present data,

Unsupervised learning, here the data has no labels. The machine just looks for whatever patterns it can find.Under the umbrella of Unsupervised learning:

  • Clustering: Clustering involves grouping sets of similar data (based on defined criteria).After which you can analyze and find patterns
  • Dimension reduction: Dimension reduction reduces the number of variables being considered to find the exact information required.

Reinforcement learning, learns by trial and error to achieve a clear objective. It tries out lots of different things and is rewarded or penalized depending on whether its behaviors help or hinder it from reaching its objective.

Machine learning Algorithm

An ‘algorithm’ is a series of steps to complete a task.

An algorithm in machine learning is a procedure that is run on data to create a machine learning “model.

Machine learning algorithms perform “pattern recognition.” Algorithms “learn” from data, or are “fit” on a dataset.

A “Model” in machine learning is the output of a machine learning algorithm run on data.

A model represents what was learned by a machine learning algorithm.

Popular Machine Learning Algorithms

  • Linear regression (Supervised Learning/Regression): Linear regression is the most basic type of regression. Simple linear regression allows us to understand the relationships between two continuous variables.
  • Logistic regression (Supervised learning — Classification): Logistic regression focuses on estimating the probability of an event occurring based on the previous data provided. It is used to cover a binary dependent variable, that is where only two values, 0 and 1, represent outcomes.
  • Naive Bayes (Supervised Learning — Classification): The Naïve Bayes classifier is based on Bayes’ theorem and classifies every value as independent of any other value. It allows us to predict a class/category, based on a given set of features, using probability.
  • K-nearest neighbor algorithm (Supervised Learning): The Neighbor algorithm estimates how likely a data point is to be a member of one group or another. It essentially looks at the data points around a single data point to determine what group it is actually in.
  • Decision trees (Supervised Learning — Classification/Regression): A decision tree is a flow-chart-like tree structure that uses a branching method to illustrate every possible outcome of a decision. Each node within the tree represents a test on a specific variable and each branch is the outcome of that test.
  • Random Forests (Supervised Learning — Classification/Regression): Random forest, like its name implies, consists of a large number of individual decision trees that operate as an ensemble. Each individual tree in the random forest spits out a class prediction and the class with the most votes becomes our model’s prediction
  • Support Vector Machines (Supervised Learning — Classification); Support Vector Machine algorithms are supervised learning models that analyze data used for classification and regression analysis. They essentially filter data into categories, which is achieved by providing a set of training examples, each set marked as belonging to one or the other of the two categories. The algorithm then works to build a model that assigns new values to one category or the other.
  • K Means Clustering Algorithm (Unsupervised Learning — Clustering)
    The algorithm works by finding groups within the data, with the number of groups represented by the variable K. It then works iteratively to assign each data point to one of K groups based on the features provided.
  • Artificial Neural Networks (Reinforcement Learning) : An artificial neural network (ANN) comprises ‘units’ arranged in a series of layers, each of which connects to layers on either side. ANNs are inspired by biological systems, such as the brain, and how they process information. ANNs are essentially a large number of interconnected processing elements, working in unison to solve specific problems.

Other useful terminologies when talking about machine learning include:

Ensemble learning method, combining multiple algorithms to generate better results for classification, regression and other tasks. Each individual classifier is weak, but when combined with others, can produce excellent results.

Artificial Intelligence (AI) refers to machines that can learn, reason, and act for themselves. They can make their own decisions when faced with new situations, in the same way that humans and animals can.

Data are characteristics or information that are collected through observation

Data Cleaning refers to the steps needed to take to prepare you data for use. Here you detect incomplete, incorrect, inaccurate or irrelevant data from your dataset and then you choose either to replace, modify, delete or coarse the data as needed

Exploratory data analysis (EDA):This refers to the critical process of performing initial investigations on data so as to discover patterns, to spot anomalies, to test hypothesis and to check assumptions with the help of summary statistics and graphical representations.

Training data is the main and most important data which helps machines to learn and make the predictions. This data set is used by machine learning engineer to develop your algorithm and more than 70% of your total data used in the project.

Validation Data is the second type of data set used to validate the machine learning model before final delivery of project. ML model validation is important to ensure the accuracy of model prediction to develop a right application. Using this type of data helps to know whether model can correctly identify the new examples or not.

Testing data is the final and last type of data helps to check the prediction level of machine learning and AI model.

The world of machine learning and data science is vast and ever growing. It is easy to view it as an insurmountable endeavor. I’ll like to encourage anyone at wishing to take the path down this road not to be intimidated. A lot of these terminologies only sound incomprehensible but once you discover its very essence, everything becomes clear. Again, good things take time and great ones take even more time, so do not weary and keep pushing forward.

--

--