Why we need Machine Learning…???. The role of machine learning really supports the development of the current digitalization era, assists the decision-making process, provides information through highly interactive visualizations and able to predict events by comparing accuracy values. Machine learning can also execute job automation, perform big data-based model analysis with good accuracy. Figure 1 shows the illustration of Machine Learning Workflow.
In the 90s, Arthur Samuel defined machine learning as follows: "It is a field of study that gives the ability to the computer for self-learn without being explicitly programmed". This means that machine learning is able to provide knowledge to itself, to learn independently without a special programming language
Figure. 1. Machine Learning Workflow
Another supervised learning method is Classification. Classification is a method to analyze the relationship between several predictor variables and one response variable / label. Classification solves the problem of identifying the category to which a new data point belong. This is used extensively in spam identification, face recognition, recommendation engines, and so on. The algorithms for data classification will come up with the right criteria to separate the given data into the given number of classes.
Figure 2 shows a graphical summary of Supervised Learning.
Figure. 2. Supervised Learning
The next type is Unsupervised Learning, in contrast to supervised learning, this method does not require a label/outcome (y). The most commonly used unsupervised learning in machine learning is Cluster analysis. Clustering is used to perform 'grouping' based on the same characteristic information, another benefit are very useful to reduce the number of problem sizes and complexity for data mining methods (dimension reduction) (). See for an illustration of Unsupervised Learning using Cluster Analysis.
Figure. 3. Unsupervised Learning
The purpose of cluster analysis is to group similar objects together. Objects that have a smaller distance will be considered the same or similar in one group compared to other objects with a greater distance (Gower et al., 2010).
There are some commonly used distances:
p = many measurement variables
From the explanation above, hopefully it can give value added to our insight about criteria for machine learning methods. See you in the next chapter.
Jamilatuzzahro, Caraka R, Riki H. 2018. Aplikasi Generalized Linear Model pada R. Yogyakarta: Innosain
Caraka RE, Lee Y, Chen RC, et al (2021) Cluster around Latent Variable for Vulnerability towards Natural Hazards, Non-Natural Hazards, Social Hazards in West Papua. IEEE Access 9:1972–1986. https://doi.org/10.1109/ACCESS.2020.3038883
Gower J, Lubbe S, Roux N le (2010) Biplot Basics. In: Understanding Biplots. pp 11–66