Distance Measures for Machine Learning
What is a distance measure?
- A distance measure is simply a means of calculation between two points or objects. An objective score that summarizes the relative difference between two objects in a domain space.
Why is it important to know what a distance measure is?
-Distance measures have many applications in machine learning, for both supervised learning and unsupervised learning.
Two such instances are:
- K Nearest Neighbors (supervised)
- K-Clustering (unsupervised)
In the KNN algorithm, a prediction is made for new points by calculating the distance between the new example (in this case row) and all examples (other rows) in the dataset.
- Euclidean Metric
Euclidian distance takes two points and determines the shortest distance by measuring the hypotenuse of the triangle between those points.
between two elements
and
is given by
Where Euclidian Distance usage :
- L2 Normalization
- Ridge
- Root Mean Squared Error
The calculation for Euclidian Distance is simple. Measure the shortest distance between two points. However, this metric is not applicable in all scenarios. Imagine the instance below: You are trying to get from point 1 to point 2. Now, unless you are a bird or a genius with a hovercraft board, Euclidian distance is not a practical calculation for the the shortest distance to get from p1 to p2. Why?
Euclidian distance in this example measures A, which is given by the formula above.
Euclidian Distance in this example measures A, the hypotenuse of the triangle between two points p1 and p2.
If we can imagine the origin point p1 as being (0,0) and the destination point p2 as (3,4) then we can calculate the Euclidian distance as follow:
Naturally, no one wants to calculate these formulas by hand so below is an example of the SciPy way of handling this:
2)Manhattan Metric
The sum of the absolute differences between points across all dimensions.
Given by the Formula:
Manhattan Distance usage:
- L1 Normalization
- Lasso
- Mean Absolute Distance
Manhattan Metric
- This metric does not use diagonals, only horizontal and vertical measures.
Here the distance is calculated by B and C.
If we can imagine the origin point p1 as being (0,0) and the destination point p2 as (3,4) then we can calculate the Manhattan distance as follow:
||X||= |3–0| +|4–0| = 7
As you can see in the examples above, the Manhattan Distance is calculated in a way that replicated how a taxicab drives between Manhattan city blocks to reach its final destination!
Again, here is the SciPy example:
3)Minkowski Metric
The generalized form of Euclidean and Manhattan Distance.
We typically think about distance when it comes to measuring the distance between or within cities. The Minkowski distance measures distance “in a space where distanced can be represented as a vector that has length.”
Naturally, one wonders does a map have vector space? Let’s use this definition and start by going back to our map example. The distance between the two points is the vector A, which connects p1 and p2. We could also combine multiple vectors to create a route that connects more than two points along the route (imagine an Uber Eats ride that makes another stop between the pickup point and your house, no priority delivery here). The normed part of the vector simply means that it has a length and a vector wouldn’t be a vector if it had no length or negative length. So in this sense, we have met the requirements of the definition.
p = 1, Manhattan Distance
p = 2, Euclidean Distance
p = ∞, Chebychev Distance
In using the formula for Minkowski distance, we simply insert p=1 for Manhattan or p=2 for Euclidian Distance. Although it is technically defined for any number, p is rarely used for except for 1, 2, or infinity.
The way distances are measured by the Minkowski metric of different orders between two objects with three variables ( In the image is displayed in a coordinate system with x, y, z-axises).
Again, let us cheat a bit and employ Minkowski Method in SciPy:
Now, that we have covered the Euclidean Distance, the Manhattan Metric, and the Minkowski (Chebychev) Distance, let’s do a short recap:
- Euclidian Distance
- Manhattan Metric
- Minkowski Distance
References: