Distance Measures for Machine Learning

Lera Tsayukova
5 min readJun 30, 2021

--

What is a distance measure?

  • A distance measure is simply a means of calculation between two points or objects. An objective score that summarizes the relative difference between two objects in a domain space.

Why is it important to know what a distance measure is?

-Distance measures have many applications in machine learning, for both supervised learning and unsupervised learning.

Two such instances are:

  1. K Nearest Neighbors (supervised)
  2. K-Clustering (unsupervised)

In the KNN algorithm, a prediction is made for new points by calculating the distance between the new example (in this case row) and all examples (other rows) in the dataset.

  1. Euclidean Metric

Euclidian distance takes two points and determines the shortest distance by measuring the hypotenuse of the triangle between those points.

between two elements

and

is given by

Where Euclidian Distance usage :

  • L2 Normalization
  • Ridge
  • Root Mean Squared Error

The calculation for Euclidian Distance is simple. Measure the shortest distance between two points. However, this metric is not applicable in all scenarios. Imagine the instance below: You are trying to get from point 1 to point 2. Now, unless you are a bird or a genius with a hovercraft board, Euclidian distance is not a practical calculation for the the shortest distance to get from p1 to p2. Why?

Euclidian distance in this example measures A, which is given by the formula above.

Portland Grid (image from Google Earth) using Euclidian Metric

Euclidian Distance in this example measures A, the hypotenuse of the triangle between two points p1 and p2.

If we can imagine the origin point p1 as being (0,0) and the destination point p2 as (3,4) then we can calculate the Euclidian distance as follow:

Euclidian Distance formula for (0,0) (3,4)

Naturally, no one wants to calculate these formulas by hand so below is an example of the SciPy way of handling this:

Calculating Euclidian Distance using SciPy

2)Manhattan Metric

The sum of the absolute differences between points across all dimensions.

Given by the Formula:

Manhattan Distance Formula

Manhattan Distance usage:

  • L1 Normalization
  • Lasso
  • Mean Absolute Distance
Portland Grid (image from Google Earth) using Manhattan Metric

Manhattan Metric
- This metric does not use diagonals, only horizontal and vertical measures.

Here the distance is calculated by B and C.

If we can imagine the origin point p1 as being (0,0) and the destination point p2 as (3,4) then we can calculate the Manhattan distance as follow:

||X||= |3–0| +|4–0| = 7

As you can see in the examples above, the Manhattan Distance is calculated in a way that replicated how a taxicab drives between Manhattan city blocks to reach its final destination!

Again, here is the SciPy example:

Calculating Manhattan Distance using SciPy

3)Minkowski Metric

The generalized form of Euclidean and Manhattan Distance.

German mathematician Hermann Minkowski

We typically think about distance when it comes to measuring the distance between or within cities. The Minkowski distance measures distance “in a space where distanced can be represented as a vector that has length.”

Naturally, one wonders does a map have vector space? Let’s use this definition and start by going back to our map example. The distance between the two points is the vector A, which connects p1 and p2. We could also combine multiple vectors to create a route that connects more than two points along the route (imagine an Uber Eats ride that makes another stop between the pickup point and your house, no priority delivery here). The normed part of the vector simply means that it has a length and a vector wouldn’t be a vector if it had no length or negative length. So in this sense, we have met the requirements of the definition.

Minkowski Distance

p = 1, Manhattan Distance

p = 2, Euclidean Distance

p = ∞, Chebychev Distance

In using the formula for Minkowski distance, we simply insert p=1 for Manhattan or p=2 for Euclidian Distance. Although it is technically defined for any number, p is rarely used for except for 1, 2, or infinity.

Minkowski Distance measurement

The way distances are measured by the Minkowski metric of different orders between two objects with three variables ( In the image is displayed in a coordinate system with x, y, z-axises).

Again, let us cheat a bit and employ Minkowski Method in SciPy:

Calculating Minkowski Method using SciPy

--

--

Lera Tsayukova
Lera Tsayukova

Written by Lera Tsayukova

Data Scientist | Data Analyst | Machine Learning Engineer

No responses yet