Edit Template

CL002 – Essential Functions for ML

A formal definition of functions. And when do we call a function linear, affine, or quadratic?

What are Functions?

A function is defined between two sets, \(A\) and \(B\). It is just a mapping with some unique properties. We write \(f: A \rightarrow B\) to mean that \(f\) is a function on the set \(\mathcal{D} \subseteq A\) into the set \(\mathcal{R} \subseteq B\). The set \(\mathcal{D}\) is the domain, the set \(B\) is the codomain, and the set \(\mathcal{R}\) is the range of the function.

  • Domain: The set \(\mathcal{D}\) contains all inputs \(\mathbf{x}\) for which the function is defined. In particular, we can have \(\mathcal{D}\) a proper subset of the set \(A\).

  • Codomain: The set \(B\) contains all potential outputs of the function, even if some elements of \(B\) are never actually attained by \(f\).

  • Range: The set \(\mathcal{R}\) contains the actual outputs attained by \(f\).

Thus, the notation \(f: \mathbb{R}^n \to \mathbb{R}\) means that \(f\) maps (some) \(n-\) vectors into (some) \(\mathbb{R}\); it does not mean that \(f(\mathbf{x})\) is defined for every \(\mathbf{x} \in \mathbb{R}^n\), and the \(f\) outputs all scalar in \(\mathbb{R}\).

Scalar-valued functions:

Let \(\mathcal{D} \subseteq \mathbb{R}^n\). A real valued scalar function \(f: \mathbb{R}^n \to \mathbb{R}\) on \(\mathcal{D}\) is a rule that assigns a real number

\[y = f(x_1, \dots, x_n), \, x_i \in \mathbb{R}, \, \text{ for } i = 1,\dots,n\]

for all element in \(\mathcal{D}\). Here \(\mathbf{x} = (x_1, \dots, x_n)\) are independent variables and \(y\) is a dependent variable.

Vector-valued functions:

A function \(f\) that carries every element of \(\mathcal{D}\) to some element in \(\mathbb{R}^m\), \(f: \mathbb{R}^n \to \mathbb{R}^m\), that is, the function takes a \(n-\) vector and outputs an \(m-\) vector.

  • The domain of \(f\) is \(\mathcal{D}\) which is a set of all points \(\mathbf{x}\) in \(\mathbb{R}^n\) for which \(f(\mathbf{x})\) is defined.

  • The range of \(f\) is \(\mathcal{R}\) which is a set of all points \(y\) in \(\mathbb{R}\) (in case of scalar-valued functions) or \(\mathbf{y}\) in \(\mathbb{R}^m\) (in case of vector-valued functions) such that \(\mathbf{y}=f(\mathbf{x})\) for all \(\mathbf{x} \in \mathcal{D}\).

Properties of Functions

Let \(f: \mathbb{R}^n \to \mathbb{R}\)

  • The graph of \(f\) is defined as

\[gr(f) \equiv \{(x_1, \dots, x_n,y) \,|\, \mathbf{x} \in \mathcal{D}, y=f(\mathbf{x})\}\]
  • The epi-graph of \(f\) is the set of all points including the graph and above the function.

    \[epi(f) \equiv \{(x_1, \dots, x_n,y) \,|\, \mathbf{x} \in \mathcal{D}, y \geq f(\mathbf{x})\}\]
  • The level-set (contour) of \(f\) at some level \(\alpha\) is the set of points in the domain of the function where the function takes on a constant value \(\alpha\)

    \[\mathcal{L}_f(\alpha) \equiv \{\mathbf{x} \in \mathcal{D} \,|\, f(\mathbf{x}) = \alpha\}\]

Example 1:

Consider the function \(f(x_1, x_2) = x_1^2 + x_2^2\). Cut the \(f(\mathbf{x})\) axis at \(\alpha=1\). And look for the points that achieve \(f(\mathbf{x}) = \alpha\), that is, \(x_1^2 + x_2^2=1\) to get the level set corresponding to value \(\alpha\). It is a subset of the domain.

contour 01
Figure 1. Function \(f(\mathbf{x}) = x_1^2 + x_2^2\) (left) and the contour curves corresponding to \(\alpha = 1,4,\dots\) (right).

\(\mathcal{L}_f(4)\) is the set of all points on the boundary of the circle of radius 2. As we increase \(\alpha\), we get concentric circles.

Example 2:

Consider the function \(f(x_1, x_2) = \cos x_1 \cdot \sin x_2\).

contour 02
Figure 2. Function \(f(x_1, x_2) = \cos x_1 \cdot \sin x_2\) (left) and the contour curves corresponding to various \(\alpha\) (right). Image Source: [KhanAcademy].
Caution
Peaks and valleys can easily look very similar on a contour plot, and can only be distinguished by reading the labels.

Example 3:

Consider a linear function \(f(x_1, x_2) = x_1 + 2x_2\)

contour 03
Figure 3. Function \(f(x_1, x_2) = x_1 + 2x_2\) (left) and the contour lines corresponding to various \(\alpha\) (right). Image Source: [KhanAcademy]

For functions of two variables, contour plots make it much easier to understand the "shape" of the function. From a contour plot

  • We can tell how steep a portion of the graph is by how close the contour curves are to one another. When they are far apart, it takes a lot of lateral distance to increase the function value, but when they are close, the function value increases quickly for small lateral increments.

  • The level sets associated with heights that approach a peak of the graph will look like smaller and smaller closed loops, each one encompassing the next. We can spot the maximum or minimum of a function using its contour plot.

Linear Functions

A function \(f: \mathbb{R}^n \to \mathbb{R}\) is said to be linear if it satifies

  • \(f(\mathbf{x}_1 + \mathbf{x}_2)= f(\mathbf{x}_1) + f(\mathbf{x}_2)\) for any \(\mathbf{x}_1, \mathbf{x}_2\) in the domain of the function.

  • For any \(\alpha \in \mathbb{R}\), \(f(\alpha \mathbf{x}) = \alpha f(\mathbf{x})\).

Example 01:

Consider the function \(f(\mathbf{x}) = w_1x_1 + w_2x_2 + \dots + w_n x_n = \mathbf{w}^\top \mathbf{x}\) where \(x_1, \dots, x_n\) are the elements of the vector \(\mathbf{x}\) and \(w_i \in \mathbb{R}\) for \(i=1,\dots,n\) are the elements of the vector \(\mathbf{w}\). Note here that \(\mathbf{w}\) is a constant and \(\mathbf{x}\) is a variable. Prove that \(f\) is a linear function in terms of \(\mathbf{x}\).

The domain of this function is \(\mathbb{R}^n\). Let \(\mathbf{x}_1 = (x_{11}, \dots, x_{1n}), \mathbf{x}_2 = (x_{21}, \dots, x_{2n}) \in \mathbb{R}^n\). Then, \(\mathbf{x}_1 + \mathbf{x}_2 = (x_{11} + x_{21}, \dots, x_{1n} + x_{2n}) \in \mathbb{R}^n\).

\[\begin{align*} f(\mathbf{x}_1 + \mathbf{x}_2) & = w_1 (x_{11} + x_{21}) + \dots + w_n (x_{1n} + x_{2n}) \\ & = w_1 x_{11} + \dots + w_n x_{1n} + w_2 x_{21} + \dots + w_n x_{2n} \\ & = f(\mathbf{x}_1) + f(\mathbf{x}_2) \\ \\ f(\alpha \mathbf{x}) &= w_1 \alpha x_1 + \dots + w_n \alpha x_n \\ & = \alpha (w_1 x_1 + \dots + w_n x_n) \\ & = \alpha f(\mathbf{x}) \end{align*}\]

This function satisfies both the conditions, hence it is a linear function in \(\mathbf{x}\).

  • When \(n=1\), then the function is \(f(x) = w_1x\), which is a straight line passing through the origin.

  • when \(n=2\), then the function is \(f(\mathbf{x}) = w_1x_1 + w_2x_2\), which is a plane passing through the origin.

    linear functions 01
    Figure 4. Linear functions \(f(x) = w_1x\) (left) and \(f(\mathbf{x}) = w_1x_1 + w_2x_2\) (right).
  • When \(n=3\), then the function is \(f(\mathbf{x}) = w_1x_1 + w_2x_2 + w_3x_3\). We can’t visualize this graph as it should be in 4D.

The definition of linear functions can be extended to vector-valued functions \(f: \mathbb{R}^n \to \mathbb{R}^m\), where each component \(f_i(\mathbf{x})\) is a scalar linear function. The function takes a vector \(\mathbf{x} \in \mathbb{R}^n\) and outputs a vector \(\mathbf{y} \in \mathbb{R}^m\).

\[\begin{align*} \mathbf{y} = f(\mathbf{x}) = \begin{bmatrix} f_1(\mathbf{x}) \\ f_2(\mathbf{x}) \\ \dots \\ f_m(\mathbf{x}) \\ \end{bmatrix} & = \begin{bmatrix} a_{11}x_1 + \dots + a_{1n}x_n \\ a_{21}x_1 + \dots + a_{2n}x_n \\ \dots \\ a_{m1}x_1 + \dots + a_{mn}x_n \\ \end{bmatrix} \\ \\ & = \begin{bmatrix} a_{11} & \dots & a_{1n} \\ a_{21} & \dots & a_{2n} \\ \vdots & \dots & \vdots \\ a_{m1} & \dots & a_{mn} \\ \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ \dots \\ x_n \\ \end{bmatrix} \\ & = A\mathbf{x} \end{align*}\]

This function also satisfies both the conditions and thus it is a linear function in \(\mathbf{x}\).

Warning
A linear function has no intercept, it passes through the origin.

Affine Functions

We say a function \(g: \mathbb{R}^n \rightarrow \mathbb{R}^m\) is affine if there is a linear function \(f: \mathbb{R}^n \rightarrow \mathbb{R}^m\) and a vector \(\mathbf{b} \in \mathbb{R}^m\) such that \(g(\mathbf{x}) = f(\mathbf{x}) + \mathbf{b}\) for all \(\mathbf{x} \in \mathbb{R}^n\). An affine function is just a linear function plus an intercept.

For example

  • Scalar-valued function: \(g(\mathbf{x}) = w_1x_1 + w_2x_2 + \dots + w_n x_n + b = \mathbf{w}^\top \mathbf{x} + b\), where \(b \in \mathbb{R}\).

  • Vector-valued function: \(g(\mathbf{x}) = \mathbf{Ax} + \mathbf{b}\), where \(b \in \mathbb{R}^m\).

Quadratic Functions

A quadratic function is a polynomial function of degree 2. A scalar-valued function \(f: \mathbb{R}^n \rightarrow \mathbb{R}\) is said to be a quadratic function when it is of the form

\[f(\mathbf{x}) = \frac{1}{2} \mathbf{x}^\top \mathbf{Q} \mathbf{x} + \mathbf{b}^\top \mathbf{x} + c\]

where

  • \(\mathbf{Q} \in \mathbb{R}^{n \times n}\) is a symmetric matrix.

  • \(\mathbf{b} \in \mathbb{R}^n\) is a vector of coefficients.

  • \(c \in \mathbb{R}\) is a scalar.

This form generalizes the univariate quadratic \(\frac{1}{2} ax^2 + bx + c\). This can be extended to vector-valued functions \(f: \mathbb{R}^n \to \mathbb{R}^m\), where each component \(f_i(\mathbf{x})\) is a scalar quadratic function.

\[\begin{align*} \mathbf{y} = f(\mathbf{x}) = \begin{bmatrix} f_1(\mathbf{x}) \\ f_2(\mathbf{x}) \\ \vdots \\ f_m(\mathbf{x}) \\ \end{bmatrix} & = \begin{bmatrix} \frac{1}{2} \mathbf{x}^\top \mathbf{Q}_1 \mathbf{x} + \mathbf{b}_1^\top \mathbf{x} + c_1 \\ \frac{1}{2} \mathbf{x}^\top \mathbf{Q}_2 \mathbf{x} + \mathbf{b}_2^\top \mathbf{x} + c_2 \\ \vdots \\ \frac{1}{2} \mathbf{x}^\top \mathbf{Q}_m \mathbf{x} + \mathbf{b}_m^\top \mathbf{x} + c_m \\ \end{bmatrix} \end{align*}\]

where for \(i=1, \dots, m\)

  • \(\mathbf{Q}_i \in \mathbb{R}^{n \times n}\) are symmetric matrices.

  • \(\mathbf{b}_i \in \mathbb{R}^n\) are vector of coefficients.

  • \(c_i \in \mathbb{R}\) are scalars.

Note
The factor \(\frac{1}{2}\) in the quadratic form is a convention that simplifies gradient and Hessian expressions. It is not required for the definition of quadratic function, but a matter of convention for cleaner derivatives.

Share Article:

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like:

From Equations to Insights: The Science Behind data-driven AI and ML Models

Quick Links