摘要:Laplace 用於 Laplace 分佈的概率統計與隨機取樣。
本文分享自華為雲社群《Laplace分佈運算元開發經驗分享》,作者:李長安。
詳細描述:
Laplace 用於 Laplace 分佈的概率統計與隨機取樣, 此任務的目標是在 Paddle 框架中,基於現有概率分佈方案進行擴充套件,新增 Laplace API,呼叫路徑為:paddle.distribution.Laplace 。類簽名及各個方法簽名,請通過調研 Paddle 及業界實現慣例進行設計。要求程式碼風格及設計思路與已有概率分佈保持一致。
實際上說了一大堆,就是一件事:實現Laplace分佈運算元,那麼首先我們需要知道什麼是 Laplace 分佈,在概率論和統計學中,拉普拉斯分佈是一種連續概率分佈。由於它可以看作是兩個不同位置的指數分佈背靠背拼在一起,所以它也叫雙指數分佈。與正態分佈對比,正態分佈是用相對於μ平均值的差的平方來表示,而拉普拉斯概率密度用相對於差的絕對值來表示。如下面的程式碼所示,Laplace 分佈的影象和正態分佈實際上是有點類似的,所以它的公式也與正態分佈的公式類似的。
%matplotlib inline import matplotlib.pyplot as plt import numpy as np def laplace_function(x, lambda_): return (1/(2*lambda_)) * np.e**(-1*(np.abs(x)/lambda_)) x = np.linspace(-5,5,10000) y1 = [laplace_function(x_,1) for x_ in x] y2 = [laplace_function(x_,2) for x_ in x] y3 = [laplace_function(x_,0.5) for x_ in x] plt.plot(x, y1, color='r', label="lambda:1") plt.plot(x, y2, color='g', label="lambda:2") plt.plot(x, y3, color='b', label="lambda:0.5") plt.title("Laplace distribution") plt.legend() plt.show()
設計檔案是我們API設計思路的體現,是整個開發工作中必要的部分。通過上述任務簡介,我們可以知道此API的開發主要為Laplace分佈的開發,需要包含一些相應的方法。首先我們需要弄清楚Laplace分佈的數學原理,這裡建議去維基百科檢視Laplace分佈的數學原理,弄明白數學原理。此外,我們可以參考Numpy、Scipy、Pytorch、Tensorflow的程式碼實現,進行設計檔案的撰寫。
首先,我們應該知道Laplace分佈的概率密度函數公式、累積分佈函數、逆累積分佈函數,並且根據公式開發出程式碼,公式如下所示:
參考Numpy、Scipy、Pytorch、Tensorflow的程式碼實現,我們這裡可以很容易的實現公式對應的程式碼,其實現方案如下3.1小節所示。
該 API 實現於 paddle.distribution.Laplace。
基於paddle.distribution API基礎類別進行開發。
class API 中的具體實現(部分方法已完成開發,故直接使用原始碼),該api有兩個引數:位置引數self.loc, 尺度引數self.scale。包含以下方法:
self.loc
(2 ** 0.5) * self.scale;
self.stddev.pow(2)
self.rsample(shape)
self.loc - self.scale * u.sign() * paddle.log1p(-u.abs())
其中 u = paddle.uniform(shape=shape, min=eps - 1, max=1); eps根據dtype決定;
self.log_prob(value).exp()
直接繼承父類別實現
-paddle.log(2 * self.scale) - paddle.abs(value - self.loc) / self.scale
1 + paddle.log(2 * self.scale)
0.5 - 0.5 * (value - self.loc).sign() * paddle.expm1(-(value - self.loc).abs() / self.scale)
self.loc - self.scale * (value - 0.5).sign() * paddle.log1p(-2 * (value - 0.5).abs())
(self.scale * paddle.exp(paddle.abs(self.loc - other.loc) / self.scale) + paddle.abs(self.loc - other.loc)) / other.scale + paddle.log(other.scale / self.scale) - 1
同時在paddle/distribution/kl.py 中註冊_kl_laplace_laplace函數,使用時可直接呼叫kl_divergence計算laplace分佈之間的kl散度。
在我們開發完對應的程式碼後,我們應該如何證明我們所開發出來的程式碼是正確的呢?這時候就需要單元測試的程式碼來證明我們的程式碼是正確的。那麼什麼是單元測試呢?單元測試的用例其實是一個「輸入資料」和「預計輸出」的集合。你需要跟你輸入資料,根據邏輯功能給出預計輸出,這裡所說的根據邏輯功能是指,通過需求檔案就能給出的預計輸出。而非我們通過已經實現的程式碼去推匯出的預計輸出。這也是最容易被忽視的一點。你要去做單元測試,然後還要通過程式碼去推斷出預計輸出,如果你的程式碼邏輯本來就實現錯了,給出的預計輸出也是錯的,那麼你的單元測試將沒有意義。實際上,這部分可以說是整個工作中最重要的部分也是比較難的部分,我們需要想出預計輸出,並且如何通過已經實現的程式碼去推匯出預計輸出,只有單元測試通過了,我們的開發任務才算基本完成了。
根據api類各個方法及特性傳參的不同,把單測分成三個部分:測試分佈的特性(無需額外引數)、測試分佈的概率密度函數(需要傳值)以及測試KL散度(需要傳入一個範例)。
1、測試Lapalce分佈的特性
2、測試Lapalce分佈的概率密度函數
3、測試Lapalce分佈之間的KL散度
程式碼的開發主要參考Pytorch,此處涉及到單元測試程式碼的開發,kl散度註冊等程式碼,需要仔細閱讀PaddlePaddle中其他分佈程式碼的實現形式。
import numbers import numpy as np import paddle from paddle.distribution import distribution from paddle.fluid import framework as framework class Laplace(distribution.Distribution): r""" Creates a Laplace distribution parameterized by :attr:`loc` and :attr:`scale`. Mathematical details The probability density function (pdf) is .. math:: pdf(x; \mu, \sigma) = \frac{1}{2 * \sigma} * e^{\frac {-|x - \mu|}{\sigma}} In the above equation: * :math:`loc = \mu`: is the location parameter. * :math:`scale = \sigma`: is the scale parameter. Args: loc (scalar|Tensor): The mean of the distribution. scale (scalar|Tensor): The scale of the distribution. name(str, optional): Name for the operation (optional, default is None). For more information, please refer to :ref:`api_guide_Name`. Examples: .. code-block:: python import paddle m = paddle.distribution.Laplace(paddle.to_tensor([0.0]), paddle.to_tensor([1.0])) m.sample() # Laplace distributed with loc=0, scale=1 # Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True, # [3.68546247]) """ def __init__(self, loc, scale): if not isinstance(loc, (numbers.Real, framework.Variable)): raise TypeError( f"Expected type of loc is Real|Variable, but got {type(loc)}") if not isinstance(scale, (numbers.Real, framework.Variable)): raise TypeError( f"Expected type of scale is Real|Variable, but got {type(scale)}" ) if isinstance(loc, numbers.Real): loc = paddle.full(shape=(), fill_value=loc) if isinstance(scale, numbers.Real): scale = paddle.full(shape=(), fill_value=scale) if (len(scale.shape) > 0 or len(loc.shape) > 0) and (loc.dtype == scale.dtype): self.loc, self.scale = paddle.broadcast_tensors([loc, scale]) else: self.loc, self.scale = loc, scale super(Laplace, self).__init__(self.loc.shape) @property def mean(self): """Mean of distribution. Returns: Tensor: The mean value. """ return self.loc @property def stddev(self): """Standard deviation. The stddev is .. math:: stddev = \sqrt{2} * \sigma In the above equation: * :math:`scale = \sigma`: is the scale parameter. Returns: Tensor: The std value. """ return (2**0.5) * self.scale @property def variance(self): """Variance of distribution. The variance is .. math:: variance = 2 * \sigma^2 In the above equation: * :math:`scale = \sigma`: is the scale parameter. Returns: Tensor: The variance value. """ return self.stddev.pow(2) def _validate_value(self, value): """Argument dimension check for distribution methods such as `log_prob`, `cdf` and `icdf`. Args: value (Tensor|Scalar): The input value, which can be a scalar or a tensor. Returns: loc, scale, value: The broadcasted loc, scale and value, with the same dimension and data type. """ if isinstance(value, numbers.Real): value = paddle.full(shape=(), fill_value=value) if value.dtype != self.scale.dtype: value = paddle.cast(value, self.scale.dtype) if len(self.scale.shape) > 0 or len(self.loc.shape) > 0 or len( value.shape) > 0: loc, scale, value = paddle.broadcast_tensors( [self.loc, self.scale, value]) else: loc, scale = self.loc, self.scale return loc, scale, value def log_prob(self, value): """Log probability density/mass function. The log_prob is .. math:: log\_prob(value) = \frac{-log(2 * \sigma) - |value - \mu|}{\sigma} In the above equation: * :math:`loc = \mu`: is the location parameter. * :math:`scale = \sigma`: is the scale parameter. Args: value (Tensor|Scalar): The input value, can be a scalar or a tensor. Returns: Tensor: The log probability, whose data type is same with value. Examples: .. code-block:: python import paddle m = paddle.distribution.Laplace(paddle.to_tensor([0.0]), paddle.to_tensor([1.0])) value = paddle.to_tensor([0.1]) m.log_prob(value) # Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True, # [-0.79314721]) """ loc, scale, value = self._validate_value(value) log_scale = -paddle.log(2 * scale) return (log_scale - paddle.abs(value - loc) / scale) def entropy(self): """Entropy of Laplace distribution. The entropy is: .. math:: entropy() = 1 + log(2 * \sigma) In the above equation: * :math:`scale = \sigma`: is the scale parameter. Returns: The entropy of distribution. Examples: .. code-block:: python import paddle m = paddle.distribution.Laplace(paddle.to_tensor([0.0]), paddle.to_tensor([1.0])) m.entropy() # Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True, # [1.69314718]) """ return 1 + paddle.log(2 * self.scale) def cdf(self, value): """Cumulative distribution function. The cdf is .. math:: cdf(value) = 0.5 - 0.5 * sign(value - \mu) * e^\frac{-|(\mu - \sigma)|}{\sigma} In the above equation: * :math:`loc = \mu`: is the location parameter. * :math:`scale = \sigma`: is the scale parameter. Args: value (Tensor): The value to be evaluated. Returns: Tensor: The cumulative probability of value. Examples: .. code-block:: python import paddle m = paddle.distribution.Laplace(paddle.to_tensor([0.0]), paddle.to_tensor([1.0])) value = paddle.to_tensor([0.1]) m.cdf(value) # Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True, # [0.54758132]) """ loc, scale, value = self._validate_value(value) iterm = (0.5 * (value - loc).sign() * paddle.expm1(-(value - loc).abs() / scale)) return 0.5 - iterm def icdf(self, value): """Inverse Cumulative distribution function. The icdf is .. math:: cdf^{-1}(value)= \mu - \sigma * sign(value - 0.5) * ln(1 - 2 * |value-0.5|) In the above equation: * :math:`loc = \mu`: is the location parameter. * :math:`scale = \sigma`: is the scale parameter. Args: value (Tensor): The value to be evaluated. Returns: Tensor: The cumulative probability of value. Examples: .. code-block:: python import paddle m = paddle.distribution.Laplace(paddle.to_tensor([0.0]), paddle.to_tensor([1.0])) value = paddle.to_tensor([0.1]) m.icdf(value) # Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True, # [-1.60943794]) """ loc, scale, value = self._validate_value(value) term = value - 0.5 return (loc - scale * (term).sign() * paddle.log1p(-2 * term.abs())) def sample(self, shape=()): """Generate samples of the specified shape. Args: shape(tuple[int]): The shape of generated samples. Returns: Tensor: A sample tensor that fits the Laplace distribution. Examples: .. code-block:: python import paddle m = paddle.distribution.Laplace(paddle.to_tensor([0.0]), paddle.to_tensor([1.0])) m.sample() # Laplace distributed with loc=0, scale=1 # Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True, # [3.68546247]) """ if not isinstance(shape, tuple): raise TypeError( f'Expected shape should be tuple[int], but got {type(shape)}') with paddle.no_grad(): return self.rsample(shape) def rsample(self, shape): """Reparameterized sample. Args: shape(tuple[int]): The shape of generated samples. Returns: Tensor: A sample tensor that fits the Laplace distribution. Examples: .. code-block:: python import paddle m = paddle.distribution.Laplace(paddle.to_tensor([0.0]), paddle.to_tensor([1.0])) m.rsample((1,)) # Laplace distributed with loc=0, scale=1 # Tensor(shape=[1, 1], dtype=float32, place=Place(cpu), stop_gradient=True, # [[0.04337667]]) """ eps = self._get_eps() shape = self._extend_shape(shape) or (1, ) uniform = paddle.uniform(shape=shape, min=float(np.nextafter(-1, 1)) + eps / 2, max=1. - eps / 2, dtype=self.loc.dtype) if len(self.scale.shape) == 0 and len(self.loc.shape) == 0: loc, scale, uniform = paddle.broadcast_tensors( [self.loc, self.scale, uniform]) else: loc, scale = self.loc, self.scale return (loc - scale * uniform.sign() * paddle.log1p(-uniform.abs())) def _get_eps(self): """ Get the eps of certain data type. Note: Since paddle.finfo is temporarily unavailable, we use hard-coding style to get eps value. Returns: Float: An eps value by different data types. """ eps = 1.19209e-07 if (self.loc.dtype == paddle.float64 or self.loc.dtype == paddle.complex128): eps = 2.22045e-16 return eps def kl_divergence(self, other): """Calculate the KL divergence KL(self || other) with two Laplace instances. The kl_divergence between two Laplace distribution is .. math:: KL\_divergence(\mu_0, \sigma_0; \mu_1, \sigma_1) = 0.5 (ratio^2 + (\frac{diff}{\sigma_1})^2 - 1 - 2 \ln {ratio}) .. math:: ratio = \frac{\sigma_0}{\sigma_1} .. math:: diff = \mu_1 - \mu_0 In the above equation: * :math:`loc = \mu`: is the location parameter of self. * :math:`scale = \sigma`: is the scale parameter of self. * :math:`loc = \mu_1`: is the location parameter of the reference Laplace distribution. * :math:`scale = \sigma_1`: is the scale parameter of the reference Laplace distribution. * :math:`ratio`: is the ratio between the two distribution. * :math:`diff`: is the difference between the two distribution. Args: other (Laplace): An instance of Laplace. Returns: Tensor: The kl-divergence between two laplace distributions. Examples: .. code-block:: python import paddle m1 = paddle.distribution.Laplace(paddle.to_tensor([0.0]), paddle.to_tensor([1.0])) m2 = paddle.distribution.Laplace(paddle.to_tensor([1.0]), paddle.to_tensor([0.5])) m1.kl_divergence(m2) # Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True, # [1.04261160]) """ var_ratio = other.scale / self.scale t = paddle.abs(self.loc - other.loc) term1 = ((self.scale * paddle.exp(-t / self.scale) + t) / other.scale) term2 = paddle.log(var_ratio) return term1 + term2 - 1
目前,該API已經鎖定貢獻。回顧API的開發過程,實際上該API的開發並不難,主要的問題在於如何進行單元測試,證明開發的API是正確的,並且還有一些相關的細節點,比如KL散度的註冊等。還有就是最開始走了彎路,參照了Normal的開發風格,將API寫成了2.0風格的,影響了一些時間,並且在最後的單測中,發現了Uniform實現方式的一些Bug,此處Debug花費了一些時間,整體來看,花時間的部分是在單測部分。