Laplace分佈運算元開發經驗分享

摘要：Laplace 用於 Laplace 分佈的概率統計與隨機取樣。

本文分享自華為雲社群《Laplace分佈運算元開發經驗分享》，作者：李長安。

1、任務解析

詳細描述：

Laplace 用於 Laplace 分佈的概率統計與隨機取樣，此任務的目標是在 Paddle 框架中，基於現有概率分佈方案進行擴充套件，新增 Laplace API，呼叫路徑為：paddle.distribution.Laplace 。類簽名及各個方法簽名，請通過調研 Paddle 及業界實現慣例進行設計。要求程式碼風格及設計思路與已有概率分佈保持一致。

實際上說了一大堆，就是一件事：實現Laplace分佈運算元，那麼首先我們需要知道什麼是 Laplace 分佈，在概率論和統計學中，拉普拉斯分佈是一種連續概率分佈。由於它可以看作是兩個不同位置的指數分佈背靠背拼在一起，所以它也叫雙指數分佈。與正態分佈對比，正態分佈是用相對於μ平均值的差的平方來表示，而拉普拉斯概率密度用相對於差的絕對值來表示。如下面的程式碼所示，Laplace 分佈的影象和正態分佈實際上是有點類似的，所以它的公式也與正態分佈的公式類似的。

%matplotlib inline 
import matplotlib.pyplot as plt
import numpy as np
def laplace_function(x, lambda_):
 return (1/(2*lambda_)) * np.e**(-1*(np.abs(x)/lambda_))
x = np.linspace(-5,5,10000)
y1 = [laplace_function(x_,1) for x_ in x]
y2 = [laplace_function(x_,2) for x_ in x]
y3 = [laplace_function(x_,0.5) for x_ in x]
plt.plot(x, y1, color='r', label="lambda:1")
plt.plot(x, y2, color='g', label="lambda:2")
plt.plot(x, y3, color='b', label="lambda:0.5")
plt.title("Laplace distribution")
plt.legend()
plt.show()

2、設計檔案撰寫

設計檔案是我們API設計思路的體現，是整個開發工作中必要的部分。通過上述任務簡介，我們可以知道此API的開發主要為Laplace分佈的開發，需要包含一些相應的方法。首先我們需要弄清楚Laplace分佈的數學原理，這裡建議去維基百科檢視Laplace分佈的數學原理，弄明白數學原理。此外，我們可以參考Numpy、Scipy、Pytorch、Tensorflow的程式碼實現，進行設計檔案的撰寫。

首先，我們應該知道Laplace分佈的概率密度函數公式、累積分佈函數、逆累積分佈函數，並且根據公式開發出程式碼，公式如下所示：

參考Numpy、Scipy、Pytorch、Tensorflow的程式碼實現，我們這裡可以很容易的實現公式對應的程式碼，其實現方案如下3.1小節所示。

2.1 API 實現方案

該 API 實現於 paddle.distribution.Laplace。

基於paddle.distribution API基礎類別進行開發。

class API 中的具體實現（部分方法已完成開發，故直接使用原始碼），該api有兩個引數：位置引數self.loc, 尺度引數self.scale。包含以下方法：

mean 計算均值:

 self.loc

stddev 計算標準差:

 (2 ** 0.5) * self.scale;

variance 計算方差:

 self.stddev.pow(2)

sample 隨機取樣(參考pytorch複用重引數化取樣結果):

 self.rsample(shape)

rsample 重引數化取樣:

 self.loc - self.scale * u.sign() * paddle.log1p(-u.abs())

其中 u = paddle.uniform(shape=shape, min=eps - 1, max=1); eps根據dtype決定;

prob 概率密度(包含傳參value):

 self.log_prob(value).exp()

直接繼承父類別實現

log_prob 對數概率密度(value):

 -paddle.log(2 * self.scale) - paddle.abs(value - self.loc) / self.scale

entropy 熵計算:

 1 + paddle.log(2 * self.scale)

cdf 累積分佈函數(value):

 0.5 - 0.5 * (value - self.loc).sign() * paddle.expm1(-(value - self.loc).abs() / self.scale)

icdf 逆累積分佈函數(value):

 self.loc - self.scale * (value - 0.5).sign() * paddle.log1p(-2 * (value - 0.5).abs())

kl_divergence 兩個Laplace分佈之間的kl散度(other–Laplace類的一個範例):

 (self.scale * paddle.exp(paddle.abs(self.loc - other.loc) / self.scale) + paddle.abs(self.loc - other.loc)) / other.scale + paddle.log(other.scale / self.scale) - 1

參考文獻：https://openaccess.thecvf.com/content/CVPR2021/supplemental/Meyer_An_Alternative_Probabilistic_CVPR_2021_supplemental.pdf

同時在paddle/distribution/kl.py 中註冊_kl_laplace_laplace函數，使用時可直接呼叫kl_divergence計算laplace分佈之間的kl散度。

2.2 測試和驗收的考量

在我們開發完對應的程式碼後，我們應該如何證明我們所開發出來的程式碼是正確的呢？這時候就需要單元測試的程式碼來證明我們的程式碼是正確的。那麼什麼是單元測試呢？單元測試的用例其實是一個「輸入資料」和「預計輸出」的集合。你需要跟你輸入資料，根據邏輯功能給出預計輸出，這裡所說的根據邏輯功能是指，通過需求檔案就能給出的預計輸出。而非我們通過已經實現的程式碼去推匯出的預計輸出。這也是最容易被忽視的一點。你要去做單元測試，然後還要通過程式碼去推斷出預計輸出，如果你的程式碼邏輯本來就實現錯了，給出的預計輸出也是錯的，那麼你的單元測試將沒有意義。實際上，這部分可以說是整個工作中最重要的部分也是比較難的部分，我們需要想出預計輸出，並且如何通過已經實現的程式碼去推匯出預計輸出，只有單元測試通過了，我們的開發任務才算基本完成了。

根據api類各個方法及特性傳參的不同，把單測分成三個部分：測試分佈的特性（無需額外引數）、測試分佈的概率密度函數（需要傳值）以及測試KL散度（需要傳入一個範例）。

1、測試Lapalce分佈的特性

測試方法：該部分主要測試分佈的均值、方差、熵等特徵。類TestLaplace繼承unittest.TestCase，分別實現方法setUp（初始化），test_mean（mean單測），test_variance（variance單測），test_stddev（stddev單測），test_entropy（entropy單測），test_sample（sample單測）。

均值、方差、標準差通過Numpy計算相應值，對比Laplace類中相應property的返回值，若一致即正確；
取樣方法除驗證其返回的資料型別及資料形狀是否合法外，還需證明取樣結果符合laplace分佈。驗證策略如下：隨機取樣30000個laplace分佈下的樣本值，計算取樣樣本的均值和方差，並比較同分佈下scipy.stats.laplace返回的均值與方差，檢查是否在合理誤差範圍內；同時通過Kolmogorov-Smirnov test進一步驗證取樣是否屬於laplace分佈，若計算所得ks值小於0.02，則拒絕不一致假設，兩者屬於同一分佈；
熵計算通過對比scipy.stats.laplace.entropy的值是否與類方法返回值一致驗證結果的正確性。

測試用例：單測需要覆蓋單一維度的Laplace分佈和多維度分佈情況，因此使用兩種初始化引數

‘one-dim’: loc=parameterize.xrand((2, )), scale=parameterize.xrand((2, ));
‘multi-dim’: loc=parameterize.xrand((5, 5)), scale=parameterize.xrand((5, 5))。

2、測試Lapalce分佈的概率密度函數

測試方法：該部分主要測試分佈各種概率密度函數。類TestLaplacePDF繼承unittest.TestCase，分別實現方法setUp（初始化），test_prob（prob單測），test_log_prob（log_prob單測），test_cdf（cdf單測），test_icdf（icdf）。以上分佈在scipy.stats.laplace中均有實現，因此給定某個輸入value，對比相同引數下Laplace分佈的scipy實現以及paddle實現的結果，若誤差在容忍度範圍內則證明實現正確。
測試用例：為不失一般性，測試使用多維位置引數和尺度引數初始化Laplace類，並覆蓋int型輸入及float型輸入。

‘value-float’: loc=np.array([0.2, 0.3]), scale=np.array([2, 3]), value=np.array([2., 5.]); * ‘value-int’: loc=np.array([0.2, 0.3]), scale=np.array([2, 3]), value=np.array([2, 5]);
‘value-multi-dim’: loc=np.array([0.2, 0.3]), scale=np.array([2, 3]), value=np.array([[4., 6], [8, 2]])。

3、測試Lapalce分佈之間的KL散度

測試方法：該部分測試兩個Laplace分佈之間的KL散度。類TestLaplaceAndLaplaceKL繼承unittest.TestCase，分別實現setUp（初始化），test_kl_divergence（kl_divergence）。在scipy中scipy.stats.entropy可用來計算兩個分佈之間的散度。因此對比兩個Laplace分佈在paddle.distribution.kl_divergence下和在scipy.stats.laplace下計算的散度，若結果在誤差範圍內，則證明該方法實現正確。
測試用例：分佈1：loc=np.array([0.0]), scale=np.array([1.0]), 分佈2: loc=np.array([1.0]), scale=np.array([0.5])

3、程式碼開發

程式碼的開發主要參考Pytorch，此處涉及到單元測試程式碼的開發，kl散度註冊等程式碼，需要仔細閱讀PaddlePaddle中其他分佈程式碼的實現形式。

import numbers
import numpy as np
import paddle
from paddle.distribution import distribution
from paddle.fluid import framework as framework
class Laplace(distribution.Distribution):
    r"""
    Creates a Laplace distribution parameterized by :attr:`loc` and :attr:`scale`.
    Mathematical details
    The probability density function (pdf) is
 .. math::
 pdf(x; \mu, \sigma) = \frac{1}{2 * \sigma} * e^{\frac {-|x - \mu|}{\sigma}}
    In the above equation:
 * :math:`loc = \mu`: is the location parameter.
 * :math:`scale = \sigma`: is the scale parameter.
 Args:
 loc (scalar|Tensor): The mean of the distribution.
 scale (scalar|Tensor): The scale of the distribution.
 name(str, optional): Name for the operation (optional, default is None). For more information, please refer to :ref:`api_guide_Name`.
    Examples:
 .. code-block:: python
 import paddle
                        m = paddle.distribution.Laplace(paddle.to_tensor([0.0]), paddle.to_tensor([1.0]))
 m.sample()  # Laplace distributed with loc=0, scale=1
                        # Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True, 
                        # [3.68546247])
 """
    def __init__(self, loc, scale):
 if not isinstance(loc, (numbers.Real, framework.Variable)):
            raise TypeError(
 f"Expected type of loc is Real|Variable, but got {type(loc)}")
 if not isinstance(scale, (numbers.Real, framework.Variable)):
            raise TypeError(
 f"Expected type of scale is Real|Variable, but got {type(scale)}"
 )
 if isinstance(loc, numbers.Real):
            loc = paddle.full(shape=(), fill_value=loc)
 if isinstance(scale, numbers.Real):
            scale = paddle.full(shape=(), fill_value=scale)
 if (len(scale.shape) > 0 or len(loc.shape) > 0) and (loc.dtype
 == scale.dtype):
 self.loc, self.scale = paddle.broadcast_tensors([loc, scale])
 else:
 self.loc, self.scale = loc, scale
 super(Laplace, self).__init__(self.loc.shape)
    @property
    def mean(self):
 """Mean of distribution.
        Returns:
            Tensor: The mean value.
 """
 return self.loc
    @property
    def stddev(self):
 """Standard deviation.
        The stddev is 
 .. math::
 stddev = \sqrt{2} * \sigma
        In the above equation:
 * :math:`scale = \sigma`: is the scale parameter.
        Returns:
            Tensor: The std value.
 """
 return (2**0.5) * self.scale
    @property
    def variance(self):
 """Variance of distribution.
        The variance is 
 .. math::
            variance = 2 * \sigma^2
        In the above equation:
 * :math:`scale = \sigma`: is the scale parameter.
        Returns:
            Tensor: The variance value.
 """
 return self.stddev.pow(2)
    def _validate_value(self, value):
 """Argument dimension check for distribution methods such as `log_prob`,
 `cdf` and `icdf`. 
 Args:
 value (Tensor|Scalar): The input value, which can be a scalar or a tensor.
        Returns:
          loc, scale, value: The broadcasted loc, scale and value, with the same dimension and data type.
 """
 if isinstance(value, numbers.Real):
            value = paddle.full(shape=(), fill_value=value)
 if value.dtype != self.scale.dtype:
            value = paddle.cast(value, self.scale.dtype)
 if len(self.scale.shape) > 0 or len(self.loc.shape) > 0 or len(
 value.shape) > 0:
            loc, scale, value = paddle.broadcast_tensors(
 [self.loc, self.scale, value])
 else:
            loc, scale = self.loc, self.scale
 return loc, scale, value
    def log_prob(self, value):
 """Log probability density/mass function.
        The log_prob is
 .. math::
            log\_prob(value) = \frac{-log(2 * \sigma) - |value - \mu|}{\sigma}
        In the above equation:
 * :math:`loc = \mu`: is the location parameter.
 * :math:`scale = \sigma`: is the scale parameter.
 Args:
 value (Tensor|Scalar): The input value, can be a scalar or a tensor.
        Returns:
          Tensor: The log probability, whose data type is same with value.
        Examples:
 .. code-block:: python
 import paddle
                            m = paddle.distribution.Laplace(paddle.to_tensor([0.0]), paddle.to_tensor([1.0]))
                            value = paddle.to_tensor([0.1])
 m.log_prob(value) 
                            # Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True,
                            # [-0.79314721])
 """
        loc, scale, value = self._validate_value(value)
 log_scale = -paddle.log(2 * scale)
 return (log_scale - paddle.abs(value - loc) / scale)
    def entropy(self):
 """Entropy of Laplace distribution.
        The entropy is:
 .. math::
 entropy() = 1 + log(2 * \sigma)
        In the above equation:
 * :math:`scale = \sigma`: is the scale parameter.
        Returns:
            The entropy of distribution.
        Examples:
 .. code-block:: python
 import paddle
                            m = paddle.distribution.Laplace(paddle.to_tensor([0.0]), paddle.to_tensor([1.0]))
 m.entropy()
                            # Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True,
                            # [1.69314718])
 """
 return 1 + paddle.log(2 * self.scale)
    def cdf(self, value):
 """Cumulative distribution function.
        The cdf is
 .. math::
 cdf(value) = 0.5 - 0.5 * sign(value - \mu) * e^\frac{-|(\mu - \sigma)|}{\sigma}
        In the above equation:
 * :math:`loc = \mu`: is the location parameter.
 * :math:`scale = \sigma`: is the scale parameter.
 Args:
 value (Tensor): The value to be evaluated.
        Returns:
            Tensor: The cumulative probability of value.
        Examples:
 .. code-block:: python
 import paddle
                            m = paddle.distribution.Laplace(paddle.to_tensor([0.0]), paddle.to_tensor([1.0]))
                            value = paddle.to_tensor([0.1])
 m.cdf(value)
                            # Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True,
                            # [0.54758132])
 """
        loc, scale, value = self._validate_value(value)
 iterm = (0.5 * (value - loc).sign() *
 paddle.expm1(-(value - loc).abs() / scale))
 return 0.5 - iterm
    def icdf(self, value):
 """Inverse Cumulative distribution function.
        The icdf is 
 .. math::
 cdf^{-1}(value)= \mu - \sigma * sign(value - 0.5) * ln(1 - 2 * |value-0.5|)
        In the above equation:
 * :math:`loc = \mu`: is the location parameter.
 * :math:`scale = \sigma`: is the scale parameter.
 Args:
 value (Tensor): The value to be evaluated.
        Returns:
            Tensor: The cumulative probability of value.
        Examples:
 .. code-block:: python
 import paddle
                            m = paddle.distribution.Laplace(paddle.to_tensor([0.0]), paddle.to_tensor([1.0]))
                            value = paddle.to_tensor([0.1])
 m.icdf(value)
                            # Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True,
                            # [-1.60943794])
 """
        loc, scale, value = self._validate_value(value)
        term = value - 0.5
 return (loc - scale * (term).sign() * paddle.log1p(-2 * term.abs()))
    def sample(self, shape=()):
 """Generate samples of the specified shape.
 Args:
 shape(tuple[int]): The shape of generated samples.
        Returns:
            Tensor: A sample tensor that fits the Laplace distribution.
        Examples:
 .. code-block:: python
 import paddle
                            m = paddle.distribution.Laplace(paddle.to_tensor([0.0]), paddle.to_tensor([1.0]))
 m.sample()  # Laplace distributed with loc=0, scale=1
                            # Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True,
                            # [3.68546247])
 """
 if not isinstance(shape, tuple):
            raise TypeError(
 f'Expected shape should be tuple[int], but got {type(shape)}')
 with paddle.no_grad():
 return self.rsample(shape)
    def rsample(self, shape):
 """Reparameterized sample.
 Args:
 shape(tuple[int]): The shape of generated samples.
        Returns:
            Tensor: A sample tensor that fits the Laplace distribution.
        Examples:
 .. code-block:: python
 import paddle
                            m = paddle.distribution.Laplace(paddle.to_tensor([0.0]), paddle.to_tensor([1.0]))
 m.rsample((1,))  # Laplace distributed with loc=0, scale=1
                            # Tensor(shape=[1, 1], dtype=float32, place=Place(cpu), stop_gradient=True,
                            # [[0.04337667]])
 """
        eps = self._get_eps()
        shape = self._extend_shape(shape) or (1, )
        uniform = paddle.uniform(shape=shape,
                                 min=float(np.nextafter(-1, 1)) + eps / 2,
                                 max=1. - eps / 2,
 dtype=self.loc.dtype)
 if len(self.scale.shape) == 0 and len(self.loc.shape) == 0:
            loc, scale, uniform = paddle.broadcast_tensors(
 [self.loc, self.scale, uniform])
 else:
            loc, scale = self.loc, self.scale
 return (loc - scale * uniform.sign() * paddle.log1p(-uniform.abs()))
    def _get_eps(self):
 """
        Get the eps of certain data type.
        Note: 
            Since paddle.finfo is temporarily unavailable, we 
            use hard-coding style to get eps value.
        Returns:
            Float: An eps value by different data types.
 """
        eps = 1.19209e-07
 if (self.loc.dtype == paddle.float64
                or self.loc.dtype == paddle.complex128):
            eps = 2.22045e-16
 return eps
    def kl_divergence(self, other):
 """Calculate the KL divergence KL(self || other) with two Laplace instances.
        The kl_divergence between two Laplace distribution is
 .. math::
 KL\_divergence(\mu_0, \sigma_0; \mu_1, \sigma_1) = 0.5 (ratio^2 + (\frac{diff}{\sigma_1})^2 - 1 - 2 \ln {ratio})
 .. math::
            ratio = \frac{\sigma_0}{\sigma_1}
 .. math::
            diff = \mu_1 - \mu_0
        In the above equation:
 * :math:`loc = \mu`: is the location parameter of self.
 * :math:`scale = \sigma`: is the scale parameter of self.
 * :math:`loc = \mu_1`: is the location parameter of the reference Laplace distribution.
 * :math:`scale = \sigma_1`: is the scale parameter of the reference Laplace distribution.
 * :math:`ratio`: is the ratio between the two distribution.
 * :math:`diff`: is the difference between the two distribution.
 Args:
 other (Laplace): An instance of Laplace.
        Returns:
            Tensor: The kl-divergence between two laplace distributions.
        Examples:
 .. code-block:: python
 import paddle
                            m1 = paddle.distribution.Laplace(paddle.to_tensor([0.0]), paddle.to_tensor([1.0]))
                            m2 = paddle.distribution.Laplace(paddle.to_tensor([1.0]), paddle.to_tensor([0.5]))
                            m1.kl_divergence(m2)
                            # Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True,
                            # [1.04261160])
 """
 var_ratio = other.scale / self.scale
        t = paddle.abs(self.loc - other.loc)
        term1 = ((self.scale * paddle.exp(-t / self.scale) + t) / other.scale)
        term2 = paddle.log(var_ratio)
 return term1 + term2 - 1

4、總結

目前，該API已經鎖定貢獻。回顧API的開發過程，實際上該API的開發並不難，主要的問題在於如何進行單元測試，證明開發的API是正確的，並且還有一些相關的細節點，比如KL散度的註冊等。還有就是最開始走了彎路，參照了Normal的開發風格，將API寫成了2.0風格的，影響了一些時間，並且在最後的單測中，發現了Uniform實現方式的一些Bug，此處Debug花費了一些時間，整體來看，花時間的部分是在單測部分。

點選關注，第一時間瞭解華為雲新鮮技術~