关灯
护眼
字体:
大
中
小
烟雨山河 [不良人|尤川](7)
作者:紫微客 阅读记录
\]
First, let's write out the log-likelihood function. Suppose \(\mathbf{X} = \{\mathbf{x}_1, \mathbf{x}_2, \ldots, \mathbf{x}_N\}\) are the observations, and \(\boldsymbol{\pi} = (\pi_1, \pi_2, \ldots, \pi_K)\) are the mixing coefficients for the Gaussian distributions. The log-likelihood function is:
\[
\ln p(\mathbf{X} \mid \boldsymbol{\pi}, \boldsymbol{\mu}, \boldsymbol{\Sigma}) = \sum_{n=1}^N \ln \left( \sum_{k=1}^K \pi_k \mathcal{N}(\mathbf{x}_n \mid \boldsymbol{\mu}_k, \boldsymbol{\Sigma}_k) \right)
\]
Next, consider the objective function with the Lagrange multiplier:
\[
\mathcal{L} = \sum_{n=1}^N \ln \left( \sum_{k=1}^K \pi_k \mathcal{N}(\mathbf{x}_n \mid \boldsymbol{\mu}_k, \boldsymbol{\Sigma}_k) \right) + \lambda \left( \sum_{k=1}^K \pi_k - 1 \right)
\]
We take the derivative of \(\mathcal{L}\) with respect to \(\pi_k\):
\[
\frac{\partial \mathcal{L}}{\partial \pi_k} = \sum_{n=1}^N \frac{\mathcal{N}(\mathbf{x}_n \mid \boldsymbol{\mu}_k, \boldsymbol{\Sigma}_k)}{\sum_{j=1}^K \pi_j \mathcal{N}(\mathbf{x}_n \mid \boldsymbol{\mu}_j, \boldsymbol{\Sigma}_j)} + \lambda
\]
Define \(\gamma(z_{nk})\) as the posterior probability that data point \(\mathbf{x}_n\) belongs to the \(k\)-th Gaussianponent:
\[
\gamma(z_{nk}) = \frac{\pi_k \mathcal{N}(\mathbf{x}_n \mid \boldsymbol{\mu}_k, \boldsymbol{\Sigma}_k)}{\sum_{j=1}^K \pi_j \mathcal{N}(\mathbf{x}_n \mid \boldsymbol{\mu}_j, \boldsymbol{\Sigma}_j)}
\]
Thus, we can rewrite the derivative as:
\[
\frac{\partial \mathcal{L}}{\partial \pi_k} = \sum_{n=1}^N \frac{\gamma(z_{nk})}{\pi_k} + \lambda
\]
To find the optimal value, set the derivative to zero:
\[
\sum_{n=1}^N \frac{\gamma(z_{nk})}{\pi_k} + \lambda = 0
\]
Rewrite the equation and solve for \(\pi_k\):
\[
\sum_{n=1}^N \gamma(z_{nk}) = -\lambda \pi_k
\]
We know that \(\sum_{k=1}^K \pi_k = 1\), hence:
\[
\sum_{k=1}^K \sum_{n=1}^N \gamma(z_{nk}) = -\lambda \sum_{k=1}^K \pi_k = -\lambda
\]
Thus,
\[
-\lambda = N
\]
Substitute \(-\lambda\) back into the previous equation:
\[
\sum_{n=1}^N \gamma(z_{nk}) = N \pi_k
\]
Solve for \(\pi_k\):
\[
\pi_k = \frac{\sum_{n=1}^N \gamma(z_{nk})}{N}
\]
Define \(N_k = \sum_{n=1}^N \gamma(z_{nk})\) as the effective number of samples belonging to the \(k\)-th Gaussianponent, then:
\[
\pi_k = \frac{N_k}{N}
\]
Thus, we have derived the expression for \(\pi_k\):
\[
\pi_k = \frac{N_k}{N}
\]
第 5 章
\chapter{Parameter estimation methods}\label{cap2}
\section{Overview of Parameter Estimation}
urate parameter estimation is crucial for effectively utilizing the Pearson Type III distribution in practical applications. This section provides an overview of the primary methods used for estimating the parameters of this distribution, emphasizing their theoretical foundations, advantages, and limitations.
\section{The method of moment estimation (MoM)}
The Moment method is one of the earliest and simplest techniques for parameter estimation. It involves equating the sample moments (e.g., mean, variance) to the theoretical moments of the distribution to solve for the parameters.
Raw moments represent the average of powers of the observed data points.The k-th order raw moment ($m_k$) is calculated as
\begin{equation}
m_k = \frac{1}{n} \sum_{i=1}^{n} X_i^k
\end{equation}
where $X_i$ represents the observed data points and $n$ is the sample size.\\
\vspace{0.1cm}
\vspace{0.1cm}
Central moments measure the variability of the data points around their sample mean.The k-th order central moment ($t_k$) is given by
\begin{equation}
t_k = \frac{1}{n} \sum_{i=1}^{n} (X_i - \bar{X})^k
\end{equation}
where $\bar{X}$ represents the sample mean.\\
\vspace{0.1cm}
In method of moments estimation, we equate the sample moments (either raw or central) to their corresponding population moments and solve for the parameters of interest. This provides estimates for the parameters based on the sample data.
\section{The maximum likelihood estimation method (ML)}
The Maximum Likelihood Estimation method is widely regarded for its efficiency and robustness. It finds parameter values that maximize the likelihood function, representing the probability of observing the given data. Maximum Likelihood Estimation (MLE) is a method used to estimate the parameters of a statistical model by maximizing the likelihood function. Suppose we have a random sample $X_1, X_2, \ldots, X_n $ from a probability distribution with probability density function (pdf) or probability mass function (pmf) $ f(x; \theta) $, where $ \theta $ represents the parameter(s) to be estimated.