<P> In statistics, an expectation--maximization (EM) algorithm is an iterative method to find maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables . The EM iteration alternates between performing an expectation (E) step, which creates a function for the expectation of the log - likelihood evaluated using the current estimate for the parameters, and a maximization (M) step, which computes parameters maximizing the expected log - likelihood found on the E step . These parameter - estimates are then used to determine the distribution of the latent variables in the next E step . </P> <P> The EM algorithm was explained and given its name in a classic 1977 paper by Arthur Dempster, Nan Laird, and Donald Rubin . They pointed out that the method had been "proposed many times in special circumstances" by earlier authors . A very detailed treatment of the EM method for exponential families was published by Rolf Sundberg in his thesis and several papers following his collaboration with Per Martin - Löf and Anders Martin - Löf . The Dempster--Laird--Rubin paper in 1977 generalized the method and sketched a convergence analysis for a wider class of problems . Regardless of earlier inventions, the innovative Dempster--Laird--Rubin paper in the Journal of the Royal Statistical Society received an enthusiastic discussion at the Royal Statistical Society meeting with Sundberg calling the paper "brilliant". The Dempster--Laird--Rubin paper established the EM method as an important tool of statistical analysis . </P> <P> The convergence analysis of the Dempster--Laird--Rubin paper was flawed and a correct convergence analysis was published by C.F. Jeff Wu in 1983 . Wu's proof established the EM method's convergence outside of the exponential family, as claimed by Dempster--Laird--Rubin . </P> <P> The EM algorithm is used to find (local) maximum likelihood parameters of a statistical model in cases where the equations cannot be solved directly . Typically these models involve latent variables in addition to unknown parameters and known data observations . That is, either missing values exist among the data, or the model can be formulated more simply by assuming the existence of further unobserved data points . For example, a mixture model can be described more simply by assuming that each observed data point has a corresponding unobserved data point, or latent variable, specifying the mixture component to which each data point belongs . </P>

What is the difference between em and likelihood