Statistical Machine Translation as written by AI
Statistical Machine Translation (SMT) is a machine translation paradigm that uses statistical models to translate from one natural language to another. SMT uses the same general approach as many other statistical NLP problems: it treats the translation problem as a noisy channel problem, where the input sentence is the "source" and the output sentence is the "target", and the goal is to find the most likely translation given the input. The main difference from other statistical NLP problems is that the output is a sentence in a different language, and thus the output space is very large, making it difficult to find the most likely output.
The general approach in SMT is to use a large parallel corpus of sentences translated into the target language. A statistical model is built from the corpus and then used to translate new sentences. The SMT paradigm is in contrast to the rule-based machine translation paradigm.
Contents
- History
- Models
- Noisy channel model
- Bayes' rule
- Training
- Training the translation model
- Training the language model
- Decoding
- Example
- Parallel corpus
- Translation model
- Language model
History
The origins of SMT can be traced back to the 1950s, when researchers began experimenting with statistical methods for machine translation. However, SMT became more popular with the advent of large bilingual corpora in the 1990s.
The first SMT system to achieve good performance was developed by IBM in the early 1990s. Their models were based on the noisy channel model and on Bayes' rule. The IBM models were originally developed for word-based translation, but were later extended to phrase-based translation and syntax-based translation.
Models
The most common SMT models are based on the noisy channel model and Bayes' rule.
Noisy channel model
The noisy channel model is a probabilistic model for explaining how a message is corrupted during transmission. It is a common model for explaining how errors are introduced in written text. In SMT, the noisy channel model is used to model how a sentence is translated from one language to another.
The noisy channel model assumes that there is a "true" sentence in the source language (), and that the sentence that is actually observed () is a corrupted version of . The goal of the model is to infer the most likely source sentence given the observed sentence.
The model is based on two main assumptions:
- The probability of a target sentence is only dependent on the source sentence (i.e. ). This assumption is known as the "independence assumption".
- The source sentence is generated independently of the target sentence (i.e. ). This assumption is known as the "target independence assumption".
These assumptions allow the probability of the source sentence to be calculated as:
The probability is called the language model, and the probability is called the translation model. The denominator is the same for all source sentences, so it can be ignored when choosing the most likely source sentence.