3word by word. When there is a mismatch for a word, we used the preceding n-gram approach to predict the new words but with a word penalty. In this contextwe assigned a reordering cost or a distortion cost depending on the number ofwords skipped either forward or the backward.
Phrase-based model is one ofthe most successful approaches of Statistical Machine Translation, but it cannothandle the syntax and semantics information of the target language 1, 5.Thus, the main focus of this paper work is to build an SMT System based onPhrase-based model and hence to translate any available text or documents fromEnglish to Manipuri Language pair. For this we required an enormous amountof Parallel corpus or bitext of aligned text level sentences. If the corpus containsill formed input it will not be translated correctly and hence, it will affect theTranslation Model. Not only this the main aim of this paper is to increase thefluency of the translated output language. There is very limited availability ofelectronically available parallel text corpora. Our paper works clearly shows thateven though we do not have much training parallel text corpora, we still try toimprove the fluency of the translated output.
This has been achieved throughthe incorporation of the monolingual corpus on the target side of the languagepair during the language model training. We also prove that the incorporationof Monolingual corpus significantly improves the Bilingual Evaluation UnderStudy (BlEU Score) which is one of a great result in our research findings andexperimentation.In the succeeding section, we discuss the methodology and Ssystem architectureof the PBSMT system. We then evaluate and compare the performance of ourimproved PBSMT system and the baseline PBSMT system using various auto-matic and human evaluation metrics and subsequently we analyze the varioustypes of error generated by the PBSMT system. In Sect.5, we conclude with thediscussion on the result and observation obtained from our research findings.