Quality Estimation for Machine Translation

Quality Estimation for Machine Translation
Author: Lucia Specia
Publisher: Springer Nature
Total Pages: 148
Release: 2022-05-31
Genre: Computers
ISBN: 3031021681

Download Quality Estimation for Machine Translation Book in PDF, Epub and Kindle

Many applications within natural language processing involve performing text-to-text transformations, i.e., given a text in natural language as input, systems are required to produce a version of this text (e.g., a translation), also in natural language, as output. Automatically evaluating the output of such systems is an important component in developing text-to-text applications. Two approaches have been proposed for this problem: (i) to compare the system outputs against one or more reference outputs using string matching-based evaluation metrics and (ii) to build models based on human feedback to predict the quality of system outputs without reference texts. Despite their popularity, reference-based evaluation metrics are faced with the challenge that multiple good (and bad) quality outputs can be produced by text-to-text approaches for the same input. This variation is very hard to capture, even with multiple reference texts. In addition, reference-based metrics cannot be used in production (e.g., online machine translation systems), when systems are expected to produce outputs for any unseen input. In this book, we focus on the second set of metrics, so-called Quality Estimation (QE) metrics, where the goal is to provide an estimate on how good or reliable the texts produced by an application are without access to gold-standard outputs. QE enables different types of evaluation that can target different types of users and applications. Machine learning techniques are used to build QE models with various types of quality labels and explicit features or learnt representations, which can then predict the quality of unseen system outputs. This book describes the topic of QE for text-to-text applications, covering quality labels, features, algorithms, evaluation, uses, and state-of-the-art approaches. It focuses on machine translation as application, since this represents most of the QE work done to date. It also briefly describes QE for several other applications, including text simplification, text summarization, grammatical error correction, and natural language generation.

Machine Translation

Machine Translation
Author: Junhui Li
Publisher: Springer Nature
Total Pages: 154
Release: 2021-01-13
Genre: Computers
ISBN: 981336162X

Download Machine Translation Book in PDF, Epub and Kindle

This book constitutes the refereed proceedings of the 16th China Conference on Machine Translation, CCMT 2020, held in Hohhot, China, in October 2020. The 13 papers presented in this volume were carefully reviewed and selected from 78 submissions and focus on all aspects of machine translation, including preprocessing, neural machine translation models, hybrid model, evaluation method, and post-editing.

Continuous Space Models with Neural Networks in Natural Language Processing

Continuous Space Models with Neural Networks in Natural Language Processing
Author: Hai Son Le
Publisher:
Total Pages: 0
Release: 2012
Genre:
ISBN:

Download Continuous Space Models with Neural Networks in Natural Language Processing Book in PDF, Epub and Kindle

The purpose of language models is in general to capture and to model regularities of language, thereby capturing morphological, syntactical and distributional properties of word sequences in a given language. They play an important role in many successful applications of Natural Language Processing, such as Automatic Speech Recognition, Machine Translation and Information Extraction. The most successful approaches to date are based on n-gram assumption and the adjustment of statistics from the training data by applying smoothing and back-off techniques, notably Kneser-Ney technique, introduced twenty years ago. In this way, language models predict a word based on its n-1 previous words. In spite of their prevalence, conventional n-gram based language models still suffer from several limitations that could be intuitively overcome by consulting human expert knowledge. One critical limitation is that, ignoring all linguistic properties, they treat each word as one discrete symbol with no relation with the others. Another point is that, even with a huge amount of data, the data sparsity issue always has an important impact, so the optimal value of n in the n-gram assumption is often 4 or 5 which is insufficient in practice. This kind of model is constructed based on the count of n-grams in training data. Therefore, the pertinence of these models is conditioned only on the characteristics of the training text (its quantity, its representation of the content in terms of theme, date). Recently, one of the most successful attempts that tries to directly learn word similarities is to use distributed word representations in language modeling, where distributionally words, which have semantic and syntactic similarities, are expected to be represented as neighbors in a continuous space. These representations and the associated objective function (the likelihood of the training data) are jointly learned using a multi-layer neural network architecture. In this way, word similarities are learned automatically. This approach has shown significant and consistent improvements when applied to automatic speech recognition and statistical machine translation tasks. A major difficulty with the continuous space neural network based approach remains the computational burden, which does not scale well to the massive corpora that are nowadays available. For this reason, the first contribution of this dissertation is the definition of a neural architecture based on a tree representation of the output vocabulary, namely Structured OUtput Layer (SOUL), which makes them well suited for large scale frameworks. The SOUL model combines the neural network approach with the class-based approach. It achieves significant improvements on both state-of-the-art large scale automatic speech recognition and statistical machine translations tasks. The second contribution is to provide several insightful analyses on their performances, their pros and cons, their induced word space representation. Finally, the third contribution is the successful adoption of the continuous space neural network into a machine translation framework. New translation models are proposed and reported to achieve significant improvements over state-of-the-art baseline systems.

Neural Machine Translation

Neural Machine Translation
Author: Philipp Koehn
Publisher: Cambridge University Press
Total Pages: 409
Release: 2020-06-18
Genre: Computers
ISBN: 1108497322

Download Neural Machine Translation Book in PDF, Epub and Kindle

Learn how to build machine translation systems with deep learning from the ground up, from basic concepts to cutting-edge research.

Linguistically Motivated Statistical Machine Translation

Linguistically Motivated Statistical Machine Translation
Author: Deyi Xiong
Publisher: Springer
Total Pages: 159
Release: 2015-02-11
Genre: Language Arts & Disciplines
ISBN: 9812873562

Download Linguistically Motivated Statistical Machine Translation Book in PDF, Epub and Kindle

This book provides a wide variety of algorithms and models to integrate linguistic knowledge into Statistical Machine Translation (SMT). It helps advance conventional SMT to linguistically motivated SMT by enhancing the following three essential components: translation, reordering and bracketing models. It also serves the purpose of promoting the in-depth study of the impacts of linguistic knowledge on machine translation. Finally it provides a systematic introduction of Bracketing Transduction Grammar (BTG) based SMT, one of the state-of-the-art SMT formalisms, as well as a case study of linguistically motivated SMT on a BTG-based platform.

Machine Translation with Minimal Reliance on Parallel Resources

Machine Translation with Minimal Reliance on Parallel Resources
Author: George Tambouratzis
Publisher: Springer
Total Pages: 92
Release: 2017-08-09
Genre: Computers
ISBN: 3319631071

Download Machine Translation with Minimal Reliance on Parallel Resources Book in PDF, Epub and Kindle

This book provides a unified view on a new methodology for Machine Translation (MT). This methodology extracts information from widely available resources (extensive monolingual corpora) while only assuming the existence of a very limited parallel corpus, thus having a unique starting point to Statistical Machine Translation (SMT). In this book, a detailed presentation of the methodology principles and system architecture is followed by a series of experiments, where the proposed system is compared to other MT systems using a set of established metrics including BLEU, NIST, Meteor and TER. Additionally, a free-to-use code is available, that allows the creation of new MT systems. The volume is addressed to both language professionals and researchers. Prerequisites for the readers are very limited and include a basic understanding of the machine translation as well as of the basic tools of natural language processing.​

Syntax-based Statistical Machine Translation

Syntax-based Statistical Machine Translation
Author: Philip Williams
Publisher: Springer Nature
Total Pages: 190
Release: 2022-05-31
Genre: Computers
ISBN: 3031021649

Download Syntax-based Statistical Machine Translation Book in PDF, Epub and Kindle

This unique book provides a comprehensive introduction to the most popular syntax-based statistical machine translation models, filling a gap in the current literature for researchers and developers in human language technologies. While phrase-based models have previously dominated the field, syntax-based approaches have proved a popular alternative, as they elegantly solve many of the shortcomings of phrase-based models. The heart of this book is a detailed introduction to decoding for syntax-based models. The book begins with an overview of synchronous-context free grammar (SCFG) and synchronous tree-substitution grammar (STSG) along with their associated statistical models. It also describes how three popular instantiations (Hiero, SAMT, and GHKM) are learned from parallel corpora. It introduces and details hypergraphs and associated general algorithms, as well as algorithms for decoding with both tree and string input. Special attention is given to efficiency, including search approximations such as beam search and cube pruning, data structures, and parsing algorithms. The book consistently highlights the strengths (and limitations) of syntax-based approaches, including their ability to generalize phrase-based translation units, their modeling of specific linguistic phenomena, and their function of structuring the search space.

Neural Machine Translation

Neural Machine Translation
Author: Philipp Koehn
Publisher: Cambridge University Press
Total Pages: 410
Release: 2020-06-18
Genre: Computers
ISBN: 1108601766

Download Neural Machine Translation Book in PDF, Epub and Kindle

Deep learning is revolutionizing how machine translation systems are built today. This book introduces the challenge of machine translation and evaluation - including historical, linguistic, and applied context -- then develops the core deep learning methods used for natural language applications. Code examples in Python give readers a hands-on blueprint for understanding and implementing their own machine translation systems. The book also provides extensive coverage of machine learning tricks, issues involved in handling various forms of data, model enhancements, and current challenges and methods for analysis and visualization. Summaries of the current research in the field make this a state-of-the-art textbook for undergraduate and graduate classes, as well as an essential reference for researchers and developers interested in other applications of neural methods in the broader field of human language processing.