Updated Proposal

Date: May 6, 2015

Project: TMS_Pilot_Samsung_Project

Content: Samsung Products Manuals

Languages: English to Simplified Chinese

From: Junpeng Qiao, Guanghan Liu, Yu Liang, Xinhui Du

Project Objective

  • Quality Goals

Zero tolerance:

1) Terminology Accuracy

Terminology accuracy should be considered in the first place when measuring the quality of the machine translation of our user manuals.  Here by accuracy we refer to both the correctness of term translation and the consistency of terms throughout the output of the trained machine translation system. We adopt a zero tolerance of terminology errors.

2) Information Completeness

Speaking of the complete representation of information of the original texts, we mainly stick to the goals that there will be no omission, addition or untranslated texts. Here it should be distinguished from “whether contents should be translated or information should be added” for culture-bound texts. The above-mentioned errors should not be committed in translating mostly the technical texts.

Tolerance:

3) Register

In register we want to measure whether slang and taboos are used in translation. Here we tolerate no such words in the machine translation output.

4) Style

The texts we are going to translate in the system are user manuals, so in general we require a formal tone of the translation. In this case, formal words and concise sentence structure should be employed. Still, it’s okay if there are 2-3 paragraphs that might not be translated in a formal manner.

5) Design

Here by design we are looking into factors as layout, local formatting, graphics and tables. Generally speaking, if post-editors mess up or fail to stick to the consistency of the fonts, indentation or leading, we won’t consider it as severe errors and will fix them later.

  • Timing Goal

Our tentative goal for timing of this project is to reduce required translation time by 30% – 50%. However, after conducting the pilot project, we found that the actual post-editing speed is much higher (75% faster).

  • Pricing Goal

There are several factors to consider when developing pricing goals for a PEMT project:

  1. Purpose

Since this project is to translate product manuals in the form of large knowledge database, we would expect a fast post-edit instead of a full one.

  1. Language pair

Unlike western languages, Chinese is very different from English in terms of grammar, and therefore has a higher requirement for the MT output.

  1. MT system

The content we will be translating and the one we import as training data are highly interrelated, so that a better result within this specific domain is expected.

In the light of these factors and a human editing test, we suggest that PEMT is priced around 50% of editing human translation.

  • Tentative PEMT timing and pricing goals VS. human translation:
Daily Workload Price
HT 2,500 source words $ 0.2 / word
PEMT 10,000 source words (400%) $ 0.1 / word (50%)

Process

  • Pilot Process and Timeline

timeline

Notes: by “segmentation” we mean dividing our original source files, which are large PDF files of over 150 pages, into smaller files for training, testing and tuning.

The wrapping & conclusion also includes a simple human editing test to see how effective the PEMT is in the light of its costs.

  • Costs

We have agreed to set the hourly rate at $20, which times the total hours of the project and then times 4 team members will be the total fixed costs for this project. The calculation is shown below.

$20 * 8h * 9 days * 4 members = $5,760

Based on the pricing and fixed cost, in order to break even and also generate enough revenue for company development, if clients want to use PEMT in their translation projects, the total words need to exceed 200,000 words or there will occur extra charge.

Deliverables

We have seen that adding training data would generally increase the BLEU score, and each round of training costs approximately 2.5 hours. After 5 rounds of training, the output quality is quite satisfying. Therefore we would expect a great reduction of both time and cost of a fully trained engine.

For further training, we would manually align documents before importing into the system, and add more specific data to tuning.