Contact us
Multilingual AI Data

Quality Management in Data Collection for NLP: 3 Things to Consider

In the last few years, we’ve seen a growing demand for data collection projects. These projects typically involve translating large quantities of language data, which is then used to train Natural Language Processing (NLP) or machine learning engines. Data collection projects need to be managed according to specific requirements, and Argos has adapted our language quality assurance (LQA) strategies to meet this need. Below, we’ve collected several tips for data collection quality management that we think are effective.

Translation quality management 101

Most localization experts are familiar with translation quality concepts, but the process of running a language quality program can often be unclear to their clients. Using an analogical example of manufacturing, LQA can be described as a sampling process, whose goal is to prevent defects from reaching customers. A sample of a finished product is tested, and non-conformances may be detected. This is usually followed by an attempt to identify the causes of the non-conformances, if any, and adoption of preventive measures. Replace a batch sample of a product with a selection from a translated text and you have an idea of how LQA works. In an LQA, a text sample is reviewed by an independent assessor who reports errors (non-compliances) and assigns error categories and severities to them. This is where the manufacturing analogy ends—LQA is an art as much as a science.

It can be described as an attempt to operationalize quality in the “very” human activity of translation. It proceeds on the assumption that language quality is something that can be assessed and evaluated – that any two (or more) professional linguists can agree whether a translation meets certain criteria or not.

As in manufacturing, quality assurance in translation is about strategically managing risk. Language quality managers typically focus on:

  • Appropriate sample size and consistent quality. Selecting a sample that’s too small may not represent the general quality of the project–for example, a small sample might overlook more challenging parts of the document. On the other hand, a sample that’s too large will affect the deadline and project margin, and still may not provide a clear picture of quality.
  • Quality of the LQA review. Finding an experienced LQA reviewer is key–it’s important that the reviewer understands the subject, is briefed on all client instructions, works thoroughly, and can strike a good balance between harsh and lenient in their assessment.

Often, we manage quality by working with trusted vendors who ensure quality at the source, and through effective project management techniques–content analysis, query handling, and more.

The internal risks of a project, however, can be managed through quality management strategies including:

  • Sampling frequency
  • Picking representative samples. The sample should cover all content types included in the project. When translating a larger file, the sample should be drawn from different parts of the file.
  • Careful selection, vetting and training of LQA reviewers
  • LQA follow-up. Following up on issues that arise in an LQA review will make that review far more effective than an LQA with no follow-up. In order for quality review to be effective in the long term, any issues in the translation need to be addressed and preventive actions need to be taken. LQAs sometimes even detect issues in the workflow or process being employed. They can reveal issues that go beyond translation and can be addressed, for example, by updating the project instructions and reference material or by working with the client to eliminate issues in the source text.

Adapting LQA to data collection projects

Data collection projects pose a number of challenges. They often come with complex instructions from the clients, who want the translations to benefit machine learning models. The source sentences are often incomplete, fragmented and usually contain a wide range of subject matter, requiring the translators to perform extensive research to become familiar with the context. Text selections are often presented out of context and can be ungrammatical, colloquial, slangy or very technical with a specialized vocabulary. Finally, most data collection projects are translation-only, meaning that language service providers can’t rely on the usual 3-step workflow of translation, editing and proofreading by a second linguist.

As a result, the LQA process needs to be able to flag gaps in the translators’ understanding of project objectives and instructions and areas of research done by the translators. And in determining quality standards, LQA metrics need to consider the specifics of data collection projects (for example, standards for AI/ML training rather than for publication), the production process and the vast number of linguists involved.

The Argos advantage

As Argos has worked with data collection clients, we’ve adopted the same LQA metrics that our clients use to evaluate deliveries from suppliers. These metrics are specific to data collection requirements and are different from the ones applied to typical translation projects for reader consumption. For example, they typically discourage reviewers from logging stylistic improvements, which don’t add much benefit given the purpose and size of these projects. In some cases, Argos has developed hybrid models that draw on the client’s LQA metrics but weigh certain error categories or severities differently.

To ensure that everyone is on the same page about the parameters and objectives of each project, all LQA reviewers attend training sessions alongside the translators. We’ve found that this first initial step results in a translation and review team that is aligned and thoroughly understands all client instructions, goals and the LQA process.

In our experience, managing data collection LQAs requires a very hands-on approach. The quality team often needs to provide input at different stages of the LQA process, train the reviewers further on specific aspects of the process, correct misperceptions and act as moderators when the translators and reviewers have differing interpretations of the client instructions. Argos has built up in-house expertise in this area, and we’re familiar with the typical issues and questions that arise during translation and LQA of data collection projects. With this bank of experience, we can guide the teams to resolve issues in line with the client’s expectations.

When it comes to sampling, we distinguish between data collection projects that relate to training AI-powered virtual assistants and projects aimed at training machine translation engines. The data for virtual assistants can usually be sorted according to scenarios (weather, planning, shopping lists, driving directions, music, TV, smart home, and more) and we make sure the LQA samples cover as many of these scenarios as possible. Preparing samples for virtual assistant data can require a lot of up-front work, but we can be fairly confident that the sample is representative—the non-LQAed content will consist of variations of the same commands and responses. In projects related to training machine translation engines, the language data is typically random, so the samples can be randomly selected.

Due to the huge volumes of data involved, a large number of linguists work on these projects. The variations in translators’ work quality and the levels of difficulty in the content, however, are within reasonable limits that satisfy the clients’ quality requirements. And we can use the results of an LQA to provide extra training or clarity to a translator with a higher error rate.

And on top of all of this, a successful quality management program for data collection projects needs to be cost-effective and reliable in detecting issues, especially to meet client instructions. Several factors need to align for this to work. It’s a very hands-on process where all parties involved—supply chain, quality, production, translators and LQA reviewers—must be flexible and open to learning and improving.

Stay in the loop

Take a look at our latest content

Understanding Sentiment Analysis: Bridging Human Emotions and AI


The Story of Data and Sound: How Collecting Bytes and Beats Enhances Our Lives


The Dawn of a New AI Era: Understanding Semantic AI and Its Significance