How to Auto-Label Your Data Using Transformer Models

Walid Amamou
5 min readJun 13, 2022
Photo by Andrea De Santis on Unsplash

While many applications have been using off the shelf pre-trained models for various tasks such as content generation, question-answering, or generic named entity recognition, less focus has been put into creating business specific training datasets that enable fine-tuning large models to solve specific business problems. In order for AI to have a real and long lasting impact, it has to be adopted by small and medium businesses with very distinct business problems. Using a one-size fits all model has proven unworkable and unrealistic.

Creating custom training dataset is easier said than done, it requires high quality data labeling which is usually expensive and time-consuming to create. Therefore, finding ways to automate the labeling process is of the utmost importance in the field of AI and it is currently a very hot topic of research. Although recent advancements in programmatic labeling such as Weak labeling have been proposed, their output quality remains questionable and require strong human supervision. For more information check out my previous article “Can Weak Labeling Replace Human Labeled Data

In this article, we will leverage a transformer model to auto-label our data using a small seed annotated dataset. We will then review the model’s annotation to correct incorrect labels.

--

--

Walid Amamou
Walid Amamou

Written by Walid Amamou

Founder and CEO of UBIAI | PhD in Physics.

No responses yet