Imagen, Data Labeling

May 30, 2022

Tweet of the Week

I came across this tweet that shows a text-to-image generation model created by the Google Brain Research team. The team discovered that large language models’s ability to encode the meaning of text transfers well to generating corresponding images. My personal favorite is the bottom right showing an autogenerated picture of a strawberry mug.

What I Learned this Week

I learned that clever UX and UI can speed up data labeling and prevent teams from having to outsource the most important thing in building high-quality machine learning models. Data teams need labeled data to build a customized NLP model. This involves assigning thousands of text examples to explicit categories such as positive or negative sentiment.

Since I’m an engineering team of one, I needed an efficient way to label ~10K text snippets. I used a program called Prodigy that provides simple UX and keyboard shortcuts. Once I was sure of the labels, I was able to label more than 2K on an LIRR train ride! I was astonished because it’s been drilled in my head that data labeling is supposed to be an expensive weeks-long process where you outsource to something like Mechanical Turk. With a proper labeling UI, I saved at least 5 hours of data labeling and unlocked a ton of productivity.

Anish's Newsletter

Discussion about this post