Why some NLP products take off, Semantic Search changing SEO
Tweet of the Week
This app powered by GPT-3 blew up on Twitter. I’m a developer and I absolutely hate writing regular expressions. In fact, I don’t know of a single developer who does.
In the first edition of this newsletter, I said I was most excited about applying Large Language Models to boring and narrow use cases. This app that interprets natural language and generates RegEx fits this description very well. There have been plenty of demos and apps built on top of GPT-3 but I haven’t seen any take-off quite like this.
My general sense is it’s because of a couple of reasons:
Writing RegEx is a frustrating experience. The syntax is complicated and ugly to look at. It’s also an incredibly useful and non-opinionated tool for parsing out patterns in text. RegEx is a difficult to use but powerful tool so any large language model that improves the developer experience will be a major hit.
RegEx does not have any dependencies. The RegEx string that describes the patterns of text you want to extract is the only thing needed to build the program. SQL queries are very valuable and can be complicated to write but my hunch is that LLMs aren’t as useful in this case because you have to know too much about the database schema and how the data was collected in the first place. In querying databases, there are many tasks and dependencies outside of just writing the SQL query.
To summarize:
Popularity of LLM application = (value of the automated task) x (difficulty of the task) x (how isolated and narrow the task is)
Based on this formula, I predict that an LLM application that helps build web scrapers will also take off. Given an entire dump of a web page’s HTML, everything you need to scrape is there (isolation) and it’s time-consuming to write the parsing code to traverse the HTML (difficulty). It would be neat to ask an LLM to generate the code to extract data from a table rather than getting lost in the HTML.
Use case of the Week - SEO
Traditional SEO involves conducting keyword research, figuring out every possible way to include those keywords on web pages, and aiming to be the top result for that keyword on Google searches.
However, Google’s search is now semantic search based meaning the underlying concepts of the text, user intent, and query context that relate to those concepts are far more important now than they used to be. Website content that’s written to answer the intention behind user queries will be rewarded. For example, you should aim to have the same content returned when a user searches “NYC apartments available for rent” and “Apartments in New York City currently open for renting”. Creating separate content to target singular, plural, and other variations of keywords helps less than it did years ago.
Google’s direction to deeply understand user query intent started in 2016 with RankBrain and in 2019 Google rolled out BERT to improve that capability. Keywords still matter but gaming tactics such as stuffing keywords will be rewarded less as more queries are handled by Google’s semantic search technology.
For anyone who manages SEO strategy or implementation, this means understanding the user intent around your product and brand is crucial. For example, is the user trying to get a specific product page on your site or researching a specific topic? Think about the questions or Google searches your users will enter to find your site online and structure your content to be answer-based and easily understandable like an FAQ.