The Need for Domain-Specific Search Engines
This weekend I was back home celebrating Father’s Day with my family. My father’s a physician and he was casually scouring through medical studies and literature on the relationship between exercise capacity and cardiovascular risk as one does on a Saturday afternoon. Like many doctors, nurses, and pharmacologists, my father was researching articles on UpToDate.
UpToDate’s an app that’s one of the most trusted repositories for recent and relevant clinical information within the broader medical community. I spend most of my day building search software so naturally, I was curious about how he uses it and what he thinks of it.
After watching him use UpToDate I’m convinced that this world needs domain-specific semantic search engines.
My father was trying to search for a specific statistic - “percent improvement in survival rate for a 1 unit increase in METs?” (METs or metabolic equivalents are a measurement for exercise capacity and resting oxygen uptake in a sitting position). He knew he had seen the statistic in an article before but couldn’t remember the specific article and what section had the statistic. UpToDate’s search returns links to entire articles which can be long and dense.
It took him over 25 min to find the statistic he was looking for.
He had access to curated information he trusted, but he still couldn’t get the value from it quickly because the search couldn’t return the actual statistic only the document that might contain it. I asked, “Why not just Google it?” He responded that while Google might respond with a statistic he’d have no idea which articles the Google search algorithm pulls from.
Can Google automatically rank and curate the most relevant web pages and documents and have the same quality as thousands of volunteer community experts in a nuanced domain such as medicine? Can Google overcome not having access to non-public documents while also dealing with spam and clickbait?
Maybe but I don’t think those are the right questions to ask.
The question should be how do we help people quickly get value from the knowledge repositories they already trust. This can be UpToDate, community forums, Discord chats, a collection of PDFs, books, etc in any domain. High quality and trust are subjective and depend on the individual, company, or industry. But a universal truth is that at the end of the day people want answers from their knowledge quickly.
Luckily, the Google-like ability to ask questions of your knowledge and extract the answer or the most relevant paragraph from a large corpus of text is accessible outside of Google and available to augment nearly any text search application.
Case in point I fed some UpToDate articles and the statistic question my father had to GPT-3 and out came the answer “For each 1 MET increase in exercise capacity, there was a 12 percent improvement in survival.”
His reaction could be pretty much summarized as