August 23, 2023

The Little Search Engine That Couldn't

A couple of ex-Googlers set out to create the search engine of the future. They built something faster, simpler, and ad-free. So how come you’ve never heard of Neeva?

Sridhar Ramaswamy didn’t leave Google to build another search engine. At least not at first. At the close of his 15-year tenure at Google, Ramaswamy was running the company’s entire advertising division, overseeing more than 10,000 people — he knew better than most exactly how much work it took to do search well.

You almost can’t overstate how dominant Google is in search. Most studies put Google at about 90 percent of the global search market, and that number has been steadily climbing for 20 years. Google is the default search engine in almost every browser, on almost every device. We don’t search the internet; we Google it. Bing and Yahoo are the second and third largest players, and when was the last time you Binged or Yahooed anything? Google has spent its enormous political, engineering, and financial capital to keep it that way.

But what Ramaswamy also knew better than most were all the things Google couldn’t or wouldn’t do to its search engine. With billions of users and hundreds of billions of dollars to protect, Google was unlikely to ever explore huge changes to its results page, new business models, or any kind of products that might make users search less. (Ramaswamy had actually tested a feature called Google Contributor that let people pay for an ad-free experience on some sites. It didn’t work.) There was an opportunity here to make something that Google simply couldn’t or wouldn’t. So when he left the company in 2018, Ramaswamy and Vivek Raghunathan — a longtime Google and YouTube executive — co-founded a company called Neeva to build the search engine of the future.

The road was rocky, but the team at Neeva ended up building a search engine they were proud of, a search engine that came close to beating Google both by Neeva’s internal metrics and in user studies. People who tried it liked it, and Neeva had a long road map filled with ideas on how to make search even better. A little more time, and they may very well have built the future of search. But only four years in, Neeva shut down.

In a way, the brief flicker of Neeva’s existence tells everything you need to know about the last 20 years of search-engine supremacy. Building a search engine is hard. Building one better than Google is even harder. But if you want to beat Google, a better search engine is only the very beginning. And it only gets harder from there.

A search engine is both an enormously complex thing and a fairly simple idea.

All a search engine is doing, really, is compiling a database of webpages — known as the “search index” — then looking through that database every time you issue a query and serving the best and most relevant set of those pages. That’s the whole job.

At every tiny step of that journey, though, there are huge complications that require critical and complex tradeoffs. Most of them boil down to two things: time and money.

Even if you could hypothetically build a constantly updating database of all of the untold billions of pages on the internet, the storage and bandwidth costs alone would bankrupt practically any company on the planet. And that’s not even counting the cost of searching that database millions or billions of times a day. Add in the fact that every millisecond matters — Google still advertises how long every query took at the top of your results — and you don’t have time to look over the whole database, anyway.

Building your own search engine thus starts with a surprisingly philosophical question: what makes a webpage good? You have to decide what counts as reasonable disagreement and what’s just misinformation. You have to figure out how many ads are too many ads. Sites clearly written by AI and rife with SEO garbage: bad. Recipe blogs written by a person and rife with SEO garbage: mostly fine. Porn? Sometimes okay, sometimes not.

Once you’ve had all these discussions and set your boundaries, you might identify, say, a few thousand domains that you definitely want in your search engine. You’ll include news sites from CNN to Breitbart, popular discussion boards like Reddit and Stack Overflow and Twitter, useful services like Wikipedia and Craigslist, sprawling platforms like YouTube and Amazon, and all the best recipe / sports / shopping / everything else sites on the web. Sometimes, you can partner with those sites to get that data in a structured way without having to look at each page individually; lots of big platforms make this easy and occasionally even free.

Then it’s time to turn the spiders loose. These are bots that grab the content on a given webpage, then find and follow every link on the page, index all those pages, find and follow every link, index, find, follow. (They’re called spiders because they crawl the web — get it?) Every time the spider lands on a page, it evaluates it against the criteria you set for a good page. Anything that passes gets downloaded onto servers somewhere, and your search index begins to grow.

Spiders aren’t welcome everywhere, though. Every time a crawler opens a webpage, the provider incurs a bandwidth cost; now imagine a search engine that is trying to load and save every single page on your website, once a second, just to make sure they’re up to date. The bill adds up.

So most sites have a file called robots.txt that defines which bots can and cannot access their content and which URLs they’re allowed to crawl. Search engines don’t technically have to respect the wishes of robots.txt, but doing so is part of the fabric and culture of the web. Nearly all sites allow Google and Bing because discoverability outweighs the bandwidth costs. Many will block specific providers, such as shopping sites that don’t want Amazon crawling and analyzing their websites. Others will set blanket rules: nobody in but Google and Bing.

It doesn’t take long for your crawlers to come back with a pretty broad snapshot of the internet. As the Neeva team was in the midst of its transition away from Bing, its spiders were crawling about 200 million URLs a day.

Next, the job is to rank all those pages, in order, for every single query your search engine might get. You might sort your pages by topics, into smaller and more searchable indexes rather than a single giant behemoth: local results go with local results, shopping with shopping, news with news. You’ll use a lot of machine learning to glean the topics and content of a given page, plus a lot of human help. You’ll bring in teams of raters, show them a query and a result, and ask them to rate from zero to 10 how good a result it really is. Sometimes it’ll be obvious: if someone searches “Facebook” and the first result isn’t, something is clearly wrong. But most times, you’re merging the ratings from lots of inputs, feeding them back into your index and your topic model, and repeating the process all over again.

All this is really only half the problem, too. You have to simultaneously improve what’s known as “query understanding” so that you know people who search for “The Rock” and “Dwayne Johnson” are looking for the same thing, but those who search for “the rock” and “rock” probably are not. You’ll end up with a huge library of synonyms and similarities and ways to rewrite queries to be more searchable. But Google likes to say that every day, 15 percent of searches are brand new, and so you’ll forever be learning new things about how people look for things online.

You’ll launch to the public after a while and start getting even more data on what people click on and care about. (A clicked link, followed by no more immediate searches and clicked links, is the best signal in the biz.) The more they click, the more you know about what they’re actually looking for.

To run a search engine is to constantly triangulate between speed, cost, and quality. You could search the whole database every time someone types “YouTube” and hits enter, but that search will take too long and use too much bandwidth and storage. You could have a database the size of the internet, but the storage costs would bankrupt practically any company on the planet — besides being far too expensive to store and too slow to search. You could limit yourself to the 100 most popular sites on the web, but that’s not much use to anyone. Websites change all the time, too, so your crawlers and ranking systems have to be constantly adapting.

It’s hard and expensive to build a search engine from scratch. That’s why many don’t — they license Bing’s data for between $10 and $25 per 1,000 transactions, add their own features and interface, and call it a day. That’s what DuckDuckGo, Yahoo, and most other smaller search engines do because Bing is pretty good and managing your own search system is a huge amount of work. It’s what Neeva did, too, at the start.

But Neeva had so many ideas about how to overhaul search that it ultimately decided it needed to control the underlying data, too. “Faster search, rich previews, preferred providers, personal search, all hit walls,” Raghunathan says. The links that came from Bing’s API didn’t allow for these extra features, and so Neeva couldn’t build them. If Neeva wanted to be a better search engine, at some point, it was going to have to build its very own better search engine.

After two years of building, training, refining, re-training, and re-refining, Neeva’s search engine was finally powered entirely by its own technology. To be clear, Neeva didn’t yet think it had built an unambiguously better search engine: at one point, the company took 500 or so queries of different types, asked human raters to compare the results, and discovered that Google still came out slightly ahead. But Neeva was getting close and was confident that it had a big lead in user experience.

Neeva’s plan started from a single insight: Google’s business model was the problem. The advertising model, Ramaswamy thought, would not produce good content in the long term.

Think about it — if a search engine works really well, you’re only searching once (and being served ads once). The ads, too, dilute the quality of a search. When you type something into Google, you’re looking for something. Google’s first order of business is to show you something someone else wants you to see; its second order of business is to show you what you want.

Making a better search engine meant changing the incentives. Ramaswamy figured that if you weren’t focused on showing as many ads as possible, you could put the user experience first. You wouldn’t need to keep people typing queries, and you wouldn’t need to collect user data for advertisers. You could just help people get where they’re going and get out of the way.

The Neeva team built shopping pages with bigger images and helpful comparison information. They prioritized human-created results from places like Reddit and Quora. Sports searches became beautiful, full-screen scoreboards. They made it so that if you were searching for “Brad Pitt IMDB” or “WhatsApp web,” Neeva’s autocomplete would take you right to the website without landing on a results page at all. Neeva was clean and simple, and early users said they liked not being tricked into looking at ads.

Over the two years it took Neeva to build its own search index, it also continued work on its browser for mobile devices and began investing heavily in AI. A side effect of building your own search index is that you’ve also just collected a hugely useful set of training data for large language models. Neeva was among the first companies to launch an AI search companion, known as NeevaAI, that would summarize search results and sometimes attempt to answer your question right at the top of the page.

But it’s one thing to build a good product; it’s entirely something else to get users to try it — especially if they have to quit the easiest and most ingrained thing on the internet to do so.

It’s a long-stated and well-earned cliche in the tech industry that people don’t change their default settings. Whether it’s privacy controls, system features, or apps, there’s nothing more powerful than whatever’s already there. And in many cases, the companies that control those default slots will do almost anything to stay there.

“Solving the default use case is one of the biggest hurdles we have,” Ramaswamy told me early on. “People forget that Google’s success was not a result of only having a better product. There were an incredible number of shrewd distribution decisions made to make that happen.”

Google reportedly pays Apple as much as $15 billion a year to be the default search engine in Apple’s Safari browser on various devices. Google also pays Mozilla to be the primary search engine in the Firefox browser — reported to be upwards of $450 million a year. It has similar deals with other device makers and browser developers, even with wireless carriers. Samsung briefly explored ending its deal with Google in 2023 but decided against it for various reasons, including “the impact on its wide-ranging business relations with Google,” The Wall Street Journal reported.

Google’s real advantage is its other products. Android is the most popular mobile operating system on Earth, commanding about 78 percent market share. Chrome is the most popular browser, at about 62 percent. Google is the near-impenetrable default search engine on both platforms.

For years, any company that wanted to make a phone or tablet that could run Google apps like Maps and YouTube had to sign a contract known as the Mobile Application Distribution Agreement. (In practice, this covers pretty much all Android phones.) The MADA governed how Google’s apps were to be loaded and shown on any covered Android device, and it always gave Search pride of place.

“Google Phone-top Search must be set as the default search provider for all Web search access points on the Device” unless Google gave explicit approval otherwise, said one agreement with HTC that was entered into evidence in Oracle’s 2010 lawsuit against the company. HTC was also required to place a search widget no more than one page away from its devices’ homescreen.

“[Former Google CEO] Eric Schmidt said ‘competition is one click away,’” says Josep Pujol, the head of search at Brave, another company building its own search engine from scratch. “But it’s not. It’s one click and $14 billion away.”

This state of affairs has come under serious regulatory scrutiny in recent years. In 2018, the European Commission fined Google €4.34 billion for breaching EU antitrust rules and other examples of what the EC called “illegal restrictions on Android device manufacturers and mobile network operators to cement its dominant position in general internet search.”

Following that ruling, a new screen appears for most users in Europe and the UK when they first set up an Android phone or tablet. “Choose your search provider,” it says before offering a list of available options.

Most of the search engines that made it onto this list — a list, by the way, controlled by Google, which initially charged companies that wanted to appear on it — saw no meaningful increase in users. People trying to get through setup as quickly as possible tend to pick the most familiar option — like the option that already has a 90 percent market share.

It’s difficult to overcome that inertia, even without additional friction. And there’s plenty of that to go around. DuckDuckGo once found that it took 15 taps to switch the default search engine on Android.

Similarly, on iOS, a search engine provider can’t just add itself to Safari’s list of search engine options. If you’re anyone other than the five built-in options — Google, Yahoo, Bing, DuckDuckGo, and Ecosia — the only way to get onto the iPhone is to build your own app. Building a mobile browser, of course, is a huge allocation of resources when you’re a small startup like Neeva. And once you have the browser, you have another problem. Convincing users to switch their default settings is already hard, but on mobile, you also have to convince users to download an app to replace an app they already have.

The process should have been easier on desktops, where there are fewer platform restrictions. Neeva tried to make switching as simple as possible: on a Mac or PC, all a user had to do was install a browser extension, and Neeva would become the default search engine. (The extension also provided tracking protection and other features.) Other search engine providers have tried building their own extensions as well. But users who install these extensions in Chrome get a pop-up asking if they want to “Change back to Google Search?” The “Change it back” button is a bright blue, “Keep it” a dim white.

Early on, Neeva discovered that if it could get a new user to get past that scary pop-up and actually start using the search engine, they were overwhelmingly likely to be still using it three months later. Some users who tried Neeva were even willing to pay a few bucks a month for a saner search experience.

If people went through all the bother of switching, they became converts; the problem was that very few of them managed to make it past the thicket of default settings and redirections. Ramaswamy and his team tried many times to find the thing that would convince users to get through the initial hassle. The privacy-focused pitch worked for a few users but was never going to be a mainstream win. The AI features garnered some buzz, but that faded as Bing and Google and others rolled out similar stuff.

Ultimately, Neeva was a product you had to try to understand. I used it as my primary search engine for a few years and really appreciated things like the redesigned sports score pages and the prioritization of Reddit and other sources. (Also, no ads. Loved that.) But it was hard to explain to others how nice it felt to go straight to a website from the autocomplete window instead of having to run your query or how much better its rich recipe pages were than the infinitely identical links on a Google page. Seeing is believing, and the state of the search market had successfully kept Neeva in the dark.

If anything changes, it’ll probably start with the regulators. Since the EC’s judgment in 2018, the US Justice Department has also sued Google on anti-competitive grounds, alleging that Google’s distribution agreements with device manufacturers and browser developers “foreclose distribution to Google’s search rivals, weakening them as competitive alternatives for consumers and advertisers by denying them scale.”

Google has argued in response that users and partners choose Google because it’s the best product available and that default choices are not exclusionary. “We compete fiercely in a fast-moving and dynamic space, investing billions of dollars in research and development and making thousands of quality improvements every year to ensure we’re delivering the most helpful results, free to everyone,” says Ned Adriance, Google’s policy communications manager. “Like countless other businesses, we pay to promote our services, just as a cereal brand might pay a supermarket to stock its products at the end of a row or on a shelf at eye level. But in each case, consumers can and do easily access alternatives if that’s what they want.”

If Google’s default dominance does come undone, competitors like DuckDuckGo and Brave think they’ll grow fast. Many of those competitors think there’s nothing to do but wait. “If we are able to survive long enough, there will be a tipping point where the distribution of Google will break or be broken,” Brave’s Pujol says. “Whenever this condition happens, we must be ready.”

Neeva couldn’t afford to wait. In April of 2023, the company announced it was closing down its search engine for good. As the economy soured and investment dollars dried up, Ramaswamy and his team decided that “there is no longer a path towards creating a sustainable business in consumer search.” This is, of course, not strictly true: Google’s consumer search business generated about $160 billion in revenue last year. The problem for Neeva and every other would-be competitor is that there’s simply no room left for anyone else. (Neeva was ultimately acquired by the business software giant Snowflake, pivoting to AI entirely.)

Neeva had done the hard work. It was running an AI product, a full-stack search engine, and a privacy-first browser, all on a startup’s budget. But it wasn’t enough.

Because even if you make every correct decision, take no shortcuts, nail the criteria, perfect the index, and build the best search engine ever created, it probably wouldn’t matter. Right now, at least, you still can’t beat Google.

Copyright 2023 Vox Media, LLC. All rights reserved. From By David Pierce.

To view all articles, check out the Internet Travel Monitor Archive