Ecommerce Site Search: How it Works and Why It’s Not Good Enough
10 Minute Read
Online shopping is becoming more difficult for shoppers as more products and options are available than ever before. People now rely on search more than they did years ago, and have higher expectations for how it will work. Open source search technology designed decades ago has flooded the market with ineffective options. What are the drawbacks of this dated technology and what methods exist that are superior?
Method 1: Basic Keyword Matching
Keyword matching is the oldest and most basic form of search and retrieval available to retailers. Using this method, the search engine scans an index for matches to the keywords in a user’s query. Anything it finds is surfaced in the results.
The solutions that use this method are abundant, and many of them are extremely affordable and simple to implement due to the simplicity and age of this technology.
This method produces a large number of matches for each query, but there are major drawbacks.
- Poor ranking of results (irrelevant products at or near the top)
- Mix of relevant and irrelevant products
- Complex user navigation
- Manual data optimization requirements
Drawback 1: Poor ranking of product results
Using the example of a search for “red nike running shoes”, this method would scan the index for matches to each keyword in the query.
It would return Nike running shoes, but generally also returns Nike branded items (backpacks, t-shirts, hats, etc.), shoes from other brands, and other assorted items (laces, shoe cleaner, etc.)
Typically, search engines using this method rank products based on the total number of times a keyword match happens within the product data. So products that use the words “Nike” “running” and “shoes” the highest number of times would appear at the top, products with only 1 match, would appear at the bottom, and products with 0 matches would not appear at all.
The context matters much more than the number of occurrences. A shoe isn’t more of a shoe when the word “shoe” appears more frequently in the product data.
But laces, insoles, or shoe cleaner for running shoes may use those keywords more times than the most relevant products, and thus appear above them.
This does actually occur quite often as product data is typically not written to be friendly to this search method. It’s generally more important for retailers to ensure that this data is optimized for Google, and to be understandable and engaging to shoppers.
If a retailer used the default product data from Nike in this example (see above), the words “shoe” and “run” only appear once, and “Nike” only appears twice in the visible product data. Chances are fairly high that irrelevant products may have a similar or even higher count of the correct keywords. For example, an Adidas running shoe using the words run and shoe two or three times each would outrank this Nike shoe.
Drawback 2: Mix of relevant and irrelevant products
In this example, an area rug was found in a search for “yellow table lamp”. Obviously, this is not a table lamp. It’s not any kind of lamp, and none of the words in the search query appear in the product title. The reason it appears here is because the words “table lamp” appear in the product’s description. Complimentary items and accessories often include more broad keywords in their title and description.
While relevant products were also available in the search results, this mix of relevant and irrelevant products makes it difficult for the shopper to understand whether they have seen all the yellow table lamps or not. Should they keep scrolling to find more table lamps, or will that be a waste of time?
Drawback 3: Complex user navigation
Because of the fact that so many products are deemed “matches” to the user’s query, faceting and filtering options end up with a lot of noise. Many times, dozens of facet groups are generated for simple searches, making navigation very confusing and difficult.
And since accessories and other irrelevant products are mixed in, sorting options often become useless. Sorting by price will usually bring accessories to the top, making it hard to find less expensive versions of the product the shopper wants.
In this example, when a user searches for “macbook laptop” on BestBuy.com, the wide range of products results in cluttered faceting with options that don’t match their search. In a best case scenario, sorting by price low to high would show the least expensive Apple laptops first. Instead, they see pages and pages of accessories before they get to the laptops.
Drawback 4: Manual data optimization
What this ultimately means is that retailers need to manually optimize their product data in order for the search engine to work properly. By adding “running shoes” to the title of the Nike shoes mentioned earlier, a retailer could make it more likely that they would appear at the top. Inversely, removing the keywords “macbook” and “laptop” from electronic accessories would remove them from these results.
While this can declutter search results, this is extremely time-consuming and creates new problems. For example, if an electronics retailer were to remove mentions of “macbook” from all of their USB-C cables, this would make it more difficult for people searching for “macbook cable” to find what they need.
Method 2: Keyword matching with custom weighting
The current most popular method for returning results in retail employs the use of keyword matching along with a couple of additional layers of intelligence. By requiring that a certain percentage of keywords are matched between the query and the product data, some products are eliminated. Products are then ranked based on which data fields contain keyword matches.
For example, having matching keywords in the product title is considered to be far more important than having matching keywords in the description.
The area rug mentioned would still appear in results for “table lamp”, but would appear below any products with those keywords in the title. This is improves search result rankings dramatically, but doesn’t entirely solve the problem.
Certain related but irrelevant products may no longer show at the top of search results since primary keywords do not appear in the title. However, this method still has major drawbacks:
- Unpredictably mixes relevant and irrelevant products
- Requires manual data optimization
- Data optimization can create new relevancy problems
Drawback 1: Unpredictably mixes relevant and irrelevant products together
Many products, particularly accessories, place important keywords in their titles. For example, in an effort to increase exposure, headphone, controllers, and cables show up in searches for “Nintendo Switch” on BestBuy.com due to the fact that “Nintendo Switch” has purposefully been placed in the titles of products that are compatible.
So while this method does increase the relevance of search results some of the time, it’s definitely not a guarantee.
We used this method for years, but found that it was unsatisfactory for many of our clients despite the many custom modifications we made for individual stores.
Drawback 2: Requires product data optimization
In order for this method to work properly on most stores, product data needs to be manipulated in order to prevent irrelevant products from being mixed with relevant products. In the example shown above, the word “Nintendo Switch” would need to be removed from the product title.
Additional “tweaks” could be made, such as using additional instances of the words “nintendo switch” to the product data of the console and first-party accessories to make them appear above third-party accessories, cables, and games. Managing this can become very complicated, very quickly.
Drawback 3: Data optimization can create new problems
If you do decide to manually optimize your data in order to improve results, this can create new problems. For example, if you were to weight the product title field more heavily, relevant products without the correct keywords in the title may appear below accessories. For example, many apparel retailers know that the product type (i.e. shoes, jeans, etc.) don’t always appear in the product title. Nike shoes often don’t contain the word “shoes” in their name. So you will have to add this manually. If you’re adding new products to your store consistently, data optimization will have to be part of the initial merchandising process.
Method 3: Product awareness
The final method is the newest approach to relevant ecommerce search. It uses custom weighted keyword matching (method 2) as a baseline, but improves upon it with an additional layer that aims to understand product data and searches on a deeper level.
Benefit 1: Greatly improved relevancy
In addition keyword matching, this method scans and interprets product data and search queries.
In this example, it would read the title of the Jay Teal Area Rug (pictured above) and understand that it’s product type is “area rug”. Search queries are also read in a similar manner so that when someone searches for “yellow table lamp”, the search engine understands that they want a table lamp.
Combining the language understanding in both the product data and the search query means that irrelevant products are eliminated from the search results or pushed to the very bottom. There are times when anomalous product data can create problems, but with properly formatted data’, no additional management is required in order for this method to work properly.
In cases where product titles are not formatted properly, such as when the type of product is not in the title, the engine falls back to keyword matching with custom weighting. During our integration process, we do look for this and recommend changes to the data structure in order to improve relevancy. In other cases, we will use other data fields or create custom data fields in order to make this method work best.
Benefit 2: Greatly improved merchandising
Since search engines using this method are able to determine which products match the user’s query, merchandising is greatly improved.
In cases where relevant and irrelevant products are mixed by other methods, such as when accessories for laptops are mixed with the laptops themselves, merchandising is not really feasible. Automatically merchandising that page so that the best rated products are at the top may result in the top products being cables, laptop bags, or wireless mice instead of laptops.
Since this method is able to determine which products match the product type the user wants, it isxable to separate out the accessories, making it possible to merchandise based on rules without the risk of making the page completely irrelevant.
This can be applied to individual search queries, category pages, or even globally across the entire store.