Treasure Hunting: Digging for Gold with Enterprise Search Engines

Treasure hunting is a lot like finding the information you need to make an important business decision. When your island is a large mass of corporate documents, you're going to need help digging through various document types, formats and languages.


21 August 2009

Lately I've been reading about the Money Pit of Oak Island off the southern shore of Nova Scotia. For more than 200 years, treasure seekers have been trying to access the ancient tunnels and caverns that lie hundreds of feet below the island's surface. Every tool and technique known to man has been used to search the tiny island for treasure, from oil drilling rigs to sonar and massive excavations to cofferdams.

Treasure hunting is a lot like finding the information you need to make an important business decision. When your island is a large mass of corporate documents, you're going to need help digging through various document types, formats and languages. Think of all the uncharted volumes that professionals had to dig through just a decade ago, when entire rooms were filled with boxes of paper, and teams of lawyers and accountants sifted through them manually!

Today, of course, we have enterprise search engines to help us explore the depths of business information. Just like with treasure hunting, the better your tools are, the faster and easier the hunt will be. Popular web sites like Google, Amazon and eBay have set the standard for online searching and have made it much easier to find exactly what we're looking for. Internet search engines like these are invaluable because their spiders index the majority of public content on the internet. This places much of the world's knowledge at our fingertips.

Enterprise search engines are typically far less thorough, indexing only a portion of each document (typically the first 10k tokens). Developers do this so that the search engine can ingest documents more quickly, forming a smaller index and therefore improving query performance. The justification behind this is that users generally search for the "essence" of a document, which is typically established in a document's first few thousand words. This is the equivalent of a developer giving you a shovel for your work on the premise that most pirates were lazy and buried their treasure only a few feet below the surface. The problem here is that you'll never find the lost treasure of the Knights Templar this way (or anything else hidden more than a few feet below the surface for that matter).

Treasure MapSo what should we look for when choosing a treasure-hunting tool? Beyond thoroughness, the tool should provide a way to drill down into initial search results. It should allow you to narrow your existing search by categorizing or classifying the results you have. You can methodically dig your way down towards "treasure" if you brace the walls of your mining shaft without having to start at the top again. You can also avoid being buried by all that information you're not looking for.

In the consumer world, websites like Amazon and eBay provide faceted navigation to help you hone in on your treasure. Faceted navigation aggregates search results according to various facets, or classifications. Metadata from each result is used to group the results along with a count. You can click each grouping to restrict the results to those matching that particular facet value. If you search Amazon for "Oak Island" you'll see faceted navigation at the left for the "Department" meta-data. In this case, you can click to drill into 2,601 results from the "Books" department, 3 from the "DVD" department or 230 from "Home & Garden."

Facet navigation is a fun and efficient way to learn more about your data, enabling you to browse information within a category. Intralinks provides faceted navigation and also uses it to power its "type ahead" feature. A user can type a couple of letters into the search field and it will return a list of facet values. A user can then type additional letters in to narrow this list down further, browsing document titles with keystrokes alone.

When evaluating an online document management solution, find out how exhaustive its indexing is and whether the provider offers faceted navigation. It's one thing to store your information efficiently, but it's another to find it again after it's buried more than a few folders deep. There are many other features to look for when considering a search engine, and I hope to cover them in future blog posts. In the meantime, don't settle for shovel when your competition is using an oil drilling rig.

By the way, they haven't found the treasure on Oak Island (yet), but if you choose your search solution wisely I'm sure you'll get better results. Happy Hunting...