
Overview
Overview
Last Updated (Friday, 29 February 2008 23:44)
Overview
SOPHIA the new paradigm in enterprise level search
SOPHIA Search heralds a new generation of search and retrieval technologies that supersedes legacy ideas that rely on outdated approaches such as “key word matching” to determine relevancy. Instead SOPHIA searches based on meaning giving much more insightful and relevant results.
The problem with conventional approaches
The fundamental flaws with “conventional” approaches to search is that they all make two basic assumptions:-
1) they assume that we always know what we are looking for – this is a flawed premise as it is often not until we see an example of what we need that we finally realise exactly what it is we are searching for.
2) they assume that we can form good descriptors to describe our information needs. In reality we are particularly bad at this. Ask yourself how often do you rephrase a query to try and improve your search results? Research has shown the average is 5 . The reason we need to do this is because we are poor at creating meaningful queries that adequately express our information needs. If a document doesn’t contain the particular term we used in the query it isn’t returned and we cannot retrieve it even if we were to wade through every document on the result list.
As a result the end product is a long list of topically diverse documents, many of which are often irrelevant, that are tied together simply by the fact they contain 1 or more of the query terms we used. Just because they have 1 or 2 words in common does not mean they are relevant to our needs. What does a document on travel tips when visiting the Island of Java have in common with web programming? The answer is very little yet this is typical of the types of topically diverse lists we get from commonly used search engines. The onus is now on the user to wade through these lists trying to identify what information is relevant to keep and what information is irrelevant to discard. We have all experienced this and can identify with this problem.
At SOPHIA we believe such techniques are no longer an acceptable solution in today’s marketplace. In recognition of this some search companies have tried to improve search technology through browsing techniques which present us with many options (often based on extracted meta-data) to filter our initial results, thereby reducing the size of the space we need to sift through. Such approaches, are fine if we know what we are looking for. For example, is the year of publication 2004, 2005 or 2006?; was the author’s name Smith, Jones or Adams?; But if we don’t know this information then such techniques are of little value to us when searching.
One other problem that is associated with searching is worth mentioning at this point and it is specifically related to the second flawed assumption which conventional approaches adhere to. When writing, authors will use different terms to express similar ideas. For example, reporters at a football game will write about the game in different ways using different terminology. When searching unless we know all these terms in advance (which is impossible) then we are inevitably going to miss retrieving important information relating to the game. Conventional approaches create & use taxonomies and ontologies to overcome this problem but these are very expensive and manually time consuming to create and maintain over time. SOPHIA provides an intelligent solution that overcomes all these problems.
The SOPHIA Solution
SOPHIA solves these problems in a very cleaver and intuitive way. Instead of returning results because they simply contain words in common with the query, it organises documents into thematically relevant groups based on their content. Therefore you can see that there is a theme related to Java as a country and another for Java as a programming language (and another for coffee if that information is in your dataset also!). In other words SOPHIA understands the different meanings or contexts that exist within your information and presents these to you, to enable you to make informed, focused decisions depending on what your current information needs are. This is very different to, and much more sophisticated and useful than, the “state of the art” browsing and filtering approaches mentioned previously. This is because SOPHIA presents us with semantically rich meaningful themes that you can relate to, as opposed to obscure meta-data that you have limited knowledge about.
By presenting us with themes related to our query SOPHIA negates the need to sift through irrelevant information (we can ignore themes that are not relevant) thereby increasing the speed that we find useful information.
One of the most important aspects of SOPHIA is that it enables you to understand the meaning of your retrieved information in the correct context thereby ensuring you interpret and understand it correctly. Its context is determined by your information collection as a whole and therefore is specific to your direct needs. The same information, if it were present in another person’s data collection, would have a very different context and meaning. Conventional search engines do not have the ability to extract contextual meaning from your information.
SOPHIA also uncovers the implicit semantic links and complex interconnections that exist among documents that are topically similar. Therefore it understands that if you find one document interesting you will be sure to want to read other ones because they are so similar in terms of their semantic context. In this way SOPHIA can recommend documents to you even if they do not contain any of your query terms. In this way it ameliorates the problem of using poor terms as a query and also overcomes the issue of different authors using different terms to express similar ideas.
SOPHIA is therefore not simply a search technology but it builds meaning from your information and helps you understand it.
SOPHIA also eliminates the artificial boundaries imposed on content by engineered background knowledge structures such as taxonomies, instead it automatically discovers, reveals and presents the inherent naturally occurring thematic structure that exists within a corpus. Thus by functioning thematically, SOPHIA understands the context and intent of a user when submitting a query, thereby delivering much more targeted and personally useful information.