Some people really love using big words. I am not sure what drives this passion, whether it is genuinely the way their mind operates or if they are just trying to impress. I always suspect it is the latter. For me I prefer simple language that everyone can understand. That said, explaining “complex things simply” is a very difficult thing to do. John Bohannon famously gave a TED talk on Dancing your PhD where he proposes that in order to really be able to explain your PhD thesis you should be able to boiled it down to really simple ideas that can be portrayed through dance movements! Quite a challenge but I strongly recommend you watch the talk.
I have firsthand experience of this challenge in Sophia where we have developed software that can read, understand the meaning within written text (such as web pages and word files) and uncover unknown relationships that exist. The ideas behind the software are based on 2 things 1) Semiotics 2) intertextuality. Huuhhhh?? These are not common words most people can identify with so let me try to explain.
The science behind Semiotics was originally developed to explain how we understand the meaning of signs and symbols. The meaning of any symbol depends on its setting. For example, the thumbs up gesture in most western cultures is a positive sign and means good or well done. However, you don’t want to do this gesture in some middle eastern countries where it is extremely insulting. Same sign but different setting equals different meaning. Linguists then took these principles and applied them to language to explain how meaning is assigned to words and how, depending on its context within a sentence, the same word can take on different meanings. Java is a good example of this. It can be an island, a beverage, or a programming language depending on the setting. Sophia draws on these concepts in its complex algorithms to understand individual word meaning within documents.
What is intertextuality? Intertextuality is all about understanding the meaning of a text at a higher (document) level. To truly understand the meaning of a text you need to have an understanding of other texts that are similar, or that are either directly or indirectly referred to by it. For example research papers often reference other published papers that serve as background or context for the new study. To fully understand the meaning of the new paper you need to be familiar with the older studies.
Intertextuality is everywhere. A great example is seen in an episode of the Simpsons where they did a spoof of Hitchcock’s movie Psycho. They did a reenactment of the famous shower scene with Homer replacing Janet Leigh. What has this to do with intertextuality? Well, if you hadn’t seen the original movie the humor of the entire episode would have been lost on you. In order to understand the meaning and the humor of the episode you needed to have seen the movie. This is intertextuality. Now how do I explain this in dance?
As many of you know, we launched Sophia For Advertising at Ad:tech SF 2013. ad:tech San Francisco 2013 may be over, but the conference session videos are now available online! Revisit your favorite sessions or view the sessions you weren't able to attend. All of the sessions are at your disposal, complete with PowerPoint presentations.
Accessing the videos is easy.
Enjoy the presentations, and let us know what you thought of them.
Here is a new video on Sophia, its investment, and our friends at the Northern Ireland Science Park in lovely Belfast, Northern Ireland, featuring David Patterson and Stephen Houston.
I'll let the video speak for itself.
JOIN US for the Sophia for Advertising Webinar on May 8th. Register Today.
This webinar will introduce you to Sophia and its latest product, Sophia For Advertising. Sophia’s advertising technology provides a new approach to building target audiences, by understanding the content that users are interested in.
Ad Networks can then blend Sophia’s content-based categories with their existing predictive analytics to further enhance audience relevance. No cookies required.
Publisher(s) networks can deliver high high-performance, targeted audiences, and have superior control of their inventory.
This webinar will give you an overview of the product, its features, benefits and how it works.
Title: Sophia For Advertising: An Introduction
Date: Wednesday, May 8, 2013
Time: 9:00 AM - 10:00 AM PDT
After registering you will receive a confirmation email containing information about joining the Webinar.
Required: Windows® 7, Vista, XP or 2003 Server
Required: Mac OS® X 10.6 or newer
Required: iPhone®, iPad®, Android™ phone or Android tablet
Space is limited.
Who is WorldIrish?
WorldIrish is an online social network for anyone that has an affinity to Ireland or Irishness. It is a digital platform that allows its users to discover and engage, create and share with each other.
Founded by Riverdance’s John McColgan, WorldIrish attracts everyone from recent emigrants to people with Irish ancestry, from those who appreciate the Irish culture and way of life to those who simply want to learn more about Ireland. All of them find something of interest on WorldIrish. For more information, see http://www.worldirish.com.
The Hyper-Personalization Challenge: Delivering relevancy
Members of WorldIrish each have different interests, views and opinions. They can’t be treated as a group but as individuals. Sophia helps WorldIrish serve each member with connections and content that is relevant to them. It enables WorldIrish to offer a hyper-personalized service to each member.
In the words of WorldIrish CEO Michael Branagan, “As part of our business proposition we recognized early that delivering relevancy to a wide audience was going to be a cornerstone of our offering. In this regard we were looking for an edge in differentiating us from other social media sites.”
They chose Sophia’s technology to help them achieve this.
How does Sophia solve the broad problem?
WorldIrish looked to Sophia and its Contextual Discovery Platform to help them with visibility into the content on their web site, in terms of categorizing the content, grouping it into related topics, and then helping them determine the most popular content. The implementation of Sophia has helped WorldIrish to deliver a much richer experience than they could have provided with traditional technology.
To read the rest case study, click below:
Sophia fans: Thanks for following our blog. We have a bunch of exciting announcements coming up, and we want to keep you up to date. We're doing everything we can on social media, so please follow us on:
Be the first to get our latest news. Sign up today.
(ed. note: this is the first in a series of posts describing our technology.)
Textual analytics is the process of software determining the content of a document so that human “consumers” can more readily find and/or work with that content. Textual analytics software takes many forms. Most people are familiar with the generic phrase search engine, for example, which is a very specific type of analytics software designed to allow users to find information within a larger document set.
Regardless of its form or function, any analytics software must ingest the textual portion of the content within a collection and understand it. Otherwise it provides no useful information to the user. This ingestion process is called indexing, and it is similar to the activity a human would have to go through in order to understand textual content. It would just take humans a much longer time to accomplish, with far more inconsistencies and errors in reading and interpreting it.
Many different indexing approaches exist. Some of the first analytics tools used a more simplistic form of indexing—basically, determining all the words used in all the texts in the set—but these tools only allowed for simple keyword (Boolean) searching. A keyword search allows a user to specify a word or group of words (known as the query) for the software to compare against all the content in the dataset. If a text contains the single- or multiple-word query, then it is a “hit” and is returned in a flat list of all texts matching the query. Ranking in that flat result list is usually based on the number of times a query appears in a text—the higher the number, the greater the ranking or implied relevance. This is largely what Google does.
Keyword indexing and querying served (and still serves) a useful purpose; however, as users’ needs became more sophisticated, analytics technologies had to evolve past keyword-based indexing in order to keep pace. The problem is that keyword-based analytics does not necessarily guarantee the true relevancy of the results. It only guarantees a match between the query and any resultant text. Just because a text contains the term “bank” does not mean that it is relevant to a user who is looking for information about financial institutions. The term “bank” has many meanings besides referring to a place where a person can cash a check. Therefore, more intelligent indexing software is required in order to provide more relevant information to the user.
Next-generation Analytics Tools
The trade-off is that these next-generation analytics tools—which provide a more sophisticated analysis of documents—require more time and hardware resources to perform the indexing process. Just how much more time or hardware depends on the particular proprietary indexing technology in question. These proprietary technologies can be based in linguistics, statistics, mathematics, or some combination of the three in order to determine the meaning within content. Regardless of approach, advanced analytics software is only useful if:
- It can accurately understand textual content, and ultimately meaning and relevance with little or no human intervention or upkeep.
- It can index content quickly enough to keep up with the explosive growth of data populations.
- It can be used to solve real data-centric problems.
Sophia Contextual Discovery Platform
Sophia’s research and development has resulted in the Sophia Contextual Discovery Platform (CDP), which is an advanced analytics technology designed to meet the above conditions using a novel approach to indexing text-based content. It leverages the science of semiotics in order to determine meaning within texts.
Semiotics is a model of linguistics that explains how humans understand meaning when communicating. It focuses on words in order to determine how meaning is constructed and understood within a text. At the core of semiotics is the premise that a word must be analyzed in light of its context. Meaning is conveyed by words as they relate to other words within a localized context.
Furthermore, our model stresses not only the importance of intra-textual context (the words within a single text that combine to create meaning within that localized context) but also the effect of inter-textual context (that a text’s meaning is affected by other texts within the same dataset). This is called intertextuality and means that Sophia’s CDP can understand meaning and context at both a local (individual text) and global (dataset) level.
Take, for example, the term window. What does it mean? Alone, it has the potential to mean quite a few things. By seeing it within the larger context of other terms around it, its localized meaning becomes apparent:
“Close the popup window before continuing to format the document.”
Clearly, the term window in the above sentence means a framed box generated within a graphical user interface on a computer monitor. This meaning is fully conveyed through the other meaningful context words around it such as popup, format, and document. We know it does not refer to an opening in a building structure, as in the sentence below:
“Ensure that the ventilation slot is open on the window to allow fresh air into the room.”
We are able to distinguish this meaning of window due to the significant context words around it such as ventilation, air, and room.
Over time, though, a community may adopt a new understanding of a word that is not imparted by the localized context:
“Our window is closing rapidly.”
While this word has little localized context to convey meaning, a broader analysis of the dataset and its inter-textual communications can reveal the meaning. Knowing that this is a transmission in a larger discussion between a pilot and ground control during a thunderstorm enables us to determine the meaning as a window of opportunity in which to perform something. Therefore, the inter-textual context of a dataset is also important in semiotics-based textual analysis.
The indexing process involves in-depth analysis (both intra-textual and inter-textual) of the dataset in order to determine semantic relationships and larger thematic elements within it. Each word is examined and analyzed to reveal its semantic meaning in order to understand its context. That information is then used in defining larger conceptual themes. Although all words have some sort of context, not all of them are interesting or useful from a thematic perspective.
Contextually Relevant Information
CDP leverages a patented algorithmic approach to intelligently differentiate among contexts that are interesting and those which are not. This process enables CDP to provide contextually relevant information when a user requests it, regardless of what specific words the user employs to frame that request.
Next-Generation Textual Analytics
While CDP employs statistical and mathematical algorithms to analyze intra-textual and inter-textual contexts, it does not leverage static dictionaries, thesauri, or other linguistic references that must be updated as language changes. CDP derives meaning out of the content itself and never needs language updates or other modes of upkeep. This drastically reduces maintenance costs while still providing the benefits of next-generation text analytics.
And that’s the real value of CDP—providing textual analytics that derives meaning from words and documents in the same way humans do. Because semiotics as a science explores how humans combine words to convey meaning within context, the semiotics-driven indexing technology within CDP approaches text analytics in a more natural, human-like manner. The result is contextually relevant information that makes sense to users, helping them more naturally solve real problems within their organization.
Like this article and want a PDF version? Click below.
Tired of being annoyed by online ads targeted at you that just make no sense? Have you evolved to unconsciously ignore those ad windows that often revoke a response best described as “huh?”? Any of these feelings sound familiar?
- I looked for airline tickets/hotel reservation two days ago, even completed my booking. Why am I still being shown travel ads for days after?
- Why is Facebook targeting me with random ads that have nothing to do with me? Why does Facebook think I’m overweight and looking to lose weight?
- I think they think I’m a gender and sexual orientation different from what I really am!
And the list goes on!
If you’re looking to demystify why these things are happening to you, read on. Take consolation in knowing that you’re not alone. It happens to millions of online users every day. Here are some Ad Targeting methods that are predominant in the industry today:
1) Audience Profiling: Cookie tracking is used to profile audience members based on demographic information such as age, sex, address, marital status, disposable income, etc. The user’s behavior (aka your behavior) is tracked by means of cookies that monitor the sites you visit and products you searched for. This information is used at a later date to display an ad for a product even if it has no relation to your current browsing context. Ad targeting that’s based on user behavior can be creepy.
2) Contextual Targeting: A true contextual targeting system needs to be able to zero in on the interest of the user based on the content that the user is reading. Many methods take a simplistic approach by focusing on the word or keywords in the content. Many times, the true context gets lost in translation. This simplistic approach can often be dangerous and/or hilarious!
I’ve compiled my ‘Mis’-Oscar list for my top five genres of Ads that ‘Mis’-fired right here!
- Mis- Placed
- Tragic (truly!)
Jokes apart, these examples demonstrate that keywords are a poor first approximation of interest or personalization. However, our problems do not end there. Growing concern around privacy (or lack thereof) and Personally Identifiable Information (PII) has led to recent EU legislation that restricts cookie tracking and collection – this is having a big impact on the quality and quantity of data now available for analysis and is impacting the revenue generated. And there’s more. A massive challenge for the mobile advertising sector is most smartphones do not support cookies, so targeting techniques that are based on them are largely irrelevant. This is compounded further by the growing data consumption on smartphones.
We can and must do better. Here’s why: today’s digital consumer has just come to expect more, in every dimension. The overall Digital experience of a user is correlated to:
- Quality of content they are exposed to
- Quality of Ads they are exposed to
These dimensions are driving User Engagement metrics which directly impact Online sales (for retailers, Brand Effectiveness (for Advertisers), and Ad revenue (for Publishers).
So how do we do better? Where the conventional approaches fail is they have no appreciation for what the user is actually interested in at any specific point in time. They may know they live in a fancy house and earn $1M a year but they don’t know if that adds up to interest in buying a Rolex or a Jaguar. Another manner in which these fail is by working off the model that the value of an ad is on a website that specifically caters to the category that the ad represents. For example, the value of a camera ad is not solely on a camera website. It has as much value on a tech review page of a media news site (such as CNET).
Here is where Semantic Targeting plays a ‘Lead Role’. A key difference between a true contextual advertising system and a conventional one is that instead of scanning a page for keywords with bids, the former examines all the words and identifies their meaning in context – e.g. they can understand that the term jaguar refers to a car as opposed to an animal. In this way it can really understand the user’s interests at a point in time and tailor contextually relevant ads as a result.
Semantic targeting technology also distills the emotion and sentiment that is housed in the text. This will be a big step towards avoiding the ‘mis’-fired ads. And as for the privacy and PII concerns, no cookies are needed for a semantic targeting technology. All of this means it is a perfect fit for the Mobile platform or to complement traditional analytic approaches on line.
Did Semantic Targeting just sweep up all the Oscars?
Editor’s Note: Yes it did
We've announced our Series A financing from the folks at Atlantic Bridge:
San Jose, Calif. – March 04, 2013 Sophia Corp, the innovation leader in semantic content analysis, Chaired by industry veteran Chris Horn, today announced that it has closed a $3.7 million Series A funding round. The investment was led by Atlantic Bridge and will be used to increase marketing and sales efforts in North America and to accelerate product roll-out.
For more information, read the press release on our News section.
If I ran an advertising network, I would be really worried. There have been two developments recently which would kill my business. Luckily, there is hope.
The first major development is the EU Cookie legislation, described on the UK information commissioner’s office here. As we have all experienced, this has led to EU websites having to ask users if they want to enable cookies. (Most of these are in a manner that defies basic Human Factors standards, IMHO.) This essentially makes it easy for consumers to opt out of being tracked.
Now, I am all for privacy, but I have to make a living. This will essentially greatly reduce the effectiveness of most advertising, since without cookies, advertising networks cannot build effective user models. (Actually, there is way, bit hold that thought.) This will mean that most advertisers will see a more pronounced decrease in Clickthrough Ratio that they are already seeing.
The second major development is the layest Firefox patch to allow the blocking of 3rd party cookies by default. The IAB's chief lawyer went on to say in CNET that this was the equivalent to a "Nuclear First Strike":
This move would make it all but impossible to track users and to build models of relevant content for them. When the IAB’s lawyer gets involved, you know it’s serious.
However, there is hope. Sophia takes a different approach to contextual advertising. Rather than attempting to build models of people on the internet, we look at the "meaning" of the content that is being read. Once we know the meaning of the content, and its basic "topics", Sophia makes it easy to find appropriate ads to display.
Why does this work, and why is it a good thing? It is safe to assume that if someone has found a particular article on the internet and is reading it, that they have some interest in it, and would like more content on the same subject.
We'll have more to say about this in the coming weeks, so subscribe to this blog (or our social media feeds) to keep up on the latest news, or contact us today.