Syllabus SEO Training

Site Map | Contact | About

Tel: +34 693 475 142

Syllabus Public SEO Training Seminars

How many sales do you lose through poor SEO

Attempting to promote your website online, often against stifling competition, can seem an impossible prospect especially if you only have limited knowledge of the techniques required. Learn what it takes to get to the top.

View Our SEO Course Timetable

Contact Syllabus

Syllabus SEO Training,
25 Daisy Street, Glasgow,
Scotland, UK.

Contact Syllabus

3.3   Latent Semantic Analysis

LSI is based on a theory called Latent Semantic Analysis. This theory was devised in 1990 by Susan Dumais, George Furnas, Scott Deerwester, Thomas Landauer, and Richard  Harshman.

According to Landauer, Foltz and Laham, Latent Semantic Analysis, or LSA, is a theory and method for extracting and representing the contextual-usage meaning for words by statistical computations applied to a large corpus of text.

In other words, LSA is statistical and mathematical method for finding the contextual meaning of words in a large collection of documents. Such a collection could be something like the Internet, which contains a vast corpus of text based documents in the form of web pages.

If this begins to sound like advanced mathematics meets advanced linguistics, that’s because it is! (LSA even borders on cognitive science). This method however has immediate applicability to search engines because we are dealing with the problem of making a mathematical machine, or computer, ‘understand’ or analyse, the meaning of words (semantics is the study of word meaning, hence Latent Semantic Analysis).

Unlike most humans, who usually acquire the ability to use and understand language at an early age, computers cannot understand what words mean. The same holds for search engines. Despite their sophisticated mathematical algorithms, and despite the fact these algorithms ‘read’ the text on web pages to some extent, search engines are actually rather stupid and cannot form even the most basic understanding of what words mean.

What is the ‘contextual-usage meaning’ of words? To explain this we have to look at two features of everyday language which cause particular problems for computers and search engines.

  1. synonymy.
  2. polysemy.

A synonym is a word that roughly has the same meaning as another word. To find synonyms for words you simply have to consult a Thesaurus, where you will find a list of alternative words that can be interchanged with the original word.

I say ‘roughly’ because we can’t just select any alternative listed in the Thesaurus to replace our original word. In fact, some words only become synonymous with other words when used in the right context.

For example, if I consult my Thesaurus for a synonym for part of our earlier search for a car, ‘used’, I am provided with the following list of possible alternatives:

cast-off
hand-me-down
nearly new, not new,
reach-me-down
second-hand,
shopsoiled,
worn

If I were looking for second-hand clothes rather than used cars, I could use many of the above synonyms, as we customarily refer to second-hand clothes as ‘cast-offs’ or ‘hand-me-downs’. However, we don’t use such phrases as ‘hand-me-down cars’ or ‘shopsoiled autmobiles’. The context of our original phrase, or the word ‘cars’, determines that only one of the above phrases - ‘second hand’ - is an appropriate alternative for ‘used’.

In other words, we understand which words are synonyms according to the context in which they appear.

Of course, it would be of great advantage to us as searchers if the search engine were to automatically find commonly-used alternative terms for the search phrases we entered. While we could simply construct a search-engine with its own built-in Thesaurus, the above example shows us the problems we would inevitably encounter if we did so. If the search-engine attempted to substitute our search terms with all the alternatives found in its Thesaurus, it would produce some very strange search results. Without some understanding of ‘contextual-usage meaning’, or the context in which the term to be substituted appears, the search engine would be unable to pick the ‘right’ synonyms.

‘Polysemy’ can roughly be translated as ‘many-meaning’. It refers to the fact that most words in any given language have more than one meaning.

To see this you simply have to look in a dictionary, where you will find that most words have more than one definition. If, for example, we use a term from our earlier search, ‘vehicle’, we can see that it could have more than one meaning. According to the Oxford Concise Dictionary, a ‘vehicle’ could be a thing for transporting people, a means of expressing something, or a film intended to display its leading performer to best advantage!

How do we decide which of these possible meanings is called into play at which point then? This is where ‘contextual usage’ comes into play. As language users, we know which meaning is being used according to the context in which it appears. If for example, I was to say that ‘Top Hat was a vehicle for Fred Astaire and Ginger Rogers’, you would know that the word ‘vehicle’ in this context refers to a type of film and not a car. If, on the other hand, I use the phrase ‘second-hand vehicle’ you are likely to know that I am referring to a car.

Unfortunately, a computer has no way of distinguishing between the two as it lacks the ability to understand the context of statements and has no knowledge of the linguistic customs that give rise to polysemy. This means that the search query ‘second-hand vehicles’ could potentially return any page that happens to mention the two words, including pages that mention films or even ‘vehicles’ for expression such as poems.

We clearly have a problem then, because computers can’t understand the meaning of words according to the context in which they appear. It either has to stick with the terms given and ignore all possible alternatives - which means that we could miss documents that are relevant to our search but don’t contain our keyphrase - or include all possible alternatives - which means that numerous irrelevant results could be returned.

LSA provides us with a means of getting round the problem of computers not being able to understand contextual-usage meaning. It has been successfully applied to the process of information retrieval - that is, the process of retrieving information from large databases and collections of documents (like the Internet) - because it adequately gets round the two problems of synonmy and polysemy.

It does this by looking at the collection of documents as a whole and finding words that are commonly closely related. For example, by looking at enough documents it could find that ‘used cars’ and ‘second-hand automobiles’ are closely related terms simply because all the above terms customarily appear together on the same pages. Let’s have a closer look at how this works.

For more information about our Search Engine Optimisation Training Courses contact Syllabus or call +34 693 475 142.

 

 

 

Dating Links
Denver Colorado Seo
Your Professional Viral Marketing
Affiliate Programs
Review Site