Latent Semantic Indexing & Co-Occurrence
Recently, I made a few discoveries while digging around on Google. Shall we draw conclusions from the following?
So here we have some root tokens at work, as evidenced by the truncated string matching that has taken place. Google clearly groups discreet strings from the general English lexicon, but in exactly what capacity, it would be difficult to determine. In situations where there are no exact string matches, Google has delved into not only the root pile but a matrix of co-occurring strings which cross-match within the root pile to locatd combinations of root-derived strings that co-exist on pages and satisfy my less-than-likely search.
Yahoo! and Bing return vastly different results. While Bing associates “CNG” with “Compressed Natural Gas” and highlights both terms within its results, Yahoo! does not seem to make the same association. Bing returns the first result with an exact match for “CNG veh” and Yahoo! does so with its second result. Both search engines return matches that tend to occur in non-HTML documents. Neither return token or root matches. Yahoo! suggests that I “Also try: cng veh in” as a search. Bing makes no suggestions.
Google associates computers and competition with the “comp” in the query “comp nat gas,” but fails to associate the more obvious “compression” or “compressed”. Four of the first 10 results for “comp nat gas” actually contain the word “compression” without highlighting, which seem to indicate, that though the term is present, there is not a strong enough association to merit confidence.
There are instances within results where associative confidence is present only in certain sections of the results content. Check out the bottom-of-page results for “cng veh” where Google highlights the Honda NGV in it’s similar searches, and twice fails to highlight “cng” which was actually a string in my query!
Any qualified takes on these discoveries are mighty welcome.
My theory is that these are perhaps some of the common roots at play in the associative partial-word substitutions seen in the examples above. Is it possible that there is some more-loosely-constructed means to determine the substituted words – one which does not rely on a defined lexicon and a matrix of lexical roots? Is this result of some substring or back tracking function in tandem with co-occurrence probabilities?
Related Websites - Pay Per Click Advertising Success With The Right Tools Pay Per Click on (PPC) is an on-line auction fashion promoting system offered by search...
- Search Engine Optimization Tips For Bing Search Engine Optimization Tips For Bing With recent Microsoft - Yahoo deals that combine their...
10




