Search Engine Results Quality Rater Job Application

admin

I wonder what search engine is would be pertaining to? Does anyone know how much this job pays?

Here’s the link to the registration page for quality raters.

Get a Job Making Search Results Align with Business Goals

Get a Job Making Search Results Align with Business Goals

I just had to throw this in. Below is an article I wrote in 2006:

A Historical Look at the Major Search Engines and SEO

1994

* WebCrawler launches
* Lycos launches
* Yahoo! is founded

1995

* AltaVista launches
* Excite launches
* Inktomi launches
* Infoseek launches
* Yahoo! goes commercial (incorporates)
* Little if any optimization had been attempted by those outside of search engines
* Some pioneering sorts had forayed into optimization tied to alphabetical placement in the Yahoo! directory, for example setting up a domain or a site/business name beginning with a number or the first few letters of the alphabet

1996

* Those who had begun to experiment discovered some value in lexical density and keyword placement
* Sites submitted to Yahoo! were, in great number, listed within a week, without much restriction, so growth had truly taken off
* The first papers begin to appear on the web about text matching and data mining – as well as interviews with search engine engineers
* Indexing as an understood methodology began attracting interest and many had discoverd the nature of the core database and scoring structure

1997

* A select few surface, having completely determined the algorithmic parameters of Excite’s ranking calculations
* This knowledge gave way to “page jacking” and “baiting” of websites
* SEO explodes as people began to see simple and easy results in 24 hours with Infoseek’s new update cycle
* Spam becomes a very serious problem

1998

* Yahoo! Web Search powered by Inktomi
* After several papers were delivered at the WWW conferences, it became clear search engines would start moving toward off-page criteria in indexing algorithms
* Decoding algorithms became very sophisticated in 1998 and 1999
* Search engines lean toward increased complexity with multiple languages, term vectors, and other language expertise
* Google jumps into the fray with an ax
* Doorway page generators arrive and are implemented in large numbers
* SEO firms begin to decline (as their clients are banned from Google, in particular)
* The authority model has established Google as a dominant leader

1999

* The influence of Google becomes fully manifested in the industry
* Link popularity tactics grow due to Google’s influence and increasingly undermine the power of Google

2000

* Infoseek is gone
* Teoma founded
* Google gains more hold – other engines take their places in what will be the background
* Goto.com (to be Overture and then Yahoo! SM) establishes a strong presence on hubs – Yahoo, AOL and MSN
* Efforts in SEO shift almost completely to a Google-centric methods and processes and the core innerworking of Google’s technology becomes widespread knowledge

2001

* Ask Jeeves (Ask.com) acquires Teoma
* The world of SEO begins a steady transition into white hat methods due to the use of blacklists and whitelists and the associative impact they carry
* Algorithmic changes from this point forward are incremental, and a growth in editorial controls enters a steady period of growth
* SEs become increasingly like portals (despite the antithetical pronouncements of some)
* SEO and search technology advancement drifts into a doldrum

2002

* A fairly uneventful year comes to pass, primarily due to the pessimism that has taken hold of the technology sector and the global economy
* Google has managed to uphold its appearance as being vastly capable, by patching its technology with human capability
* Buyout planning gets underway due to search techology reaching a developmental lull

2003

* Yahoo! acquires Inktomi
* FindWhat acquires Espotting
* Google acquires Applied Semantics
* Overture acquires Alta Vista
* Overture acquires Fast (AlltheWeb)

2004

* AlltheWeb switches to Yahoo! Search
* Ask Jeeves acquires Excite, iWon, My Way
* Lycos Search discontinued
* Yahoo! Search launches (first original results)
* MSN Search Beta launches
* Google Index Size reaches a reported 8,058,044,651 documents

2005

* Overture renamed Yahoo! SM
* Ask Jeeves acquires Excite Europe
* InterActiveCorp agrees to a $1.85 billion buyout of Ask Jeeves
* MSN Search launches

2006

* SE traffic 5.1 billion searches per month
* Ask.com – Jeeves retires
* Zeal Directory discontinued

2002: Forward or Not?

SEO has changed little since 2002. Many black hat tactics that were effective in 2002 remain effective in 2006, illustrating the inability of search engines to combat various techniques without some form of human contribution. Editorial intervention has therefore become a part of the overarching scheme, and what is learned from the editorial element is inevitably logged. Useful information is gathered from banned sites as well as sites that are considered to epitomize good citizens of the web. This leads to the maintenance of what we’ll call blacklists and whitelists. These human created lists or databases are the means by way of which search engines have been able to mend holes in the proverbial dike, whilst innovation has come to a “crawl”. By enlisting users to report spamdexing, search engines have largely been able to keep the appearance of having a technology that is as effective as it was in the not-too-distant past. With the mass proliferation of knowledge regarding off-site ranking techniques, it was only a matter of time before patches would be needed. And, with a few clever tweaks and a small human army, the engines managed to appear impervious to a multitude of would-be benefactors. Over time, such a tenuous context is bound to become more-visibly unfavorable for search engines, without an increase in editorial capacity in proportion to the growth of content and the increasing prevalence in manipulation of search results.

Certainly search has become less like a pure technology than it was 6-7 years ago. It has also become more focused on pattern matching, in various respects. For instance, try typing a phone number into Google. The phone number pattern is one of many patterns stored in a database that sits alongside the standard index. When a user query is submitted to Google, this secondary database is scanned for a match, if one is found, the pattern is passed to yet another database, such as a phone directory. If a match is found, this match will be blended with Google’s index results. Search engines are likely using the appearance of such patterns as ways to determine the credibility of a website or a domain. The presence of a page that contains a string of numbers which matches a defined phone-number pattern is likely to boost a website’s credibility. The same is true of a pattern consistent with the pattern of a physical address. Spam or rogue sites are not likely to reveal such information, but may nevertheless seed their pages with junk patterns which seek to fool search engine spiders.

The following model notes the importance of editorial controls which run parallel to search technology, though it does not tie this factor into the scoring mechanism, which is probably more than a black and white process – something probably akin to a grading system. This aspect of the overall search landscape is not publicized due to the fact that such methods run counter to the image of companies that tout technological purity. The reality is that machines have yet to evolve to a point where they can visually interpret meaning and reliably resolve ambiguity.

Simplied Search Engine Results Rank Calculation Diagram

Simplied Search Engine Results Rank Calculation Diagram

Being that editorial or associative controls are in many respects more readily carried out and do not require heavy-handed alterations to search algorithms, which in turn may require substantial increases in machine resources, it stands to reason that such controls are fully institutionalized. And, given the recent growth in the number of means one has a available to manipulate algorithmic determinations, editorial intervention has become even more necessary. The widespread adoption and understanding of CSS, for instance, has brought to the picture a number of ways to hide or manipulate webpage text in a “spurious” capacity. Though the code used to carry out any deceptive intent will not likely differ from the code used in a “fair” manner. Given this, algorithms are in no way comparable to humans, insofar as recognizing the nature of the usage. And, even though there are technologically sound ways of achieving recognition, they are not presently in use for large-scale (multi-billion document) indexing pursuits, due to storage needs and resource dependence.

Having been in an editorial role, in the early days of search engines, this need has always been clear to me. Though, the financial capacity to truly tackle the monumental task of human-driven quality control is only made a reality by Google. Other enterprises attempted, but the construct was not scalable either due to a lack of funding or a platform unable to scale efficiently. And, to think, Google used to snub other engines pumped up on an ego of algorithmic might. Alas mathematics, and good night.

VN:F [1.7.0_948]
Rating: 5.0/10 (2 votes cast)
VN:F [1.7.0_948]
Rating: 0 (from 0 votes)
Blog Traffic Exchange Related Websites
Oct
10

Latent Semantic Indexing & Co-Occurrence

admin

Recently, I made a few discoveries while digging around on Google. Shall we draw conclusions from the following?

Googles Semantic Weave

Google's Semantic Weave

Co-Occurrence Coupled with LSI

A Little Co-LSI?

LSI and Paid Search May Not Play Well Together

Adwords and LSI - A Bit of a Reach?

So here we have some root tokens at work, as evidenced by the truncated string matching that has taken place. Google clearly groups discreet strings from the general English lexicon, but in exactly what capacity, it would be difficult to determine. In situations where there are no exact string matches, Google has delved into not only the root pile but a matrix of co-occurring strings which cross-match within the root pile to locatd combinations of root-derived strings that co-exist on pages and satisfy my less-than-likely search.

Yahoo! and Bing return vastly different results. While Bing associates “CNG” with “Compressed Natural Gas” and highlights both terms within its results, Yahoo! does not seem to make the same association. Bing returns the first result with an exact match for “CNG veh” and Yahoo! does so with its second result. Both search engines return matches that tend to occur in non-HTML documents. Neither return token or root matches. Yahoo! suggests that I “Also try: cng veh in” as a search. Bing makes no suggestions.

Google associates computers and competition with the “comp” in the query “comp nat gas,” but fails to associate the more obvious “compression” or “compressed”. Four of the first 10 results for “comp nat gas” actually contain the word “compression” without highlighting, which seem to indicate, that though the term is present, there is not a strong enough association to merit confidence.

There are instances within results where associative confidence is present only in certain sections of the results content. Check out the bottom-of-page results for “cng veh” where Google highlights the Honda NGV in it’s similar searches, and twice fails to highlight “cng” which was actually a string in my query!

Google's Associations with High Blood Alchohol

Loose Associations: Google

Any qualified takes on these discoveries are mighty welcome.

My theory is that these are perhaps some of the common roots at play in the associative partial-word substitutions seen in the examples above. Is it possible that there is some more-loosely-constructed means to determine the substituted words – one which does not rely on a defined lexicon and a matrix of lexical roots? Is this result of some substring or back tracking function in tandem with co-occurrence probabilities?

VN:F [1.7.0_948]
Rating: 0.0/10 (0 votes cast)
VN:F [1.7.0_948]
Rating: 0 (from 0 votes)
Blog Traffic Exchange Related Websites
Oct
10