Search
Engine News
The Search Engine
Spam Police
by Shari Thurow, Guest Writer
The major search engines and web directories consider spammers to be
those who take extreme measures to get web pages ranked well. What types
of pages are considered spam?
In a Search Engine Strategies session entitled "The Search Engine
Spam Police," representatives from search engines Inktomi, Google,
FAST Search, and web directories LookSmart and the Open Directory
Project explored the issue of spamming and presented the audience with
some general guidelines to follow.
In yesterday's issue of SearchDay we covered the advice and tips offered
by the human compiled web directories. Today we'll focus on the
policies of the crawler built search engines.
Bob Keating, Editor-in-Chief of the Open Directory Project (ODP), defined spam as the aggressive and continuous submission of identical sites to the same or multiple, inappropriate categories, and sites that violate submission policies for inclusion.
Types of sites that ODP considers spam are:
(1) Affiliate sites with same or similar content but a different site designs.
(2) Mirror sites. Submitting mirror URLs to different categories is also considered spam. Multi-lingual sites are acceptable as long as the URL resolves to the appropriate language.
(3) Sites that use redirects or any type of bait-and-switch practice. Using frames to hide a real URL, commonly referred to as "poor man's cloaking," is also considered spam.
(4) Sites whose sole purpose is to drive traffic to affiliate links or sites that contain these types of links.
If an editor or a submitter is caught spamming, the editor is immediately removed from ODP without notice, and future submissions are either deleted or blocked. If the spam is particularly relentless, ODP might remove "listable" listings as well. If you suspect that an editor or submitter is spamming, report the spam abuse to staff@dmoz.org.
Tim Mayer, former Director of Web Search Product Management at Inktomi,
stated that "Inktomi considers spam to be pages created
deliberately to trick the search engine into offering inappropriate,
redundant, or
poor-quality search results." Spam is more about how and to
what extent a technique is used, Mayer explained, rather than if a
technique is used.
Some of the common practices that Inktomi considers spam are:
(1) Web pages that are built primarily for the search engines and not
your target audience, especially machine-generated pages.
(2) Pages that contain hidden text and hidden links.
(3) "Great quantity and little value" pages.
(4) Link farming and link spamming, particularly free-for-all (FFA)
links.
(5) Cloaking, a practice in which the search engine and the end user do
not view the same page.
(6) Sites with numerous, unnecessary host names (i.e. poker.abc.com,
blackjack.abc.com, etc.).
(7) Excessively cross-linking sites to artificially inflate a site's
apparent popularity.
(8) Affiliate spam.
If a webmaster is caught spamming, Inktomi will either demote the
offending web page/site from its index or completely ban it.
Jen McGrath, Software Engineer at Google, advised webmasters to create
sites with appropriate, relevant content and a straightforward design.
In other words, make a useful site that clearly benefits your end users.
McGrath also advised webmasters to submit your site to web directories
and let other sites link to you. Your site does benefit from the
sites that link to it. However, your site can be penalized for the
sites that you
link to. Spam penalties include demotion and removal from Google's
index.
Some items that Google considers spam are:
(1) Cloaking.
(2) Automated queries to Google to check positioning. The goal of
this is primarily to tweak a site for positioning purposes, not to
create content that benefits end users.
(3) Hidden text or hidden links.
(4) Stuffing pages with irrelevant keywords.
(5) Doorway pages, domains, and subdomains with the same or similar
content.
(6) "Sneaky" redirects.
Rolf Michelsen, Software Engineering Manager at Search, defined
spam as using techniques to artificially influence a search engine's
precision or relevancy. Just as Mayer stated earlier, spam is based on
effect rather than technique.
Michelsen presented the following guidelines:
Do:
(1) Focus on content.
(2) Create a site that is easy to use in simple browsers.
(3) Link to other relevant sites.
(4) Submit the URL of your main site.
Don't:
(1) Cloak.
(2) Stuff irrelevant keywords into web pages using invisible text.
(3) Submit all URLs, every day, using the free submit.
(4) Participate in link farming or FFA links.
(5) Resort to "snake oil" search engine marketers. In
other words, don't fight spam with spam.
How Search Engines
Look at Links
by Craig Fifield, Guest Writer
Link analysis is one of the most important techniques search engines use
to determine relevance, and understanding how it works is crucial for
successful search engine optimization. Representatives from Google
and Teoma explain how it's done.
If you have spent any time over the past few years studying search
engine marketing you are probably familiar with the linking craze going
on in the industry. Everyone from experts to those new to the field toss
about terms like "link popularity" and "page rank"
and it seems that all related discussion forums and web sites have
entire sections devoted to linking. As the foundation of the web, links
have always been important, but links themselves haven't changed much
since the day they were created so why all the renewed interest?
The reason is that the major search engines are utilizing links more and
more to improve the relevance of their search results. However, the
world of links and their use by search engines can get confusing
quickly. To help sort through the more important elements of linking the
session "Looking at Links" was held at the Search Engine
Strategies conference in San Jose, California. The search engines that
utilize links the most,
Google and Teoma, both sent representatives to explain why links are
important to their engines, and how to best utilize them on a web site.
Daniel Dulitz, Director of Technology for Google, started things off by
stating one of the more important points of the session -- as search
engine indexes grow larger it becomes almost impossible to determine a
web page's relevancy based solely upon on-the-page factors (page text,
metas, titles, etc.). It's this fact, combined with the reality that
most on-the-page factors can't be trusted due to abuse, that prompted
Google to begin looking at the link structure of the web to help
determine a page's relevance to a query.
According to Dulitz, when determining the relevance of a web page to a
search they use their PageRank system to attempt to "model the
behavior of web surfers" by analyzing the manner in which pages are
linked to one another. He explained that Google views the interlinking
of web pages as a way of "leveraging the democratic structure of
the web" with links equating to votes.
Google essentially treats each link from one site to another as a vote
for the site receiving the link (link popularity), but each vote is not
created equal. Dulitz used a simple diagram to show that each page of a
site only has one vote to give, so the more links to different sites on
the same page the less of a vote each one receives. He also stated that
links from higher quality sites carry more weight than those of lesser
quality sites (e.g. sites with hidden links, involved in link farms, no
incoming links, etc.). In addition, Google not only analyzes who is
linking to whom, but they also analyze the text in and around the links
to
help determine the relevance of the pages receiving the links.
Paul Gardi, Vice President of Search for Ask Jeeves/Teoma, began with
similar comments to those made by Dulitz. Gardi stated that "due to
statistical convergence" and the ease with which they can be
abused,
neither page text analysis nor standard link popularity can be relied
upon when determining the relevancy of a web page. Specifically, he
mentioned that standard link popularity is ineffective because it does
not help
determine the subject or the context of the site, and larger more
popular sites tend to overwhelm smaller sites that may actually be more
relevant to a search.
To combat these issues Teoma views the web as a global entity that
contains many subject based web site communities. They study these
subject communities and the manner in which they are interlinked within
themselves and with each other to determine not only their link
popularity, but also the subject and context of the involved sites.
According to Gardi, Teoma is able to do this by using their unique
method of ranking sites. He explained that rather than relying on
general link popularity to determine results, their engine attempts to
employ a "subject specific popularity" to locate the most
popular sites within a specific subject community. This is done by first
analyzing the web as a whole to identify subject communities.
Teoma then employs link popularity within those communities to determine
which sites are the "authorities" on the subject of the query
and it's those sites that are returned as their results to a search. In
addition, he mentioned that by analyzing the links of the authority
sites their technology is also able to locate high-quality resource
pages (links pages) that are related to the original query. Each of
these components is
then made available on their search results page as follows:
"Results" are the authorities, "Refine" is the
related subject communities, and "Resources" are the related
links pages.
Overall, the session was well received and very informative, especially
for those new to the subject. Considering that most major search engines
now utilize some method of link analysis, anyone that has a vested
interest in being properly indexed by the search engines should consider
attending in the future.
Home
Page
Sue
Strand
605-274-1565
CST
sstrand
at worldrecreationaldiscounts.com