Daily Blogger

Approaches To Combat Spam Web

Posted on: 07/08/2010

1、Content Spam (spam content)

Language: Engineers online search were interested in the languages of pages to see what they could find. It came out that French was the language that has often proved to be a festival of spam, with German and then English. I found this model quite interesting.

Area: No surprise, it was revealed that the domains. Biz had a high rate of spam, much higher than others. The. And us. Com is the following. But. Biz is well ahead – be careful ok?

Words per page: Another approach often used. They found that the pages that contain lots of text were often those that contained the most spam. Less than 1500 words, the curve of spam declined. The slice of 750-1500 words seemed to be the benchmark of spammers。

Keywords in the TITLE tag: This is another area that they observe. Experience has shown that spam pages tended to use more keywords in the TITLE tag compared to normal pages.

The number of anchor link (anchor text): Another interesting approach is to consider the report of the anchor text to the text of the page. It may be at the page or site. Sites that contain a high percentage of anchor text (as plain text) are likely to be spam sites.

Fraction of visible content: This is to use hidden text, not to be confused with the reports of the code in the text. They are interested in a proportion of text that is not actually visible on the page.

Compressibility: As a mechanism to fight against Keyword Stuffing (keyword stuffing), search engines may look at compression. More specifically, it is the repetitive nature or content of spinning used to spam. Search engines often compress a page to record indexing and processing. It is likely that spam pages have a compression ratio (uncompressed divided by the compressed)。

Overall Popular Tags: Another good way to find the Keyword stuffing is to compare the words of a page with data from existing applications and documents known. If someone does Keyword stuffing around those keywords, they will engage in use much less natural than the user requests and best-known pages. 

Spam query: Given the rise in requests analyzed, click data and customization, spammers can find varieties of words clicked and click on their own results. By observing the types of queries, in combination with other signals, these tactics become statistically apparent.

Spam Host-level is looking for other sites and domains on the server and / or the registrar. Like TrustRank, most of the time spammers will find themselves in the same neighborhood with other spammers.

Phrase-based: Under this approach, a learning model that uses training materials looking for abnormalities in the form of textual phrases linked. It’s a bit of Keyword Stuffing steroids. The search for statistical anomalies can often highlight documents spammy.

Link Spam (Spam link)

TrustRank (confidence rating): This method has several names, TrustRank is the taste of Yahoo. The concept revolves around having “good neighbors”. Research shows that the sites are good links to good sites and vice versa. You are recognized by the company you keep.

Link stuffing (stuffing links): A spammer approach is to create a ton of pages of low value to point to several links to a page target. Spam sites tend to have a greater share of these types of pages artificial compared to other good pages.

Nepotistic links (links of patronage): we had all through paid links as opposed to those negotiated (reciprocal). While for SEOs it can be a gray area, the search engines themselves, certainly believe that manipulation link all forms of reciprocity exist are obvious manipulation.

Topological spamming (link farms): We have our own thoughts about this, but the search engines themselves, will focus on the percentage of linkage in a graph, compared to “good” sites. Generally those who seek to manipulate the engines have a higher percentage of links from these places of spam.

Temporal anomalies (anomalies temporary): Another area where typically spam sites deviate from the set of pages (normal) are the historical data. In the index there will be an average acquisition and regression links to sites “normal”. Temporal data can be used to help detect spammy sites engaged in the creation of unnatural links.

2 Responses to "Approaches To Combat Spam Web"

Craft information and project ideas are always in high demand on the internet.

Finding profitable blog niche ideas is one of the major challenges for any new internet marketer.

Comments are closed.

RSS Unknown Feed

  • An error has occurred; the feed is probably down. Try again later.


Promote Your Blog

Add this page to SheToldMe.com

August 2010
« Jul   Sep »

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 144 other followers


Top Rated

Blog Stats

  • 32,272 hits
%d bloggers like this: