HTML

Random musings in IT

Friss topikok

Címkék

21 days (14) addon (3) Alfresco (1) analysis (1) android (6) angel investment (1) angularjs (2) ant (1) architect (2) architecture (2) Atoll (1) awstats (1) axis (1) axis2 (1) beta (1) blog (4) blogging (2) bluetoth keyboard (1) bofh (1) boxee (1) breakfast (1) build (1) cassandra (1) CGI (1) challenge (10) challenge cup (1) chinese (4) chromium (1) CMS (1) compaq contura aero (1) conference (1) consulting (1) continous-integration (2) Dark Data (1) DB2 (5) Debian (1) developer (1) development (1) document outliner (1) driver (1) Eclipse (3) ECM (2) education (1) EJB (1) ejb-client (1) emulator (1) etcd (1) experience (1) facebook (1) female (1) FileNet (1) firefox (2) freeplane (17) freeplaneGTD (17) fun (8) Geronimo (1) getting things done (1) gitlab (1) gnome (1) gtd (15) habit (3) hack (1) hacking (1) hdmi to vga (1) hibernate (1) HR (2) Hungary (1) I18N (1) IBM (1) in (1) Information Lifecycle Governance (1) interview (1) invitel (1) issues (1) It (2) J2ME (1) java (11) javascript (3) java security (1) JAX-WS (1) JBoss (1) JSF (1) kernel (1) Keyboard (1) layout (1) lighttpd (1) Linux (11) LXC (1) macro (1) maven (3) meetup (1) mercury (1) microservice (1) mindweb (5) motorola vip1910-9 (1) movie (1) MQ (1) mw3030r (1) natty (1) nginx (1) node.js (1) nodejs (1) nosql (1) OpenUP (1) Oracle (1) overheat (1) php (1) plugin (1) PrimeFaces (1) project (5) project management (1) RCP (1) recruiter (2) regexp (1) release (11) reporting (1) retrospective (2) rootkit (1) rss (1) RUP (1) script (1) server (1) shared library (1) SharePoint (1) SOA (1) spam (2) spellchecker (1) SQL (1) SSL (1) startup (2) stb (1) story (1) swing (1) SWT (2) tablet (4) tapermonkey (1) tech-beer (1) test (1) thoughts (1) timelapse (1) tomcat (1) toys (1) translate (2) typescript (1) ubuntu (1) ui (1) unified-process (1) University (1) usb (1) VirtualBox (2) virtualization (2) web-service (2) webcam (1) WebSphere (2) wicket (1) widget (1) Windows 8.1 (1) wordpress (3) X11 (1) Címkefelhő

I keep this blog mostly for my personal journal, so I won't forget what I worked on, and to keep my writing skills up. I never thought anyone would seriously be interested in what I wrote, so I've paid little attention to the comments I received for my posts. To keep the steady flow of spam at minimum, I've set up Spam Karma 2, and didn't worry too much about it anymore.

Just now when I entered the admin area for routine upgrade, I've noticed that beside the 2500 spam comments I also have about 30 not marked as spam, and I became curious. Are these real comments, or have they bypassed the spam filter?

Blogs are basically spammed for two reasons, either the network trolls pick your post for food and start munching away on each others' comments, or it's used as part of some Search Engine Optimization (SEO) scheme which is based on the fact that the search engines consider a potential hit more relevant if more links point at it.

I've blogged about some "now" topics, like time-lapse and Android tablets, and it seems that it's a kind of sweet spot to those who just want to sell their junk be it non-subscription medicine or fashionable cheap sunglasses.

The reason I think this deserves it's own post is because I've read through some of the comments and it showed that several different engines of a varying level of relevancy are in use for spamming. There is always a link provided as the comment's author's own site, which is the page that the SEO is for. There is also an e-mail address that is valid, but usually is a machine generated one.

The simplest of all is just a elaborately worded congratulation on the article or a promise to distribute it on other channels like reddit. Well it's just social engineering, all about flattering the author, so he will show it on among his other comments. These kind of comments are using a template and are so blend they can be used on basically any post. The fixed template is it's main weakness. Even the simplest spam filters can be trained to detect these fixed templates. How they got through? Probably my spam template database didn't yet contain the exact template when it arrived.

The next type is the machine generated lorem ipsum type spams. There are subtle differences among these as well. The simplest of these is the web scraper based commenter, which takes a - possibly completely random - part of an other page and is publishing it as a post. Some of these are so crude, they cut the first and last words in half. It's quite difficult to detect these, as the text contained in it is coherent, since these texts are published several sites, the spam filter can use the comments reported by others as templates and detect them.

A more interesting approach is when several key phrases are used as seeds and are put in random context with a lorem ipsum type generator. These generators vary in complexity, some look like an alphabet soup, but some have punctuation and capitalization as if it was proper text. The idea of using this type of spam is that by posting the keywords the page pointing at the target will be even more relevant scoring higher on the SEO. How are these handled in the spam filter? Well the text is generated usually using words from the very same page/blog. The link that is the target of the SEO is what it gives it away.

The "best" ones I've found were both readable and almost relevant to the article itself. They were surely written by humans and some of those were so relevant I almost accepted them. If my blog had more traffic and commenters they would have almost certainly have passed as valid comments. I strongly suspect these were written by actual people and are a the most relevant boilerplate comment (they are always positive and supportive) is selected based on the analysis of the actual article. I think these templates are collected from various forums by people and are categorized and regularly changed to bypass the filters.

At the end I became so paranoid I marked all my comments as spam. If there was anyone whose actual comment I inadvertently removed, I'm sorry!

NB: this article I intend to use as a kind of honeytrap. All comments (passing Spam Karma) on it will be preserved and allowed so as to prove my point.
If an actual human is about to share his thoughts, he's welcome, but please state I'm an actual human, just to avoid confusion. :)

Címkék: spam blog wordpress

3 komment

A bejegyzés trackback címe:

https://itworks.blog.hu/api/trackback/id/tr155522372

Kommentek:

A hozzászólások a vonatkozó jogszabályok  értelmében felhasználói tartalomnak minősülnek, értük a szolgáltatás technikai  üzemeltetője semmilyen felelősséget nem vállal, azokat nem ellenőrzi. Kifogás esetén forduljon a blog szerkesztőjéhez. Részletek a  Felhasználási feltételekben és az adatvédelmi tájékoztatóban.

Outdoor Furniture Design Idea 2013.07.11. 07:09:57

My pal recommended I'll like this site. He / she was entirely perfect. This particular blog post definitely built my own day time. A person can not consider simply the fact that whole lot moment I had created invested because of this details! Thank you!

Anna 2013.07.11. 11:37:32

Interesting entry. Let's see where my comment ends up. I am a human, despite what you might think ;-)

Ray Ban UK 2013.07.16. 04:05:54

Your use from the web site indicates your agreement to be bound by the Terms of Use.
süti beállítások módosítása