ExtortionLetterInfo Forums
ELI Forums => Getty Images Letter Forum => Topic started by: Robert Krausankas (BuddhaPi) on January 23, 2013, 10:58:03 AM
-
Since moving the ELI site to new servers and working on bugs, glitches and other items, I had the opportunity to have a look at the log files and made a couple of discoveries of interest..
Something somewhere was triggering mod-security to block IP's, the last issue I looked into led to the discovery of 46 alerts triggered! yes 46 different alert strings! I immediately suspected some sort of bot, crawler or scraper. As it turns out a crawler named "magpie crawler" was hammering the server relentlessly.
some quick research told me that magpie crawler is owned and operated by a company named "Brandwatch" and there are plenty of google entries slamming them for being "bad bots" and sucking away bandwidth like crazy..
Who are Brandwatch?
"We are a social media monitoring company, helping our customers find useful and relevant comments and discussion on the web. We crawl blogs, forums, news sites, and all kinds of social media content. The content is indexed, much like a search engine, allowing our users to find the pages that mention the words they are interested in."
Brandwatch is apparantly involved in "Online Reputation Management and Brand Tracking in Social Media". They have a crawler that will just swamp a reasonably small site. No pacing, no maximum number of requests per second, just blast away at the fastest possible rate they can.
They perform no useful function for ELI or our users / readers. They sell information to their clients on who is talking about them. That's right, they are trawling ELI in order to sell information to someone who is worried we might be saying something negative.
Don't be surprised is someone from Brandwatch appears in this thread, apologizing and justifying their bots behavior, it is their typical MO. At this time they "claim" they adhere to robots.txt, I have my doubts, they will be monitored closely, and worst case they will get their IP range blocked from ELI.
-
Since moving the ELI site to new servers and working on bugs, glitches and other items, I had the opportunity to have a look at the log files and made a couple of discoveries of interest..
Something somewhere was triggering mod-security to block IP's, the last issue I looked into led to the discovery of 46 alerts triggered! yes 46 different alert strings! I immediately suspected some sort of bot, crawler or scraper. As it turns out a crawler named "magpie crawler" was hammering the server relentlessly.
Yep. Been there. Seen that. :)
magpie is blocked by default with ZBblock. It's really worth adding ZBblock that to the front end of the SMF which uses php. That will block a bunch of other seo bots.
some quick research told me that magpie crawler is owned and operated by a company named "Brandwatch" and there are plenty of google entries slamming them for being "bad bots" and sucking away bandwidth like crazy..
Yep. These are the sorts of things that need to be blocked. After I saw the image crawlers racing, I watched more and realized that there is just to much c*ap stuff out there that does your site no good, and needs to be blocked. As far as positive SEO: Google, and maybe bing are worth letting in. Most other SEO exist for "research" (i.e. party A reads your site "B" to sell information to site C. There is nothing positive in it for "site B". )
Don't be surprised is someone from Brandwatch appears in this thread, apologizing and justifying their bots behavior, it is their typical MO. At this time they "claim" they adhere to robots.txt, I have my doubts, they will be monitored closely, and worst case they will get their IP range blocked from ELI.
Don't even wait. Just block them. I advise getting ZBblock http://www.spambotsecurity.com/zbblock_download.php It's pretty easy to use-- and you'll be glad you do. Zaphod has pretty good directions to install the thing and once it's installed, for most CMS's you add 1 line to the top of a php file. (For Wordpress it's the config.php or something like that.)
If you use that, you'll block all sorts of other things: majestic12, 80 legs, mobsters in Russia, a fair amount of spambots. I'm enough up to speed that I can suggest custom things if you need quick help.
This can be in addition to any blocking you do at your router. (I'm on shared hosting... so ZBblock is a big thing for me. I now use Cloudflare as a CDN and automatically transfer blocks so they happen at Cloudflare. That's great too. But I think since you have control of the server you don't necessarily need to have the Cloudflare bit. )
FWIW: If you end up using ZBblock, I'm going to ask you to let me read your killed_log.txt files. I want more data!! :)
-
make no mistake, they are blocked...albeit at this time thur robots.txt...i'll be watching in case i need to be more proactive..I'll also be looking at zblock.
-
Carnac predicts: They will violate robots.txt. :)
-
Carnac predicts: They will violate robots.txt. :)
i totally agree, but see when they do I can call them out on it, and slam Brandwatch for being shifty in their practices and liars!!
-
http://www.spambotsecurity.com/forum/search.php?keywords=Brandwatch&terms=all&author=&sc=1&sf=all&sk=t&sd=d&sr=posts&st=0&ch=300&t=0&submit=Search
-
I feel if the bot serves no valuable service to a website, it does not need to be crawling it. As I am finding out the hard way, most malicious bots ignore the robots.txt file. I'm with lucia and would just go ahead and block them.
I wish I could use the ZBblock, but our website is on an IIS7.5 server, so all I have to work with is the rewrite rule add on tool and I am still learning how to use it properly. Lucia, any recommendations if what ZBblock is doing can be done with a rewrite rule?
-
ZBblock can be made to work for with *anything* that uses php. The ELI forums uses SMF which is a php script. So, with respect to the forum, ZBBlock would do a great job keeping crawlers off. Reading the killed_log.txt files could also help Robert & Matt identify "things" they might want to block in .htaccess (which is for Apache).
Zbblock doesn't protect static files (unless you do really fancy stuff). So, for that, those on Apache can use .htaccess, or some sort of firewall. (I end up using Cloudflare as my "firewall".)
I don't know anything about how to protect servers using software that doesn't permit you to use .htaccess. But *in principle* you might be able to do a lot of stuff with .htaccess you can do with ZBblock-- but ZBblock is easier because Zaphod already wrote it, updates &etc. ZBblock may also be faster and it permits you to come up with quite a few "tailored" rules if you so desire.
Are you trying to protect anything dynamic? (A blog? Shopping cart? Etc.)
-
We don't use php, but IIS7.5 uses htaccess and webconfig, so a lot of items that can be written for those types of files, I can use. Was just hoping for some software that will help automate it a bit and make it easier. From some more research last nite, it looks like URL rewrite will do what I am wanting to do, just that it is time consuming and I have an entire company network to manage along with being a webmaster and the network security specialist.
You know I was so curious to why our website that I had not worked on much all of sudden last year had almost doubled the traffic hits and bandwidth, only to realize later that it is scanning and trolling bots making my life a living hell. Our site is static, but when we started posting more on on our twitter and FB pages with links back to our website, it seems the "trolls" got interested in us. I suppose popularity comes at a price :(
-
There are lots of twitterbots too; I think their goal is advertizing/seo for their customers (not the site visited). If you post a link to twitter, a swarm comes and it comes instantaneously. Because my site is only a blog I ban most of those too. Eli probably should too (though I don't think anyone is tweeting ELI's address much. But if it does, most twitter bots are useless. A few might be useful-- someone could let them in and block the others.)
-
We don't use php, but IIS7.5 uses htaccess and webconfig, so a lot of items that can be written for those types of files, I can use. Was just hoping for some software that will help automate it a bit and make it easier. From some more research last nite, it looks like URL rewrite will do what I am wanting to do, just that it is time consuming and I have an entire company network to manage along with being a webmaster and the network security specialist.
You know I was so curious to why our website that I had not worked on much all of sudden last year had almost doubled the traffic hits and bandwidth, only to realize later that it is scanning and trolling bots making my life a living hell. Our site is static, but when we started posting more on on our twitter and FB pages with links back to our website, it seems the "trolls" got interested in us. I suppose popularity comes at a price :(
ahhhhh, there are a number of bots that come running when things are posted on twitter, it's known as a twitter swarm, most of them seem to come from amazonaws.com IPs...
-
There are lots of twitterbots too; I think their goal is advertizing/seo for their customers (not the site visited). If you post a link to twitter, a swarm comes and it comes instantaneously. Because my site is only a blog I ban most of those too. Eli probably should too (though I don't think anyone is tweeting ELI's address much. But if it does, most twitter bots are useless. A few might be useful-- someone could let them in and block the others.)
No ELI's address is not getting tweeted often, I would like to change that however, and have been concentrating some effort into getting a bit more exposure via twitter... I'm not going to go nuts blocking bots , as I don't have the time to invest, but I will make the time if server resources are effected enough in a negative way.. This could easily be a full time job...I lalready have 2 or 3 of those..
-
ahhhhh, there are a number of bots that come running when things are posted on twitter, it's known as a twitter swarm, most of them seem to come from amazonaws.com IPs...
ZBblock blocks most of Amazonaws.com with a few bypasses for the wayback machine and other popular with hosts services. One can then block wayback in customsigs.inc. But many people like the wayback, so Zap has a bypass for that.
That twitterswarm can wreck havoc on a dynamic site with cheap hosting. I escalate Amazonaws.com IP blocks to cloudflare and never unblock those that got blocked. It's just too much cpu/memory for my hobby site.
-
No ELI's address is not getting tweeted often, I would like to change that however, and have been concentrating some effort into getting a bit more exposure via twitter... I'm not going to go nuts blocking bots , as I don't have the time to invest, but I will make the time if server resources are effected enough in a negative way.. This could easily be a full time job...I lalready have 2 or 3 of those..
The "don't have time to do it full time" is where ZBblock can be useful to people running smaller sites especially hobby sites and forums. It ends up saving time resources (including human). But it's not necessarily the solution for everything. You have access to the logs and thus situated to know if excess bot traffic is a problem for ELI. If it is a big problem: ZBBlock is a good thing to add quickly. If it's not, then no.
-
There are lots of twitterbots too; I think their goal is advertizing/seo for their customers (not the site visited). If you post a link to twitter, a swarm comes and it comes instantaneously. Because my site is only a blog I ban most of those too. Eli probably should too (though I don't think anyone is tweeting ELI's address much. But if it does, most twitter bots are useless. A few might be useful-- someone could let them in and block the others.)
No ELI's address is not getting tweeted often, I would like to change that however, and have been concentrating some effort into getting a bit more exposure via twitter... I'm not going to go nuts blocking bots , as I don't have the time to invest, but I will make the time if server resources are effected enough in a negative way.. This could easily be a full time job...I lalready have 2 or 3 of those..
LOL..I hear you, and I know the feeling.
It seems since the whole Getty thing, I have now turned into being more of a security specialist working on hardening our network even more. We host our own web server on a DMZ, so I have the luxury of blocking large swaths of IP ranges too, but the sheer number going after our web server is ridiculus. I have over 4000 attempts on our network every month! Good news is I have learned a lot about URL Rewrite, the htaccess and the webconfig configurations over the last few days, so maybe I can slow some of them down. Compiling a databse and some instructions on how to deal with some of them for future postings. :)
-
I added some "webconfig" rules that Dreamhost suggested. They catch well known bad things like "timthumb" attempts etc. When that catches something, it triggers a 503 response. So.... I wrote a dynamic 503.php which calls ZBBlock, bans that IP-- and then my other script reads the killed_log.txt file in ZBBlock and bans it at Cloudflare!
-
hi - does zbot stop picscout and similar... http://www.spambotsecurity.com/zbblock.php (http://www.spambotsecurity.com/zbblock.php)
It's just it says on their site - It won't stop access to non-exploitable resource files like .gif, .jpg, or .swf.
I have nothing to hide from Getty or others... I'm trying to get as good a understanding as I can in the shortest possible time so I can put my sites (and 10 years of my life) back online after my Letter...
TJ
-
ZBblock will not block image agents *on it's own*. But once you have it up and running, you can do things to make it block image agents. The difficulty is that
1) It's a little complicated and it's not worth explaining how to do it unless someone first wants to use ZBblock and understands what it does.
2) Someone is willing to fiddle with .htaccess fairly frequently.
3) Someone is willing to burn the cpu/memory to turn image handling into a dynamic process instead of a static process and
4) You (likely) can't achieve perfection.
Because of this, I've never fully explained how to use ZBblock to protect images.