ExtortionLetterInfo Forums
ELI Forums => Getty Images Letter Forum => Topic started by: Robert Krausankas (BuddhaPi) on December 08, 2011, 05:39:09 PM
-
I just read this little snippet ( no link to the site as there wasn't much else there worthwile)
"Copyright owners have the legal right under the DMCA to reserve the right to view content only to website visitors. Webmasters have the legal right under DMCA to block access to anyone who wants to store or copy website content. It is also a crime under US law to use any trick or false information to gain access to a computer system. Running a robot that pretends to be a user by faking its useragent is crime under US Law because it is using false information to gain access to a computer system."
Can anybody point me to this specific area of the DMCA statutes? I'm going to start looking thru the entire thing. If this is indeed in the law, then I would venture to guess that Getty images will keep operating picscout out of Israel, to skirt US laws..
-
Interesting tidbit there.
I think that the answer's in the DMCA; it'll just be a little bit of work to read it all and understand it.
It would be good if "robots" could be legally precluded from accessing sites.
If this is, in fact, the case:
I don't think that it would help the copyright trolls if they collected data in countries other than the US.
Because, while they may be able to detect infringements in this manner, I can't imagine them bringing such data to court as evidence.
It would certainly seem odd to arrive in a US court with evidence collected "legally offshore", that would have been "illegal" to collect on US soil.
Again, that's assuming that the DMCA prevents the trolls from ignoring attempts to block "robots".
I'm quite interested to hear more...
S.G.
-
I've been scanning through the .PDF of the law:
http://frwebgate.access.gpo.gov/cgi-bin/getdoc.cgi?dbname=105_cong_public_laws&docid=f:publ304.105.pdf
I think that "‘‘CHAPTER 12—COPYRIGHT PROTECTION AND MANAGEMENT SYSTEMS" ~ ‘‘§ 1201. Circumvention of copyright protection systems" (PAGE 5) might be the section to look at first.
One could assume, for argument's sake, that the "copyrighted material" is one's web site, and the "copyright protection system" would be the "robots.txt" file, or other measure.
S.G.
-
Again great minds think alike...except I'm kinda brain dead..If this is how the law is interpreted, and say HAN, Getty or one of the other trolls brings it to court, it could be shown that they did collect the "evidence" from another country, and I was also thinking along the lines of my site being the copyrighted material... I'll keep plugging along and reading more, see if i can drudge up anything else starting with Chapter 12 as you suggest..my main problem is understanding all the legal jargon, it generally takes me 2 or 3 times reading it before the perverbial light bulb goes off..
-
I can't imagine them bringing such data to court as evidence.
It would certainly seem odd to arrive in a US court with evidence collected "legally offshore", that would have been "illegal" to collect on US soil.
Again, that's assuming that the DMCA prevents the trolls from ignoring attempts to block "robots".
I'm curious how, as a practical matter, you are every going to be able to demonstrate that any evidence that might be presented was collected illegally.
-
Yes, buddhapi, I agree it's pretty involved reading.
I'll also have to read it many times, highlight parts and take notes.
It appears that the law wasn't specifically written with our intent in mind.
But, the question is, could it be used as a legal defense anyway if it was presented in a compelling manner?
Lucia, I don't think that the likes of Getty would reveal exactly where or how they collected evidence to an alleged infringer.
But, if a dispute actually became a case for the courts, the question could be brought up. Getty's counsel would have to respond.
If ignoring a "robots.txt" file is illegal in the US, and Getty stated that the data was collected in the US, then Getty would likely lose.
In a similar scenario, but wherein Getty stated that the data was collected legally from Israel for example, a US judge might still frown upon the evidence.
Just my thoughts...
S.G.
-
I can't begin to guess on the legalities but my hosting service is in the US. So, any intrusion is into a machine physically located on US soil.
As for the difficulty proving something, I guess I was thinking more along the lines of hand offs. It could be that picscout might find things on Google's image base. Then, afterwards, someone sends an email to someone somewhere else: that is "person B". That person might not be excluded in robots.txt. Anyway, they aren't a robot so you don't expect them to read or obey robots.txt. If asked the lawyer says the evidence was obtained when "person B" visited google, clicked a link and then got a screen shot. Maybe they saved the html for the page and so on. It would be true enough. Even if something illegal was done and even if it would matter, getting back to how 'person B' knew to do anything and tracing it to anything illegal and demonstrating it in court might be pretty hard.
That's not to say you shouldn't look into it. After all, I could be wrong and it might be that 100% of the evidence came from a picscout bot prowling around disobeying robots.txt.
-
While I agree this scenario is certainly plausible, I doubt Getty or any of the other trolls would go to far, and I could be wrong on this, but I think it is more of an automated system, and they would not go thru the trouble of actually sending someone somewhere or sending emails to "alert" someone they have a possible hit. I have since made it standard procedure to not have my images indexed, nor my pages cached by any of the major search engines. If nothing else it makes it a little harder on the trolls..
I'v been seeing this little gem and some variations popping up more and more on sites as people get wind of the troll activity, I don't know if it would help in a court of law, but I don't think it could hurt..
"With the exception of the main search engine bots, the use of data-mining/extraction software or bots by any company that is not collecting data for a search engine is strictly forbidden. In particular the use of Picscout will be treated as 'hacking' and as such will be prosecuted."
The links are below are a bit more in depth, again the problem would be proving picscout was sucking your bandwidth, the RIAA fiasco has made it difficult to use IP's as any sort of evidence..not to mention the issue with them being in Israel as well.
http://dcdirectactionnews.wordpress.com/legal-notice-to-getty-images-scanning-robot-picscout-is-not-authorized-to-access-this-site/
http://newsdata.info/terms/terms.htm
-
The links are below are a bit more in depth, again the problem would be proving picscout was sucking your bandwidth, the RIAA fiasco has made it difficult to use IP's as any sort of evidence..not to mention the issue with them being in Israel as well.
Sorry, but I'm not up to speed. I did a little search on RIAA, but I'm not sure which part you consider the fiasco and I don't know what it makes it difficult to use IPs as any sort of evidence. Could you elaborate? Thanks!
-
Basically what it boils down to is the movie / music have been on a mission filing suit for millions of illegally downloaded movies, music, and more specifically porn movies, via torrent sites, they would base the accusations on IP addresses, two problems arise here. 1 most users have dynamic IP's which change frequently, and 2. an unsecured network allows anyone within reach of that network to download from that IP.
One case comes to mind is that of the 85 year old lady, who supposedly downloaded a slew of full lenght porn movies.. who's to say her network was not secure and i sat at the curb in front of her house and downloaded the movies??...another item to mention is the use of proxy's, with a little know how it is very easy to hide behind multiple layers..
My guess is Picscout is probably using static IPs, but sending out their dirty little bot behind a proxy, making it nearly impossible to lead back to them..
-
Regarding circumvention and robots.txt and archive.org this case addresses a lot of those issues:
http://www.paed.uscourts.gov/documents/opinions/07D0852P.pdf
i first heard about it here:
http://lawmeme.research.yale.edu/modules.php?name=News&file=print&sid=1543
-
Lettered--
I think that case is interesting, but I don't think it's what budhappi is talking about. In that case, as far as I can see Harding just didn't do anything to violate robots.txt or access Healthcare advocates server illegally.
What's discussed further upstream are things like this:
1) Server X includes a "disallow imagecrawler" in their robots.txt. But image crawler crawls anyway. Their crawling would be violating robots.txt. (Lots of bots violate this because robots.txt is like a verbal 'imagecrawler, please go away'. This violation didn't happen in Healthcare Advocates v Harding. )
2) Server X excludes 'imagecrawler" useragent in .htaccess. This is a little harder for agents to get around because .htaccess is more like a bouncer that picks up the agent and kicks them out. But a browser or bot can 'fake' their useragent. That is: it can present a type of fake id. So, maybe imagecrawler presents a fake ID. The bouncer can't recognize them, and lets them in. This didn't happen in Healthcare Advocates v Harding.
3) Server X excludes everything from the server or ISP where 'imagecrawler' operates. (Example: if you wanted to keep everyone who surfs using a comcast out, you can exclude comcast.com) This is a bit like the bouncer too. It just looks at a different thing. The image crawler just goes to find another ISP. Maybe they go to ATT. Now they aren't excluded.
None of these three things happened in Health Advocates v. Harding. But some suggests picscout might do them. (I don't know if there is any evidence picscout does do them.)
It seems to me the Health Advocates v. Harding can't tell us anything about the legality of 1-3 because none of those things happened.
-
I found Lettered's posting to be quite interesting.
I'm going to read every single word of those documents when I have a little time.
It's the closest thing that I've seen so far as to a defendant testing a defense based on Internet archives and also robot spiders.
Just my opinions, and I wanted to take the opportunity to thank Lettered for the post.
S.G.
-
Starting on page 25 of the above link I posted, the court seems to me to say that
1) ignoring robots.txt to gain access to an otherwise public web page does not violate the circumvention clauses of the DMCA
2) circumventing the wayback machine's protocol to gain access to user blocked history could constitute a violation of the circumvention clauses of the DMCA
thats the way I read it anyway.
-
Lettered,
Other than with some pedantic nitpicking , I don't disagree with your interpretation of what the court might be saying about robots.txt.
But the reason I was saying that I don't think this is what buddhapi started out discussing is that in his introductory comment, he bolded this from the law:
It is also a crime under US law to use any trick or false information to gain access to a computer system. Running a robot that pretends to be a user by faking its useragent is crime under US Law because it is using false information to gain access to a computer system."
Notice the bit he quotes says nothing about robots.txt. It says something about faking a user agent.
What I'm going to say next has nothing to do with legalities. It has to do with nuts and bolts of running a web site:
Nothing needs to fake a user agent to get around robots.txt. This is because robots.txt is not a block. (In fact, the reason the court seems to recognize disobeying robots.txt isn't necessarily violating DMCA is that robots.txt is not really a block.)
Faking user agents is a way to get around a real, honest to goodness block like the kind in .htaccess on Apache. Also: In discussions above and on other thread, people have been talking about picscout faking useragents.
So while I think a case discussing robots.txt especially as it involves the Wayback machine is interesting, I think maybe people are getting distracted by an interesting discussion of robots.txt and forgetting about the issue of faking useragents.
-
Lucia is right on the money, robots.txt means nothing in the way of any laws that i'm aware of, it's just a set of directions/instructions, it is well known that picscout fakes/masks its user-agent along with ignoring robots.txt.
By faking /masking the user-agent it is nearly impossible to block the bot via htaccess, hence I have taken other measures, and as of now I have seen no sign of picscout in my logs.
I should also add that the items referenced by lettered now reside on my list of reading items...just as soon as I finish dissecting the DCMA
-
Lucia,
Understood. My post was mainly in response to the thread in general where the question of whether robots.txt constitutes a copyright protection arose. That said, I think you could still find some clues regarding the question in the original post on this thread. The case seems to place importance on the fact that:
"Even if it the Harding firm knew that Healthcare Advocates did not give them permission to see its archived screenshots, lack of permission is not circumvention under the DMCA".
With the "lack of permission" issue off the table, by faking the user agent aren't they are basically just requesting the information without identifying themselves and receiving it? I can't see how that could be construed as circumvention under the DMCA.
I hope I am wrong, by the way. I'm not saying picscout isn't breaking any laws ... i just don't think they are violating the DMCA circumvention laws.
Lettered,
Other than with some pedantic nitpicking , I don't disagree with your interpretation of what the court might be saying about robots.txt.
But the reason I was saying that I don't think this is what buddhapi started out discussing is that in his introductory comment, he bolded this from the law:
It is also a crime under US law to use any trick or false information to gain access to a computer system. Running a robot that pretends to be a user by faking its useragent is crime under US Law because it is using false information to gain access to a computer system."
Notice the bit he quotes says nothing about robots.txt. It says something about faking a user agent.
What I'm going to say next has nothing to do with legalities. It has to do with nuts and bolts of running a web site:
Nothing needs to fake a user agent to get around robots.txt. This is because robots.txt is not a block. (In fact, the reason the court seems to recognize disobeying robots.txt isn't necessarily violating DMCA is that robots.txt is not really a block.)
Faking user agents is a way to get around a real, honest to goodness block like the kind in .htaccess on Apache. Also: In discussions above and on other thread, people have been talking about picscout faking useragents.
So while I think a case discussing robots.txt especially as it involves the Wayback machine is interesting, I think maybe people are getting distracted by an interesting discussion of robots.txt and forgetting about the issue of faking useragents.
-
With the "lack of permission" issue off the table, by faking the user agent aren't they are basically just requesting the information without identifying themselves and receiving it? I can't see how that could be construed as circumvention under the DMCA.
I hope I am wrong, by the way. I'm not saying picscout isn't breaking any laws ... i just don't think they are violating the DMCA circumvention laws.
Not quite. The answer will be long and I'm going to follow it with further stuff.
First, I'm not a lawyer. My training is mechanical engineering, but I self host and organize my own web site. So, I can describe a little what I mean about user agents. My illustration will use blocking with .htaccess as an example of a method to block user agents. People who know more about .htaccess should feel free to correct my mis-usage of terms etc. (I'm sure to do so.)
This is going to be long because I assume lots of people don't know what certain things are. So what are the different things that get recorded when something hits a page. Here's a slightly edited example of something that I would see if something I blocked hit the address "mydomain.com/blog/name_of_page".
180.175.7.236 - - [12/Dec/2011:01:17:22 -0800] "GET /blog/name_of_page/ HTTP/1.1" 403 521 "-" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.803.0 Safari/535.1"
The part that is the useragent string is on the far right and shown in bold. I can tell I successfully blocked it because the '403' appears after "HTTP/1.1" . In contrast "200" appearing where 403 appears would mean my server sent them the page. I'll explain how I blocked this later and relate that to user agent.
But for now: What is a useragent? I found a long, good explanation is here: http://whatsmyuseragent.com/WhatsAUserAgent.asp My short approximate explanation is this:
When you surf the web, you will be using some sort of utility. This is typically a browser. I often use Firefox 8.0.1 on the mac. Firefox 8.0.1 is a useragent. This user agent will identify itself to the web site you visit by leaving a "useragent string". The string I leave is
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; rv:8.0.1) Gecko/20100101 Firefox/8.0.1
This string tells them what utility I used to download the page. Because lots of people use Firefox 8.0.1 on the Mac, the useragent string along doesn't tell them who I am.
In contrast, when google crawler visits, it doesn't use Firefox 8.0.1. It uses a different useragent. In fact it has more than one possible agent-- one agent looks at pages. one looks at images. The different crawlers tell me who they are. One says
Googlebot/2.1 ( http://www.googlebot.com/bot.html) another says
Googlebot-Image/1.0 ( http://www.googlebot.com/bot.html)
Needless to say even the braindead can figure out these are representing themselves as google, and guess they are "bots". But you can also look these up at Googles site. Note: They leave web site to learn more! These are nicely behaved bots.
Meanwhile a pesky chinese spider sometimes uses useragent strings like this:
Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)
To block anything with this useragent, my .htaccess file contains a bit of code that looks like this:
Options +FollowSymlinks
RewriteEngine on
# agents
RewriteCond %{HTTP_USER_AGENT} Baidu [nc,or]
RewriteCond %{HTTP_USER_AGENT} ^$ [or]
RewriteCond %{HTTP_USER_AGENT} Ezooms [nc,or]
RewriteCond %{HTTP_USER_AGENT} picscout [nc,or]
RewriteCond %{HTTP_USER_AGENT} java [nc,or]
# methods
RewriteCond %{REQUEST_METHOD} ^PROPFIND$ [NC,OR]
RewriteCond %{REQUEST_METHOD} ^OPTIONS$ [NC,OR]
# referrers
RewriteCond %{HTTP_REFERER} (getty|picscout) [NC]
RewriteRule .* - [F]
I've edited my block down so that I don't fill the comment-- but I've left a few key things in there, and you can ask about why they are there if you like.
For now, the "RewriteCond %{HTTP_USER_AGENT} Baidu [nc,or]" blocks anything that contains "Baidu" in the user agent. So, if something visits and shows my server "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" it is blocked. Period. When I look at my server logs, if the useragent says
"Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
The access code will say "403".
Now, in principle, if anything visits using the Baidu useragent-- a program--, it will either
a) tell the truth and leave an agent that contains "Baidu" in it,
b) leave no user agent, in which case, I will see a "-" where the useragent string belongs or
c) leaven a false user agent which is presenting false information or lying.
So (b) is "requesting the information without identifying themselves and receiving it?" but (c) is lying.
Now, if you look at the code I use to block user agents, you'll also see:
"RewriteCond %{HTTP_USER_AGENT} ^$ [or]"
This command will block anything that refuses to provide a useragent. So, that eliminates the possibility that they would gain access by merely not identifying themselves.. Because I refuse to supply pages both to visitors with Baidu in their UA string and visitors with no UA string, someone wants to use the baidu bot to crawl my site, they must lie. They can't just not tell me what useragent they used.
The text budhappi left suggests lying about the USERAGENT to gain access that is otherwise refused violates DMAC. If that is true, then anything surfing using the baidu-bot and avoiding being blocked would be violating DMAC. (But this is a legal issue, and you lawyers can decide what the DMCA says. I can only tell you what the block does.)
With regard to picscout, I've tried to block them by blocking connections with "picscout" in the UA string. But I'm not sure that appears in their UA string. Picscouts page doesn't seem to reveal what their UA string is-- which makes things a bit difficult technically, and likely legally. (Legal people could maybe look into whether we can send letter to groups whose bot refuses to tells us what the UA string is.)
Next post will discuss a related topic, but I think I've now discussed the answer to the question you actually asked.
-
Ok- but would now you might wonder as a techical nuts and bolts matter, can anything lie about the useragent?
Easily! I very easily program Firefox to leave the useragent "Googlebot/2.1 ( http://www.googlebot.com/bot.html)", or I could protram it to tell the server I am visiting using "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)". Doing so would be called "spoofing".
Spoofing can be legitimate. For example: I have spoofed the referrer and hit my own site to see whether I have correctly programmed .htaccess to block to block the baidubot. After testing, I change my user agent back to the default. Should I ever mistakenly crawl as the baidubot, I'll probably find myself blocked all over the place!
Next question: Do things spoof? Oh Yes! I could show examples of obvious spoofing, but I'm going to show one of suspected spoofing instead. Let me return to the example of something I saw in my server logs. This time I'm going to highlight something called the IP in bold:
180.175.7.236 - - [12/Dec/2011:01:17:22 -0800] "GET /blog/name_of_page/ HTTP/1.1" 403 521 "-" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.803.0 Safari/535.1"
The IP address is related to the machine through which the connection occurred. It tells me more about "who" might have connected than the useragent. In fact, I can look up 180.175.7.236 here:
http://whois.domaintools.com/180.175.7.236
What this tells me is whatever that is connects through "China Chinanet Shanghai Province Network". In fact everything with IPs starting with 180. come through "China Chinanet Shanghai Province Network".
If you do a little checking you will find that rumor has it that IPs near that value are the Baiduspider! My guess is this entry corresponds to an attempt by the baidubot to connect while spoofing the user agent. (Mind you, this is not necessarily so. But I suspect it.) Remember, if it leaves "Baidu" in the user agent, it is blocked. But it's possible for the person who programmed the bot to tell it to initially tell the truth, but change user agent if it gets back a "403" or "forbidden" message. (You can see this happen in server logs.) Now, I could maybe call a lawyer and try to assemble a case about Baidu-- but likely that would be expensive. And anyway, I might have trouble proving it was Baidu lying. After all, for all I know "China Chinanet Shanghai Province Network" is the chinese equivalent of Comcast and I'm blocking a real visitor. I doubt it, but I might be.
Since I don't have any particular need for traffic from China, I decided to deal with the huge number of spammy hits from this IP range by blocking by IP. It turns out my .htaccess also contains
order allow,deny
# baidu spider various ranges
deny from 119.
deny from 123.125.71
deny from 124.114
deny from 124.115
deny from 180
deny from 220.181
deny from 183
# china China Fujian Chinanet Fujian Province Network
deny from 120.37.209.57
# copyscape
deny from 212.100.254.105
# block picscout
deny from bezeqint.net
deny from 82.80.249
deny from 82.80.252
deny from 62.0.8.
deny from gettyimages.com
deny from gettywan.com
deny from picscout.com
deny from istockphoto.com
allow from all
Once again, this is edited to keep from filing the entire screen. The bold 'deny from 180' means I deny all IPs starting with 180-- which means no one can visit my blog if they connect through "China Chinanet Shanghai Province Network".
I'm sure you've also noticed '#block from picscout", right? All the commands between that line and "allow from all" block everything I've found that either is known or rumored to be associated with picscout or getty surfing. I have other blocks in place too.
As I've said on other thread, I'm trying to put a php script that will auto install and implement a lot of these blocks for people. I thought I'd be done more quickly, but as I checked things out, I needed to make sure it really, really does what I want it to do, and fairly easily. Given that, I might be charging a small amount for it. ($10 or so.) But it would put in blocks for things known or rumored to be getty/ picscout, etc. And do a few more things to protect people with a range of sites to some extent. (Nothing will give perfect protection. But you can make your site much less vulnerable by making it harder for picscout to crawl!)
-
I have largely stayed away from this conversation because I didn't get it. I couldn't see how violating robots.txt broke any laws. Lucia, thank you so much for bringing clarifying that spoofing the user agent may violate some laws. Also for providing a clear, succinct breakdown of what a visiting spider looks like and what it does.
I'm still not fond of developers having to "hide" from picscout. But I understand how some people would want to slam the door in the face of this intrusion.
-
mcfilms--
No one likes the fact that doors sometimes have to be locked to keep people out. But by the same token, it's sometimes wiser to lock a door rather than follow the practice of leaving it unlocked and counting on the law providing a remedy after someone comes in and takes something.
Oddly, based on the snippet of law buddhapi posted, it reads as if locking the door may be required to make the intrusion a violation of DMCA.
If we are going to use metaphors, I'd also say my example may be "locking the doors and barring the windows". The useragent blocks would be locking the door. The IP blocks might be "barring the windows". Of course, the next step is figuring out how to install the security cameras to record who tried to get in the window. :)
-
Meanwhile on the web... It looks like the ELI site isn't the only group fed up with the antics of PicScout:
http://www.webhostingtalk.com/showthread.php?t=1105828&highlight=getty (http://www.webhostingtalk.com/showthread.php?t=1105828&highlight=getty)
It seems the rapid-fire manner in which PicScout is spidering some sites is tantamount to a DoS attack. One person even had his server go down.
I noticed that group is considering contacting members of the media.
-
arghhhh.....another item on the to do list....register for an account..Had one there years ago, so might as well re-up, it would seem that the folks there have not realized that Getty now owns picscout..
-
Nice read Mcfilms!
It is unfortunate that we use a webhosting company to run our website and therefore have no access to the servers. We can only add robot.tx to our files. Last week our site crashed for over an hour and it's not the first time it's happened within the last six months. I went back and checked my traffic logs for the last year and a half. On two occasions in a six month gap, my page views skyrocketed way beyond normal.
Can someone say Picscout?
How do you access robots.txt if you don't have any access to the servers? Also, can't you request your webhosting company to block? As if you are on an Apache machine. If you are, remember the string I posted on another thread? I'd edited some stuff out. But the part in bold would have blocked the IP that group is complaining about.
order allow,deny
deny from 46.165.197.142
deny from 114.41.24.17
deny from 200.251.58.190
deny from 190.202.87.134
deny from 216.245.211.245
deny from 222.118.167.142
deny from 85.195.138.26
deny from 219.90.114.26
deny from 188.40.102.81
deny from 118.175.28.80
# baidu spider various ranges
deny from 119.
deny from 123.125.71
deny from 124.114
deny from 124.115
deny from 180
deny from 220.181
deny from 183
# china China Fujian Chinanet Fujian Province Network
deny from 120.37.209.57
# romanian spammer
deny from 94.60.1
# copyscape
deny from 212.100.254.105
# block picscout
[b]deny from bezeqint.net[/b]
deny from 82.80.249
deny from 82.80.252
deny from 62.0.8.
deny from gettyimages.com
deny from gettywan.com
deny from picscout.com
deny from istockphoto.com
# this is spoofing referrers. looked like ia archiver then stuck around with different user agent I think it ok though
#deny from compute-1.amazonaws.com
# block garlik crawler No info on how to do so in robots
deny from 178.17.32.78
# december bot
deny from 208.43.135.148
# spoofs bingbog dec
deny from sus.nukes.procesosirc.org
# dec load bunch of images
deny from cpc2-live18-0-0-cust836.know.cable.virginmedia.com
deny from c-76-101-177-151.hsd1.fl.comcast.net
# cracker bot dec
deny from 212.77.176.179
# hunters try to find wpcontent where it does not exist
deny from advancednet.pl
deny from s5.miwiredhosting.com
deny from ns1.goodafternoon.ro
# liperhey spider until i learn robotstxt block
deny from 94.75.233.28
#deny from www.lipperhey.com
# poor stuck guy
deny from unknown.blyon.com
# getting in with two slashes
deny from 74.86.120.107
deny from 200.54.72.14
allow from all
So: even if you are going through a webhosting company ask:
Am I on an apache machine? If yes, then say: Can we request blocks in .htaccess? I can't imagine many reasons why the answer would be no.
I'm on Dreamhost. I fiddle with my .htaccess block all the time.
-
I think that those toads are still operating out of Israel.
Why not just block all Israeli IP ranges?
S.G.
-
I think that those toads are still operating out of Israel.
Why not just block all Israeli IP ranges?
S.G.
I don't think they will never move the operation to the states, that would really open a can of legal issues on them..That's exactly what I did, blocked all of Israel both at the firewall and via htaccess.
-
I think that those toads are still operating out of Israel.
Why not just block all Israeli IP ranges?
S.G.
I have informative insightful blog visitors who happen to be in Israel. I don't want to block them. I do want to block picscout. It seems to be that blocking the host picscout surfs from is sufficient.
If I were running a storefront that only sold merchandize to people in the US, and would never sell to someone in Israel, I might make a different decision.
-
budhappi--
I'm not a real computer person, so what's involved in blocking at "the firewall". My blog runs on a VPN account at Dreamhost. If I have access to blocking at 'a firewall' I'd do it for some of these things. (Baidu actually... What a pest!.)
-
budhappi--
I'm not a real computer person, so what's involved in blocking at "the firewall". My blog runs on a VPN account at Dreamhost. If I have access to blocking at 'a firewall' I'd do it for some of these things. (Baidu actually... What a pest!.)
I'm assuming that your account with Dreamhost is on a shared server with VPN added?? If so you won't have access to the firewall, you would need to have a dedicated server, Dreamhost might be willing to do it for you if you can show good enough cause, and also show that it would not effect other accounts on your box in a negative way.
Personally I use a plugin that is set-up in cpanels WHM (web host manager)
http://www.countryipblocks.net/malicious-internet-traffic/block-cidr-ranges-and-multiple-ips-using-webhost-manager-whm/
-
budhappi--
Yes. I think that's the way VPN works. It's a form of shared hosting, they are dedicating a physical machine to my account.
I think a firewall that kept the baidubot off everyone's account would not harm the other users in anyway. But who knows? Maybe the other users like the baidubot. I suspect Dreamhost wouldn't be interested in letting me decide to block baidubot on some third parties account.
I thought the answer would be that the firewall as at the server level. So, for all practical purposes, I would need to be using a dedicated server. I don't need that level of resources, so it's .htacess for me!
-
Well you did think correct! It would be at the server level...