Click Official ELI Links
Get Help With Your Extortion Letter | ELI Phone Support | ELI Legal Representation Program
Show your support of the ELI website & ELI Forums through a PayPal Contribution. Thank you for supporting the ongoing fight and reporting of Extortion Settlement Demand Letters.

Author Topic: Picscout / DMCA question  (Read 18974 times)

Robert Krausankas (BuddhaPi)

  • ELI Defense Team Member
  • Administrator
  • Hero Member
  • *****
  • Posts: 3354
    • View Profile
    • ExtortionLetterInfo
Re: Picscout / DMCA question
« Reply #15 on: December 12, 2011, 07:31:42 PM »
Lucia is right on the money, robots.txt means nothing in the way of any laws that i'm aware of, it's just a set of directions/instructions, it is well known that picscout fakes/masks its user-agent along with ignoring robots.txt.

By faking /masking the user-agent it is nearly impossible to block the bot via htaccess, hence I have taken other measures, and as of now I have seen no sign of picscout in my logs.

I should also add that the items referenced by lettered now reside on my list of reading items...just as soon as I finish dissecting the DCMA 
« Last Edit: December 12, 2011, 07:58:08 PM by buddhapi »
Most questions have already been addressed in the forums, get yourself educated before making decisions.

Any advice is strictly that, and anything I may state is based on my opinions, and observations.
Robert Krausankas

I have a few friends around here..

Lettered

  • Sr. Member
  • ****
  • Posts: 256
    • View Profile
Re: Picscout / DMCA question
« Reply #16 on: December 13, 2011, 01:23:00 PM »
Lucia,

Understood.  My post was mainly in response to the thread in general where the question of whether robots.txt constitutes a copyright protection arose.  That said, I think you could still find some clues regarding the question in the original post on this thread.  The case seems to place importance on the fact that:

"Even if it the Harding firm knew that Healthcare Advocates did not give them permission to see its archived screenshots, lack of permission is not circumvention under the DMCA".  

With the "lack of permission" issue off the table, by faking the user agent aren't they are basically just requesting the information without identifying themselves and receiving it?  I can't see how that could be construed as circumvention under the DMCA.

I hope I am wrong, by the way.  I'm not saying picscout isn't breaking any laws ... i just don't think they are violating the DMCA circumvention laws.



Lettered,
Other than with some pedantic nitpicking , I don't disagree with your interpretation of what the court might be saying about robots.txt.

But the reason I was saying that I don't think this is what buddhapi started out discussing is that in his introductory comment, he bolded this from the law:

Quote
It is also a crime under US law to use any trick or false information to gain access to a computer system. Running a robot that pretends to be a user by faking its useragent is crime under US Law because it is using false information to gain access to a computer system."

Notice the bit he quotes says nothing about robots.txt. It says something about faking a user agent.

What I'm going to say next has nothing to do with legalities. It has to do with nuts and bolts of running a web site:

Nothing needs to fake a user agent to get around robots.txt.  This is because robots.txt is not a block. (In fact, the reason the court seems to recognize disobeying robots.txt isn't necessarily violating DMCA is that robots.txt is not really a block.)

Faking user agents is a way to get around a real, honest to goodness block like the kind in .htaccess on Apache.  Also: In discussions above and on other thread, people have been talking about picscout faking useragents.

So while I think a case discussing robots.txt especially as it involves the Wayback machine is interesting, I think maybe people are getting distracted by an interesting discussion of robots.txt and forgetting about the issue of faking useragents.  






lucia

  • Hero Member
  • *****
  • Posts: 767
    • View Profile
Re: Picscout / DMCA question
« Reply #17 on: December 13, 2011, 03:08:23 PM »
With the "lack of permission" issue off the table, by faking the user agent aren't they are basically just requesting the information without identifying themselves and receiving it?  I can't see how that could be construed as circumvention under the DMCA.

I hope I am wrong, by the way.  I'm not saying picscout isn't breaking any laws ... i just don't think they are violating the DMCA circumvention laws.

Not quite. The answer will be long and I'm going to follow it with further stuff.

First, I'm not a lawyer. My training is mechanical engineering, but I self host and organize my own web site.  So, I can describe a little what I mean about user agents. My illustration will use blocking with .htaccess as an example of a method to block user agents. People who know more about .htaccess should feel free to correct my mis-usage of terms etc. (I'm sure to do so.)

This is going to be long because I assume lots of people don't know what certain things are.  So what are the different things that get recorded when something hits a page.  Here's a slightly edited example of something that I would see if something I blocked hit the address "mydomain.com/blog/name_of_page".

180.175.7.236 - - [12/Dec/2011:01:17:22 -0800] "GET /blog/name_of_page/ HTTP/1.1" 403 521 "-" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.803.0 Safari/535.1"

The part that is the useragent string is on the far right and shown in bold. I can tell I successfully blocked it because the '403' appears after "HTTP/1.1" .  In contrast "200" appearing where 403 appears would mean my server sent them the page.   I'll explain how I blocked this later and relate that to user agent.


But for now: What is a useragent? I found a long, good explanation is here: http://whatsmyuseragent.com/WhatsAUserAgent.asp  My short approximate explanation is this:

When you surf the web, you will be using some sort of utility. This is typically a browser. I often use Firefox 8.0.1 on the mac. Firefox 8.0.1 is a useragent.  This user agent will identify itself to the web site you visit by leaving a "useragent string". The string I leave is

Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; rv:8.0.1) Gecko/20100101 Firefox/8.0.1

This string tells them what utility I used to download the page.  Because lots of people use Firefox 8.0.1 on the Mac, the useragent string along doesn't tell them who I am.  

In contrast, when google crawler visits, it doesn't  use Firefox 8.0.1. It uses a different useragent. In fact it has more than one possible agent-- one agent looks at pages. one looks at images.  The different crawlers tell me who they are. One says

Googlebot/2.1 ( http://www.googlebot.com/bot.html) another says
Googlebot-Image/1.0 ( http://www.googlebot.com/bot.html)

Needless to say even the braindead can figure out these are representing themselves as google, and guess they are "bots".  But you can also look these up at Googles site. Note: They leave web site to learn more! These are nicely behaved bots.  

Meanwhile a pesky chinese spider sometimes uses useragent strings like this:
Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)

To block anything with this useragent, my .htaccess file contains a bit of code that looks like this:

Options +FollowSymlinks
RewriteEngine on
# agents
RewriteCond %{HTTP_USER_AGENT} Baidu [nc,or]
RewriteCond %{HTTP_USER_AGENT} ^$ [or]
RewriteCond %{HTTP_USER_AGENT} Ezooms [nc,or]
RewriteCond %{HTTP_USER_AGENT} picscout [nc,or]
RewriteCond %{HTTP_USER_AGENT} java [nc,or]
# methods
RewriteCond %{REQUEST_METHOD} ^PROPFIND$ [NC,OR]
RewriteCond %{REQUEST_METHOD} ^OPTIONS$ [NC,OR]
# referrers
RewriteCond %{HTTP_REFERER} (getty|picscout) [NC]
RewriteRule .* - [F]

I've edited my block down so that I don't fill the comment-- but I've left a few key things in there, and you can ask about why they are there if you like.

For now, the  "RewriteCond %{HTTP_USER_AGENT} Baidu [nc,or]" blocks anything that contains "Baidu" in the user agent. So, if something visits and shows my server "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" it is blocked.  Period.  When I look at my server logs, if the useragent says

"Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
The access code will say "403".

Now, in principle, if anything visits using the Baidu useragent-- a program--, it will either
a) tell the truth and leave an agent that contains "Baidu" in it,  
b) leave no user agent, in which case, I will see a "-" where the useragent string belongs or
c) leaven a false user agent which is presenting false information or lying.  

So (b) is "requesting the information without identifying themselves and receiving it?" but (c) is lying.  

Now, if you look at the code I use to block user agents, you'll also see:

 "RewriteCond %{HTTP_USER_AGENT} ^$ [or]"

This command will block  anything that refuses to provide a useragent.  So, that eliminates the possibility that they would gain access by merely not identifying themselves..  Because I refuse to supply pages both to visitors with Baidu in their UA string and visitors with no UA string,  someone wants to use the baidu bot to crawl my site, they must lie.   They can't just not tell me what useragent they used.

The text budhappi left suggests lying about the USERAGENT to gain access that is otherwise refused violates DMAC. If that is true, then anything surfing using the baidu-bot and avoiding being blocked would be violating DMAC. (But this is a legal issue, and you lawyers can decide what the DMCA says. I can only tell you what the block does.)

With regard to picscout, I've tried to block them by blocking connections with "picscout" in the UA string. But I'm not sure that appears in their UA string.  Picscouts page doesn't seem to reveal what their UA string is-- which makes things a bit difficult technically, and likely legally. (Legal people could maybe look into whether we can send letter to groups whose bot refuses to tells us what the UA string is.)

Next post will discuss a related topic, but I think I've now discussed the answer to the question you actually asked.
« Last Edit: December 13, 2011, 10:28:14 PM by lucia »

lucia

  • Hero Member
  • *****
  • Posts: 767
    • View Profile
Re: Picscout / DMCA question
« Reply #18 on: December 13, 2011, 03:15:00 PM »
Ok- but would now you might wonder as a techical nuts and bolts matter, can anything lie about the useragent?

Easily! I very easily program Firefox to leave the useragent "Googlebot/2.1 ( http://www.googlebot.com/bot.html)", or I could protram it to tell the server I am visiting using  "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)". Doing so would be called "spoofing".

Spoofing can be legitimate. For example: I have spoofed the referrer and hit my own site to see whether I have correctly programmed .htaccess to block to block the baidubot.  After testing, I change my user agent back to the default. Should I ever mistakenly crawl as the baidubot, I'll probably find myself blocked all over the place! 

Next question: Do things spoof? Oh Yes! I could show examples of obvious spoofing, but I'm going to show one of suspected spoofing instead.   Let me return to the example of something I saw in my server logs. This time I'm going to highlight something called the IP in bold:

180.175.7.236 - - [12/Dec/2011:01:17:22 -0800] "GET /blog/name_of_page/ HTTP/1.1" 403 521 "-" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.803.0 Safari/535.1"

The IP address is related to the machine through which the connection occurred.   It tells me more about "who" might have connected than the useragent.   In fact, I can look up 180.175.7.236 here: 
http://whois.domaintools.com/180.175.7.236
What this tells me is whatever that is connects through "China    Chinanet Shanghai Province Network". In fact everything with IPs starting with 180. come through "China    Chinanet Shanghai Province Network".

If you do a little checking you will find that rumor has it that IPs near that value are the Baiduspider!  My guess is this entry corresponds to an attempt by the baidubot to connect while spoofing the user agent. (Mind you, this is not necessarily so. But I suspect it.)    Remember, if it leaves "Baidu" in the user agent, it is blocked.  But it's possible for the person who programmed the bot to tell it to initially tell the truth, but change user agent if it gets back a "403" or "forbidden" message. (You can see this happen in server logs.) Now, I could maybe call a lawyer and try to assemble a case about Baidu-- but likely that would be expensive. And anyway, I might have trouble proving it was Baidu lying. After all, for all I know "China    Chinanet Shanghai Province Network" is the chinese equivalent of Comcast and I'm blocking a real visitor. I doubt it, but I might be.

Since I don't have any particular need for traffic from China, I decided to deal with the huge number of spammy hits from this IP range by blocking by  IP.  It turns out my .htaccess also contains

order allow,deny
# baidu spider various ranges
deny from 119.
deny from 123.125.71
deny from 124.114
deny from 124.115
deny from 180
deny from 220.181
deny from 183
# china China Fujian Chinanet Fujian Province Network
deny from 120.37.209.57
# copyscape
deny from 212.100.254.105
# block picscout
deny from bezeqint.net
deny from 82.80.249
deny from 82.80.252
deny from 62.0.8.
deny from gettyimages.com
deny from gettywan.com
deny from picscout.com
deny from istockphoto.com
allow from all

Once again, this is edited to keep from filing the entire screen.  The bold 'deny from 180' means I deny all IPs starting with 180-- which means no one can visit my blog if they connect through "China Chinanet Shanghai Province Network". 

I'm sure  you've also noticed '#block from picscout", right?  All the commands between that line and "allow from all" block everything I've found that either is known or rumored to be associated with picscout or getty surfing.  I have other blocks in place too.

As I've said on other thread, I'm trying to put a php script that will auto install and implement a lot of these blocks for people. I thought I'd be done more quickly, but as I checked things out, I needed to make sure it really, really does what I want it to do, and fairly easily.  Given that, I might be charging a small amount for it. ($10 or so.)  But it would put in blocks for things known or rumored to be getty/ picscout, etc. And do a few more things to protect people with a range of sites to some extent. (Nothing will give perfect protection. But you can make your site much less vulnerable by making it harder for picscout to crawl!)

Jerry Witt (mcfilms)

  • Hero Member
  • *****
  • Posts: 682
    • View Profile
    • Motion City
Re: Picscout / DMCA question
« Reply #19 on: December 13, 2011, 07:06:01 PM »
I have largely stayed away from this conversation because I didn't get it. I couldn't see how violating robots.txt broke any laws. Lucia, thank you so much for bringing clarifying that spoofing the user agent may violate some laws. Also for providing a clear, succinct breakdown of what a visiting spider looks like and what it does.

I'm still not fond of developers having to "hide" from picscout. But I understand how some people would want to slam the door in the face of this intrusion. 
Although I may be a super-genius, I am not a lawyer. So take my scribblings for what they are worth and get a real lawyer for real legal advice. But if you want media and design advice, please visit Motion City at http://motioncity.com.

lucia

  • Hero Member
  • *****
  • Posts: 767
    • View Profile
Re: Picscout / DMCA question
« Reply #20 on: December 13, 2011, 09:52:14 PM »
mcfilms--
No one likes the fact that doors sometimes have to be locked to keep people out. But by the same token, it's sometimes wiser to lock a door rather than follow the practice of leaving it unlocked and counting on the law providing a remedy after someone comes in and takes something. 

Oddly, based on the snippet of law buddhapi posted, it reads as if locking the door may be required to make the intrusion a violation of DMCA.   

If we are going to use metaphors, I'd also say my example may be "locking the doors and barring the windows".  The useragent blocks would be locking the door. The IP blocks might be "barring the windows".  Of course, the next step is figuring out how to install the security cameras to record who tried to get in the window. :)

Jerry Witt (mcfilms)

  • Hero Member
  • *****
  • Posts: 682
    • View Profile
    • Motion City
Re: Picscout / DMCA question
« Reply #21 on: December 14, 2011, 08:58:28 PM »
Meanwhile on the web... It looks like the ELI site isn't the only group fed up with the antics of PicScout:

http://www.webhostingtalk.com/showthread.php?t=1105828&highlight=getty

It seems the rapid-fire manner in which PicScout is spidering some sites is tantamount to a DoS attack. One person even had his server go down.

I noticed that group is considering contacting members of the media.
Although I may be a super-genius, I am not a lawyer. So take my scribblings for what they are worth and get a real lawyer for real legal advice. But if you want media and design advice, please visit Motion City at http://motioncity.com.

Robert Krausankas (BuddhaPi)

  • ELI Defense Team Member
  • Administrator
  • Hero Member
  • *****
  • Posts: 3354
    • View Profile
    • ExtortionLetterInfo
Re: Picscout / DMCA question
« Reply #22 on: December 14, 2011, 09:27:33 PM »
arghhhh.....another item on the to do list....register for an account..Had one there years ago, so might as well re-up, it would seem that the folks there have not realized  that Getty now owns picscout..
Most questions have already been addressed in the forums, get yourself educated before making decisions.

Any advice is strictly that, and anything I may state is based on my opinions, and observations.
Robert Krausankas

I have a few friends around here..

lucia

  • Hero Member
  • *****
  • Posts: 767
    • View Profile
Re: Picscout / DMCA question
« Reply #23 on: December 14, 2011, 10:28:12 PM »
Nice read Mcfilms!

It is unfortunate that we use a webhosting company to run our website and therefore have no access to the servers. We can only add robot.tx to our files. Last week our site crashed for over an hour and it's not the first time it's happened within the last six months. I went back and checked my traffic logs for the last year and a half. On two occasions in a six month gap, my page views skyrocketed way beyond normal.

Can someone say Picscout?
How do you access robots.txt if you don't have any access to the servers?  Also, can't you request your webhosting company to block?  As if you are on an Apache machine. If you are, remember the string I posted on another thread?  I'd edited some stuff out.  But the part in bold would have blocked the IP that group is complaining about.

Code: [Select]
order allow,deny
deny from 46.165.197.142
deny from 114.41.24.17
deny from 200.251.58.190
deny from 190.202.87.134
deny from 216.245.211.245
deny from 222.118.167.142
deny from 85.195.138.26
deny from 219.90.114.26
deny from 188.40.102.81
deny from 118.175.28.80
# baidu spider various ranges
deny from 119.
deny from 123.125.71
deny from 124.114
deny from 124.115
deny from 180
deny from 220.181
deny from 183
# china China Fujian Chinanet Fujian Province Network
deny from 120.37.209.57
# romanian spammer
deny from 94.60.1
# copyscape
deny from 212.100.254.105
# block picscout
[b]deny from bezeqint.net[/b]
deny from 82.80.249
deny from 82.80.252
deny from 62.0.8.
deny from gettyimages.com
deny from gettywan.com
deny from picscout.com
deny from istockphoto.com
# this is spoofing referrers. looked like ia archiver then stuck around with different user agent I think it ok though
#deny from compute-1.amazonaws.com
# block garlik crawler No info on how to do so in robots
deny from 178.17.32.78
# december bot
deny from 208.43.135.148
# spoofs bingbog dec
deny from sus.nukes.procesosirc.org
# dec load bunch of images
deny from cpc2-live18-0-0-cust836.know.cable.virginmedia.com
deny from c-76-101-177-151.hsd1.fl.comcast.net
# cracker bot dec
deny from 212.77.176.179
# hunters try to find wpcontent where it does not exist
deny from advancednet.pl
deny from s5.miwiredhosting.com
deny from ns1.goodafternoon.ro
# liperhey spider until i learn robotstxt block
deny from 94.75.233.28
#deny from www.lipperhey.com
# poor stuck guy
deny from unknown.blyon.com
# getting in with two slashes
deny from 74.86.120.107
deny from 200.54.72.14
allow from all

So: even if you are going through a webhosting company ask:
Am I on an apache machine?  If yes, then say: Can we request blocks in .htaccess? I can't imagine many reasons why the answer would be no.

I'm on Dreamhost. I fiddle with my .htaccess block all the time.   

SoylentGreen

  • Hero Member
  • *****
  • Posts: 1503
    • View Profile
Re: Picscout / DMCA question
« Reply #24 on: December 14, 2011, 11:50:33 PM »
I think that those toads are still operating out of Israel.
Why not just block all Israeli IP ranges?

S.G.


Robert Krausankas (BuddhaPi)

  • ELI Defense Team Member
  • Administrator
  • Hero Member
  • *****
  • Posts: 3354
    • View Profile
    • ExtortionLetterInfo
Re: Picscout / DMCA question
« Reply #25 on: December 15, 2011, 07:47:13 AM »
I think that those toads are still operating out of Israel.
Why not just block all Israeli IP ranges?

S.G.

I don't think they will never move the operation to the states, that would really open a can of legal issues on them..That's exactly what I did, blocked all of Israel both at the firewall and via htaccess.
Most questions have already been addressed in the forums, get yourself educated before making decisions.

Any advice is strictly that, and anything I may state is based on my opinions, and observations.
Robert Krausankas

I have a few friends around here..

lucia

  • Hero Member
  • *****
  • Posts: 767
    • View Profile
Re: Picscout / DMCA question
« Reply #26 on: December 15, 2011, 03:11:53 PM »
I think that those toads are still operating out of Israel.
Why not just block all Israeli IP ranges?

S.G.

I have informative insightful blog visitors who happen to be in Israel.  I don't want to block them. I do want to block picscout. It seems to be that blocking the host picscout surfs from is sufficient. 

If I were running a storefront that only sold merchandize to people in the US, and would never sell to someone in Israel, I might make a different decision. 


lucia

  • Hero Member
  • *****
  • Posts: 767
    • View Profile
Re: Picscout / DMCA question
« Reply #27 on: December 15, 2011, 03:14:57 PM »
budhappi--
I'm not a real computer person, so what's involved in blocking at "the firewall".  My blog runs on a VPN account at Dreamhost. If I have access to blocking at 'a firewall' I'd do it for some of these things. (Baidu actually... What a pest!.)

Robert Krausankas (BuddhaPi)

  • ELI Defense Team Member
  • Administrator
  • Hero Member
  • *****
  • Posts: 3354
    • View Profile
    • ExtortionLetterInfo
Re: Picscout / DMCA question
« Reply #28 on: December 15, 2011, 03:41:13 PM »
budhappi--
I'm not a real computer person, so what's involved in blocking at "the firewall".  My blog runs on a VPN account at Dreamhost. If I have access to blocking at 'a firewall' I'd do it for some of these things. (Baidu actually... What a pest!.)

I'm assuming that your account with Dreamhost is on a shared server with VPN added?? If so you won't have access to the firewall, you would need to have a dedicated server, Dreamhost might be willing to do it for you if you can show good enough cause, and also show that it would not effect other accounts on your box in a negative way.


Personally I use a plugin that is set-up in cpanels WHM (web host manager)

http://www.countryipblocks.net/malicious-internet-traffic/block-cidr-ranges-and-multiple-ips-using-webhost-manager-whm/
Most questions have already been addressed in the forums, get yourself educated before making decisions.

Any advice is strictly that, and anything I may state is based on my opinions, and observations.
Robert Krausankas

I have a few friends around here..

lucia

  • Hero Member
  • *****
  • Posts: 767
    • View Profile
Re: Picscout / DMCA question
« Reply #29 on: December 15, 2011, 04:17:45 PM »
budhappi--
Yes. I think that's the way VPN works. It's a form of shared hosting, they are dedicating a physical machine to my account. 

I think a firewall that kept the baidubot off everyone's account would not harm the other users in anyway. But who knows? Maybe the other users like the baidubot. I suspect Dreamhost wouldn't be interested in letting me decide to block baidubot on some third parties account. 

I thought the answer would be that the firewall as at the server level. So, for all practical purposes, I would need to be using a dedicated server. I don't need that level of resources, so it's .htacess for me!

 

Official ELI Help Options
Get Help With Your Extortion Letter | ELI Phone Support Call | ELI Defense Letter Program
Show your support of the ELI website & ELI Forums through a PayPal Contribution. Thank you for supporting the ongoing fight and reporting of Extortion Settlement Demand Letters.