ExtortionLetterInfo Forums
ELI Forums => Getty Images Letter Forum => Topic started by: GoGetter on January 28, 2013, 05:28:16 PM
-
I am really enjoying getting to know all about GETTY IMAGES and their nasty practices (I have full respect for copyright - but none for this business model).
I do wish I knew earlier so I could have spent more of my life educating other people and trying to undermine them. The more people they piss off and the more ammunition those people get to fight them back the more we all win .. right? Do we have an up to date IP Range to block? If so I would appreciate it.
The sooner everyone starts defending themselves from this parasite the better. I know there is some info in other threads but I didn't find a 2013 list. Thanks.
-
Here is what I have currently on Pic-Scout and others. I have not updated my file in a while so others may have additional info. Lucia is who I would ask.
----------------------------------------------------------------------------------
Robot.txt and IP blocks
80 legs
80 legs is a fast Internet crawler which crawls approximately 40,000,000 URLs a day. Unlike a lot of web crawlers 80 legs cannot be stopped by blocking IP addresses as it constantly rotates between thousands of IP addresses. The only way of effectively stopping 80 legs is with the robot.txt file. The following needs to be added to the robot.txt file to stop 80 legs:
User-agent: 008
Disallow: /
Archive.org
Archive.org also known as the way back machine crawls the web taking screenshots of everything it finds to keep is an Internet archive. The problem with this is web crawlers like Pic-Scout go and others will crawl archive.org and find images on webpages that may no longer even be active or have been changed but Pic-Scout that will pass this information along to Getty. Getty will then take this information and pass along a demand letter saying you were infringing on one of our images back on this date. For this reason I believe archive.org should be blocked from crawling your site. Archive.org is blocked with a robot.txt file, the following should be added to the file:
User-agent: ia_archiver
Disallow: /
Pic-Scout
Pic-Scout was developed to crawl the web searching for copyrighted images, it can identify copyrighted images even if they have been modified as long as 5% of the original still remains. Getty started using Pic-Scout and liked it so much that they bought the company so they can control it. Unlike the majority of web crawlers Pic-Scout will ignore requests from robot.txt not to crawl a site so Pic-Scout must be blocked by IP range. My current information shows that Pic-Scout 's IP range operates under:
IP range 72.26.192.0 - 72.26.223.255
Also on May 13th of 2012in what appears to be an attempt to hide their activity Pic-Scout purchased a new domain name called 411images.org this site activity was traced back to Pic-Scout’s IP addresses. It is interesting that one of the locations for these IP addresses is traced back to Israel were Getty has just been hit with a class-action lawsuit for $12 million for sending out demand letters and attempting to collect on images that they have no legal right to collect on. Below is the IP address information for
411images.org:
DNS01.411IMAGES.ORG
IP Address 72.26.211.146
Location UNITED STATES
Managed By VOXEL DOT NET INC.
Domains 1
DNS02.411IMAGES.ORG
IP Address 82.80.249.150
Location ISRAEL
Managed By BEZEQ INTERNATIONAL-LTD
Domains 1
Pic-Scout also has a nasty little brother called Image Exchange. Image Exchange is an add-on that will work with Firefox Chrome and Internet Explorer and is apparently designed to be run if you find it image that you like to determine if it is a copyrighted image. If Image Exchange recognizes the image as a copyrighted image it will then take you to where you can license the image. There are two drawbacks with Image Exchange the first being it does not have a complete database of copyrighted images so should not be trusted as to the definitive answer FA images copyrighted or not. When this add-on was taken to Getty's website and run on a page of Getty's images only a few images came back as copyrighted. The most important reason why image exchange should be blocked is that when it does find a copyrighted image it immediately tattles back to Pic-Scout so they can notify the owner of the image to check the registration and possibly send out a demand letter. I do not recommend the use of the image exchange add-on or app as it will not guarantee your images copyright free and may end up putting somebody else on the infringement roller coaster.
TinEye.com
You may also wish to exclude TinEye.com. TinEye is a program like pic Scout that crawls the web taking samples of images off of page webpages. It then stores these images and you can go to the website and upload an image and it will show you all other instances where it has found this image on the Internet. Getty has also been known to use TinEye as a quick and easy method of locating webpages in which to send demand letters to. TinEye.com may currently be blocked by use of the robot.txt file. To block TinEye add the following to your robot.txt file:
User-agent: TinEye
Disallow: /
Note: since writing this it has come to my attention that TinEye may sometimes ignore the robot.txt file. Useragentstring.com has identified the following IP addresses as tracing back to TinEye and should be blocked as an added layer of security.
It lists IPs as
204.15.199.142 - 142-199-15-204-static.prioritycolo.com
41.68.22.0 - 41.68.22.0
66.230.232.19 - mail.macrobright.com
67.202.44.125 - ec2-67-202-44-125.compute-1.amazonaws.com
67.202.48.109 - 0
75.101.176.194 - ec2-75-101-176-194.compute-1.amazonaws.com
75.101.238.112 - ec2-75-101-238-112.compute-1.amazonaws.com
-
awesome...
-
This is all information I found here and most came from Lucia, she would know about any updates or changes since the original posts.
-
Oddly-- I don't know much on updates because I block so many things at Cloudflare. So few things scrape my images in obvious ways, and now very few agent visit with "no referrer/no user agent" pairs (which was a symptom of Image Search). I think you have to ban lots of stuff because I think image groups are now likely using accounts on many popular servers (Go Daddy, BlueHost and so forth.)
-
Wow, sounds like you really have your set up dialed in!
Oddly-- I don't know much on updates because I block so many things at Cloudflare. So few things scrape my images in obvious ways, and now very few agent visit with "no referrer/no user agent" pairs (which was a symptom of Image Search). I think you have to ban lots of stuff because I think image groups are now likely using accounts on many popular servers (Go Daddy, BlueHost and so forth.)
-
all of these are going into .htaccess
-
some of them will go in .htaccess and some in robot.txt
where you see structure as follows add them to lines in robot
User-agent: 008
Disallow: /
The IP address you will add to htaccess like so
order allow,deny
deny from 82.80.249.150
deny from 72.26.211.146
allow from all
etc
Of course the intention of blocking image scrapers would not be to allow you to perform copyright infringement, but it would alllow you to be better defended from band width eating image scrapers and devious companies like those sending extortion letter to people who should be getting take down notices. I have never heard of a more disturbing business model than this and I encourage everyone vaguely concerned to take up arms to defend themselves.
-
There is no method of blocking scrapers that would be sufficient to protect you if you were violating copyright. If a resource is on the web, you can't be certain no one can get to it.
The main reasons to block are (a) to raise their costs of scraping, (b) to lower you costs of hosting, (c) just to keep them off because they irritate the heck out of you or similar.
-
This isn't about copyright. These scumbags don’t care about copyright infringement. They're just interested in extorting cash under the guise of copyright infringement.
We stopped buying from them (as I'm sure most others like us have) due to their recent ridiculous cost increases. With this behaviour of theirs, they are not even worth the risk of buying and using their stock photos for any purpose. I don't even feel very safe buying and using pictures from bigstockphoto because of them. I'm encouraging customers to avoid bulk outlets like this as traps, and use custom art from local freelancers.
Artists trying to make a living should take notice.
I digress... would blocking these IPs in .htaccess prevent them from impacting server resources? I assume they would completely ignore robots.txt.
-
This isn't about copyright. These scumbags don’t care about copyright infringement. They're just interested in extorting cash under the guise of copyright infringement.
We stopped buying from them (as I'm sure most others like us have) due to their recent ridiculous cost increases. With this behaviour of theirs, they are not even worth the risk of buying and using their stock photos for any purpose. I don't even feel very safe buying and using pictures from bigstockphoto because of them. I'm encouraging customers to avoid bulk outlets like this as traps, and use custom art from local freelancers.
Artists trying to make a living should take notice.
I digress... would blocking these IPs in .htaccess prevent them from impacting server resources? I assume they would completely ignore robots.txt.
you assume correctly, robots.txt is a waste of time in terms of picscout and other scrapers/bots..it will be a never ending process..blocking the IP range picscout uses via htaccess will work, until they switch up the ip's they use, and this also won't do anygood if they use proxies..
-
I have noticed that Getty Images over the last few days has been using proxies. Though I have a thorough listing of thier IP addresses, the other day I noticed I could still access there site though it is blocked by my firewall and using nslookup, I saw that they were using different IP addresses. 5 minutes later, it was back on the usual IP addresses. Now using domain name to block, but they will figure a way around that too. :(
-
I have noticed that Getty Images over the last few days has been using proxies. Though I have a thorough listing of thier IP addresses, the other day I noticed I could still access there site though it is blocked by my firewall and using nslookup, I saw that they were using different IP addresses. 5 minutes later, it was back on the usual IP addresses. Now using domain name to block, but they will figure a way around that too. :(
Did you mean picscout or Getty Images?? Getty doesn't crawl sites as far as I know, picscout is the culprit there, but Getty does own picscout..
-
I digress... would blocking these IPs in .htaccess prevent them from impacting server resources? I assume they would completely ignore robots.txt.
If blocked in .htaccess, blocking in robots.txt becomes superfluous. However, if you are blocking by IP and you miss an IP block or a robot changes IP ranges, it won't work. robots.txt might-- if the bot obeys it (which it may not.)
-
I have noticed that Getty Images over the last few days has been using proxies. Though I have a thorough listing of thier IP addresses, the other day I noticed I could still access there site though it is blocked by my firewall and using nslookup, I saw that they were using different IP addresses. 5 minutes later, it was back on the usual IP addresses. Now using domain name to block, but they will figure a way around that too. :(
Yes. That's why it is very difficult to block Getty. To an extent, if you really want to block Getty, you have to decide to block lots and lots and lots of stuff. You'll end up wanting to block nearly all the serverbased seo/reputation management groups, hosting companies that welcome spammers, the amazon range-- used by lots of script kiddies-- and loads of other stuff. For many, many, many sysadmis, blocking these is a win/win situation because very little of that stuff has any great benefit to <i>most</i> web sites. (Oh.. you'll find people who tell you they do. But those people either a) don't know what they are talking about, b) are seriously over-rating the level of benefit of things like ... of for example, "shopping bots" to the vast majority of sites which list nothing for sale, or c) are lying.)
But a few web sites do benefit from some of those visits and those web sites need to know which of the server-supported sites visit them.
-
For your typical small non-retail business, would you really want anything cataloguing your site besides google?
-
I digress... would blocking these IPs in .htaccess prevent them from impacting server resources? I assume they would completely ignore robots.txt.
If blocked in .htaccess, blocking in robots.txt becomes superfluous. However, if you are blocking by IP and you miss an IP block or a robot changes IP ranges, it won't work. robots.txt might-- if the bot obeys it (which it may not.)
It was Getty Images. I am already blocking a good range of IP addresses and two domain names with Picscout.
-
Depend what's "typical". You might want yahoo search, bing and a few others. If you are a blogger you might want incoming pings from blogs. You might want feed readers visiting. If you sell advertizing, there might be a few bots that are worth letting visit. (I don't know what they are, but they may exist.) Many people like the wayback. (Some don't.)
But really, there are a stupendous number of things visiting. I have no idea why *anyone* outside China would want baidu spider to visit. Similar for yandex but with Russia. I don't know why anyone who is not retail wants a "shopping bot" (the kind that find good prices on retail items for people to compare) to visit.
So, I can't say "only google" categorically. But I'd say if you pick a 'mystery' bot that visits a lot at random, chances it does you no good exceed 90%.
-
Here is what I have currently on Pic-Scout and others. I have not updated my file in a while so others may have additional info.
One of our sites was pounded (mini DDOS) through multiple IPs. The IPs lead to Bezeq International-Ltd, and information on them lead me/us here.
- Hello - Wave -
Thought the following might help those who want to block Getty Image's rogue BI bots.
Bezeq International-Ltd
http://www.nirsoft.net/countryip/il.html <-- no connection to them 8)
31.168.0.0-31.168.255.255
62.219.0.0-62.219.255.255
79.176.0.0-79.183.255.255
81.218.0.0-81.218-255.255
82.80.0.0-82.81.255.255
84.108.0.0-84.111.255.255
85.130.128.0-85.130.255.255
109.64.0.0-109.67.255.255
212.5.64.0-212.25.127.255
212.179.0.0-212.179.255.255
217.22.112.0-217.22.127.255
We were hit by bots from three of the above blocks, so it appears ok. But you should double-check. Especially since we're new here.
Best of luck.
-
There is a good bit on this forum about using your htaccess and basically saying to the bots "don't go to these areas please". If they ignore your warning you ban them!
http://forums.eukhost.com/newreply.php?do=newreply&p=87709
Just applied it to all my sites plus the little addition of emailing me when one is banned. Within 20 seconds of putting it in place, blinkin Googlebot came along, completely disregarded the robots.txt file and got itself banned. Well there ya go :(
-
There is a good bit on this forum about using your htaccess and basically saying to the bots "don't go to these areas please". If they ignore your warning you ban them!
http://forums.eukhost.com/newreply.php?do=newreply&p=87709
Just applied it to all my sites plus the little addition of emailing me when one is banned. Within 20 seconds of putting it in place, blinkin Googlebot came along, completely disregarded the robots.txt file and got itself banned. Well there ya go :(
[/quote
Googlebot DOES adhere to robots.txt, chances are good this bot was simply masking itself as googlebot , If it were me, I would be doing some further digging into this...IP address, ect....
-
Well it did resolve back to Google, but maybe they were going by what the robot.txt said a few hours ago. I can understand they might want to cache it etc.
"The IP 66.249.75.237 (crawl-66-249-75-237.googlebot.com) has been blocked for an invalid access attempt to a file, directory, or a scanning attempt."
I think I can let them off maybe once or twice and take them back out of my block list ;)
-
Over time there seems to be a gradual trickle of bots poking about. Some tell the truth and say who they are while others quite clearly try to cover their tracks and make out they are Mozilla browsers. I have looked up the IPs to some of there and they resolve to things that quite clearly are not browsers, as there are no ways to link to the "booby traps" on my sites with a normal browser.
Remains an interesting subject!
## Banned IPs
Deny from 220.181.108.158
# baiduspider-220-181-108-158.crawl.baidu.com
Deny from 94.242.198.110
# Agent: Mozilla/5.0 (Windows NT 5.1; rv:11.0) Gecko Firefox/11.0)
# on 2013-07-03 (Wed) 02:38:11 IP: 94.242.198.110 (static-198-110.softronics.ch)
Deny from 27.45.240.84
# Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729))
# on 2013-07-03 (Wed) 06:33:30 IP: 27.45.240.84 (27.45.240.84) <-- China Unicom Guangdong Province Network
Deny from 27.45.240.82
# Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729))
# on 2013-07-07 (Sun) 12:45:13 IP: 27.45.240.82 (27.45.240.82) <-- China Unicom Guangdong Province Network
Deny from 175.42.90.137
# Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729))
# on 2013-07-09 (Tue) 06:29:28 IP: 175.42.90.137 (175.42.90.137) <-- China Unicom Fujian Province Network
Deny from 76.94.95.83
# Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0)
# on 2013-07-11 (Thu) 18:52:09 IP: 76.94.95.83 (cpe-76-94-95-83.socal.res.rr.com) <-- Road Runner / Time Warner Cable
Deny from 183.234.49.109
# Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729))
# on 2013-07-13 (Sat) 09:59:59 IP: 183.234.49.109 (183.234.49.109) <-- China Mobile communications corporation
Deny from 14.211.88.3
# Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729))
# on 2013-07-13 (Sat) 14:32:40 IP: 14.211.88.3 (14.211.88.3) <-- CHINANET Guangdong province network
Deny from 50.19.165.99
# Agent: Test Spider 0.2)
# on 2013-07-14 (Sun) 01:25:24 IP: 50.19.165.99 (ec2-50-19-165-99.compute-1.amazonaws.com)
Deny from 188.143.234.127
# Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1))
# on 2013-07-08 (Mon) 13:29:20 IP: 188.143.234.127 (188.143.234.127) <-- ToussaintDesaulniers-net
Deny from 192.114.71.13
# Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:11.0) Gecko/20100101 Firefox/11.0)
# on 2013-07-10 (Wed) 17:40:51 IP: 192.114.71.13 (bzq-114-71-13.static.bezeqint.net) <-- Bastards
Deny from 89.75.96.207
# Agent: Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.94 Safari/537.36)
# on 2013-07-11 (Thu) 08:39:09 IP: 89.75.96.207 (89-75-96-207.dynamic.chello.pl) <-- PL-UPC-20060222 in Warsaw
Deny from 5.10.83.73
# Agent: Mozilla/5.0 (compatible; AhrefsBot/5.0; +http://ahrefs.com/robot/))
# on 2013-07-14 (Sun) 05:41:12 IP: 5.10.83.73 (5.10.83.73-static.reverse.softlayer.com) <-- Ahrefs Pte Ltd Singapore
Deny from 188.143.234.127
# Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1))
# on 2013-07-08 (Mon) 13:29:20 IP: 188.143.234.127 (188.143.234.127)<-- ToussaintDesaulniers-net
Deny from 192.114.71.13
# Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:11.0) Gecko/20100101 Firefox/11.0)
# on 2013-07-10 (Wed) 17:40:51 IP: 192.114.71.13 (bzq-114-71-13.static.bezeqint.net) <-- Bastards
Deny from 89.75.96.207
# Agent: Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.94 Safari/537.36)
# on 2013-07-11 (Thu) 08:39:09 IP: 89.75.96.207 (89-75-96-207.dynamic.chello.pl) <-- PL-UPC-20060222 in Warsaw
Deny from 5.10.83.73
# Agent: Mozilla/5.0 (compatible; AhrefsBot/5.0; +http://ahrefs.com/robot/))
# on 2013-07-14 (Sun) 05:41:12 IP: 5.10.83.73 (5.10.83.73-static.reverse.softlayer.com) <-- Ahrefs Pte Ltd Singapore
Deny from 37.59.202.77
# Agent: Mozilla/5.0 (Windows NT 5.1; rv:5.0.1) Gecko/20100101 Firefox/5.0.1)
# on 2013-07-07 (Sun) 14:08:04 IP: 37.59.202.77 (37.59.202.77) <-- Str Miron Costin, Brasov, France
Deny from 207.189.121.44
# Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.8) Gecko/20100721 Firefox/3.6.8)
# on 2013-07-08 (Mon) 01:31:17 IP: 207.189.121.44 (207.189.121.44) <-- VIAWEST-NETBLOCK-207.189.96.0/19
Deny from 5.135.47.74
# Agent: Mozilla/5.0 (Windows NT 6.1; rv:2.0b7pre) Gecko/20100921 Firefox/4.0b7pre)
# on 2013-07-08 (Mon) 11:00:01 IP: 5.135.47.74 (5.135.47.74) <-- 2 rue Kellermann, France
--- end ---
-
The Bezeq International IPs now appear to start "192."
That's a nice touch to anyone thinking that's ur local internal network (192.168 etc etc.)
-
For Bezeq International add 192.114.64.0-192.114.79.255
Looks like some cough lawyer is reading the posts and passing the news of exposed IPs.
-
I knew it was only a matter of time, before new IP's started appearing...hence I simply block Israel as a whole..at the server firewall and the on the server.
-
TinEye.com
You may also wish to exclude TinEye.com. TinEye is a program like pic Scout that crawls the web taking samples of images off of page webpages. It then stores these images and you can go to the website and upload an image and it will show you all other instances where it has found this image on the Internet. Getty has also been known to use TinEye as a quick and easy method of locating webpages in which to send demand letters to.
I've been using TinEye, on the recommendation of ELI, to check and see if images I have or others are using are public domain or copyrighted. What are the liabilities for me?
-
there are no "liabilities for you using tineye to research images...whoever can you explain how tineye is telling you whether images are public domain or copyrighted?
FYI: Copyright exists at the moment of creation, so a good majority of images are copyrighted..
-
can you explain how tineye is telling you whether images are public domain or copyrighted?
I didn't explain myself fully on this one.
The images in question have been on my web site for years, as a tribute to my father. They are pictures of the Distinguished Flying Cross and the Air Medal. I didn't remember when or where I got them, so I used TinEye to see if these pictures appeared anywhere else on the web. They did; and the links that TinEye came back with were federal government web sites (if I recall), and these web sites clearly state that there is no copyright restriction for the use of these images.
-
can you explain how tineye is telling you whether images are public domain or copyrighted?
I didn't explain myself fully on this one.
The images in question have been on my web site for years, as a tribute to my father. They are pictures of the Distinguished Flying Cross and the Air Medal. I didn't remember when or where I got them, so I used TinEye to see if these pictures appeared anywhere else on the web. They did; and the links that TinEye came back with were federal government web sites (if I recall), and these web sites clearly state that there is no copyright restriction for the use of these images.
Thanks for the clarification! Getty has been known to accuse folks of infringement over public domain images.. If the images on those government sites you mention are indeed the same, you could sternly point this out to getty, and advise them if they continue their harassment, you will file complaints with the Washington Attorney General, and you will also invoice them for the time you have been wasting on this matter.
-
You're welcome.
Getty hasn't accused me of using the images of the ribbons/medals against their license. This was a precautionary step I took, of 'sterilizing' all my web sites to make sure that not a single questionable image remained that they could possibly come back and send me another round of extortion letters.
-
With your sterilization make sure you check the wayback machine and make sure nothing is on that site and if so remove it.
-
Good idea. I just sent an e-mail to archive.org asking them to remove all of my pages.
-
Another day, another Getty Images "data mining, robots, or similar data gathering or extraction" IP range. Consolidated prior numbers, and added Greg's Picscout range.
Bezeq International
31.168.0.0-31.168.255.255
62.219.0.0-62.219.255.255
79.176.0.0-79.183.255.255
81.218.0.0-81.218-255.255
82.80.0.0-82.81.255.255
84.108.0.0-84.111.255.255
85.130.128.0-85.130.255.255
109.64.0.0-109.67.255.255
192.114.64.0-192.114.79.255
212.5.64.0-212.25.127.255
212.179.0.0-212.179.255.255
217.22.112.0-217.22.127.255
Picscout
62.0.8.0-62.0.8.255
72.26.192.0-72.26.223.255
Laugh for the day.
http://imagery.gettyimages.com/gettylive/dm/gbr/TermsConditions_pc.html
"You are specifically prohibited from: (a) downloading, copying, or re-transmitting any or all of the Site or the Getty Images Content without, or in violation of, a written licence or agreement with Getty Images; (b) using any data mining, robots or similar data gathering or extraction methods;"
"Such unauthorized use may also violate applicable laws including without limitation copyright and trademark laws, the laws of privacy and publicity, and applicable communications regulations and statutes."
-
That is a good quote and I have added it to my information. Thanks! Also we should thank Lucia and Robert for this information.
-
Seems like a bit of a double standard from them - yet AGAIN.
-
Yes, maybe this qualifies for a #gettyflubs tag as it might help someone in the future.
-
Laugh for the day.
http://imagery.gettyimages.com/gettylive/dm/gbr/TermsConditions_pc.html
"You are specifically prohibited from: (a) downloading, copying, or re-transmitting any or all of the Site or the Getty Images Content without, or in violation of, a written licence or agreement with Getty Images; (b) using any data mining, robots or similar data gathering or extraction methods;"
"Such unauthorized use may also violate applicable laws including without limitation copyright and trademark laws, the laws of privacy and publicity, and applicable communications regulations and statutes."
Methinks the pot just called the kettle black.
-
I know this topic is old, but it seems very appropriate for the task. Also I should say I haven't tested this, I just want to point out this analysis contains details about the bot:
https://www.hackerfactor.com/blog/index.php?/archives/627-A-Victory-for-Fair-Use.html
See section Automated Filing, where the author shows logs from picscout. The referrer string can be used to serve it another page. It was (is?) a particular referrer:
http://ops.picscout.com/QcApp/Classification/Index/["case number"]