346
Getty Images Letter Forum / Re: Image scraping: The Newsblur angle.
« on: August 18, 2012, 09:26:41 AM »
Jerry,
First: On this thread, my main intention is to alert readers that they should be aware that it is now easier for image scrapers to find images. This is separate from the copyright issue and merely has to do with the way this group is displaying stuff.
I'm not sure how everyone is going to detect this-- I know how I am. Among other things, in .htaccess, if the referrer is newsblur, any uploaded blog images will be replaced by pictures of a cat. You can see this if you load http://www.newsblur.com/site/1100897/ which currently seems to default to "feed". If not, click feed. Scroll down. You'll see cats. (I can also detect that I have an imperfection in my redirection because that first graph which is a '.png' ought to also be a cat, but it's a graph.)
Second: On the copyright issue: I am going to send a more formal cease and desist. But here's the background:
On newsblur's site, FAQ says to 'opt-out' we should email Clay Samuels the owner, founder, coder. I emailed and told him to take me off. He sent me a sales pitch telling me how great his service was and actually told me he was my best friend because he was copying my material in part. I repeated my request he stop. He did not respond to this email in any way. Silence. This pissed me off.
On Monday, I blogged, and I made changes to my page so that people viewing the fresh copies read my opinion about the practice and are autoforwarded to my real site. (This is called the "ass-hat message" which is written in javascript. To see it visit http://newsblur.com/reader/page/1100897. That's what people view if they click to the "story view" in the previous url which I sent you. which is the one I have asked him to take down and also the one that involves copying. You can see by scrolling down here:
http://www.newsblur.com/site/1100897/
On Tuesday I tweeted about the page display and the owner of newsblur made some quick changes which supposedly intended to eliminate the problem. He tweeted back that he had made changes to prevent loading. I tweeted back that copies were still displaying. Fresh copies continue to be made. (Although initially I thought they had stopped-- but I was mistaken. What had happened was merely that Newsblur has copied a display showing that their IP had been banned at Cloudflare. It looked pretty funny actually.)
Later, I noticed the newsblur continued to show fresh copies. I did some tweaking to verify that
a) their bot does not visit robots.txt (which would have told them their visits are disallowed)
and
b) their bot does not visit the "noarchive" metatag.)
This means that as far as I can tell, there is no way for my server to communicate my wish to "don't visit here" and "don't copy" to their bot. And the bot just copies.
I have also been taking fresh snapshots of his copies and my pages because at this point I anticipate that he might continue to fail to stop copying. In preparation for that I want to have a packet of "stuff". But my plan forward is:
1) Send him another email cease and desist.
2) If he does not cease and desist, send a DMCA notice to either his hosting company, his name server or both.
But before I do (2) I want to be certain that I have evidence in place that should he dispute my take down, I would have proper evidentiary materials so that I win and my court costs are covered. That's why I am asking people what I should log etc.
I also wouldn't mind if people might suggest whether they can think of any reason in the world why I might fail to win a case in copyright court. Because I certainly don't want to go to court and lose.
First: On this thread, my main intention is to alert readers that they should be aware that it is now easier for image scrapers to find images. This is separate from the copyright issue and merely has to do with the way this group is displaying stuff.
I'm not sure how everyone is going to detect this-- I know how I am. Among other things, in .htaccess, if the referrer is newsblur, any uploaded blog images will be replaced by pictures of a cat. You can see this if you load http://www.newsblur.com/site/1100897/ which currently seems to default to "feed". If not, click feed. Scroll down. You'll see cats. (I can also detect that I have an imperfection in my redirection because that first graph which is a '.png' ought to also be a cat, but it's a graph.)
Second: On the copyright issue: I am going to send a more formal cease and desist. But here's the background:
On newsblur's site, FAQ says to 'opt-out' we should email Clay Samuels the owner, founder, coder. I emailed and told him to take me off. He sent me a sales pitch telling me how great his service was and actually told me he was my best friend because he was copying my material in part. I repeated my request he stop. He did not respond to this email in any way. Silence. This pissed me off.
On Monday, I blogged, and I made changes to my page so that people viewing the fresh copies read my opinion about the practice and are autoforwarded to my real site. (This is called the "ass-hat message" which is written in javascript. To see it visit http://newsblur.com/reader/page/1100897. That's what people view if they click to the "story view" in the previous url which I sent you. which is the one I have asked him to take down and also the one that involves copying. You can see by scrolling down here:
http://www.newsblur.com/site/1100897/
On Tuesday I tweeted about the page display and the owner of newsblur made some quick changes which supposedly intended to eliminate the problem. He tweeted back that he had made changes to prevent loading. I tweeted back that copies were still displaying. Fresh copies continue to be made. (Although initially I thought they had stopped-- but I was mistaken. What had happened was merely that Newsblur has copied a display showing that their IP had been banned at Cloudflare. It looked pretty funny actually.)
Later, I noticed the newsblur continued to show fresh copies. I did some tweaking to verify that
a) their bot does not visit robots.txt (which would have told them their visits are disallowed)
and
b) their bot does not visit the "noarchive" metatag.)
This means that as far as I can tell, there is no way for my server to communicate my wish to "don't visit here" and "don't copy" to their bot. And the bot just copies.
I have also been taking fresh snapshots of his copies and my pages because at this point I anticipate that he might continue to fail to stop copying. In preparation for that I want to have a packet of "stuff". But my plan forward is:
1) Send him another email cease and desist.
2) If he does not cease and desist, send a DMCA notice to either his hosting company, his name server or both.
But before I do (2) I want to be certain that I have evidence in place that should he dispute my take down, I would have proper evidentiary materials so that I win and my court costs are covered. That's why I am asking people what I should log etc.
I also wouldn't mind if people might suggest whether they can think of any reason in the world why I might fail to win a case in copyright court. Because I certainly don't want to go to court and lose.