"Even though the web server is public facing, it still has measures of protection enabled on it to protect it from unauthorized access. PicScout.com bypassed those settings plain and simple to get to folders that were not available to them otherwise. The only way to access the supposedly copyrighted image was to view it on the web page. "
In laymans words: Since picscout is a crawler and not a human beeing it can not see the pictures so they have to "intrude" to see the pictures by going into your files and taking a copy of your files.
Is my understanding correct ?
Khan
Correct....if they were only looking at the code of the page like most bots do, then all they would see would be a file name with a path to a folder. If the bot tried to access the folder, then would be blocked, so the bot or spider cannot access the files directly. They use software that "tricks" the security measures that I had in place to think it is actually a user using a web browser, then they further trick the server to giving access to those hidden folders for access, then they download the image so they can compare the meta information. This is the only way they can match up images as people usually rename files so matching the filename would do no good. The problem with this is this bypassing bots and spiders cause excessive bandwidth usage on a server and can affect network performance.
About a year and half ago, I starting noticing our bandwidth usage for our server going up, It nearly doubled to almost 2 GB downloaded every month (it used to be around 600 MB). The number of visitors also jumped up by 20,000 to 30,000 more hits and I thought it was because of our social media tie ins, but I now know it was and still are trollers scanning my server and bypassing security settings to download files they have no business looking at. If they want to actually use a human to view every page on my site looking for copyrighted images and then get screenshot captures of supposedly violations, then that would be completely ok.
I have already tracked down more IP addresses PicScout uses that are in the US including ones set aside to roll over to when their primary one gets found out. It is truly amazing what searches on the Internet will uncover.
I am hoping to have a spreadsheet compiled of all of the more devious bots and spider domain info and IP addressing. I need to have this so I can run my filters on our firewall logs so I can see when and how they were coming into my network and into my web server.