topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Thursday March 28, 2024, 7:20 am
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Last post Author Topic: SOPA: Alt view - You need to be Shakespeare or Picasso to Avoid Content Scraping  (Read 13403 times)

mahesh2k

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 1,426
    • View Profile
    • Donate to Member
My friend has gardening blog and she regularly posts text, owned images and videos in her blog. There are some personal images related to her family and friends in that blog. She found her blog scraped by one black hat author who refuses to take down the content. I understand that SOPA/PIPA/ACTA are bad but scraping content that includes your personal property and replying arrogantly to content owner increases the power of SOPA/ACTA/PIPA lobby. I'll not be surprised if many bloggers/content owners support SOPA/PIPA due to this type of abuse.

Here's what she received as a reply for the DMCA take down notice sent to hostgator. I don't know what hostgator did to that site as of now (it looks up for now).

"We've got your copyright complaint by the hosting service. We are not based on U.S.A. and not obligated by United State's laws. But we've got respect for other website and intellectual properties. The content which was related to your complaint was added to our website by our volunteer editors. For us a public content that's opened to world wide web is not copyright protected unless it has a literatural value or including a specific sciencetific reasearch. Any text content is not copyrighted material if it's redistribution of public knowledge without highly literature value. But we are willing to add your name to the document if you permit, or if you will not give permition we will edit the content page properly if you can prove us that you are the copyright owner more than DMCA records. We want to know this is legally your intellectual property in a non-digital legal registered document."

Check the bolded part. You need to be Picasso or Shakespeare or some apple or MS in order to have your content taken down. Sad isn't it? I also came across similar case where blackhat folks refuse to take down personal images and are challenging the original author to prove that in court. So you have to prove that you own your personal images, relations as shown in blog in order for these scrapers to take content down?It's definitely not cool and something that is giving boost to SOPA/PIPA/ACTA.

Name of the site is : newhealthtips.com ( If you're aware of blackhat stuff then it doesn't take long time for you to realize that this is yet another scraper site).


Renegade

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 13,288
  • Tell me something you don't know...
    • View Profile
    • Renegade Minds
    • Donate to Member
I've had content stolen, and am fully aware of scraper sites. I can't get behind SOPA or other legislation like it.

Yes, it sucks when you get scraped. But, that's not a reason to suppress free speech.

There are other blackhat ways to deal with blackhats. ;)

Slow Down Music - Where I commit thought crimes...

Freedom is the right to be wrong, not the right to do wrong. - John Diefenbaker

40hz

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 11,857
    • View Profile
    • Donate to Member

There are other blackhat ways to deal with blackhats. ;)


Don't you mean 'asshats' rather than 'blackhats' when referring to scrapers? ;) :)
« Last Edit: January 27, 2012, 09:26 AM by 40hz »

40hz

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 11,857
    • View Profile
    • Donate to Member
She could make a complaint to the domain registrar, which is a US company. They won't do anything. But it gives them a heads-up about it. If enough people complain they sometimes stop accepting domain requests from the party named. Also the webhost if it is a US or Euro operation. Most have policies about acceptable use and practices. Content scraping is usually forbidden. As is openly defying DMCA notices. Unfortunately, most heavy-duty scrapers host their own servers for exactly that reason.

But to the point, there's nothing which makes you have to accept a DMCA at face value since there are so many bogus ones filed. Asking for proof of copyright is something I'm surprised more sites and hosts don't request. I guess they just feel it's easier to do a CYA and act immediately on receiving notice.

I've had content I've created scraped.  

It's annoying.

You get over it eventually.

In your friend's case it might be beneficial to try to get a link plus her name added to the article. At least that way she'll get some exposure value out of it. Maybe people who see the scraped article will then discover her website.

Hardly ideal, or even fair, but it's still better than nothing. Especially since the scrapers realize (correctly) that there's little most people can do about it.
 :)

mahesh2k

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 1,426
    • View Profile
    • Donate to Member
@Ren, I'm not supporting SOPA in the way it is mixed with political motives. But original intention of SOPA was to protect someone's intellectual property. It has nothing to do with freedom of speech. Freedom of expression doesn't mean stealing and if it means that there should be open stealing then I guess we're inviting flawed socialism with broken communist and capitalist model all in one by crushing authors/artists revenue model.

@40hz, problem is that the images and content clearly shows that it is her property. I mean it's obvious that your pic is YOUR pic which is hosted on your blog, right? If any scraper is copying that content and asking you to prove that if it's your property (even after getting an email from the same scraped domain email) then it is offensive. Hostgator removed that content from above site and is taking further action on webmaster. Problem with giving proofs to such scraper is that, if you look at their scraped sites content, there is no need for giving any proof to thieves. It's like giving proof to murderer who is standing next to the corpse with blood on his clothes and knife.





vlastimil

  • Honorary Member
  • Joined in 2006
  • **
  • Posts: 308
    • View Profile
    • Donate to Member
Having content scrapped can make one angry, but SOPA would not help in this case. Your friend would need to get a court order and then the machinery would start moving. US search engines would have to block the domain from their search results, US ad networks would have to stop dealing with the domain, US ISPs would have to block access to that domain. And the scrapper would not really care much. Spammers know their domains will get caught sooner or later and are prepared for that. They have dozens of web sites and when some get banned, they just move on. In short, with SOPA, a lot of effort would be wasted and nothing would be accomplished.

BTW you can send DMCA to Google ( http://www.labnol.or...t/google-dmca/19256/ ) to have the offending URLs removed from search results. Content is scrapped mainly to feed the Google-bot and denying the scrapper their price is the best way to get what you want. This would probably accomplish the most, but I am still not sure it is really worth the effort.

Stoic Joker

  • Honorary Member
  • Joined in 2008
  • **
  • Posts: 6,646
    • View Profile
    • Donate to Member
Search the server logs to find out what IP the scraper is coming from. Then setup a redirector that will send that IP/them something they will never forget (like 90GB of garbage files) the next time they come by for a scraping.

TaoPhoenix

  • Supporting Member
  • Joined in 2011
  • **
  • Posts: 4,642
    • View Profile
    • Donate to Member

What about somehow dynamically assembling content like tiny little pictures of letters so that they just happen to form words but in fact there is nothing there to scrape? Anyone know how to optimize that so it loads just as fast as a regular page?  And would it work?

40hz

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 11,857
    • View Profile
    • Donate to Member
@40hz, problem is that the images and content clearly shows that it is her property. I mean it's obvious that your pic is YOUR pic which is hosted on your blog, right? If any scraper is copying that content and asking you to prove that if it's your property (even after getting an email from the same scraped domain email) then it is offensive. Hostgator removed that content from above site and is taking further action on webmaster. Problem with giving proofs to such scraper is that, if you look at their scraped sites content, there is no need for giving any proof to thieves. It's like giving proof to murderer who is standing next to the corpse with blood on his clothes and knife.

I feel her pain and understand where you're coming from.

But as Oliver Wendell Holmes observed, US courts are courts of law - not courts of justice.

And like it or not, the accused has the benefit of the doubt - and the accuser the burden of proof.

In this particular case, I'll agree it leaves a lot to be desired. But doing it the opposite way (like DMCA takedowns do) is even worse in the long run.

As for ownership clearly being proven by the fact it's on your own site...I'll have to disagree.

I once had something (an infographic) taken down from a friend's site by someone who filed a DMCA on my buddy because he had supposedly 'swiped' it. As proof, they claimed it was clearly hosted on their site for over a year - and further, they claimed they held a copyright for it.

Fortunately, I had a fully registered copyright on it. Signed, sealed, and delivered by the Library of Congress no less! Yowza. :mrgreen:

I got in touch with the people who were hassling my friend, explained that I was the legal owner of the graphic in question, and asked them what was up with that. I explained I didn't want to file a DMCA on them - or suggest my friend take legal action against them for knowingly filing a false DMCA takedown (which you can btw) - but I would appreciate knowing why they felt the need to harass my friend over something that wasn't theirs to begin with.

After a few emails with some rambling talk about how infographics weren't copyrightable (wrong) - and a vague threat about suing me for "tens of thousands" (not millions? they apparently think small) because I was engaged in "a clear case of defamation" against them (wrong again) - and a complaint to my friend's ISP about being harassed (I think they thought my friend and me were the same person) - it all stopped just as quickly as it started.

My friend went through the necessary actions needed to get the takedown removed. And my infographic disappeared from the other site in the meantime.

I understand that other site did eventually get shut down by their own ISP following numerous DMCA complaints filed against it. Guess my piece wasn't the only thing they borrowed. But I just can't help but wonder why they were so stupid as to draw attention to themselves by filing bogus takedowns if that was the case. Everybody knows somebody on the web. And it doesn't usually take too long for word to get around.

 8)


40hz

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 11,857
    • View Profile
    • Donate to Member
Search the server logs to find out what IP the scraper is coming from. Then setup a redirector that will send that IP/them something they will never forget (like 90GB of garbage files) the next time they come by for a scraping.

Not worth it. All it does is start a pissing match and chew up bandwidth you're paying for.

It can also provoke something really nasty (like a DoS attack) if whoever is doing the scraping just so happens to speak Russian or Chinese, and is having a particularly bad day. (Not that I'm naming names or pointing fingers. ;) )

That's a major headache that can cost an ISP or web host hassles, downtime, and money.

If your host has to deal with one of those, and their admins (who are all closet BOFHs) spotted you were playing amateur cyber-vigilante games with some jackass before it happened, you're very likely to get your account closed.

And I wouldn't blame them.  8)

celcom network down.jpg

 ;)

« Last Edit: January 27, 2012, 02:49 PM by 40hz »

Stoic Joker

  • Honorary Member
  • Joined in 2008
  • **
  • Posts: 6,646
    • View Profile
    • Donate to Member
Oh don't be so practical, I'm trying to have fun (there's always lower profile alternatives). :)

40hz

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 11,857
    • View Profile
    • Donate to Member
Oh don't be so practical, I'm trying to have fun (there's always lower profile alternatives). :)

BOFH-Mainframe.jpg

Sorry man. Can't help it. I'm a BOFH.

P.S. - where's the "fun" in lower profile? Cows may come, and cows may go - but when the cobras begin to strike you want to firing magnum double-loads. Remember: you can only be punished if there are survivors left to do the punishing.
 :P

Stoic Joker

  • Honorary Member
  • Joined in 2008
  • **
  • Posts: 6,646
    • View Profile
    • Donate to Member
Sorry man. Can't help it. I'm a BOFH.

Me to (e.g. I can't get yelled at about bw usage), hence the initial response.

P.S. - where's the "fun" in lower profile? Cows may come, and cows may go - but when the cobras begin to strike you want to firing magnum double-loads. Remember: you can only be punished if there are survivors left to do the punishing.

First it's "Not worth it"/risky...Then LP is no "fun"/to weak...  :-\ Okay, now your just screwing with me.  :D

Tuxman

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 2,466
    • View Profile
    • Donate to Member
The European Union meanwhile said ACTA (the EU "SOPA" thingy) should be declared too. Sigh.
Wonder what is wrong with the world.

40hz

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 11,857
    • View Profile
    • Donate to Member
Sorry man. Can't help it. I'm a BOFH.
8)
Me to (e.g. I can't get yelled at about bw usage), hence the initial response.

P.S. - where's the "fun" in lower profile? Cows may come, and cows may go - but when the cobras begin to strike you want to firing magnum double-loads. Remember: you can only be punished if there are survivors left to do the punishing.

First it's "Not worth it"/risky...Then LP is no "fun"/to weak...  :-\ Okay, now your just screwing with me.  :D

Maybe just a little? ;)  ;D

But I was quite serious about the "not worth it" part.  8)

Renegade

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 13,288
  • Tell me something you don't know...
    • View Profile
    • Renegade Minds
    • Donate to Member
@Ren, I'm not supporting SOPA in the way it is mixed with political motives. But original intention of SOPA was to protect someone's intellectual property. It has nothing to do with freedom of speech. Freedom of expression doesn't mean stealing and if it means that there should be open stealing then I guess we're inviting flawed socialism with broken communist and capitalist model all in one by crushing authors/artists revenue model.


We already have lots of laws for copyright, but they don't seem to be enforced much. Things like SOPA aren't the right way to go about it though. We need some kind of due process.

Slow Down Music - Where I commit thought crimes...

Freedom is the right to be wrong, not the right to do wrong. - John Diefenbaker

mahesh2k

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 1,426
    • View Profile
    • Donate to Member
I helped with hostgator application and it seems hg staff removed the articles. I don't know if that site is going to take rest of the copied content. Not my problem with all that stuff. But if anyone wants to prove the content ownership, use web archive for the cached text and also google index has earlier date. This makes things easy for proving text based  content's origin. There's no way to help in case of graphics. It'll be reused unless there is watermark or some sort of protection.

cybermen007

  • Participant
  • Joined in 2012
  • *
  • default avatar
  • Posts: 4
    • View Profile
    • Donate to Member
I know the owner of that site personally. I used to work with him. Also i spoke with him about that copy content problem. And i found this website when i was making a search about it. That copy content situation was happened because of his editor. He's got many websites. Maybe hundreds. And each website has got on editor or more editors. And one of the editor from india copied all content and just changed the title. And he removed all the articles one by one if he see the copyrighted content and complaint. But he doesn't want to close the website for such thing even he doesn't care that website much. It hasn't got much visitors and there's no ad and income. It's just a website with a nice domain. But i see some people threaten him. I can tell he's got high programming skills. And he's well experienced on all internet stuff. So if someone want to make a war with him he should have been well prepared for that. He's a nice guy anyway. If i was him and someone show me such aggressive reaction i would carry the website out of usa. Then i would send all those articles to the other websites with mass submiting bots. I can submit your copyrighted article to 200.000 forum, 200.000 blog, 10.000 article website, and thousands of guestbooks etc... only in 48 hours. Then you will have to send DMCA complaint everyday in the rest of your life :)

Renegade

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 13,288
  • Tell me something you don't know...
    • View Profile
    • Renegade Minds
    • Donate to Member
I can submit your copyrighted article to 200.000 forum, 200.000 blog, 10.000 article website, and thousands of guestbooks etc... only in 48 hours. Then you will have to send DMCA complaint everyday in the rest of your life :)

Sounds like a perfect strategy for the XXAA Mafia to pursue.

Slow Down Music - Where I commit thought crimes...

Freedom is the right to be wrong, not the right to do wrong. - John Diefenbaker

40hz

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 11,857
    • View Profile
    • Donate to Member

But i see some people threaten him. I can tell he's got high programming skills. And he's well experienced on all internet stuff. So if someone want to make a war with him he should have been well prepared for that. He's a nice guy anyway. If i was him and someone show me such aggressive reaction i would carry the website out of usa. Then i would send all those articles to the other websites with mass submiting bots. I can submit your copyrighted article to 200.000 forum, 200.000 blog, 10.000 article website, and thousands of guestbooks etc... only in 48 hours. Then you will have to send DMCA complaint everyday in the rest of your life :)

Sounds a bit like a threat...

I'm not sure how you could reconcile being a "nice guy" with doing something along the lines of what you're suggesting however.

It would also be a vastly out of proportion response to what happened. He was asked to remove content that had been copied word for word from someone else's site. He sent back a letter which basically said: Go to hell. I'm not in the US so there's nothing you can do to stop me.  It was only after a person-to-person and unofficial attempt to resolve the situation failed that a more official set of actions took place. So if anybody is responsible for what happened, it was you acquaintance's editor - first by borrowing the content without permission - and then by insulting and mocking the content creator when she asked for the article to be removed form his website.

But either way he was not "attacked" by the content creator. He simply suffered the consequences of his own personal decision to refuse to remove content he had no legal or moral right to post on his website. It only went the way it did in response to his actions. And it could have gone a totally different way if he had handled it in a less arrogant and defiant manner. But no matter how yo wish to characterize it, it was not a "attack" against him.

Behaving like this editor has the unfortunate effecy of also adding further fuel to the arguments for bills such as ACTA, SOPA, and PIPA - all which have justified their necessity by pointing out how someone in a country outside the US could do as you've suggested with impunity.

By behaving the way he did, he played into the hands of those screaming for further constraints and censorship up on the web. And by doing so, he increases the risk of increased hurt  - not only himself - but to the rest of us as well. Because those draconian laws being proposed will hurt all of us.

So thanks for your input. (And welcome to DonationCoder BTW!  :)) But it doesn't help your 'acquaintance' (or those of us who are attempting to stop, or rein in, things like SOPA and ACTA) by suggesting hypothetical(?) threats like using mass submitting bots to further compound the wrongful act of scraping someone's web content without permission. And that's something which remains morally wrong regardless of what the law may say wherever the scraper happens to live.

It's a global community. People need to start acting like it is before the hands of authority descend upon us and turn what was once 'our web' into a global and governmentally operated panopticon.

Take a look around you. The transition has already started. :(

Just my 2ยข

40hz

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 11,857
    • View Profile
    • Donate to Member
I can submit your copyrighted article to 200.000 forum, 200.000 blog, 10.000 article website, and thousands of guestbooks etc... only in 48 hours. Then you will have to send DMCA complaint everyday in the rest of your life :)

Sounds like a perfect strategy for the XXAA Mafia to pursue.



Let's save things like that for when the battle finally goes nuclear shall we? ;D

Of course the only thing that will do (short term) is give governments the excuse to pull their backbone kill switches (Hello FIDOnet? We have a situation!) ; and (long term) see the Internet be reopened as a completely licensed, regulated, and controlled environment - just like radio and telecommunications presently is.

And care to guess who will be put in charge and running it once it is allowed to reopen? (Hint: Cablevision, BBC, Comcast, AT&T, Verizon,...)

cybermen007

  • Participant
  • Joined in 2012
  • *
  • default avatar
  • Posts: 4
    • View Profile
    • Donate to Member
It was not his threat. It was mine. :) If someone try to deal with me in a aggressive way i can speak the same language quite well even on internet. Also my friend he's such a nice guy not to do that because he's more capable than me. He can do more things and he's not obligated in Sopa.

Also Thanks for your Wellcome :)
And i support your ideas abotu SOPA. That law is not for protecting the rights of innoucent bloggers. This is a total goverment control on internet. We've already got it in Turkey. Internet became very dangerous after the revulotion in north Africa and Arabic countries. Rebels was organised on social media. Facebook and Twitter was the source of the modern revulotion. Also in United Kingdom and France experienced huge insurrection by the ghetto teenagers. Also Occupy Wall Street thing make US goverment worry i guess because unhappy people can organise easily on internet. And this is a big threat just before a new economic crisis. So USA goverment want to control internet more strictly (like they don't follow each emails we sent) also media and music industry is sponsoring your senators for such laws. At the end like everything in our life goverment will show itself on internet more. So one day your dns server will be managed by your goverment and they will give you the website list which they allow you to visit.

Also i've studied media communications and i had lessons about philosophy of law. And also i can tell your copyright definition is quite different than ours. You have got total capitalist definition for the copyright. If you want me to pay money for your single article. I would like to ask you, what you did create in that article as your mind? An order of words? A new theory which never told before?  A new sciencetific research? A new way of expressing an emotion? Or you just wrote an article about breast cancer treatment? Did you pay copyright price for the people who made researches about breast cancer? Did you pay copyright to the doctor who gave the latin names of the breast cancer types? Did you pay copyright price for the words you use in the vocabulary? Did you pay copyright price for the person who found the html markup language? Did you pay the any copyright price to the person who made PHP programming language which your form is working on? So? you are living in a free world. All those genius programming languages, sciencetific reasearches, medical treatment techniques, web publishing markup language is free. All those genius technology with hig h creativity is free. And your 500 words article which contains information that made public by genius doctors and sciencetists without paying copyright is cost money?

I saw that you are a good person and on a good intellectual level. And i believe you can understand me. Some bloggers think that they are einstein and their 500 words poor articles are a treasure for world's culture :) Typical American Point of View :)

40hz

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 11,857
    • View Profile
    • Donate to Member
I saw that you are a good person and on a good intellectual level. And i believe you can understand me.

I like to think I am. And I like to think I can.

It's also interesting to see a non-US, non-Euro take on copyright from someone coming in from outside the usual circle of Western understandings.

Look forward to your comments going forward. :Thmbsup:


cybermen007

  • Participant
  • Joined in 2012
  • *
  • default avatar
  • Posts: 4
    • View Profile
    • Donate to Member
Yes i am too western to say eastern and too eastern to say western. Middle of somewhere like my country. I've got respect for the peoples blogs. But it's killing me sometimes when i see such arrogant comments about copyright property of the people and thief stories about their intellectual property except you :)

cybermen007

  • Participant
  • Joined in 2012
  • *
  • default avatar
  • Posts: 4
    • View Profile
    • Donate to Member
also i must say one more example. the mighty Steve jobs is a legend. Because capitalist america love heroes of the money. Best trader who make more money is a legend. What about dennis Ritchie? He made a operating system with simple gui. He wrote a genius programming language the C. In last 30 years many program writen with c. But none give copyright to him. If he was taking patent price 50 bucks from each program written with C language and made the press release show by himself, he could be richer than bill gates and more famous than Steve jobs. That's a modern nikola tesla - Thomas edison story. Everyone appreciate businessman edisson for selling them lamp. But no one give any single cent to nikola tesla forinvention of A.C.