I wanted to thank readers for their generous and helpful input on the problems I’ve been having with sites ripping off Naked Capitalism content by putting up entire posts in full without permission, and often without attribution.
Even with reader help, however, the process of dealing with this takes effort, and any time I devote to site admin is at the expense of generating new content.
For instance, Ed Harrison e-mailed a very good list of action steps, as well as the form for one of the missives he suggested:
I got one site to take my stuff down. Here are the measures I took
Terms of Service page
E-mail to offender about violation referencing ToS and DCMA
E-mail to host about violation referencing ToS and DCMA
E-mail to advertiser (Google) about violation referencing ToS and DCMA
E-mail to Google News about violation referencing ToS and DCMA
E-mail to Twitter (using their twitter account) about violation referencing ToS and DCMA
Summary RSS feed only (just temporary)
Dan Duncan advocated the “don’t get mad, get even” strategy:
While you sort it out, always include several internal internal links to other posts. As long as you have internal links to your other work, then at least the scraped content will get you deep links to your back pages.
Other considerations: Instead of a simple HTAccess denial—ie simply denying access from the offending IP address— do an HTAccess “re-write”. By doing this, you don’t block access…rather, you send the asshole “false” content of your choice. It could be a HUGE file of jibberish like “hy^&GBHBDFNLG#$&H%” …or even better send them “The Best of DownSouth”! ["Please Yves of Naked Cap, we won't ever scrape your site again. Please, just-make-it-stop! We're begging you!"] [Of course, you are more than welcome to send them my commentary as well.]
Or, you could send the scraper into an infinite loop with something like this in HTAccess:
RewriteCond %{REMOTE_ADDR} ^123.123.123
RewriteRule ^(.*)$ http://domain.tld/feedReplace the IP address with that of the scraper and replace the feed URL with the feed from the scraper’s site. That would actually be amusing. If you do this, please let us know what happens.
Here are some other good blacklist options from a helpful site:
http://perishablepress.com/press/2009/02/03/eight-ways-to-blacklist-with-apaches-mod_rewrite/
Also, beyond the Cease and Desist, you need to file DMCA Reports with the Search Engines.
http://www.mcanerin.com/EN/articles/copyright-03.asp
And finally, since they are scraping to game Google go to Google:
http://googlewebmastercentral.blogspot.com/2008/06/duplicate-content-due-to-scrapers.html
Again, the tech torture sounds great but is above my pay grade, and the tech people I know weren’t certain how to go about implementing it.
So I am instead opting for the cut the Gordian knot approach. I’m going to implement limited RSS syndication. It is the simplest solution (as in scrapers generally pull content off RSS feeds, this should cut way back on abuse) and to the extent anyone republishes my limited RSS, I will probably get more out of it than they do (as in they won’t get much content, and it will also drive some traffic back to NC).
Another reason this change may be a plus is, perversely, it will remove disincentives against putting up more than five posts in a day. Being in Recent Items listing has a huge impact on comments levels. Once an item drops off Recent Items, comments by readers pretty much cease too. Since I like getting comments, on evenings when I might have material for, say, six posts, I don’t see the point in putting up that many, since some will be ignored. Using excerpts (which means the main www.nakedcapitalism.com page will also show excerpts) will give site visitors another way to navigate through posts.
It also offers some advantage in terms of my typo proneness. I often catch typos after posting, as do readers, and believe it or not, I do correct them. But RSS grabs the first published version, a too often buggy 1.0 release. To my knowledge, the only way to get a new version up is to repost, and you still have the earlier version in the feed also. So having a much shorter version in RSS makes the odds much higher that it will be clean (since I do look at the opening sentences more closely than the guts of the post).
I’m not keen about this change, but some sites go this route. Jesse and Krugman only have headlines in their RSS; Felix Salmon in his Portfolio days only had a limited RSS feed (Update: Felix wrote to tell me that was not accurate; he did have excerpts on his Portfolio site, but had required them to provide a full RSS feed); FT Alphaville only gives the first line of its articles. And at least in my case, it did not make me less inclined to read them.
Where I may have a problem is with e-mail subscribers. My tech guy did not know whether going to a truncated feed would lead to e-mail subscribers getting that version (anyone who knows what has to be done in WordPress to have e-mail subscribers get full text versions, please e-mail me at yves@nakedcapitalism.com). I hope there is a solution, but we may not have it in hand.
I plan to start early next week and hope you will be like the change.
Separately, I finally have my new MacBook Air with all my old data on it and it really is cute and tiny. The cursor worringly skitters when I’m not using it. So far, it seems like a harmless problem, but I’m keeping an eye on it. I’m still getting over the fact that the sharper 13.3″ screen is at least as usable as the 15.4″ screen on my old laptop. I’ve also plugged in my antique monitor (a 20″ Apple Studio from 2002, we Yankees don’t like getting rid of things that still function) and it’s instructive to see the two of them side by side.
But as nice as this is, I still think the apex of desktop computing was the NeXT, with Improv (what a phenomenal spreadsheet) and the old WordPerfect.








Wish I still could use Word Perfect 5.x
For someone who learned how to type by learning how to type, I could format with keyboard commands faster than I can remember.