XenCraft Making e-business work around the world!
SOKSOK.JP is a site that relays and reformats all other web sites. Here is why their approach is web site theft or hijacking, and how you can prevent it.
Preventing Web Site Hijacking
Hijacking Web Sites
One of my pages recently turned up in a Google listing, but at a new location. When I investigated the page, I found that my entire site was duplicated, and was very up-to-date. Poking around further, I found that many, many sites are being "hijacked" or mirrored under a different domain.
Besides stealing away your readership from your site, which you worked hard to develop, it also corrupts the pages (by filtering some of it out), so it looks terrible and often doesn't even work correctly. Some of the pages were not completely copied and appear truncated, probably due to bugs in the filtering software. To add insult to injury the links that are added, associate dating services, porn sites, or who knows what, with your material. Your pages will still have your name on it, they don't attempt to replace your mail address or copyright statements and the like. If your reputation is important to you, this site can damage it with the implication of shoddy workmanship, changes to the meaning of your web pages, and by embedding links and associating your pages to inappropriate sites. It can also hurt your business and frustrate your audience, if you have subscription or payment pages, as more advanced functions don't work via this site. Customers will attempt to make purchases and leave, believing your web site and services are incompetent or poor quality.
Web thief and slime ball: Masafumi Hashimoto
The domain, SOKSOK.JP is listed as owned by: Masafumi Hashimoto. I e-mailed him, as have others who discovered their sites were hijacked, requesting my site be removed and have not received a reply. He is based in Ikebukuro. Hey, if you are nearby, please call or drop by and tell him "Tex said hi". (OK, I have learned the phone number does not exist. The fax number connects, but since I can't be sure if the number is legitimately the site owner's fax or someone else's number, we probably shouldn't harass them.)
[Slime Ball] Masafumi Hashimoto [Slime e-mail] firstname.lastname@example.org [Postal code] 171-0014 [Postal Address] Toshima-ku 3-26-25-622 Ikebukuro [Phone] 03-3760-0464 (non-existent) [Fax] 03-3981-2792 [Name Server] ns.5th.jp (18.104.22.168) [Name Server] ns2.5th.jp (22.214.171.124)I guess I could follow the current trend and publish a deck of web thief and slime ball playing cards. Where should we put Hashimoto in the deck?
Who is being ripped off?
As far as I can tell, the better question is who isn't. Besides my sites, I have found major and minor companies on this site. I suspect the sites that I could not access via http://pack.SOKSOK.JP had already discovered that their site was hijacked and countered it. The URL mappings look like those in the table below. Of course by the time you read this, I will have notified the web owners and they will likely have blocked access.
To find other places where this particular web criminal or his site have been discussed, search with any search engine for the domain SOKSOK.JP or his name Masafumi Hashimoto. Many site owners have discovered their sites under this domain.
Is your site being hijacked?
There are at least 3 ways to tell:
Why is SOKSOK.JP doing this?
Various people claim that the purpose of SOKSOK.JP is to reformat pages and make them available for Japanese cell phones (DoCoMo). Some say, therefore, it isn't really stealing and they don't mind SOKSOK.JP's setup.
However, there are several other ways formatting for cell phones can be achieved. For example, SOKSOK.JP could make guidelines available for formatting pages for cell phones and perhaps even make XSLT style sheets or their filtering software available for web designers to use to create or migrate their pages to appropriate formats. I.E. Let people work cooperatively and constructively to create good cell phone pages. This approach works for WAI (W3C Web Accessibility Initiative), for example, who has educated web designers how to develop accessible pages.
Other sites redisplay or modify pages. Why is this site a problem?
Search engines cache web pages. They also perform conversions, for example from Acrobat pdf, Microsoft Powerpoint, and other file formats to HTML. Some sites provide translations of pages, with variable accuracy. Some sites perform internet archival making copies of other people's sites. Where is the harm? What is different about SOKSOK.JP?
Well ok, so their implementation is sloppy. Is that a problem? Let's look at the impact of SOKSOK.JP's implementation:
Preventing Web Theft, Aggressive Access, or Harassment
There are several different ways to prevent web site hijacking, and more generally preventing an IP address or block of addresses from accessing your site if you don't want them to. I found the IP address of the proxy server that was hitting my web site by looking at my web server log. The IP address is w59st.5th.jp, which is 126.96.36.199. So blocking access by this IP address will stop the web site theft. To stop accesses by this IP address, I used the following commands to the Apache server in an .htaccess file, which I placed in my root directory on my www.I18nGuy.com site:
order allow,deny deny from 188.8.131.52 allow from all
On my www.XenCraft.com site, I used:
order allow,deny deny from w59st.5th.jp allow from all
These commands tell the server to deny everything that isn't allowed (i.e. denial is the default), denies the IP address of the SOKSOK.JP proxy server, and then allows everything else. If they start moving IP addresses around, you can block more of the domain, for example with:
deny from 210.224.177.
Another approach is to add the following to the Unix file for denying access to certain protocols, /etc/hosts.deny:
You can also add the IP address to the Apache configuration file httpd.conf. If you don't have Apache or access to the .htaccess or other files, the Ink-Stained Banana Blog has this PHP solution to deny access to SOKSOK.JP.
As I mentioned, I discovered this crime while using Google to find something else. I have since e-mailed Google, suggesting or rather requesting that they should not index sites such as SOKSOK.JP. Google's web site says they only act on requests to not index a site, from the site's owner. This is a perfectly reasonable policy in general. However, where a site is committing clearly provable theft and corrupting people's good works, well they shouldn't be aiding them. Although I have to admit, I also would not have learned of the problem, if it weren't for Google. Anyway, I haven't heard back from them yet. We will see what they decide.
I'd be interested to know of the right place to report these kind of problems to have them stopped. In the meantime, I hope this is helpful to those of you that find your site on the SOKSOK.JP domain.
ISPsWouldn't it be nice if ISPs proactively blocked this IP address, in behalf of the sites they host? I am not a fan of messages or web pages that say "forward this to everyone you know", so I won't say that. But if you are hosted, you might consider in addition to blocking your own site, asking the host to block the IP address, in behalf of all their other sites.
Related LinksThere are many, here are just a few.