SOKSOK.JP is a site that relays and reformats all other web sites. Here is why their approach is web site theft or hijacking, and how you can prevent it.

Newsflash 2003-05-19: SOKSOK.JP DENIES ALL ACCESS!

As of Monday, May 19, SOKSOK.JP is returning 403 errors (Access denied) on all attempts to access its site.
Have they realized the error of their ways or they are simply morphing into something else? Maybe enough sites were denying SOKSOK.JP access that they no longer provided value to DOCOMO users. It's not clear whether DOCOMO users can still access their material. However, the access denial is welcome news.

Preventing Web Site Hijacking

Hijacking Web Sites

One of my pages recently turned up in a Google listing, but at a new location. When I investigated the page, I found that my entire site was duplicated, and was very up-to-date. Poking around further, I found that many, many sites are being "hijacked" or mirrored under a different domain.

The site that hijacked mine and other web sites is http://pack.SOKSOK.JP. Apparently, they are running a proxy server and some software which maps addresses under their domain to domains of other sites. The software filters out some Javascript and HTML formatting information from the web pages and adds in links to other Japanese sites offering various services. They also add a <BASE href="http://pack.SOKSOK.JP/..."> statement to the document to insure that all links are redirected through the SOKSOK.JP domain.

Besides stealing away your readership from your site, which you worked hard to develop, it also corrupts the pages (by filtering some of it out), so it looks terrible and often doesn't even work correctly. Some of the pages were not completely copied and appear truncated, probably due to bugs in the filtering software. To add insult to injury the links that are added, associate dating services, porn sites, or who knows what, with your material. Your pages will still have your name on it, they don't attempt to replace your mail address or copyright statements and the like. If your reputation is important to you, this site can damage it with the implication of shoddy workmanship, changes to the meaning of your web pages, and by embedding links and associating your pages to inappropriate sites. It can also hurt your business and frustrate your audience, if you have subscription or payment pages, as more advanced functions don't work via this site. Customers will attempt to make purchases and leave, believing your web site and services are incompetent or poor quality.

Table of Contents
Preventing Web Site Hijacking

Web thief and slime ball: Masafumi Hashimoto

The domain, SOKSOK.JP is listed as owned by: Masafumi Hashimoto. I e-mailed him, as have others who discovered their sites were hijacked, requesting my site be removed and have not received a reply. He is based in Ikebukuro. Hey, if you are nearby, please call or drop by and tell him "Tex said hi". (OK, I have learned the phone number does not exist. The fax number connects, but since I can't be sure if the number is legitimately the site owner's fax or someone else's number, we probably shouldn't harass them.)

[Slime Ball]         Masafumi Hashimoto
[Slime e-mail]       fs.corp111@lycos.ne.jp
[Postal code]        171-0014
[Postal Address]     Toshima-ku
                     3-26-25-622 Ikebukuro
[Phone]              03-3760-0464 (non-existent)
[Fax]                03-3981-2792

[Name Server]        ns.5th.jp (210.224.177.57)
[Name Server]        ns2.5th.jp (210.224.170.224)
I guess I could follow the current trend and publish a deck of web thief and slime ball playing cards. Where should we put Hashimoto in the deck?

Who is being ripped off?

As far as I can tell, the better question is who isn't. Besides my sites, I have found major and minor companies on this site. I suspect the sites that I could not access via http://pack.SOKSOK.JP had already discovered that their site was hijacked and countered it. The URL mappings look like those in the table below. Of course by the time you read this, I will have notified the web owners and they will likely have blocked access.

Hijacked Site URL Remapping Examples
Site NameOriginal URLURL under http://pack.SOKSOK.JP
Amazonhttp://www.amazon.comhttp://pack.SOKSOK.JP/x/.bc5/
eBayhttp://www.ebay.comhttp://pack.SOKSOK.JP/x/.pnb1/
HPhttp://www.hp.comhttp://pack.SOKSOK.JP/x/.5ud/
I18nGuyhttp://www.i18nguy.comhttp://pack.SOKSOK.JP/y/.hba8/
IESGhttp://www.ietf.org/iesg.htmlhttp://pack.SOKSOK.JP/y/.gi4/iesg.html
Oraclehttp://www.oracle.comhttp://pack.SOKSOK.JP/x/.db5/
Progress Softwarehttp://www.progress.comhttp://pack.SOKSOK.JP/y/.nxd1/
Sun Microsystemshttp://www.sun.comhttp://pack.SOKSOK.JP/x/.mkl/
Unicode Consortiumhttp://www.unicode.orghttp://pack.SOKSOK.JP/y/.g44/
W3Chttp://www.w3.orghttp://pack.SOKSOK.JP/y/.073/
XenCrafthttp://www.xencraft.comhttp://pack.SOKSOK.JP/x/.h857/

To find other places where this particular web criminal or his site have been discussed, search with any search engine for the domain SOKSOK.JP or his name Masafumi Hashimoto. Many site owners have discovered their sites under this domain.

Is your site being hijacked?

There are at least 3 ways to tell:

  • You can check if your web site is being ripped off, by using a web search engine and searching for both SOKSOK.JP and your domain name. This way is quick and practical if your site is popular and reasonably well-indexed by search engines. If on the resulting list of web pages you see a page that has a title exactly like one of yours, and a description that sounds like yours, and the domain is "http://pack.soksok.jp/", well BINGO!, that is your page mapped by SOKSOK.JP. Click on the link to confirm. The remapped URL for your domain will look like "http://pack.soksok.jp/" followed by "x/." or "y/." and a short string of letters and numbers, such as "hba8" or "073".
  • The second approach should work for all sites. First enter into your browser the string "http://pack.soksok.jp/x/". Then append to the string, your domain name, "http://<your domain name>". The resulting string will look like: "http://pack.soksok.jp/x/http://<your domain name>". For example, I used the following with my domain www.xencraft.com: "http://pack.soksok.jp/x/http://www.xencraft.com". Now, have your browser access the page. You will see a variation of your domain's home page. How similar it looks depends on your content and how the filters impact it. It may look identical, with a few links added at the top, or it may be partial or incomplete because the filters have removed formatting, javascript, flash or other multimedia, etc.

    If you can't see much of your home page, try again using a deeper link into one of your pages, instead of just the domain name. Use a page that is simple in architecture, with little advanced technology (e.g. without javascript, advanced formatting or multimedia, etc.), so it is more likely to display compeletely. It can also be helpful if the page has at least one link to another page on your site or the home page.

    If you view the source of the page, you should find a <BASE> statement in the <HEAD> portion of your page. For XenCraft, I saw: "<BASE href=http://pack.soksok.jp/x/.h857/>". So the XenCraft domain is mapped to: "http://pack.soksok.jp/x/.h857/". Any links on this page will be relative to this URL. If the page has a link to others on the site, click on one and you will see this URL appear in the browser address followed by the path to the page that you referenced. For example, this page "http://www.xencraft.com/resources/web-theft.html". would be shown as: "http://pack.soksok.jp/x/.h857/resources/web-theft.html".

  • There is a third way: Go to http://pack.soksok.jp/. The screen has a place to enter a "URL (www...)". Enter your web address and click "GO". If your site is shown, then it is being accessed through SOKSOK.JP. If you view the source of the page, you should find a <BASE> statement in the <HEAD> portion of your page. As described in the previous bullet, the <BASE> statement will contain the URL that maps to your domain.

Why is SOKSOK.JP doing this?

Various people claim that the purpose of SOKSOK.JP is to reformat pages and make them available for Japanese cell phones (DoCoMo). Some say, therefore, it isn't really stealing and they don't mind SOKSOK.JP's setup.

However, there are several other ways formatting for cell phones can be achieved. For example, SOKSOK.JP could make guidelines available for formatting pages for cell phones and perhaps even make XSLT style sheets or their filtering software available for web designers to use to create or migrate their pages to appropriate formats. I.E. Let people work cooperatively and constructively to create good cell phone pages. This approach works for WAI (W3C Web Accessibility Initiative), for example, who has educated web designers how to develop accessible pages.

Other sites redisplay or modify pages. Why is this site a problem?

Search engines cache web pages. They also perform conversions, for example from Acrobat pdf, Microsoft Powerpoint, and other file formats to HTML. Some sites provide translations of pages, with variable accuracy. Some sites perform internet archival making copies of other people's sites. Where is the harm? What is different about SOKSOK.JP?

  • When legitimate sites display pages, they introduce themselves. It is clear you are on their site, and you need to take some action to view a page that is not one of theirs. SOKSOK.JP does not introduce itself.
  • When legitimate sites change pages (translate, change format) they say so. If the changes may not accurately reflect the original content they say so. SOKSOK.JP does not do this.
  • When legitimate sites display ads, they clearly indicate and announce they are responsible for the ad. There is no confusion that ads or links might be part of other people's pages. SOKSOK.JP simply embeds links in their representation of other people's pages.
  • Search engines provide ways for sites to indicate they do not want to be indexed. (See Robots Exclusion.) SOKSOK.JP does not.
  • Legitimate sites have a home page describing themselves and what they do. SOKSOK.JP does not.
  • Legitimate sites respond to questions and complaints. SOKSOK.JP does not.
  • Legitimate sites try to support the latest standards. SOKSOK.JP does not. Their software often stops processing a page before it is completed (and then displays just a part of the page).
  • Legitimate sites try to display pages faithfully. SOKSOK.JP does not. Pages they display lack formatting, multimedia, dynamic behavior. This can be partially explained by suggesting they intentionally removed these, due to lack of support by DoCoMo. However, removing this information can change the meaning of the page signficantly, making it inappropriate for DoCoMo as well. They should allow users more controls over this or provide better substitution facilities.

Well ok, so their implementation is sloppy. Is that a problem? Let's look at the impact of SOKSOK.JP's implementation:

  • Some of the pages are missing important information and corrupted in other ways. Some don't work. Readers believe the work is the original author's, (their names and copyrights are left in place) so their reputation is damaged.
  • Links to porn and other sites are added, making an association with these sites and the author.
  • Pages with security, multimedia, payment, subscription, search, and other advanced facilities do not work. This impairs the business of the sites original owner. Readers of their web pages cannot subscribe etc. and the readers believe the web site is incompetently designed.
  • SOKSOK.JP allows the pages to be indexed, even though they could easily direct search-bots to not index the pages. If the intent was for the pages to be used for DoCoMo, they could easily insure they aren't indexed. However, they allow indexing, and instead of being sent to the legitimate site, potential audiences and business are directed to SOKSOK.JP's debilitated versions.
  • Along the same lines, some sites specify their pages should not be indexed by search engines. However, the pages on SOKSOK.JP are indexed. So a web author that does not want their pages searched or cached, now finds their pages suffer both thru SOKSOK.JP and as a result they are getting referrals they tried to prevent. (I read this complaint from Mariann in a comment to daniasdailies.com.
  • Perhaps less significant, but the owner's web server logs no longer contains the IP address of visitors. Instead the log reflects the IP address and other information of SOKSOK.JP's proxy server. The statistical information that is being lost is useful to owners.

Preventing Web Theft, Aggressive Access, or Harassment

There are several different ways to prevent web site hijacking, and more generally preventing an IP address or block of addresses from accessing your site if you don't want them to. I found the IP address of the proxy server that was hitting my web site by looking at my web server log. The IP address is w59st.5th.jp, which is 210.224.177.59. So blocking access by this IP address will stop the web site theft. To stop accesses by this IP address, I used the following commands to the Apache server in an .htaccess file, which I placed in my root directory on my www.I18nGuy.com site:

order allow,deny
deny from 210.224.177.59
allow from all

On my www.XenCraft.com site, I used:

order allow,deny
deny from w59st.5th.jp
allow from all

These commands tell the server to deny everything that isn't allowed (i.e. denial is the default), denies the IP address of the SOKSOK.JP proxy server, and then allows everything else. If they start moving IP addresses around, you can block more of the domain, for example with:

deny from 210.224.177.

You can learn more about Apache Directives and JavaScript Kit's Denying access with .htaccess.

Another approach is to add the following to the Unix file for denying access to certain protocols, /etc/hosts.deny:

ALL: 210.224.177.59

You can also add the IP address to the Apache configuration file httpd.conf. If you don't have Apache or access to the .htaccess or other files, the Ink-Stained Banana Blog has this PHP solution to deny access to SOKSOK.JP.

As I mentioned, I discovered this crime while using Google to find something else. I have since e-mailed Google, suggesting or rather requesting that they should not index sites such as SOKSOK.JP. Google's web site says they only act on requests to not index a site, from the site's owner. This is a perfectly reasonable policy in general. However, where a site is committing clearly provable theft and corrupting people's good works, well they shouldn't be aiding them. Although I have to admit, I also would not have learned of the problem, if it weren't for Google. Anyway, I haven't heard back from them yet. We will see what they decide.

I'd be interested to know of the right place to report these kind of problems to have them stopped. In the meantime, I hope this is helpful to those of you that find your site on the SOKSOK.JP domain.

ISPs

Wouldn't it be nice if ISPs proactively blocked this IP address, in behalf of the sites they host? I am not a fan of messages or web pages that say "forward this to everyone you know", so I won't say that. But if you are hosted, you might consider in addition to blocking your own site, asking the host to block the IP address, in behalf of all their other sites. There are many, here are just a few.