Skip to content

ystk/debian-libapache-mod-encoding

Repository files navigation

=head1 NAME

  mod_encoding - Apache module for non-ascii filename interoperability

=head1 SYNOPSIS

 # in httpd.conf
 LoadModule headers_module  libexec/mod_headers.so
 LoadModule encoding_module libexec/mod_encoding.so

 AddModule mod_headers.c
 AddModule mod_encoding.c

 <IfModule mod_headers.c>
  Header add MS-Author-Via "DAV"
 </IfModule>

 <IfModule mod_encoding.c>
  EncodingEngine    on
  NormalizeUsername on
  SetServerEncoding     UTF-8
  DefaultClientEncoding JA-AUTO-SJIS-MS SJIS

  AddClientEncoding "cadaver/" EUC-JP
 </IfModule>

=head1 DESCRIPTION 

This module improves non-ascii filename interoperability of
apache (and mod_dav).

It seems many WebDAV clients send filename in its platform-local
encoding. But since mod_dav expects everything, even HTTP request
line, to be in UTF-8, this causes an interoperability problem.

I believe this is a future issue for specification (RFC?) to
standardize encoding used in HTTP request-line and HTTP header,
but life would be much easier if mod_dav (and others) can handle
various encodings sent by clients, TODAY. This module does just that.

=head1 REQUIREMENTS

This module requires iconv(3) support.

If your system don't have it (ex. *BSD platform), try using
iconv(3) implementation available from

  http://clisp.cons.org/~haible/packages-libiconv.html

This worked for me on BSD/OS 4.1.

=head1 INSTALLATION

Standard procedure of

  $ ./configure --with-apxs=<path-to-apxs>
  $ make
  $ make install

should work. If you fail with "make install", just copy
created mod_encoding.so to where standard Apache DSO modules reside.

Or, you can use Makefile.simple that comes with the package.
If you have problem with configure, it is recommended to manually
edit Makefile.simple and use apxs(1) directly, as that will make
things much simpler.

Now, if configure doesn't work and you don't have working apxs,
you have a problem. In that case, you should probably consult
the apache documentation and find how you can integrate a module
into the server.

=head1 CONFIGURATION

This module adds following directives: EncodingEngine, SetServerEncoding,
AddClientEncoding, DefaultClientEncoding, and NormalizeUsername.

=over 4

=item EncodingEngine (on|off)

This directive either enables or disables this module.

=item SetServerEncoding <encoding>

This directive specifies encoding used by local filesystem.
Whenever mod_dav is requested to create file or folder, its
name will be converted into this encoding.

However, since mod_dav does not (yet) supports encoding other
than UTF-8 for local filesystem, you should better set this
to "UTF-8", unless you have apply separately available patch
to mod_dav.

=item AddClientEncoding <agent-name> <encoding> [<encoding> ...]

This is a directive to specify encoding(s) expected from each
client implementation.

Though WebDAV clients are expected (or at least recommended,
I believe) to send every data in UTF-8 or any other properly
detectable style, some (many?) clients seems to send data in
non-auto-detectable, platform-local encoding, thus breaking
interoperability.

You can use extended regexp to name the agent.

Note you should never use ".*" for agent name.
In that case, use DefaultClientEncoding instead.

=item DefaultClientEncoding <encoding> [<encoding> ...]

This directive sets default set of encoding(s) to expect
from various clients in general. Note you have no need
to specify "UTF-8", as that is the implicit default.

=item NormalizeUsername (on|off)

This directive is introduced to support behavior of WindowsXP
when accessing password-protected resource. For some reason,
it prepends "hostname\" to real username, and no server can
handle such extension. Enabling this option strips off "hostname\"
part, so only "real" username is passed to authentication module.

=back

=head1 SUPPORTED ENCODINGS

This module supports all encoding supported by underlying
iconv(3) implementation. You might want to try "iconv -l",
as it might give you the list of encoding names.

Also, if you have installed and linked iconv_hook extension,
you should be able to use following encoding names additionally:

  MSSJIS
  - This is almost same as SJIS, but is a Microsoft varient of it.

  JA-AUTO-SJIS-MS
  - This is a special converter which does autodetection between
    UTF-8/JIS/MSSJIS/SJIS/EUC-JP. This itself does not do conversion.

=head1 INFORMATIONAL NOTES

This is an informational note for developers.

Today, as people around the world start to exchange information in
many languages, many protocols are now required and so beginning to
consider to be i18n (internationalization) compliant. WebDAV is not
the exception.

WebDAV selected XML as its data exchange format, and XML is
one of the most well-defined formats including this i18n issue.

However, because past standards/implementations (ex. DNS, HTTP,
etc.) did not care much about this issue, many HTTP/WebDAV
clients seem to break when used in i18n (= non-ascii) environment.

For WebDAV, there usually is no problem for its XML content part.
But HTTP header part seems to be broken in many implementations.
Here, I will describe several often observed problems with possible
solution(s).

  [Problem in PUT operation]

  Consider the situation when one WebDAV client tries to save a
  file which has non-ASCII filename. First, it sends out filename
  in HTTP header:

    PUT /<non-ascii-filename> HTTP/1.1

  Current standard only asks clients not to use non-ASCII encoding
  in HTTP header. So many clients simply url-escapes the name in
  %xx style and get away with it (some doesn't even bother to
  escape, which is really broken).

  Now, when server receives PUT request, it'll unescape (if escaped)
  given filename and then saves the file in that name. This is the
  first point of interoperability problem.

  As server has no idea what encoding or charset this filename belongs
  to, it cannot apply proper conversion to make sure all filenames are
  aligned to encoding or charset supported by server-side filesystem.

  Without information on filename charset and encoding, even a simple
  "ls" or "PROPFIND" is most likely to generate unreadable garbage.

  [Problem in PROPFIND operation]

  Next interoperability problem arises when response to PROPFIND
  request is sent. As server has no idea on charset and encoding
  used, all names will be included in XML-formatted response
  without (or with improper) encoding information.

  Obviously, as defined by XML spec, this causes XML parser used
  at client side to abort. Even if it didn't (which means non-compliant
  parser), the chance for client to show/handle filename correctly
  is scarce.

  [Problem in MOVE operation]

  Another problem arises when client tries to rename file.
  To rename file, client sends out request in following format:

    MOVE /<old-non-ascii-filename> HTTP/1.1
    Destination: /<new-non-ascii-filename>

  This is a same problem as PUT operation problem. Client simply
  passes filenames in its platform-local encoding (or url-escaped
  string of that), and server just cannot handle it.

  [Possible Solution]

  Solution to this interoperability problem is rather straight-
  forward. Practically there're two ways to do it:

    1. Always pass charset information. You can also you charset
       encoding scheme which contains such information by default.

       With proper charset information, both client and server
       can interoperate safely by converting encoding whenever
       needed.

    2. Always use single charset encoding, which can describe any
       character in any language.

       Though current Unicode standard has many pitfalls and has been
       criticized by many people working on i18n issue, this is
       obviously a faster way because you don't have to know anything
       about charset encoding.

  While XML took the way to support both schemes, many protocols (DNS,
  HTTP, etc) seem to be going with method #2.

  So if you're implementing WebDAV client/server and want to do a
  quick hack to support i18n filename, try

    a. Always convert outgoing string to UTF-8 (and url-escape
       it whenever needed).
    b. Always convert incoming string to platform-local encoding.

  If both client and server do the same, they will be interoperable.

=cut

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published