Recently an encoding issue of HTTP form POST requests with Tomcat was found in a project I work. The form submit was triggered with Javascript and the normal submit behavior was disabled. The form page was rendered with Content-Type: text/html; charset=UTF-8 header, the HTML head section had <meta charset="utf-8"> hint and the form was defined as normal <form> without any additional attributes. This should have resulted the form to posted with application/x-www-form-urlencoded content type using UTF-8 encoding. But no, this did not work with Tomcat, for unknown reason, though everything worked fine with Jetty application server.

The encoding issue originates from the HTTP/1.1 and Java servlet specifications. The servlet specification at least for versions 2.4 and 2.5 define ISO-8859-1 as the default character encoding for POST requests. I’m not sure what really causes the issue with Tomcat, but I think that it respects the spec fully and when a modern browser sends a POST request with the current page’s encoding leaving the actual content type away from the request, Tomcat assumes that it is encoded as ISO-8859-1. Tomcat FAQ covers the issue and suggests a servlet filter that sets the default encoding to UTF-8. Since Tomcat versions 5.5.36+, 6.0.36+ and 7.x the filter is part of Tomcat core, so it’s simple to adopt.

The environment setup of this project is so that all non developer environments deploy to Tomcat application servers, but developers use mainly embedded Jettys bootstrapped from their IDEs. Also Guice is used with its servlet module as the dependency injection framework. To make everything work without any burden to development process, the filter must be applied only on Tomcats and so that it does not require any extra configuration. Guice servlet module uses Java based configuration for servlet filters, so everything must be applied in runtime.

Lines 11-13 are the most important part of applying the filter. Because there might be other application servers than Tomcat, the filter class is loaded dynamically. If the class load fails, a warning is logged. This is fine for development environments, but if such warning is found in an environment where Tomcat is used, an action should be taken. Guice allows binding filters by type only, but there’s a catch. A servlet filter must always be a singleton, so binding a filter by type requires it to define @Singleton annotation, which of course is not possible for a third party class. That’s why a new instance is created for the binding. Guice uses Map<String, String> interface to set the init parameters for filters. Guava’s ImmutableMap is used here to create the required init configuration for the filter.

Posted in Java | Tagged , , ,
Share this post, let the world know

4 Comments

  1. bugsan
    Posted 2013/01/30 at 14:54 | Permalink

    ISO-8859-1 is (according to the standards at least) the default encoding of documents delivered via HTTP with a MIME type beginning with “text/”

    http://en.wikipedia.org/wiki/ISO/IEC_8859-1

    Your http POST message should contain a “Content-Encoding: UTF-8″ header. If it’s not specified, webservers and browsers assume ISO-8859-1.

  2. bugsan
    Posted 2013/01/30 at 15:00 | Permalink

    Sorry i meant:
    Content-Type: application/x-www-form-urlencoded; charset=UTF-8

  3. Tapio Rautonen
    Posted 2013/01/30 at 15:03 | Permalink

    @Bugsan
    The document was delivered with UTF-8 content type and it should be then the client’s responsibility to set the correct content encoding for the form submit request (in this case browser’s responsibility). But this is not always the case, quoted from Tomcat’s FAQ.

    Most web browsers today do not specify the character set of a request, even when it is something other than ISO-8859-1. This seems to be in violation of the HTTP specification. Most web browsers appear to send a request body using the encoding of the page used to generate the POST (for instance, the <form> element came from a page with a specific encoding… it is that encoding which is used to submit the POST data for that form).

  4. Posted 2013/02/10 at 11:34 | Permalink

    As a variant you can use Tomcat config files to set encoding for your application. But this doesn’t work sometimes.
    So the second variant is using of appropriate encoding-filter (CharacterEncodingFilter) in your @Configuration class, sure if you are rely on java based configuration approach

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">