Saturday, December 1, 2007

WhitespaceFilter

Whitespace

Whitespace is used everywhere. It covers spaces, tabs and newlines. It is used to distinguish lexical tokens from each other and also to keep the source code readable for the developer. But in case of HTML over network, whitespace costs bandwidth and therefore in some circumstances also money and/or performance. If you care about the bandwidth usage and/or the money and/or performance, then you can consider to trim off all whitespace of the HTML response. The only con is that it makes the HTML source code at the client side almost unreadable.

You can trim whitespace right in the HTML files (or JSP or JSF or whatever view you're using, as long as it writes plain HTML response), but that would make the source code unreadable for yourself. Better way is to use a Filter which trims the whitespace from the response.

Back to top

Replace response writer

Here is how such a WhitespaceFilter can look like. It is relatively easy, it actually replaces the writer of the HttpServletResponse with a customized implementation of PrintWriter. This implemetation will trim whitespace off from any strings and character arrays before writing it to the response stream. It also take care of any <pre> and <textarea> tags and keep the whitespace of its contents unchanged. However it doesn't care about the CSS white-space: pre; property, because it would involve too much work to check on that (parse HTML, lookup CSS classes, sniff the appropriate style and parse it again, etc). It isn't worth that effort. Just use <pre> tags if you want to use preformatted text ;)

Note that this filter only works on requests which are passed through a servlet which writes the response to the PrintWriter, e.g. JSP and JSF files (parsed by JspServlet and FacesServlet respectively) or custom servlets which uses HttpServletResponse#getWriter() to write output. This filter does not work on requests for plain vanilla CSS, Javascript, HTML files and images and another binary files which aren't written through the PrintWriter, but through the OutputStream. If you want to implement the same thing for the OutputStream, then you'll have to check the content type first if it starts with "text" or not, otherwise binary files would be screwed up. Unfortunately in real (at least, in Tomcat 6.0) the content type is set after the output stream is acquired, thus we cannot determine the content type during acquiring the output stream.

The stuff is tested in a Java EE 5.0 environment with Tomcat 6.0 with Servlet 2.5, JSP 2.1, JSTL 1.2 and JSF 1.2_06.

/*
 * net/balusc/webapp/WhitespaceFilter.java
 * 
 * Copyright (C) 2007 BalusC
 * 
 * This program is free software; you can redistribute it and/or modify it under the terms of the
 * GNU General Public License as published by the Free Software Foundation; either version 2 of the
 * License, or (at your option) any later version.
 * 
 * This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without
 * even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
 * General Public License for more details.
 * 
 * You should have received a copy of the GNU General Public License along with this program; if
 * not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
 * 02110-1301, USA.
 */

package net.balusc.webapp;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.PrintWriter;
import java.io.StringReader;

import javax.servlet.Filter;
import javax.servlet.FilterChain;
import javax.servlet.FilterConfig;
import javax.servlet.ServletException;
import javax.servlet.ServletRequest;
import javax.servlet.ServletResponse;
import javax.servlet.http.HttpServletResponse;
import javax.servlet.http.HttpServletResponseWrapper;

/**
 * This filter class removes any whitespace from the response. It actually trims all leading and 
 * trailing spaces or tabs and newlines before writing to the response stream. This will greatly
 * save the network bandwith, but this will make the source of the response more hard to read.
 * <p>
 * This filter should be configured in the web.xml as follows:
 * <pre>
 * &lt;filter&gt;
 *     &lt;description&gt;
 *         This filter class removes any whitespace from the response. It actually trims all
 *         leading and trailing spaces or tabs and newlines before writing to the response stream.
 *         This will greatly save the network bandwith, but this will make the source of the
 *         response more hard to read.
 *     &lt;/description&gt;
 *     &lt;filter-name&gt;whitespaceFilter&lt;/filter-name&gt;
 *     &lt;filter-class&gt;net.balusc.webapp.WhitespaceFilter&lt;/filter-class&gt;
 * &lt;/filter&gt;
 * &lt;filter-mapping&gt;
 *     &lt;filter-name&gt;whitespaceFilter&lt;/filter-name&gt;
 *     &lt;url-pattern&gt;/*&lt;/url-pattern&gt;
 * &lt;/filter-mapping&gt;
 * </pre>
 *
 * @author BalusC
 * @link http://balusc.blogspot.com/2007/12/whitespacefilter.html
 */
public class WhitespaceFilter implements Filter {

    // Constants ----------------------------------------------------------------------------------

    // Specify here where you'd like to start/stop the trimming.
    // You may want to replace this by init-param and initialize in init() instead.
    static final String[] START_TRIM_AFTER = {"<html", "</textarea", "</pre"};
    static final String[] STOP_TRIM_AFTER = {"</html", "<textarea", "<pre"};

    // Actions ------------------------------------------------------------------------------------

    /**
     * @see Filter#init(FilterConfig)
     */
    public void init(FilterConfig config) throws ServletException {
        //
    }

    /**
     * @see Filter#doFilter(ServletRequest, ServletResponse, FilterChain)
     */
    public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain)
        throws IOException, ServletException
    {
        if (response instanceof HttpServletResponse) {
            HttpServletResponse httpres = (HttpServletResponse) response;
            chain.doFilter(request, wrapResponse(httpres, createTrimWriter(httpres)));
        } else {
            chain.doFilter(request, response);
        }
    }

    /**
     * @see Filter#destroy()
     */
    public void destroy() {
        //
    }

    // Utility (may be refactored to public utility class) ----------------------------------------

    /**
     * Create a new PrintWriter for the given HttpServletResponse which trims all whitespace.
     * @param response The involved HttpServletResponse.
     * @return A PrintWriter which trims all whitespace.
     * @throws IOException If something fails at I/O level.
     */
    private static PrintWriter createTrimWriter(final HttpServletResponse response)
        throws IOException
    {
        return new PrintWriter(new OutputStreamWriter(response.getOutputStream(), "UTF-8"), true) {
            private StringBuilder builder = new StringBuilder();
            private boolean trim = false;

            public void write(int c) {
                builder.append((char) c); // It is actually a char, not an int.
            }

            public void write(char[] chars, int offset, int length) {
                builder.append(chars, offset, length);
                this.flush(); // Preflush it.
            }

            public void write(String string, int offset, int length) {
                builder.append(string, offset, length);
                this.flush(); // Preflush it.
            }

            // Finally override the flush method so that it trims whitespace.
            public void flush() {
                synchronized (builder) {
                    BufferedReader reader = new BufferedReader(new StringReader(builder.toString()));
                    String line = null;

                    try {
                        while ((line = reader.readLine()) != null) {
                            if (startTrim(line)) {
                                trim = true;
                                out.write(line);
                            } else if (trim) {
                                out.write(line.trim());
                                if (stopTrim(line)) {
                                    trim = false;
                                    println();
                                }
                            } else {
                                out.write(line);
                                println();
                            }
                        }
                    } catch (IOException e) {
                        setError();
                        // Log e or do e.printStackTrace() if necessary.
                    }

                    // Reset the local StringBuilder and issue real flush.
                    builder = new StringBuilder();
                    super.flush();
                }
            }

            private boolean startTrim(String line) {
                for (String match : START_TRIM_AFTER) {
                    if (line.contains(match)) {
                        return true;
                    }
                }
                return false;
            }

            private boolean stopTrim(String line) {
                for (String match : STOP_TRIM_AFTER) {
                    if (line.contains(match)) {
                        return true;
                    }
                }
                return false;
            }
        };
    }

    /**
     * Wrap the given HttpServletResponse with the given PrintWriter.
     * @param response The HttpServletResponse of which the given PrintWriter have to be wrapped in.
     * @param writer The PrintWriter to be wrapped in the given HttpServletResponse.
     * @return The HttpServletResponse with the PrintWriter wrapped in.
     */
    private static HttpServletResponse wrapResponse(
        final HttpServletResponse response, final PrintWriter writer)
    {
        return new HttpServletResponseWrapper(response) {
            public PrintWriter getWriter() throws IOException {
                return writer;
            }
        };
    }

}

WhitespaceFilter configuration in web.xml:


    <filter>
        <description>
            This filter class removes any whitespace from the response. It actually trims all
            leading and trailing spaces or tabs and newlines before writing to the response stream.
            This will greatly save the network bandwith, but this will make the source of the
            response more hard to read.
        </description>
        <filter-name>whitespaceFilter</filter-name>
        <filter-class>net.balusc.webapp.WhitespaceFilter</filter-class>
    </filter>
    <filter-mapping>
        <filter-name>whitespaceFilter</filter-name>
        <url-pattern>/*</url-pattern>
    </filter-mapping>

That's all, folks!

Back to top

Copyright - GNU General Public License

(C) December 2007, BalusC

8 comments:

Unknown said...

Great post. I was looking at the PrintWriter.java source and noticed that all of the write() methods are synchronized around a lock object. Is there any reason that you didn't do this in your example ??

Unknown said...

Hi, this is really useful! Thanks :)
However, there's one tiny bug, that can screw up a html output a bit. Namely, usually page is divided into a bunch of separate responses and in some cases it's not good to trim all whitespaces.


So, let's say reponse nr 1 ends with

input typ

and response nr 2 starts with

e="hidden"

So it's ok, because when we trim all whitespaces it will become

input type="hidden" ...

BUT, if response 1 ends with

input

and response 2 starts with

type="hidden"

then it's not ok. Because after trimming we receive in html output something like this:

inputtype="hidden"

and browser cannot parse that.

It's not that common, but it can happen, and it happened in my environment.
The solution for this is simple. I've just added one additional if clause:

(...)
} else if (trim) {
if (line.endsWith(" ")) { out.write(line.trim() + " ");
} else { out.write(line.trim());
}

if (stopTrim(line)) {
(...)

Matt said...
This comment has been removed by the author.
Matt said...

@pwoszczek: thank you very much for your comment. Turns out I needed to add that line as well - but that wasn't sufficient. I also had to handle the same case at the start of the line, like this:


(...)
} else if (trim) {
if (line.endsWith(" ")) {
out.write(line.trim() + " ");
} else if (line.startsWith(" ")) {
out.write(" " + line.trim());

} else {
out.write(line.trim());
}
if (stopTrim(line)) {
(...)

Unknown said...

I *JUST* found an article about trimming whitespace in Tomcat by adding this to web.xml:

note: not sure how to post code in the comments -- i'm replacing greater than and less than brackets with square brackets

[init-param]
[param-name]trimSpaces[/param-name]
[param-value]true[/param-value]
[/init-param]


Does this filter do anything beyond what Tomcat provides?

BalusC said...

@Joshua: the trimSpaces setting only remove the whitespace which is left behind after parsing JSP scritplets and taglibs. It won't trim ALL whitespace. There this filter is for. You can determine it yourself by judging the generated HTML source in webbrowser (rightclick, View Source).

J said...

Awesome. One issue I ran into is if the page contains JavaScript that uses // comments, the trimmer breaks the scripts by collapsing everything after the // comment onto the same line. The workaround is to either move all JavaScript to external files, or change all comments to /* */ style.

Ken Cafe said...

This is really helpful. And I thank @Matt and @pwoszczek for their solutions. However i have already add the changes below before seeing the solutions by @Matt and @pwoszczek .
.
.
.
} else if (trim) {
/*Start custom code*/
if(line.length() > 1){
//pad conditionally
boolean padLeft = false;
boolean padRight = false;
Character leftChar = line.charAt(0);
Character rightChar = line.charAt(line.length()-1);
line = line.trim();

if(Character.isWhitespace(leftChar)){line = " "+line;}
if(Character.isWhitespace(rightChar)){line = line+" ";}
}
/*stop custom code*/
out.write(line);
if (stopTrim(line)) {
trim = false;
}
} else {
.
.
.