Tuesday, June 13, 2006

Parse CSV upload

How easy can it be done?

CSV files are commaseparated files and can be opened and saved using Microsoft Excel. However, in most locales (at least US International and Europe) Excel uses a semicolon instead of a comma to separate fields. With the correct field separator the CSV files will be opened automatically in the right rows and columns in Excel. And saving an Excel table as CSV without using any wizard will deliver CSV files with the locale default field separation.

According to the RFC 4180 spec there are 3 important rules to keep in mind (you may also read "comma" as "semicolon"):

  1. Fields are separated by a comma.
  2. If a comma occurs within a field, then the field have to be surrounded by double quotes.
  3. If a double quote occurs within a field, then the field have to be surrounded by double quotes and the double quote within the field have to be escaped by another double quote.

Valid CSV records are:

field1,field2,field3
"field1,",field2,"fie""ld3"
"""field1""",",field2,",","","","""

Which should be parsed as:

field1    field2    field3
field1,   field2    fie"ld3
"field1"  ,field2,  ,",","

Back to top

Uploading a CSV file using IBM faces

Here is a basic example of the JSF code:

<hx:scriptCollector id="scriptCollector1">
    ...
    <h:form id="csvupload">
        <hx:fileupload 
            id="csvfile"
            binding="#{myBean.csvFile}"
            accept="application/vnd.ms-excel"
        />
        <h:message for="csvfile" />
        <h:commandButton
            value="upload"
            action="#{myBean.uploadCsvFile}"
        />
    </h:form>
    ...
</hx:scriptCollector>

The h:message is a placeholder for any errormessage from the hx:fileupload if the application type of the uploaded file don't match the MIME type application/vnd.ms-excel. If you don't need the accept attribute of the hx:fileupload, then just leave this attribute and the h:message away.

The relevant java code for the backing bean MyBean.java:

package mypackage;

import java.io.ByteArrayInputStream;
import java.util.List;

import com.ibm.faces.component.html.HtmlFileupload;
import com.ibm.faces.fileupload.util.ContentElement;

import net.balusc.util.CsvUtil;

public class MyBean {

    // Init --------------------------------------------------------------------------------------

    private HtmlFileUpload csvFile;

    // Actions -----------------------------------------------------------------------------------

    public void uploadCsvFile() {

        // Get uploaded csv file.
        ContentElement csvContent = (ContentElement) csvFile.getValue();

        // Process uploaded csv file.
        if (csvContent != null) {
            ByteArrayInputStream csvInput = new ByteArrayInputStream(csvContent.getContentValue());
            List<List<String>> csvList = CsvUtil.parseCsv(csvInput, ';');
            // Now you can do your thing with the CSV List.
        } else {
            // Empty file error, do your thing.
        }
    }

    // Getters -----------------------------------------------------------------------------------

    public HtmlFileupload getCsvFile() {
        return csvFile;
    }
    
    // Setters -----------------------------------------------------------------------------------

    public void setCsvFile(HtmlFileupload csvFile) {
        this.csvFile = csvFile;
    }

}

The useful CsvUtil class is described in the next chapter:

Back to top

Parsing and formatting the CSV file

This kind of a CSV parser accepts an InputStream which can be feeded by a file upload, reading a local file or any other kind of a binary stream. And it returns a two-dimensional List. The first List contains the CSV records in the appropriate order and the second List contains the fields of each CSV record in the appropriate order.

Formatting a CSV is also relatively easy: just turn the parser around. The CSV formatter should accept a two-dimensional List and return an InputStream with which you can do anything. Saving to a file or streaming it to the response as a file download.

/*
 * net/balusc/util/CsvUtil.java
 * 
 * Copyright (C) 2006 BalusC
 * 
 * This program is free software: you can redistribute it and/or modify it under the terms of the
 * GNU Lesser General Public License as published by the Free Software Foundation, either version 3
 * of the License, or (at your option) any later version.
 * 
 * This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without
 * even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 * Lesser General Public License for more details.
 * 
 * You should have received a copy of the GNU Lesser General Public License along with this library.
 * If not, see <http://www.gnu.org/licenses/>.
 */

package net.balusc.util;

import java.io.BufferedReader;
import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;

/**
 * Useful CSV utilities.
 * 
 * @author BalusC
 * @link http://balusc.blogspot.com/2006/06/parse-csv-upload.html
 */
public class CsvUtil {

    // Init ---------------------------------------------------------------------------------------

    // Defaults.
    private static final char DEFAULT_CSV_SEPARATOR = ',';
    private static final String DEFAULT_LINE_SEPARATOR = "\r\n"; // CRLF.

    private CsvUtil() {
        // Utility class, hide the constructor.
    }

    // Parsers ------------------------------------------------------------------------------------

    /**
     * CSV content parser. Convert an InputStream with the CSV contents to a two-dimensional List
     * of Strings representing the rows and columns of the CSV. Each CSV record is expected to be
     * separated by the default CSV field separator, a comma.
     * @param csvInput The InputStream with the CSV contents.
     * @return A two-dimensional List of Strings representing the rows and columns of the CSV.
     */
    public static List<List<String>> parseCsv(InputStream csvInput) {
        return parseCsv(csvInput, DEFAULT_CSV_SEPARATOR);
    }

    /**
     * CSV content parser. Convert an InputStream with the CSV contents to a two-dimensional List
     * of Strings representing the rows and columns of the CSV. Each CSV record is expected to be
     * separated by the specified CSV field separator.
     * @param csvInput The InputStream with the CSV contents.
     * @param csvSeparator The CSV field separator to be used.
     * @return A two-dimensional List of Strings representing the rows and columns of the CSV.
     */
    public static List<List<String>> parseCsv(InputStream csvInput, char csvSeparator) {

        // Prepare.
        BufferedReader csvReader = null;
        List<List<String>> csvList = new ArrayList<List<String>>();
        String csvRecord = null;

        // Process records.
        try {
            csvReader = new BufferedReader(new InputStreamReader(csvInput, "UTF-8"));
            while ((csvRecord = csvReader.readLine()) != null) {
                csvList.add(parseCsvRecord(csvRecord, csvSeparator));
            }
        } catch (IOException e) {
            throw new RuntimeException("Reading CSV failed.", e);
        } finally {
            if (csvReader != null) try {
                csvReader.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }

        return csvList;
    }

    /**
     * CSV record parser. Convert a CSV record to a List of Strings representing the fields of the
     * CSV record. The CSV record is expected to be separated by the specified CSV field separator.
     * @param record The CSV record.
     * @param csvSeparator The CSV field separator to be used.
     * @return A List of Strings representing the fields of each CSV record.
     */
    private static List<String> parseCsvRecord(String record, char csvSeparator) {

        // Prepare.
        boolean quoted = false;
        StringBuilder fieldBuilder = new StringBuilder();
        List<String> fields = new ArrayList<String>();

        // Process fields.
        for (int i = 0; i < record.length(); i++) {
            char c = record.charAt(i);
            fieldBuilder.append(c);

            if (c == '"') {
                quoted = !quoted; // Detect nested quotes.
            }

            if ((!quoted && c == csvSeparator) // The separator ..
                || i + 1 == record.length()) // .. or, the end of record.
            {
                String field = fieldBuilder.toString() // Obtain the field, ..
                    .replaceAll(csvSeparator + "$", "") // .. trim ending separator, ..
                    .replaceAll("^\"|\"$", "") // .. trim surrounding quotes, ..
                    .replace("\"\"", "\""); // .. and un-escape quotes.
                fields.add(field.trim()); // Add field to List.
                fieldBuilder = new StringBuilder(); // Reset.
            }
        }

        return fields;
    }

    // Formatters --------------------------------------------------------------------------------

    /**
     * CSV content formatter. Convert a two-dimensional List of Objects to a CSV in an InputStream.
     * The value of each Object will be obtained by its toString() method. The fields of each CSV 
     * record will be separated by the default CSV field separator, a comma.
     * @param csvList A two-dimensional List of Objects representing the rows and columns of the
     * CSV.
     * @return The InputStream containing the CSV contents (actually a ByteArrayInputStream).
     */
    public static <T extends Object> InputStream formatCsv(List<List<T>> csvList) {
        return formatCsv(csvList, DEFAULT_CSV_SEPARATOR);
    }

    /**
     * CSV content formatter. Convert a two-dimensional List of Objects to a CSV in an InputStream.
     * The value of each Object will be obtained by its toString() method. The fields of each CSV
     * record will be separated by the specified CSV field separator.
     * @param csvList A two-dimensional List of Objects representing the rows and columns of the
     * CSV.
     * @param csvSeparator The CSV field separator to be used.
     * @return The InputStream containing the CSV contents (actually a ByteArrayInputStream).
     */
    public static <T extends Object> InputStream formatCsv(List<List<T>> csvList, char csvSeparator) {

        // Prepare.
        StringBuilder csvContent = new StringBuilder();

        // Process records.
        for (List<T> csvRecord : csvList) {
            if (csvRecord != null) {
                csvContent.append(formatCsvRecord(csvRecord, csvSeparator));
            }

            // Add default line separator.
            csvContent.append(DEFAULT_LINE_SEPARATOR);
        }

        return new ByteArrayInputStream(csvContent.toString().getBytes());
    }

    /**
     * CSV record formatter. Convert a List of Objects representing the fields of a CSV record to a
     * String representing the CSV record. The value of each Object will be obtained by its
     * toString() method. The fields of the CSV record will be separated by the specified CSV field
     * separator.
     * @param csvRecord A List of Objects representing the fields of a CSV reecord.
     * @param csvSeparator The CSV field separator to be used.
     * @return A String representing a CSV record.
     */
    private static <T extends Object> String formatCsvRecord(List<T> csvRecord, char csvSeparator) {

        // Prepare.
        StringBuilder fields = new StringBuilder();
        String separator = String.valueOf(csvSeparator);

        // Process fields.
        for (Iterator<T> iter = csvRecord.iterator(); iter.hasNext();) {
            T object = iter.next();

            if (object != null) {
                String field = object.toString().replace("\"", "\"\""); // Escape quotes.

                if (field.contains(separator) || field.contains("\"")) {
                    field = "\"" + field + "\""; // Surround with quotes.
                }

                fields.append(field);
            }

            if (iter.hasNext()) {
                fields.append(separator); // Add field separator.
            }
        }

        return fields.toString();
    }

}
Back to top

CSV formatting example

Here is an example how to create, format and save a CSV file quickly:

package mypackage;

import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

import net.balusc.util.CsvUtil;
import net.balusc.util.FileUtil;

public class Test {

    public static void main(String[] args) throws IOException {

        // Create CSV.
        List<List<String>> csvList = new ArrayList<List<String>>();
        csvList.add(Arrays.asList("field1", "field2", "field3"));
        csvList.add(Arrays.asList("field1,", "field2", "fie\"ld3"));
        csvList.add(Arrays.asList("\"field1\"", ",field2,", ",\",\",\""));

        // Format CSV.
        InputStream csvInput = CsvUtil.formatCsv(csvList, ';');

        // Save CSV.
        FileUtil.write(new File("c:/test.csv"), csvInput);
    }

}

You can find the FileUtil here by the way.

Back to top

Copyright - GNU Lesser General Public License

(C) June 2006, BalusC

13 comments:

Sudeep Agrawal said...

Thanks for the link BalusC.

Ashok Iyer said...

Hello BalusC, I have the foll requirement...

I have a .jsp page with the hx:fileUpload feature. Using this, I am able to browse to the location of my file on my PC.

I need to get the String Value of the file selected which is useful for me to read the contents. I see that hx:fileProp "fileName" looks somewhat close but I am not sure how this could be used.

Could you please guide me to get the String path Value. As an example,

c:\Ashok\input.xls

Thanks,
Ashok.

Bauke Scholtz said...

@Ashok: you do not need to get the full path. Only the file name is sufficient. You can get the file contents by getValue().

Ashok Iyer said...

BalusC,

I played around the code and found a solution to my problem. They wasy to get the path is by using a String value against fileProp name = "fileName"

hx:fileupload styleClass="fileupload" id="fileupload1" size="100"
value="#{bkghandler.file}"
hx:fileProp name="fileName" value="#{bkghandler.fileName}"
hx:fileProp name="contentType" value="#{bkgHandler.fileType}"
hx:fileupload>"

I was never using the fileProp value clause. Now things look fine till I get to the next hurdle.

Thanks,
Ashok.

Bauke Scholtz said...

Keep in mind that some clients doesn't send the full file path along it, but only the file name. If I recall correctly, only IE and Opera sends the full file path while FF and Safari only sends the file name. Your code logic should not rely on the file path.

Unknown said...

Hurdle: What if i am having multiple hs:fileupload element depend on n no of files.

if i put

<hx:fileProp name="fileName" value="#{datatableitem.fileName}">

<hx:fileProp name="contentType" value="#{datatableitem.fileType}">

I am getting last fileUplaod element's value in all the filename and filetype attribute in list of beans.. i dont know why ??

Unknown said...

I have a question about the hx:fileUpload tag.

I've tried to use id and it works ok as long as I use a http request. If I try co do this with ajax (hx:ajaxRefreshSubmit), the value of the field is always null. Is there some way I can do this?

Unknown said...

Hello,

Please help.

I downloaded UsingDatatables.war and tried placing it in the tomcat (5.5.28)webapps folder.

I try to run http://localhost:8080/UsingDatatables/
But the 404 error is displayed.

This is my first JSF try.

Let me know how to successfully view the tables demo.

Thanks
Mamatha

Sachin said...

Hi Friends,
I have an issue with the file upload component. i have a file upload on my page, now if i dont change on the page and upload the file it works fine.
If any of the component on the page is change, like i select a value in dropdown and then i go and upload the file the filename comes as NULL. If i try it again immediately then the file name comes correct. Please let me know the suggestions to fix this .

My Useful Blogger said...

How to keep file selected if any other validation errors?

Shilpa said...

hi ,
my requirement is to upload the .xls file directly as an input by htmlparser using the java source code.

help me out in this.

anna said...

is multiple file upload supported??

Lo Alex said...

I found there have a bug in this function, field cannot be added when the last char is 'comma', it cause me a java.lang.IndexOutOfBoundsException: Index: 6, Size: 6 when I need to get the index of 6 of values(the last index 6 of value is an "" value). Please find my fix...

private static List parseCsvRecord(String record, char csvSeparator) {

// Prepare.
boolean quoted = false;
StringBuilder fieldBuilder = new StringBuilder();
List fields = new ArrayList();

// Process fields.
for (int i = 0; i < record.length(); i++) {
char c = record.charAt(i);
fieldBuilder.append(c);

if (c == '"') {
quoted = !quoted; // Detect nested quotes.
}

if ((!quoted && c == csvSeparator) // The separator ..
|| i + 1 == record.length()) // .. or, the end of record.
{
String field = fieldBuilder.toString() // Obtain the field, ..
.replaceAll(csvSeparator + "$", "") // .. trim ending separator, ..
.replaceAll("^\"|\"$", "") // .. trim surrounding quotes, ..
.replace("\"\"", "\""); // .. and un-escape quotes.
fields.add(field.trim()); // Add field to List.

>>>fix start>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
if(i + 1 == record.length() && c == csvSeparator){
fields.add("");
}

>>fix end>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

fieldBuilder = new StringBuilder(); // Reset.
}
}

return fields;
}