Excel2CSV: Convert MS Excel Worksheets to CSV

Excel2CSV: Convert MS Excel Worksheets to CSV

Posted by: Informatica Professional Services

Java utility that generates csv files by continuous scanning an input directory for MS Excel files. It is supported on Windows, Linux, AIX, Solaris Operating systems.

Overview

ExcelSheet2Csv polls a directory and reads all excel files as and when available. It converts each sheet in excel file to delimited csv files. It is supported on Windows, Linux, AIX, Solaris Operating system. It supports all excel file versions. Processing steps:
  1. Polls a given location for any excel files. Polling can be configured based on user requirements and this utility will run as a service until terminated explicitly. The default polling duration is 120 seconds (2 minutes).
  2. Whenever a new excel is dropped in the folder, the utility will create a folder and name it according to excel file name and extension. Then it will generate a csv file for each sheet in the excel file.
  3. Utility makes sure that it doesn?t extract same file twice.
E.g. If there is a file named ?Informatica.xls? and it has 2 sheets [Sheet1 and Sheet2]. Then, this utility will create a folder named ?xls_ Informatica? in same directory as that of excel file and generate 2 csv files named Sheet1.csv and Sheet2.csv. End-user can dictate what should be the output format in case of default date-format. The steps to do this are:
  1. Use latest version of the utility (attached). Please remove .txt from its extension.
  2. Copy attached date-format properties file (dataFormat.properties) to /some/location/on/client/machine/dataFormat.properties of client machine
  3. Pass -dtf /some/location/on/client/machine/dataFormat.properties as additional parameter to regular command.
  4. For e.g.java -jar excel2csv_2.1.jar -delim "|" -efd C:\Users\abcd\work\MyWork\Excel2CSV -force -tq D -wt 0 -dtf C:\Users\abcd\workspaces\ExcelToCsvConverter\ExcelToCsvConverter\utils\dateFormat.properties
Options and their usages:java ?jar excel2csv.jar [-delim <delimiter>] -efd <excel_dir_full_path> [-force] [-help] [-infinite] [-version] [-wt <time_in_seconds>]
  1. -delim <delimiter>: Delimiter for the output file. User is free to specify any delimiter enclosed by double quotes. E.g. ?,?, ?|?
  2. -efd <excel_dir_full_path>: Full path of excel file directory. Utility will poll for new excel files in this location. Also, files are extracted in this location.
  3. ?force: Utility makes sure that same excel file is not extracted again. However, in some scenarios it becomes necessary to re-extract all files. To override default behavior, user can use this option.
  4. ?infinite: Utility is capable of polling for an infinite duration. User can implement infinite polling using this option.
  5. -wt <time_in_seconds>: It is an alternate to infinite polling. User can provide time, in seconds. Utility will poll for new excel files based on this number. By default, utility polls for 120 seconds.
Support for new requirements:
  • Extraction of selected sheets.  
    • For this, users might want to use -idx option. 
    • End users can pass sheet indices like this -> -idx 0 2 3 4 OR -idx 0-4 5 6. 
    • In first case, 1st, 3rd, 4th, and 5th sheet will be extracted. In second case, first 5 sheet (0 to 4), 6th and 7th sheets will be extracted
  • Extracted files to be placed in output directory of user's choice instead of default directory.  
    • For this, users might want to use -ofd option.
    • E.g. -ofd /some/location/on/disk. 
    • Note: workbooks will still be extracted under their folders (as earlier) i.e /some/location/on/disk/xlsx_workbook/sheet1.csv
  • File names for generated files. 
    • -sfn can be used to achieve this.
    • Syntax : -sfn key1=value1 key2=value2 ... keyN=valueN. Where key = sheet-index (0-based index) and value = file name for that sheet.

Features

Support for xls (1997-2003) and xlsx (newer version) excel files as input.Support for user defined output delimiter. E.g. ?,?, ?|? etc.Support for infinite polling. With infinite polling in place, user need not intervene in extraction process and hence making it more automatable. Support for timed waiting. This is an alternative to infinite polling.Utility make sure that same file is not extracted twice and hence saving valuable time in case the to-be-extracted excel file is huge. However, this behavior can be overridden by user. If you want to re-extract same file again, use ?force option to do the same.

  • OS Support: All OS
  • Excel format support: xls (97-2003) and xlsx formats.
  • Minimum Java Version: 1.6.0.45 onwards.

Resources

Support

Sumeet Sharma is a software engineer working with Informatica R&D.