ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

DonationCoder.com Software > Post New Requests Here

IDEA Data extraction and convert to CSV

<< < (2/4) > >>

Ath:
I've come to this, so far:
Input:

--- ---fieldname1@data1
fieldname2@data2
fieldname3@data3
fieldname4@data4
fieldname1@data1
fieldname2@data2
fieldname3@data3
fieldname4@data4
fieldname1@data1
fieldname3@data3
fieldname4@data4
Output:

--- ---fieldname1,fieldname2,fieldname3,fieldname4
data1,data2,data3,data4
data1,data2,data3,data4
data1,,data3,data4
And after adding a few commas in the field3 data:

--- ---fieldname1,fieldname2,fieldname3,fieldname4
data1,data2,"data,,3",data4
data1,data2,"data,3",data4
data1,,"data,,,3",data4

Missing columndata sample:

--- ---fieldname1@data1
fieldname2@data2
fieldname4@data4
fieldname1@data1
fieldname3@data3
fieldname4@data4
fieldname1@data1
fieldname3@data3
With this output: (attention: columns are in the order found in the input!)

--- ---fieldname1,fieldname2,fieldname4,fieldname3
data1,data2,data4,
data1,,data4,data3
data1,,,data3

magician62:
That looks like it will work. The order of result columns is not important, as the columns can be re-ordered or handled in a query later.

Ath:
So, I've written this in Java, and running it requires a Java runtime environment to be installed on your computer (Windows or Linux).

Doc2CSV
Version: 1.1.0.0
Released: 2017-11-28

Purpose
Convert a 'column@data' formatted file to a .csv file.

First column name found triggers a 'next record'.
Columns are in the order they are found in the file, so adding a complete first record gives you control over column ordering.
Empty lines, or lines without the @ separator are ignored.
Whats new:
Separator and delimiter can be changed using command-line parameters.
Outputfile can be specified from the command-line.

Installation

* Download the zip file attached to this post
* Download and install an Oracle Java JRE (I've been testing with Java 8, but Java 9 should work AFAICS)
* Unzip the contents of the zipfile to it's own directory (keeping the directory structure intact, on upgrade: overwrite all files)
* On Linux: rename the start-run.sh script to start-run and make it executable
Running the conversion

* Open a command-prompt in the application directory
* Export the .doc file as text
* execute: start-run {exported-text-file} (a full path/filename can be used, see below for optional parameters)
* Wait for processing to complete (should be a few seconds)
* Check the output in the created file (same as the inputfile but with .csv appended, any previous outputfile will be overwritten)
Command-line parameters

* -s {separator character} : specifies the separator character used in the input file
* -d {delimiter character} : specifies the delimiter character used in the output file
* {input filename} : first non option (-s or -d) specified, required!
* {output filename} second non option (-s or -d) specifiedNB: The separator or delimiter character support specifying \t for using the Tab character.
NB: If {output filename} is not specified or equal to {input filename} (name compare only!), then {input filename}.csv is used.
NB: The order of parameters is not important, except for the {input filename} and {output filename}.

Possible enhancements

* Make the column separator (,) configurable and/or a parameter
* Make the input-data separator (@) configurable and/or a parameter
* Do you need a GUI? It can be added but will take some time
* Process .doc files directly (rather easy, but because of response time not done yet)

magician62:
Many thanks, have downloaded. As it is late here at the moment, and I need to be up early, I will likely test later tomorrow.

magician62:
So decided to have a little play, few issues, but that was my source data! :)

No need for process from Doc.

GUI would I assume make frequent use easier, and I can see the use growing over time.
Input and column separator options would also be good.

For the moment I have placed the text source in the same folder as the util, which only takes a few seconds.

My test used 9 records, 3 of which didn't process correctly due to issues. One was found to be a duplicate. A bonus!

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version