topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Friday March 29, 2024, 5:05 am
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Last post Author Topic: Text Parsing and Output ( Result in Excel, load multiple DOC,RTF,Word Files )  (Read 36509 times)

Ath

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,612
    • View Profile
    • Donate to Member
Updated ScriptLineCounter to version 1.4.0.0

What's new/changed:
  • Added: PDF read capability
  • Added: Parsing FileFormat 3, as supplied in pdf, nearly correct, to be validated
  • Added: -oc (OutputContent) option, no ini setting, displays all read file-content to console when -v also is set
  • Added: -im (IgnoreMinimalScore) option, no ini setting
  • Added: -xe option, Extra Info sheet to excel file, listing all files, the recognition percentage and the file-format detected
  • Improved: Overhauled GUI options and layout, added url-label with link to DC-forum thread

TODO: (in priority-order)
  • Improve and expand the GUI interface (partially done)
  • Possibly add headers and footers to the output
  • Add some unexpected features :)
  • Fix any bugs or issues reported (2 until now :o)
  • Handle pdf files for input
  • Handle extra file-format extracted from samples received
  • Write a Readme.txt file
  • Create a GUI interface

Download:

Ath

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,612
    • View Profile
    • Donate to Member
Updated ScriptLineCounter to version 1.4.1.0

What's new/changed:
  • Fixed: Exception when generating output but no episodes where found (no files?)
  • Improved: If a .doc file is actually a disguised .rtf file, then read it like rtf, and same for .docx
  • Improved: Remove non-breaking spaces from character names, as found in some .doc files
  • Added: ScriptLineCounter.exe built using Launch4j, to avoid having a Command prompt open during run, also disables Console output
  • Added: Messagebox feature when running from .exe, messages to console shown as messagebox when needed
  • Added: ScriptLineCounter-CharacterMapping.properties settings file to merge multiple characters into 1, for resolving some typo's, supports Unicode
  • Added: ScriptLineCounter-CharacterNames.properties settings file to replace [CharacterNames] section in ini file, supports Unicode
  • Added: ScriptLineCounter-IgnoreCharacters.properties settings file to replace [IgnoreCharacters] section in ini file, supports Unicode

TODO: (in priority-order)
  • Improve and expand the GUI interface (partially done)
  • Possibly add headers and footers to the output
  • Add some unexpected features :)
  • Fix any bugs or issues reported (3 until now :o)
  • Handle pdf files for input
  • Handle extra file-format extracted from samples received
  • Write a Readme.txt file
  • Create a GUI interface

Download:

Screenshot:
A new screenshot would be appropriate, as it has been changed quite a bit since the previous one was taken

Screenshot - 07-04-2012 , 15_51_53.pngText Parsing and Output ( Result in Excel, load multiple DOC,RTF,Word Files )

The Console section is marked to emphasize that it's disabled when running the .exe, as that has no Console capabilities.

Ath

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,612
    • View Profile
    • Donate to Member
Updated ScriptLineCounter to version 1.4.2.0

What's new/changed:
  • Added: OpenOffice/LibreOffice .odt and .ods read capability (minimal support)
  • Added: More (optional) sheets if -xe specified: Name mappings and Ignored names
  • Improved: Extra info sheet now shows percentages with 1 decimal place, and has extra columns for text-lines found and lines recognized
  • Improved: Refactorings in code

TODO: (in priority-order)
  • Replace own logging system by log4j (already required/used by some libs)
  • Improve and expand the GUI interface (partially done)
  • Possibly add headers and footers to the output
  • Add some unexpected features :)
  • Fix any bugs or issues reported (3 until now :o)
  • Handle pdf files for input
  • Handle extra file-format extracted from samples received
  • Write a Readme.txt file
  • Create a GUI interface

Download:

Ath

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,612
    • View Profile
    • Donate to Member
Updated ScriptLineCounter to version 1.5.0.0

What's new/changed:
  • Changed: Replaced own simple logging system by calls to Log4J (infrastructure already required for other libs)

TODO: (in priority-order)
  • Improve and expand the GUI interface (partially done)
  • Possibly add headers and footers to the output
  • Add some unexpected features :)
  • Fix any bugs or issues reported (3 until now :o)
  • Replace own logging system by log4j (already required/used by some libs)
  • Handle pdf files for input
  • Handle extra file-format extracted from samples received
  • Write a Readme.txt file
  • Create a GUI interface

Download:

Ath

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,612
    • View Profile
    • Donate to Member
Updated ScriptLineCounter to version 1.5.1.0

What's new/changed:
  • Changed: Updated POI Library 3.8-beta6 by 3.8 Release (You'll need the Initial/Full install)

TODO: (in priority-order)
  • Improve and expand the GUI interface (partially done)
  • Possibly add headers and footers to the output
  • Add some unexpected features :)
  • Fix any bugs or issues reported (3 until now :o)
  • Replace own logging system by log4j (already required/used by some libs)
  • Handle pdf files for input
  • Handle extra file-format extracted from samples received
  • Write a Readme.txt file
  • Create a GUI interface

Download:

Ath

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,612
    • View Profile
    • Donate to Member
Updated ScriptLineCounter to version 1.6.0.0

What's new/changed:
  • Added: New FileFormat nr. 4, based on doc-files supplied by Saira, the original initiator of this tool.
      note: This format is quite similar to FileFormat 2, so it may need to be forcibly used on some files/filesets.
  • Added: Setting the FileFormat from the commandline using -ff parameter, or set from the ini file, as documented in the readme.
  • Changed: GUI now has the Fileformat combo box enabled, to force a specific file format to be used.
  • Improved: Some more robustness while handling the file contents.
  • Added: A warning in the readme file to NOT use Windows Notepad for editing the properties files, as it inserts a BOM in UTF-8 files, not supported by SLC.

TODO: (in priority-order)
  • Improve and expand the GUI interface (partially done)
  • Add some unexpected features :)
  • Fix any bugs or issues reported (3 until now :o)
  • Possibly add headers and footers to the output
  • Replace own logging system by log4j (already required/used by some libs)
  • Handle pdf files for input
  • Handle extra file-format extracted from samples received (4 file-formats supported now)
  • Write a Readme.txt file
  • Create a GUI interface

Download:

Ath

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,612
    • View Profile
    • Donate to Member
Updated ScriptLineCounter to version 1.7.0.0

What's new/changed:
  • Added: New FileFormat nr. 5, for reading data from CSV files (comma-separated or tab-separated), having a first row with column-names, and a "Name" and optional "Text" column. Columnnames can be configured from the command-line and .ini, details in the readme file.
  • Changed: The -pe encoding parameter is now (also) applied for reading .txt and .csv files, if it has been set/changed from the command-line or GUI.

TODO: (in priority-order)
  • Fix any bugs or issues reported (3 until now :o)
  • Improve and expand the GUI interface (partially done)
  • Add some unexpected features :)
  • Handle extra file-format extracted from samples received (5 file-formats supported now)
  • Possibly add headers and footers to the output
  • Replace own logging system by log4j (already required/used by some libs)
  • Handle pdf files for input
  • Write a Readme.text file
  • Create a GUI interface

Download:

Ath

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,612
    • View Profile
    • Donate to Member
Updated ScriptLineCounter to version 1.7.1.0

What's new/changed:
  • Added: The possibility to have .doc/.docx files processed like csv (file-format 5) if the content is a table with proper headings (use -cn and -ct command-line options to set the used headers)
  • Added: Command-line (-ci <lines>) and ini (skiplines) and gui option to ignore the first n lines of a file
  • Added: Command-line (-cm <max>) and ini (maxcharacternameparts) and gui option to specify the number of words a character-name can consist of. Default is 4
  • Changed: When using debug-level 3 or 4 (-3 / -4 command-line options) the files read are saved as text with the same name appended with .txt.tmp extension, useful for inspecting how SLC 'sees' the file

TODO: (in priority-order)
  • Fix any bugs or issues reported (3 until now :o)
  • Improve and expand the GUI interface (partially done)
  • Add some unexpected features :)
  • Handle extra file-format extracted from samples received (5 file-formats supported now)
  • Possibly add headers and footers to the output
  • Replace own logging system by log4j (already required/used by some libs)
  • Handle pdf files for input
  • Write a Readme.text file
  • Create a GUI interface

Updated GUI screenshot:
Screenshot - 15-06-2013 , 12_41_18.pngText Parsing and Output ( Result in Excel, load multiple DOC,RTF,Word Files )

Download:

Ath

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,612
    • View Profile
    • Donate to Member
Updated ScriptLineCounter to version 1.7.2.0

What's new/changed:
  • Added: Fileformat 5 parameters -cn and -ct now also accept column-numbers instead of a name. First column is 1.
  • Improved: GUI now has fields for Name and Text columns for Fileformat 5. There also the column numbers can be entered.

TODO: (in priority-order)
  • Fix any bugs or issues reported (4 until now :o)
  • Improve and expand the GUI interface (partially done)
  • Add some unexpected features :)
  • Handle extra file-format extracted from samples received (5 file-formats supported now)
  • Possibly add headers and footers to the output
  • Replace own logging system by log4j (already required/used by some libs)
  • Handle pdf files for input
  • Write a Readme.text file
  • Create a GUI interface

Updated GUI screenshot:
Screenshot - 20-06-2013 , 15_32_08.pngText Parsing and Output ( Result in Excel, load multiple DOC,RTF,Word Files )

Download:
« Last Edit: June 20, 2013, 08:39 AM by Ath »

Ath

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,612
    • View Profile
    • Donate to Member
Updated ScriptLineCounter to version 1.7.2.1

What's new/changed:
  • Fixed: Input of Name and Text column name was not transferred correctly to the running instance from the GUI.
  • Improved: GUI layout was a bit stretched.

TODO: (in priority-order)
  • Fix any bugs or issues reported (5 until now :o)
  • Improve and expand the GUI interface (partially done)
  • Add some unexpected features :)
  • Handle extra file-format extracted from samples received (5 file-formats supported now)
  • Possibly add headers and footers to the output
  • Replace own logging system by log4j (already required/used by some libs)
  • Handle pdf files for input
  • Write a Readme.text file
  • Create a GUI interface

Download:

Contro

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 3,940
    • View Profile
    • Donate to Member
Nice work Ath.

I will comment.  :-*