Tomos Method simplifying
Manipulating text splitting a word file in almos impossible. xml files are difficult....
Seems more easy try with a pdf file.
So I convert the rtf file to pdf.
Then i can split in pages.
But seems better try to split by bookmarks.
But the original rtf have no bookmarks. With adobe I obtain a lot of useless bookmarks.
If I do manual I have to create less than 10 bookmarks.
So I did.
SourceForge present several free pdf splitters by bookmarks. But all my trials fails.
So I try pdfsam-basic as open source.
I have 9 bookmarks but only obtain 7 splitted pdf. What happens ?
pdfsam , and I think others too what really does is split the pdf by the page that contains the bookmark. If a page have two bookmarks we only obtain one pdf....
So really split a pdf file by bookmark is a split by a range of pages. Really there is not a extract of text of anykind.
What's next ?
Manipulate the rtf file to make certain each bookmark is in a differente page.
Create bookmarks and try again.
This time goes well.
I use the bookmark name to identify the splitted files.
Extracting paragraphs from a word file (doc, rtf, docx)What's next
PDFsam-basic works very well.
I think this in general :
Splitting a word file by bookmarks or strings always will be a difficult operation and the results impredictible when inserting the new files in the master documents , specially problems with formatting...
Splitting a pdf file by content (text, bookmark, strings, ....) is interpreted as split by the page that contained the string.
Commercial software offer specially split a big file recognizing the account number to seperate invoices...
The very expensive engines to combines documents usually uses C#, Java, VB, etc and Visual Studio. But I Think are native documents generating, not modifying that is my purpose.
So the Ath Tomos most simplest way is prepare a little the word document.
I haven't found an interactive bookmarks generator for a pdf file. Only automatic and depending of the target files the results may be not the expected ones.
I will try the new actions wizard from Adobe acrobat DC.
I have observ that if we want to mix is better in pdf format. But usually if we have a TOC or bookmarks from the word file finally we'll lose everything except the last merge or combination.
So I need renumber the final document and generate bookmarks for the final document.
This have been an exercise with one of the documents to be inserted.
I have to do this with about 10 documents. ...