hi, gang, great forum! i produce an alphabetical list manually using several different programs but thought i'd see if i could automate the task.
a folder of my tv listing program has 2 kinds of files in it but i want to extract the names of tv shows from files matching the filename ScheduleData*.xml. here's a small chunk of the contents of 1 of the files:
<ch ChNo="15"><show Aff="" CId="28456692" PId="28410472" Title="Public Access" CLetter="PUAC015" STime="05/09/2011 14:00:00" Dur="240" Rep="N" New="" Logo="" Prem="" Fin=""><Categories><Category Id="1" /><Category Id="113" /></Categories></show></ch><ch ChNo="16"><show Aff="PBS" CId="28455507" PId="188545708" Title="WordWorld" CLetter="WPTD" STime="05/09/2011 15:30:00" Dur="30" Rep="Y" New="N" Logo="" Prem="" Fin=""><Categories><Category Id="1" /><Category Id="3" /><Category Id="7" /><Category Id="105" /><Category Id="106" /><Category Id="304" /><Category Id="702" /><Category Id="706" /><Category Id="1911" /></Categories></show></ch><ch ChNo="17"><show Aff="CBS" CId="28457103" PId="258095235" Title="The Price Is Right" CLetter="WHIO" STime="05/09/2011 15:00:00" Dur="60" Rep="N" New="Y" Logo="" Prem="" Fin=""><Categories><Category Id="7" /><Category Id="707" /><Category Id="1911" /></Categories></show></ch><ch ChNo="18"><show Aff="" CId="28455318" PId="28436365" Title="Information Channel" CLetter="INFO018" STime="05/09/2011 14:00:00" Dur="240" Rep="Y" New="N" Logo="" Prem="" Fin=""><Categories><Category Id="1" /><Category Id="113" /></Categories></show></ch>
the only data i want to extract are between Title= and CLetter markers, which i've bolded above.
what i want is a text file with all of those Titles from about 750 .xml files in alphabetical order (there will be 10s of thousands), stripping the quotes around the names and the Title= tag, eliminating all duplicates (about 35,000), and replacing & with a real ampersand, and open and close quotes with ordinary keyboard quotes " so that from the above sample i'd end up with a file that just has this in it:
Information Channel
Public Access
The Price Is Right
WordWorld
at present i use
ReplaceText to isolate the Title= lines with carriage returns,
Catview2000 to extract those lines to a new file, and
TextPad to sort and eliminate duplicate titles. 2 of those programs are no longer being developed, the 3rd is unregistered shareware. since it is some trouble, i only make a new list occasionally.