DONE: HTML Garbage Tag Removal

ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

DonationCoder.com Software > Finished Programs

(1/2) > >>

pulphero:
In constructing ebooks, I often run into these unnecessary HTML tag pairs in files exported from InDesign:

I don’t

or:

my decision.

It always follow the pattern of followed by some random amount of text and then a closing . Deleting these tags (and there are lots of them!) manually in SublimeText is a huge time sink.

I'd like to have an AHK script or Regex code that will delete these specific tags but not the text between them, leaving, for example:

I don't
my decision.

Although it seems a simple problem, I've not been able to come up with anything that works. Thanks!

MilesAhead:
How big are the files? Perhaps this web applet is good enough?

http://www.striphtml.com

Edit: Hmm, seems to do nothing. Perhaps it only works if there are html header or body tags. I don't know.

Also I got this from "sed one-liners site:"

# remove most HTML tags (accommodates multiple-line tags)
sed -e :a -e 's/<[^>]*>//g;/</N;//ba'

sed is a very powerful free stream editor. It can do many things way faster than an interactive edit session. There are free versions for Windows:
http://gnuwin32.sourceforge.net/packages/sed.htm

The page of "one-liner" sed scripts:
http://sed.sourceforge.net/sed1line.txt

The idea is the file to be modified is fed into sed via command line redirection usually, and the output redirected to a new file. It modifies the file in one shot.

pulphero:
StripHTML is a great app, but I believe it takes out ALL the HTML tags. I need only a specific opening/closing tag removed.

The files aren't that big. Most are a few hundred lines of code. For each book, there's usually two or three dozen files.

I'm really just looking for a dirt-simple AHK script or regex code. Doubt I'm savvy enough to handle something like SED.

MilesAhead:
StripHTML is a great app, but I believe it takes out ALL the HTML tags. I need only a specific opening/closing tag removed.

The files aren't that big. Most are a few hundred lines of code. For each book, there's usually two or three dozen files.

I'm really just looking for a dirt-simple AHK script or regex code. Doubt I'm savvy enough to handle something like SED.

-pulphero (June 03, 2017, 10:11 AM)
--- End quote ---

Why not grab one of those freeware "regex tester" programs? You put in sample text, and a regex. Hit the Go Button and it shows the results. Most regex I get by trial and error myself. I don't use it often enough to predict what will happen.

Here's one from sourceforge but there are a bunch of them out there:
https://sourceforge.net/projects/regextester/

Ath:
Well,

I tried (with) a trial version of Sublime Text (v3 build 3126 x64), and I cooked this regex to remove the extra span tags:

--- ---Find What: (?m)<span\sclass="no-break">(.*?)<\/span>(.*?)
Replace With: \1\2
Be sure to enable the Regular expression option (Alt-R from the Replace screen). Switching on Wrap is required too.

Navigation

[0] Message Index

[#] Next page

Go to full version