topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Thursday March 28, 2024, 10:02 am
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: How do I get rid of hidden characters like †in text file ?  (Read 28731 times)

patteo

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 437
    • View Profile
    • Read more about this member.
    • Donate to Member
I have clipped some text of a webpage and it appears to contain just text.

However, when I examine it with different utilities like AptEdit or Metapad, I notice that there are hidden characters like â€.

When I load the file into Windows Notepad, I cannot see them.

They are probably some hidden control codes.

Does anyone know of any utility that I can run the file through that will clean it of such hidden characters.

It would be nice if the utility accepts commandline commands as well so I need not open the file and can run it as a batch file as well.

40hz

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 11,857
    • View Profile
    • Donate to Member
Re: How do I get rid of hidden characters like †in text file ?
« Reply #1 on: January 03, 2009, 10:52 AM »
Grab a copy of PureText and have at it:

Link: http://www.download....2384_4-10069166.html

Publisher's description of PureText
From Steve Miller:


Have you ever copied some text from a web page or a document and then wanted to paste it as simple text into another application without getting all the formatting from the original source? PureText makes this simple by adding a new Windows hot-key (default is WINDOWS+V) that allows you to paste unformatted text to any application.

Version 2.0 adds Vista support, a new default hot key combo, optional sound, and various other visual enhancements and bug fixes.

 :Thmbsup:

PhilB66

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 1,522
    • View Profile
    • Donate to Member
Re: How do I get rid of hidden characters like †in text file ?
« Reply #2 on: January 03, 2009, 11:25 AM »
PureText is a wonderful little tool. The home page is @ http://www.stevemiller.net/puretext/. Skrommel's PlainPaste and Copy Plain Text (a firefox addon) are good alternatives.

f0dder

  • Charter Honorary Member
  • Joined in 2005
  • ***
  • Posts: 9,153
  • [Well, THAT escalated quickly!]
    • View Profile
    • f0dder's place
    • Read more about this member.
    • Donate to Member
Re: How do I get rid of hidden characters like †in text file ?
« Reply #3 on: January 03, 2009, 11:51 AM »
Hidden characters?

I'm guessing at UTF-8 encoding. Processing the documents in an editor/tool that doesn't know about non-ascii character encoding will risk ruining the documents.
- carpe noctem

40hz

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 11,857
    • View Profile
    • Donate to Member
Re: How do I get rid of hidden characters like †in text file ?
« Reply #4 on: January 03, 2009, 12:49 PM »
Hidden characters?

I'm guessing at UTF-8 encoding. Processing the documents in an editor/tool that doesn't know about non-ascii character encoding will risk ruining the documents.

I believe the latest iteration of PureText supports Unicode under Windows.

PureText is a wonderful little tool. The home page is @ http://www.stevemiller.net/puretext/. Skrommel's PlainPaste and Copy Plain Text (a firefox addon) are good alternatives.

Thanks for posting the direct link on the Steve Miller homepage . 

:Thmbsup: I opted to post download.com because I kept timing-out every time I tried to go to stevemiller.net.  Seems to be working ok now...


tomos

  • Charter Member
  • Joined in 2006
  • ***
  • Posts: 11,959
    • View Profile
    • Donate to Member
Re: How do I get rid of hidden characters like †in text file ?
« Reply #5 on: January 03, 2009, 03:17 PM »
staying slightly on-topic:-

I've sometimes seen this on webpages -
they cant display an apostrophe as in ' (I don't mean the accent`)
or they can have problems with umlauts - instead they show stuff like the †-
I'm wondering is that to do with the site in question or my browser?
(just idly wondering to be honest - and what did they use before the euro symbol came along :tellme: :P)
Tom

cyberdiva

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 1,041
    • View Profile
    • Donate to Member
Re: How do I get rid of hidden characters like †in text file ?
« Reply #6 on: January 03, 2009, 06:31 PM »
I have clipped some text of a webpage and it appears to contain just text.
However, when I examine it with different utilities like AptEdit or Metapad, I notice that there are hidden characters like â€.
I don't think these are "hidden characters"; it's more likely that the document was encoded in UTF-8, and that AptEdit and Metapad can't handle UTF-8.  I had a similar problem with text that looked normal in UltraEdit, but when I tried to import it into a different program, all kinds of strange characters appeared.  I went back into UltraEdit and saved the document as ANSI/ASCII rather than UTF-8.  Then, when I imported it again into the program that couldn't handle UTF-8, it looked normal.

patteo

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 437
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: How do I get rid of hidden characters like †in text file ?
« Reply #7 on: January 04, 2009, 06:48 AM »
Thanks all you most helpful donationcoders for your suggestions.

I tried PureText and Plainpaste and Thornsoft Clipmate (the cleaning part of the utility)

But no luck - still the same problem.

Eventually, I tracked down the problem to either " and ' characters in this particular instance and when I deleted all of them, the problem went away.

I suspect it has something to do with UTF-8 encoding as well.

Well, I imagine that something like that would not be too hard to fix but it does take some trial and error.

cyberdiva

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 1,041
    • View Profile
    • Donate to Member
Re: How do I get rid of hidden characters like †in text file ?
« Reply #8 on: January 04, 2009, 08:34 AM »
Eventually, I tracked down the problem to either " and ' characters in this particular instance and when I deleted all of them, the problem went away.
I suspect it has something to do with UTF-8 encoding as well.
I've seen this when someone cuts and pastes from MS Word or other programs that use slanted apostrophes and quotation marks rather than the straight up-and-down ' and " .   If this occurs in text over which I have some control, I'll change all the slanted stuff to straight up and down characters -- a simple search-and-replace will usually do the trick.  And yes, I think UTF-8 can represent the slanted stuff, but programs that can't handle UTF-8 will show the slanted stuff as strange characters.

Curt

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 7,566
    • View Profile
    • Donate to Member
Re: How do I get rid of hidden characters like †in text file ?
« Reply #9 on: January 04, 2009, 02:34 PM »
@ patteo:  If you are on Firefox it is as simple as installing an add'on like Copy Plain Text, or Extended Copy Menu (there is also a version for Internet Explorer), or even Auto Context, and then use the new line in the right-click context menu, "Copy As Text (without formats)".

Edited:
Another way is, to change/adjust View > Charset > ..., before copying.
« Last Edit: January 04, 2009, 02:38 PM by Curt »

f0dder

  • Charter Honorary Member
  • Joined in 2005
  • ***
  • Posts: 9,153
  • [Well, THAT escalated quickly!]
    • View Profile
    • f0dder's place
    • Read more about this member.
    • Donate to Member
Re: How do I get rid of hidden characters like †in text file ?
« Reply #10 on: January 04, 2009, 03:26 PM »
tomos: when you see that "broken utf-8" on the web, it's probably the HTML document encoding type that hasn't been set properly - all current browsers should be able to render utf-8 just fine.

cyberdiva: some editors do support unicode documents, but fail to auto-detect document encoding if the document doesn't start with a BOM... kinda similar to the broken webpages not specifying document encoding.
- carpe noctem

cyberdiva

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 1,041
    • View Profile
    • Donate to Member
Re: How do I get rid of hidden characters like †in text file ?
« Reply #11 on: January 25, 2009, 11:29 AM »
cyberdiva: some editors do support unicode documents, but fail to auto-detect document encoding if the document doesn't start with a BOM... kinda similar to the broken webpages not specifying document encoding.
Thanks for the info, f0dder.  AFAIK, the program I had trouble with most recently unfortunately does not support unicode: askSam 7, a flexible database program that I like a lot (when I don't hate it  ;D ).  There have been a number of messages on their forum complaining about the total lack of unicode support. 

f0dder

  • Charter Honorary Member
  • Joined in 2005
  • ***
  • Posts: 9,153
  • [Well, THAT escalated quickly!]
    • View Profile
    • f0dder's place
    • Read more about this member.
    • Donate to Member
Re: How do I get rid of hidden characters like †in text file ?
« Reply #12 on: January 26, 2009, 02:39 AM »
The thing wrt. Unicode is that you really ought to design your programs with it in mind from the start, it can quickly become a major bother trying to retrofit support. And if you use 3rd-party components without unicode support, you might have to include both ANSI and UNICODE stuff, massage data around in the right formats, etc. Slightly++ messy :)
- carpe noctem