avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Wednesday November 25, 2020, 5:08 pm
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: On the lack of standardisation in "tagging" .  (Read 1832 times)


  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 7,533
  • Slartibartfarst
    • View Profile
    • Read more about this member.
    • Donate to Member
On the lack of standardisation in "tagging" .
« on: May 11, 2013, 01:21 AM »
On the lack of standardisation in "tagging" .
I use the Firefox add-on Scrapbook to capture/save web-page snippets or whole webpages.
Scrapbook is very handy as I can sort/search through the saved data using the Scrapbook index/search. I could also use the Win7 index/search function, though it is usually easiest/fastest to search using Scrapbook, as I can then open the page material, and add text notes (which are also searchable) or stick-it type notes (which do not seem to be searchable), all within the browser.

Though the Scrapbook saved file formats are easily accessed using a browser, I have been exploring the use of .mht and .maff files as alternative ways of capturing/archiving the web page data, using Mozilla Archive Format or UnMHT.
The .mht (stands for MIME HTML, I gather) format standard seems to have been around for a while, but is relatively little-used. There is an MHT viewer in Win7, and Win7 also has the Problem Step Recorder - which generates an .mht file and will even email it if you want - if you inspect the contents of the file, you will see that it is set up as an email. You can read .mht files in the MHT viewer or in a browser (but not necessarily always with identical results!).

When you go exploring, you invariably risk discovering something that you did not know about before. I stumbled upon something described as a potential "alternative" to using Evernote, though the alternative was not of itself of much interest to me. The alternative was in the shape of the FF add-on TagSpaces, but the way in which it enabled you to explore and tag your files - i.e., including .mht files - in a "view" of the directories on the hard disk - was interesting.
For tagging, it simply appended the tag to the filename, enclosed in square brackets.

There are probably over a hundred metadata fields (columns) that you could use for files in Win7, and you can view these as columns in Windows Explorer. Why did the developer not use one of those? Well, I haven't asked the question of the author, but I presume the answer would be "It's just too hard due to the multiplicity and inconsistency of standards, so I developed my own standard using the filename", or something along those lines.
For example, this tagging method is applied by TagSpaces to the filenames of image files also, thus avoiding using the the "Keywords" IPTC metadata field in the image file for tagging.
So, thinking of the image files, I checked it up and established that Google Picasa uses the Keywords field for tagging and it uses other image file metadata fields for other things - e.g., the face recognition tags are apparently now put (or can be put) into an "XMP" tag. (I don't know much about XMP.)
But it seems that both Picasa and Windows Live Photo Gallery use those photo-recognition and other tags too, but not always in the same way, because they are apparently each based on a different "interpretation" of the relevant standards.

In fact, this image tagging might not conform to a consistent standard by any significant group of photo-tagging software. I am unsure whether this would be deliberate.
If you want to read more on this, refer these links (the last one is probably the most informative):

It's rather a confused picture, and some of it seems to explain why I had to invest many hours in restoring tags to my image files a couple of years back, following a Picasa update. Yet it also indicates that Picasa would seem to be one of the more "stable" in this regard.    :o

"The nice thing about standards is that you have so many to choose from."
 - Andrew S. Tanenbaum, Computer Networks, 2nd ed., p. 254.
« Last Edit: May 11, 2013, 01:47 AM by IainB, Reason: Minor corrections. »