@tomos: Well, that's called scraping; as for the web-development browser-plugins mentioned by Shades, that's a new idea for me but he says, "show you lots of content and how it is linked", and that's spot-on, since it's easy there to identify the elements, which is the very first step.
Then, of course, you will want to retrieve those elements respectively their respective, individual content, and it's common understanding now that trying to do it all with regex isn't only extremely difficult but impossible, even the same webpage (technical) "layout" differing too much from one occurrence to another, in many, many cases, not speaking of them varying the code itself. An example would be some database output, let's say classifieds (ie classified ads). If the person having authored the ad hadn't filled out all the fields in the web form of the owner of the web site, it's quite aleatoric how the web page's technical layout's programmers will handle this case in the output, and that can even differ from field to field, and from field combination to field combination, i.e. if the balises are simply left out, come now in other within-the-"<"-combinations, or are even grouped differently, ie even the construction can change in parts overall, and that's why they say you must do some scripting, or must have done some scripting, by some tool.
As I said in the regex thread, Goyvaert's expensive tool seems to rely on regex exclusively, while TextPipe (even more expensive) does a combination of pre-figured scripting and then regex, but you will not buy that latter tool, not only for its price but particularly because it will very probably not be as flexible/moldable as you will need it, and then at the latest you will feel remorse about the price (it's on bits here and there, so for somebody really interested but not needing it immediately...).
I script those things, introducing the necessary regexes wherever needed, and with lots of logical branchings wherever needed, ie I don't try to anticipate too much "regular" construction / foreseeable combinations, but I try to identify every element's content one-by-one, so as to make it as "context-proof" as it gets.
Since you obviously program: Does that language you use to "write stuff" lends itself to scripting, too? You see, I strongly advocate doing it yourself (with doing it all in the FF-and-other-browser extensions Shades mentioned, I don't have any experience, so that's possibly a good idea to peek in their respective possibilities indeed), instead of using some tool, but that being said, there are some dedicated scraping tools* of which I don't remember the respective names but of which I remember they're quite expensive; they are more or less like TextPipe but more specialized, ie they have been conceived for analyzing / retrieving content of web pages in particular, and they all follow the paradigm "scripting over trying to do it with regex, use the latter only for the tasks in the script where regex displays its strengths": it's all about making your scraping more or less "robust", ie insensitive to minor variants in the web pages's source code.
*: I do not speak of web site crawlers here but of programs which allow for running scripts (and very probably helping with writing them to begin with) on text files; some of them will also help with getting the necessary source code of the web page into the file which is then analyzed and processed, but that's not their main reason; crawlers in contrast are specialized in following links ("scraping the site"), and the better ones let you program/script how they do that / which links to follow; they aren't for then analyzing the resulting text bodies.
Since I didn't want to spend the money for TextPipe, had some scripting knowledge and was willing to learn the necessary amount of regex - as said, you better do by scripting the bulk of your work anyway, so the regex you must learn isn't that high-brow as it is for people who see regex as an art form and strive to do anything possible in that special query language -, I always do it all "manually" and don't see any need to buy/learn/use some special tool anymore; as soon as you have a script, you have logic, which means you can check for things or their absence (or even some variants, by regex then), and that means you can have your script react accordingly and smoothly; trying to do similar things in e.g. TextPipe is probably possible, but I see lots of fuss arising in trying to get the same results by "clicking them together", or then, you end up applying some (other) scripting language within such a tool, in order to get to your ends, which again rises the question, why then use any such tool to begin with.
More advice welcome if, which is possible, I have overlooked some useful aspects, and as said, Shades' advice to use browser add-ins for faster element grouping/identifying is good advice; there are special editors for that, also (but which aren't necessary), for example Code Lobster (here and there, available as a freebie), and then, some regular text editors have got such functionality, too. I don't exclusively rely on these, though, but I regularly copy the relevant element groups out into multiple new, smaller files, in order to have them isolated: Everything which battles complexity is a good thing when it doesn't withhold relevant info at the same time.