topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Thursday March 28, 2024, 6:04 pm
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: Search for any string in a web (even in the code : hyperlinks, etc) TOOLS for  (Read 3776 times)

Contro

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 3,940
    • View Profile
    • Donate to Member
Where can i find good tools to search inside a web any code or content ?

Best Regards
 :-*

Shades

  • Member
  • Joined in 2006
  • **
  • Posts: 2,922
    • View Profile
    • Donate to Member
Web development plugin(s) for your favorite browser. That should show you lots of content and how it is linked. Of course, it helps a lot if you are versed in HTML, CSS, Javascript...etc.

ital2

  • Member
  • Joined in 2017
  • **
  • default avatar
  • Posts: 115
    • View Profile
    • Donate to Member
@tomos: Well, that's called scraping; as for the web-development browser-plugins mentioned by Shades, that's a new idea for me but he says, "show you lots of content and how it is linked", and that's spot-on, since it's easy there to identify the elements, which is the very first step.

Then, of course, you will want to retrieve those elements respectively their respective, individual content, and it's common understanding now that trying to do it all with regex isn't only extremely difficult but impossible, even the same webpage (technical) "layout" differing too much from one occurrence to another, in many, many cases, not speaking of them varying the code itself. An example would be some database output, let's say classifieds (ie classified ads). If the person having authored the ad hadn't filled out all the fields in the web form of the owner of the web site, it's quite aleatoric how the web page's technical layout's programmers will handle this case in the output, and that can even differ from field to field, and from field combination to field combination, i.e. if the balises are simply left out, come now in other within-the-"<"-combinations, or are even grouped differently, ie even the construction can change in parts overall, and that's why they say you must do some scripting, or must have done some scripting, by some tool.

As I said in the regex thread, Goyvaert's expensive tool seems to rely on regex exclusively, while TextPipe (even more expensive) does a combination of pre-figured scripting and then regex, but you will not buy that latter tool, not only for its price but particularly because it will very probably not be as flexible/moldable as you will need it, and then at the latest you will feel remorse about the price (it's on bits here and there, so for somebody really interested but not needing it immediately...).

I script those things, introducing the necessary regexes wherever needed, and with lots of logical branchings wherever needed, ie I don't try to anticipate too much "regular" construction / foreseeable combinations, but I try to identify every element's content one-by-one, so as to make it as "context-proof" as it gets.

Since you obviously program: Does that language you use to "write stuff" lends itself to scripting, too? You see, I strongly advocate doing it yourself (with doing it all in the FF-and-other-browser extensions Shades mentioned, I don't have any experience, so that's possibly a good idea to peek in their respective possibilities indeed), instead of using some tool, but that being said, there are some dedicated scraping tools* of which I don't remember the respective names but of which I remember they're quite expensive; they are more or less like TextPipe but more specialized, ie they have been conceived for analyzing / retrieving content of web pages in particular, and they all follow the paradigm "scripting over trying to do it with regex, use the latter only for the tasks in the script where regex displays its strengths": it's all about making your scraping more or less "robust", ie insensitive to minor variants in the web pages's source code.

*: I do not speak of web site crawlers here but of programs which allow for running scripts (and very probably helping with writing them to begin with) on text files; some of them will also help with getting the necessary source code of the web page into the file which is then analyzed and processed, but that's not their main reason; crawlers in contrast are specialized in following links ("scraping the site"), and the better ones let you program/script how they do that / which links to follow; they aren't for then analyzing the resulting text bodies.

Since I didn't want to spend the money for TextPipe, had some scripting knowledge and was willing to learn the necessary amount of regex - as said, you better do by scripting the bulk of your work anyway, so the regex you must learn isn't that high-brow as it is for people who see regex as an art form and strive to do anything possible in that special query language -, I always do it all "manually" and don't see any need to buy/learn/use some special tool anymore; as soon as you have a script, you have logic, which means you can check for things or their absence (or even some variants, by regex then), and that means you can have your script react accordingly and smoothly; trying to do similar things in e.g. TextPipe is probably possible, but I see lots of fuss arising in trying to get the same results by "clicking them together", or then, you end up applying some (other) scripting language within such a tool, in order to get to your ends, which again rises the question, why then use any such tool to begin with.

More advice welcome if, which is possible, I have overlooked some useful aspects, and as said, Shades' advice to use browser add-ins for faster element grouping/identifying is good advice; there are special editors for that, also (but which aren't necessary), for example Code Lobster (here and there, available as a freebie), and then, some regular text editors have got such functionality, too. I don't exclusively rely on these, though, but I regularly copy the relevant element groups out into multiple new, smaller files, in order to have them isolated: Everything which battles complexity is a good thing when it doesn't withhold relevant info at the same time.

Contro

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 3,940
    • View Profile
    • Donate to Member
Web development plugin(s) for your favorite browser. That should show you lots of content and how it is linked. Of course, it helps a lot if you are versed in HTML, CSS, Javascript...etc.
I am trying Shades. Even when a presencial forum is developed in my city. The last TLP2k17 Innova .
I suppose my problem is the same everybody has when use these tools once or twice a year....

The memory !  ;D

Contro

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 3,940
    • View Profile
    • Donate to Member
@tomos: Well, that's called scraping; as for the web-development browser-plugins mentioned by Shades, that's a new idea for me but he says, "show you lots of content and how it is linked", and that's spot-on, since it's easy there to identify the elements, which is the very first step.
I have seen some tools for this, but what I have seen only search or read the opened webpage , but not all the web or page.
By instance. My blog have several pages, I can search in every page, but can I search in all pages at the same time ?


Then, of course, you will want to retrieve those elements respectively their respective, individual content, and it's common understanding now that trying to do it all with regex isn't only extremely difficult but impossible, even the same webpage (technical) "layout" differing too much from one occurrence to another, in many, many cases, not speaking of them varying the code itself. An example would be some database output, let's say classifieds (ie classified ads). If the person having authored the ad hadn't filled out all the fields in the web form of the owner of the web site, it's quite aleatoric how the web page's technical layout's programmers will handle this case in the output, and that can even differ from field to field, and from field combination to field combination, i.e. if the balises are simply left out, come now in other within-the-"<"-combinations, or are even grouped differently, ie even the construction can change in parts overall, and that's why they say you must do some scripting, or must have done some scripting, by some tool.

I only wants explore my blogs , so I think is no problem.

As I said in the regex thread, Goyvaert's expensive tool seems to rely on regex exclusively, while TextPipe (even more expensive) does a combination of pre-figured scripting and then regex, but you will not buy that latter tool, not only for its price but particularly because it will very probably not be as flexible/moldable as you will need it, and then at the latest you will feel remorse about the price (it's on bits here and there, so for somebody really interested but not needing it immediately...).

I didn't know you can search inside a web page with regex tools like TextPipe or others .


I script those things, introducing the necessary regexes wherever needed, and with lots of logical branchings wherever needed, ie I don't try to anticipate too much "regular" construction / foreseeable combinations, but I try to identify every element's content one-by-one, so as to make it as "context-proof" as it gets.

Interesting

[/quote]Since you obviously program: Does that language you use to "write stuff" lends itself to scripting, too? You see, I strongly advocate doing it yourself (with doing it all in the FF-and-other-browser extensions Shades mentioned, I don't have any experience, so that's possibly a good idea to peek in their respective possibilities indeed), instead of using some tool, but that being said, there are some dedicated scraping tools* of which I don't remember the respective names but of which I remember they're quite expensive; they are more or less like TextPipe but more specialized, ie they have been conceived for analyzing / retrieving content of web pages in particular, and they all follow the paradigm "scripting over trying to do it with regex, use the latter only for the tasks in the script where regex displays its strengths": it's all about making your scraping more or less "robust", ie insensitive to minor variants in the web pages's source code.[/quote]

I would like to know the names of the free ones to search the web. Do you know anyone ?


*: I do not speak of web site crawlers here but of programs which allow for running scripts (and very probably helping with writing them to begin with) on text files; some of them will also help with getting the necessary source code of the web page into the file which is then analyzed and processed, but that's not their main reason; crawlers in contrast are specialized in following links ("scraping the site"), and the better ones let you program/script how they do that / which links to follow; they aren't for then analyzing the resulting text bodies.

I want the names.

Since I didn't want to spend the money for TextPipe, had some scripting knowledge and was willing to learn the necessary amount of regex - as said, you better do by scripting the bulk of your work anyway, so the regex you must learn isn't that high-brow as it is for people who see regex as an art form and strive to do anything possible in that special query language -, I always do it all "manually" and don't see any need to buy/learn/use some special tool anymore; as soon as you have a script, you have logic, which means you can check for things or their absence (or even some variants, by regex then), and that means you can have your script react accordingly and smoothly; trying to do similar things in e.g. TextPipe is probably possible, but I see lots of fuss arising in trying to get the same results by "clicking them together", or then, you end up applying some (other) scripting language within such a tool, in order to get to your ends, which again rises the question, why then use any such tool to begin with.

Good free alternatives to TextPipe ?

More advice welcome if, which is possible, I have overlooked some useful aspects, and as said, Shades' advice to use browser add-ins for faster element grouping/identifying is good advice; there are special editors for that, also (but which aren't necessary), for example Code Lobster (here and there, available as a freebie), and then, some regular text editors have got such functionality, too. I don't exclusively rely on these, though, but I regularly copy the relevant element groups out into multiple new, smaller files, in order to have them isolated: Everything which battles complexity is a good thing when it doesn't withhold relevant info at the same time.

Running to try Code Lobster of course.

Best Regards


Contro

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 3,940
    • View Profile
    • Donate to Member
 ;D

Installed
lobster

How can I search or see all the code of one of my blogs. By example http://ingenierosten...rife.blogspot.com.es
 :-*

tomos

  • Charter Member
  • Joined in 2006
  • ***
  • Posts: 11,959
    • View Profile
    • Donate to Member
@tomos
:tellme: I'm not in this thread :huh:
well,
I wasn't before now anyways ;-)
Tom

Contro

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 3,940
    • View Profile
    • Donate to Member
@tomos
:tellme: I'm not in this thread :huh:
well,
I wasn't before now anyways ;-)
I am so sad. Any day I want to be tomos  :-*