ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

Main Area and Open Discussion > General Software Discussion

RegEx Vicinity Search / Regular Expressions: Search string a Near string b

<< < (2/2)

IainB:
...Not really, it's been proven that people very rarely bother to use a forum search engine prior to just firing off a new thread. ...
-4wd (October 17, 2018, 09:01 PM)
--- End quote ---
I was unaware that this was "proven", but it certainly looks that way - judging from what I've seen, anyway. However, even of you used (say) a site: search (as opposed to the cruddy internal search tool), finding and consolidating relevant/related material in the discussion threads is still likely to be an uphill battle and somewhat hit-or-miss.

DCF is often a veritable mine of useful information, with stuff to be found on various subject categories in DCF discussion threads, but a lot of it seems to be buried in or scattered across threads broken into multiple micro-sub-categories. Occasionally, I try to pull these bits and pieces together into specific higher-level category threads to provide a sort of indexed experiential knowledge-point on a specific subject category that I am interested in. The trouble there is that I am the sole author/editor of the index I created, and - as things stand - it can't be edited in a shared or collaborative fashion by other DCF members. This is a spotty, unreliable and inefficient way of accumulating/curating a knowledge base category.
Ideally, we would use a Wiki for those...    :o

ital2:
EDIT OF THE ABOVE:

matchtext
wanted(text) (which is not included in the match)
notwanted(text) (which is not included in the match)


BETTER:

You have your:

- matchstring
i.e. the string (IF it matches of course, and that's to be checked first by the regex engine), to be declared a match IF the lookbehing/lookahead condition(s) IS/are met resp. NOT met

- wantedstring(s)
i.e. the string(s) (which are not included in the match)  and which MUST be there (i.e. before resp. after the matchstring), in order to declare the (matching) matchstring a match

- unwantedstring(s)
i.e. as before but if these strings are matched there, your (originally matching) matchstring match is declared UN-successful by the regex engine (i.e. the unwantedstring "must not" be there, in the sense of "is not allowed to be there"; I insist on this fact since in some other languages, "must not" is synonym for "may be there or not, "is optional", and that's NOT the case here)

Also, let's remind of the fact that there can be, in case, a negative or positive lookbehind, and ALSO a negative or positive lookahead, if the regex engine in question doesn't stumble upon such combinations; and that of course, the 3 different strings have nothing to do with each other: the engine
- tries to match the matchstring
- if that's successful, it tries to match the lookbehind
- if that's successful AND does not discard the match (ie in case of a negative lookbehind), it tries to match the lookahead
- if successful (and does not discard the match (ie in case of a negative lookahead), the match is deemed successful, and that'll ONLY be the matchstring then (and that's the reason why the lookbehind/lookahead parentheses () are NOT counted as elements for replacements then: After having served for validating or invalidating the matchstring match, they go back into oblivion, as far as regex is concerned.

I'm insisting on these facts since, in view of lookbehinds/lookaheads being extremely simple, the obvious difficulty people have with them must lie with misconceptions they could have, around them.

/END OF EDIT



EDIT: "!" is the logical "not", so it's logical that this character is used in negative lookarounds; the "?", in regex in general, stands for "0 or 1 (!) occurrences of the preceding (!) element"; for distinguishing simple linebreaks from double / multiple ones, you'll use a complete lookaround, e.g. in "replace (?<!\n)\n(?!\n) by \n\n" which would only replace the single ones by two, but would not multiply the double or multiple ones; this is of interest e.g. for normalizing web downloads where the title for the next paragraph often clings to the previous one; of course you could implement an additional condition of a max line length, in order for that single \n to be (matched >) changed within that given line character number limit, in order for the code not to affect (most) regular sub-paragraph breaks within regular, broader paragraphs of the source material.

(EDIT: Don't forget the lookbehind-"<": it's necessary since in real life, there often would be other string parts, even before the lookbehind, needing disambiguation.)

ital2:
"Ideally, we would use a Wiki for those...". Not.

Navigation

[0] Message Index

[*] Previous page

Go to full version