topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Saturday December 14, 2024, 3:57 am
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: Nested Matches  (Read 5204 times)

allen

  • Charter Member
  • Joined in 2006
  • ***
  • Posts: 1,206
    • View Profile
    • Donate to Member
Nested Matches
« on: February 04, 2006, 03:37 PM »
For my web-based php regex find/replace do-hickey, I need to match individual back references and wrap a tag around them so they'll be unique to the rest of the match for individual color markup.  Initially this would seem easy enough, however not all of a potential regex match is going to be within a back reference.  So it's necessary to replace the back reference, and only the back reference, while preserving the context of the match.  For example, if I were to search the text

fish this fish fish

looking for

.*?(?<=this )(fish).*

I'd match everything, capturing  the second instance of fish into the back reference.  I can't simply take the match and run a replace for fish in order to apply the highlighting, because then i'd end up with 3 highlighted "fish", 2 of which weren't supposed to be.  I also couldn't simply return the back reference with the markup, as that wouldn't return the non-back referenced stuff.

My initial solution was to run the original find text over the match to get the back references, using an extra flag to have it return the offset of each back reference.  So now I have the location of the text within the string, and can get the length of it from that point from the string itself.  Going backwards so as not to mess with the numeric location with in the string, it captures back references without losing context or data.  Perfect.

. . . until back references are nested.  In this example:
(.*?(?<=this )(fish).*)
back reference 1 would be fish this fish fish, back reference 2 would be fish -- here's where the problem surfaces.

If I wrap back reference 2 in the markup, when I apply back reference 1's markup it's going to apply the end tag in the wrong place since the string has increased and the original length calculated no longer applies.  If I replace back reference 1 first, same problem.  I'm sure there's some obvious, simple solution I'm overlooking having exhausted a bunch of complex attempts to compensate for it.  Any fresh perspectives on the best way to markup nested groups while preserving the integrity of the return?

allen

  • Charter Member
  • Joined in 2006
  • ***
  • Posts: 1,206
    • View Profile
    • Donate to Member
Re: Nested Matches
« Reply #1 on: February 07, 2006, 03:42 PM »
For anyone interested, the problem was solved.  Here's the function that handles highlighting:

function hltr($text,$find) {
  preg_match_all($find,$text,$hlight,PREG_OFFSET_CAPTURE+PREG_SET_ORDER);
  if ( is_array($hlight) && count($hlight) > 0 ) {
    $hlight = $hlight[0];
    $n=1;
    foreach ( $hlight as $num => $match ) {
        if ( $num > 0 ) {
          while ( isset($points[$match[1]]) ) $match[1] .= '.01';
          $num = $match[1]+strlen($match[0]);
          while ( isset($points[$num]) ) $num .= '.01';
          $points[$match[1]] = "<span class=\"hlt$n\">";
          $points[$num] = '</span>';
          $n++;
        }
    }
    if ( isset($points) && is_array($points) ) {
      ksort($points);
      $count = 0;
      foreach ( $points as $num => $insert ) {
        $text = substr_replace($text,$insert,$count+$num,0);
        $count = $count + strlen($insert);
      }
    }
  }
  return('<strong class="result">'.$text.'</strong>');
}