Given a block of text like the following:
remove all the
BEGIN:VEVENT
GEO:48.1667\;-123.1167
TRANSP:TRANSPARENT
LOCATION:Dungeness\, Washington
DTSTART:20080608T131700Z
UID:D56BE4D5-F9A3-4026-B99A-C2D979639220
SUMMARY:High Tide 1.83 meters
DTSTAMP:20080617T002129Z
END:VEVENT
BEGIN:VEVENT
GEO:48.1667\;-123.1167
TRANSP:TRANSPARENT
LOCATION:Dungeness\, Washington
DTSTART:20081012T013000Z
UID:373CB6BD-F894-4826-8D04-6683AADFB4C4
SUMMARY:Sunset
DTSTAMP:20080617T002129Z
END:VEVENT
BEGIN:VEVENT
GEO:48.1667\;-123.1167
TRANSP:TRANSPARENT
LOCATION:Dungeness\, Washington
DTSTART:20080125T035500Z
UID:1EAC4C71-23B1-456D-9302-1436E407B84E
SUMMARY:Moonrise
DTSTAMP:20080617T002129Z
END:VEVENT
BEGIN:VEVENT
GEO:48.1667\;-123.1167
TRANSP:TRANSPARENT
LOCATION:Dungeness\, Washington
DTSTART:20080920T081500Z
UID:CF306FB5-480D-433D-9E2C-34569BE0A654
SUMMARY:Low Tide -0.33 meters
DTSTAMP:20080617T002129Z
END:VEVENTSUMMARY:Moonrise|Sunrise|High Tide VEVENT
mentions, leaving just the low tides.
This (?s)BEGIN:VEVENT??.*?END:VEVENT
will find just the VEVENT items.
This (?s)BEGIN:VEVENT.*?SUMMARY:Low Tide.*?END:VEVENT
gets too much: it grabs everything from the first instance of BEGIN:VEVENT
to the END:VEVENT
after the Low Tide, no matter how many other events get collected. Looks I need a look-behind: find the END:VEVENT
and the Low Tide that came just before it, and then, everything back to the BEGIN:VEVENT
.
And ideally, I just pull out the minus tides, especially if I have to go that far (anything that includes a ferry ride needs to be carefully considered).
After a lot of back and forth with a perl guru (I really get tripped up by this stuff), it was clear that I was trying to do much in one pass (better to pull out the events, then extract the ones we want, all with the default delimiter/linebreak turned off). So what I ended up with appears below the fold. I had the regex right (those have always been my bête noire) but I had no idea what to do with what I was getting.