Given a block of text like the following: BEGIN:VEVENT GEO:48.1667\;-123.1167 TRANSP:TRANSPARENT LOCATION:Dungeness\, Washington DTSTART:20080608T131700Z UID:D56BE4D5-F9A3-4026-B99A-C2D979639220 SUMMARY:High Tide 1.83 meters DTSTAMP:20080617T002129Z END:VEVENT BEGIN:VEVENT GEO:48.1667\;-123.1167 TRANSP:TRANSPARENT LOCATION:Dungeness\, Washington DTSTART:20081012T013000Z UID:373CB6BD-F894-4826-8D04-6683AADFB4C4 SUMMARY:Sunset DTSTAMP:20080617T002129Z END:VEVENT BEGIN:VEVENT GEO:48.1667\;-123.1167 TRANSP:TRANSPARENT LOCATION:Dungeness\, Washington DTSTART:20080125T035500Z UID:1EAC4C71-23B1-456D-9302-1436E407B84E SUMMARY:Moonrise DTSTAMP:20080617T002129Z END:VEVENT BEGIN:VEVENT GEO:48.1667\;-123.1167 TRANSP:TRANSPARENT LOCATION:Dungeness\, Washington DTSTART:20080920T081500Z UID:CF306FB5-480D-433D-9E2C-34569BE0A654 SUMMARY:Low Tide -0.33 meters DTSTAMP:20080617T002129Z END:VEVENT remove all the SUMMARY:Moonrise|Sunrise|High Tide VEVENT mentions, leaving just the low tides.
…#!/usr/bin/perl local $/ = undef; my @low_tides = (); while (<>) { my @header = /(BEGIN:VCALENDAR.*?METHOD:PUBLISH\r?\n)/gs; my @events = /(BEGIN:VEVENT.*?END:VEVENT\r?\n)/gs; my $footer = “END:VCALENDAR\n”; push(@ical, @header); push(@ical, grep { /SUMMARY:Low Tide.*-\d/ } @events); push(@ical, $footer); } # Now @low_tides is an array of strings, each one containing just the # BEGIN:VEVENT through END:VEVENT lines of a single low tide event.
Given a block of text like the following:
BEGIN:VEVENT
GEO:48.1667\;-123.1167
TRANSP:TRANSPARENT
LOCATION:Dungeness\, Washington
DTSTART:20080608T131700Z
UID:D56BE4D5-F9A3-4026-B99A-C2D979639220
SUMMARY:High Tide 1.83 meters
DTSTAMP:20080617T002129Z
END:VEVENT
BEGIN:VEVENT
GEO:48.1667\;-123.1167
TRANSP:TRANSPARENT
LOCATION:Dungeness\, Washington
DTSTART:20081012T013000Z
UID:373CB6BD-F894-4826-8D04-6683AADFB4C4
SUMMARY:Sunset
DTSTAMP:20080617T002129Z
END:VEVENT
BEGIN:VEVENT
GEO:48.1667\;-123.1167
TRANSP:TRANSPARENT
LOCATION:Dungeness\, Washington
DTSTART:20080125T035500Z
UID:1EAC4C71-23B1-456D-9302-1436E407B84E
SUMMARY:Moonrise
DTSTAMP:20080617T002129Z
END:VEVENT
BEGIN:VEVENT
GEO:48.1667\;-123.1167
TRANSP:TRANSPARENT
LOCATION:Dungeness\, Washington
DTSTART:20080920T081500Z
UID:CF306FB5-480D-433D-9E2C-34569BE0A654
SUMMARY:Low Tide -0.33 meters
DTSTAMP:20080617T002129Z
END:VEVENT
remove all the SUMMARY:Moonrise|Sunrise|High Tide VEVENT
mentions, leaving just the low tides.
This (?s)BEGIN:VEVENT??.*?END:VEVENT
will find just the VEVENT items.
This (?s)BEGIN:VEVENT.*?SUMMARY:Low Tide.*?END:VEVENT
gets too much: it grabs everything from the first instance of BEGIN:VEVENT
to the END:VEVENT
after the Low Tide, no matter how many other events get collected. Looks I need a look-behind: find the END:VEVENT
and the Low Tide that came just before it, and then, everything back to the BEGIN:VEVENT
.
And ideally, I just pull out the minus tides, especially if I have to go that far (anything that includes a ferry ride needs to be carefully considered).
After a lot of back and forth with a perl guru (I really get tripped up by this stuff), it was clear that I was trying to do much in one pass (better to pull out the events, then extract the ones we want, all with the default delimiter/linebreak turned off). So what I ended up with appears below the fold. I had the regex right (those have always been my bête noire) but I had no idea what to do with what I was getting.
Continue reading “adventures in regular expressions”