adventures in regular expressions

Given a block of text like the following: BEGIN:VEVENT GEO:48.1667\;-123.1167 TRANSP:TRANSPARENT LOCATION:Dungeness\, Washington DTSTART:20080608T131700Z UID:D56BE4D5-F9A3-4026-B99A-C2D979639220 SUMMARY:High Tide 1.83 meters DTSTAMP:20080617T002129Z END:VEVENT BEGIN:VEVENT GEO:48.1667\;-123.1167 TRANSP:TRANSPARENT LOCATION:Dungeness\, Washington DTSTART:20081012T013000Z UID:373CB6BD-F894-4826-8D04-6683AADFB4C4 SUMMARY:Sunset DTSTAMP:20080617T002129Z END:VEVENT BEGIN:VEVENT GEO:48.1667\;-123.1167 TRANSP:TRANSPARENT LOCATION:Dungeness\, Washington DTSTART:20080125T035500Z UID:1EAC4C71-23B1-456D-9302-1436E407B84E SUMMARY:Moonrise DTSTAMP:20080617T002129Z END:VEVENT BEGIN:VEVENT GEO:48.1667\;-123.1167 TRANSP:TRANSPARENT LOCATION:Dungeness\, Washington DTSTART:20080920T081500Z UID:CF306FB5-480D-433D-9E2C-34569BE0A654 SUMMARY:Low Tide -0.33 meters DTSTAMP:20080617T002129Z END:VEVENT remove all the SUMMARY:Moonrise|Sunrise|High Tide VEVENT mentions, leaving just the low tides.

…#!/usr/bin/perl local $/ = undef; my @low_tides = (); while (<>) { my @header = /(BEGIN:VCALENDAR.*?METHOD:PUBLISH\r?\n)/gs; my @events = /(BEGIN:VEVENT.*?END:VEVENT\r?\n)/gs; my $footer = “END:VCALENDAR\n”; push(@ical, @header); push(@ical, grep { /SUMMARY:Low Tide.*-\d/ } @events); push(@ical, $footer); } # Now @low_tides is an array of strings, each one containing just the # BEGIN:VEVENT through END:VEVENT lines of a single low tide event.

Given a block of text like the following:

BEGIN:VEVENT
GEO:48.1667\;-123.1167
TRANSP:TRANSPARENT
LOCATION:Dungeness\, Washington
DTSTART:20080608T131700Z
UID:D56BE4D5-F9A3-4026-B99A-C2D979639220
SUMMARY:High Tide 1.83 meters
DTSTAMP:20080617T002129Z
END:VEVENT
BEGIN:VEVENT
GEO:48.1667\;-123.1167
TRANSP:TRANSPARENT
LOCATION:Dungeness\, Washington
DTSTART:20081012T013000Z
UID:373CB6BD-F894-4826-8D04-6683AADFB4C4
SUMMARY:Sunset
DTSTAMP:20080617T002129Z
END:VEVENT
BEGIN:VEVENT
GEO:48.1667\;-123.1167
TRANSP:TRANSPARENT
LOCATION:Dungeness\, Washington
DTSTART:20080125T035500Z
UID:1EAC4C71-23B1-456D-9302-1436E407B84E
SUMMARY:Moonrise
DTSTAMP:20080617T002129Z
END:VEVENT
BEGIN:VEVENT
GEO:48.1667\;-123.1167
TRANSP:TRANSPARENT
LOCATION:Dungeness\, Washington
DTSTART:20080920T081500Z
UID:CF306FB5-480D-433D-9E2C-34569BE0A654
SUMMARY:Low Tide -0.33 meters
DTSTAMP:20080617T002129Z
END:VEVENT
remove all the SUMMARY:Moonrise|Sunrise|High Tide VEVENT mentions, leaving just the low tides.

This (?s)BEGIN:VEVENT??.*?END:VEVENT will find just the VEVENT items.

This (?s)BEGIN:VEVENT.*?SUMMARY:Low Tide.*?END:VEVENT gets too much: it grabs everything from the first instance of BEGIN:VEVENT to the END:VEVENT after the Low Tide, no matter how many other events get collected. Looks I need a look-behind: find the END:VEVENT and the Low Tide that came just before it, and then, everything back to the BEGIN:VEVENT.

And ideally, I just pull out the minus tides, especially if I have to go that far (anything that includes a ferry ride needs to be carefully considered).

After a lot of back and forth with a perl guru (I really get tripped up by this stuff), it was clear that I was trying to do much in one pass (better to pull out the events, then extract the ones we want, all with the default delimiter/linebreak turned off). So what I ended up with appears below the fold. I had the regex right (those have always been my bête noire) but I had no idea what to do with what I was getting.

#!/usr/bin/perl 
local $/ = undef;
my @low_tides = ();
while (<>) {
        my @header = /(BEGIN:VCALENDAR.*?METHOD:PUBLISH\r?\n)/gs;
        my @events = /(BEGIN:VEVENT.*?END:VEVENT\r?\n)/gs;
        my $footer = "END:VCALENDAR\n"
        push(@ical, @header);   
        push(@ical, grep { /SUMMARY:Low Tide.*-\d/ } @events);
        push(@ical, $footer);   

}
# Now @low_tides is an array of strings, each one containing just the
# BEGIN:VEVENT through END:VEVENT lines of a single low tide event.

my $s = @ical == 1 ? '' : 's';

print for (@ical);

And now I have a reliable calendar of tide events that I can share.

Leave a Reply

Your email address will not be published. Required fields are marked *