art-talk weather forecast rides again!
Posted by rick Mon, 27 Feb 2006 00:17:00 GMT
In order to make automated “7 day Art weather forecast” emails like these on the new AirSet art-talk calendar I had to update my ruby script. Here’s the new version (and you’ll note that the bulk of the code deals with cleanup of bastardized input):
#!/usr/bin/ruby
require 'open-uri'
require 'rexml/document'
require 'rexml/xpath'
require 'time'
URL = "http://www.airset.com/syndicate/public/1391/week.xml"
def cleanup(text, keep_stars = false)
result = text.
gsub(/&/, '&').
gsub(/&[lr]?quot;/, '"').
gsub(/'/, "'").
gsub(/'/, "'").
gsub(/>/, '>').
gsub(/</, '<').
gsub(/ /, ' ').
gsub(%r{</?[^>]+>}, '').
gsub(/\*\s*\*/, '**').
gsub(/\342\200\235/, '"').
gsub(/\342\200\234/, '"').
gsub(/\342\200\231/, "'").
gsub(/\342\200\223/, " -- ").
gsub(/\342\200\224/, " -- ").
gsub(/\303\242/, "a").
gsub(/\303\251/, "e").
gsub(%r{/+\s*$}, '')
result.gsub!(/\s*\*\s*/, '') unless keep_stars
result
end
def wordwrap(text, line_width = 70)
text.gsub( /\n/, "\n\n" ).gsub( /(.{1,#{line_width}})(\s+|$)/, "\\1\n")
end
def output(title, time, location, link, description)
title = cleanup(title)
time = cleanup(time)
location = cleanup(location)
link = cleanup(link)
description = wordwrap(cleanup(description, true))
puts "#{time.chomp} - #{title}"
puts " online: <#{link.chomp}>"
puts " location: #{location}\n\n"
description.split("\n").each {|l| puts " #{l}"}
puts ""
end
def fetch_document(link)
open(link) { |f| return f.read.split("\n").join(' * ') }
end
def extract_location(doc)
doc =~ %r{<span\s+class="evDescAndLoc">([^<]+)</span>}
place = $1 || ''
place = place.sub(/^.* at /, '').gsub(/\s+\*\s+/, '')
doc =~ %r{<span\s+class="evAddress">(.*?)</span>}
address = $1 || ''
address = address.gsub(/Get map/, '').gsub(/\s+\*\s+/, '')
place += " / #{address}" unless address =~ /^\s*$/
place
end
def extract_description(doc)
doc =~ %r{<span\s+class="evNote">(.*?)</span>}
return ($1 || '')
end
def retrieve_data(item)
# Extract title and time
if item.elements['title'].text =~ /\(([^)]+)\)\s*$/
time = $1
else
t = Time.parse item.elements['pubDate'].text
hour = t.hour % 12
hour = 12 if 0 == hour
time = "%02d/%02d/%4d (%d:%02d%sM)" %
[t.month, t.day, t.year, hour, t.min, t.hour > 11 ? 'P':'A']
end
title = cleanup(item.elements['title'].text.sub(/\([^)]+\)\s*$/, ''))
link = item.elements['link'].text
doc = fetch_document(link)
location = extract_location(doc)
description = extract_description(doc)
[title, time, location, link, description]
end
puts "7 Day Art Weather Forecast"
puts
puts " ... see the Art-Talk Calendar for more events:"
puts
puts " online at: <http://www.airset.com/Public/Calendars.jsp?id=1391>"
puts
puts " -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --"
puts
open(URL) do |f|
xml = REXML::Document.new(f.read)
REXML::XPath.each(xml, '//item') do |item|
title, time, location, link, description = retrieve_data(item)
output(title, time, location, link, description)
end
end

