Web Link List Example In Ruby

WebLinkListExample in RubyLanguage using RegularExpression matching (because I'm not aware of anything in the standard library for parsing HTML):

 #!/usr/bin/env ruby

require 'net/http' require 'uri'

# usage: linklister.rb url # example: linklister.rb http://www.google.com/

url = ARGV[0] ? ARGV[0] : 'http://www.google.com/' parsed_url = URI.parse url page = Net::HTTP.get parsed_url.host, parsed_url.path

nested_hrefs = page.scan /href\s*=\s*\"([^\"]*)|href\s*=\s*\'([^\']*)|href\s*=\s*([^\s\"\'>]+)/i

puts "*** hrefs in #{url} ***" nested_hrefs.flatten.each do |href| puts href if href end

Output on 2005-12-30:

 *** hrefs in http://www.google.com/ ***
 /url?sa=p&pref=ig&pval=2&q=http://www.google.com/ig%3Fhl%3Den
 https://www.google.com/accounts/Login?continue=http://www.google.com/&hl=en
 /imghp?hl=en&tab=wi&ie=UTF-8
 http://groups.google.com/grphp?hl=en&tab=wg&ie=UTF-8
 http://news.google.com/nwshp?hl=en&tab=wn&ie=UTF-8
 http://froogle.google.com/frghp?hl=en&tab=wf&ie=UTF-8
 /lochp?hl=en&tab=wl&ie=UTF-8
 /intl/en/options/
 /advanced_search?hl=en
 /preferences?hl=en
 /language_tools?hl=en
 /ads/
 /services/
 /intl/en/about.html

-- ElizabethWiethoff


CategoryRuby


EditText of this page (last edited December 30, 2005) or FindPage with title or text search