Web Scraping with Nokogiri (3)

require 'open-uri'
require 'nokogiri'

url = "http://nuttycellist-unknown.blogspot.jp/"

loop do
  charset = nil
  html = open(url) do |f|
    charset = f.charset
    f.read
  end

  doc = Nokogiri::HTML.parse(html, nil, charset)

  doc.css('.date-outer').each do |node|
    print '<span style="margin-right: 20px;">'
    print node.css('.date-header').text
    print '</span>'
    print node.css('.entry-title').inner_html
    puts '<br>'
  end

  unless doc.css('.blog-pager-older-link').empty?
    url = doc.css('.blog-pager-older-link').attribute('href').value
  else
    break
  end
end

Somewhat improved. The link is here.

If there are multiple posts in a day, the titles are not separated properly with the short code shown here.

S	M	T	W	T	F	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Leave a Reply Cancel reply