Web Scraping with Nokogiri (3)

require 'open-uri'
require 'nokogiri'

url = "http://nuttycellist-unknown.blogspot.jp/"

loop do
  charset = nil
  html = open(url) do |f|
    charset = f.charset
    f.read
  end

  doc = Nokogiri::HTML.parse(html, nil, charset)

  doc.css('.date-outer').each do |node|
    print '<span style="margin-right: 20px;">'
    print node.css('.date-header').text
    print '</span>'
    print node.css('.entry-title').inner_html
    puts '<br>'
  end

  unless doc.css('.blog-pager-older-link').empty?
    url = doc.css('.blog-pager-older-link').attribute('href').value
  else
    break
  end
end

Somewhat improved. The link is here.

If there are multiple posts in a day, the titles are not separated properly with the short code shown here.

Web Scraping with Nokogiri (2)

Suppose you wish to make a list of all the titles posted at a particular blog, then you write a program something like this:

require 'open-uri'
require 'nokogiri'

url = "http://nuttycellist-unknown.blogspot.jp/"

loop do
  charset = nil
  html = open(url) do |f|
    charset = f.charset
    f.read
  end

  doc = Nokogiri::HTML.parse(html, nil, charset)

  doc.css('.post-title a').each do |node|
    puts node.to_html
    puts "<br>"
  end

  unless doc.css('.blog-pager-older-link').empty?
    url = doc.css('.blog-pager-older-link').attribute('href').value
  else
    break
  end
end

The CSS selectors should be determined depending on the html code employed on the site.

What you will get is an html file showing the titles.

<a href="http://nuttycellist-unknown.blogspot.jp/2017/09/arrival-of-fall-season.html">Arrival of fall season</a>
<br>
<a href="http://nuttycellist-unknown.blogspot.jp/2017/09/chicken-cooked-with-welsh-onion.html">Chicken cooked with welsh onion</a>
<br>
<a href="http://nuttycellist-unknown.blogspot.jp/2017/08/seventy-two-years-have-passed.html">Seventy two years have passed</a>
<br>
...