September 1, 2017 – Spinor Lab

require 'open-uri' require 'nokogiri' url = "http://nuttycellist-unknown.blogspot.jp/" loop do charset = nil html = open(url) do |f| charset = f.charset f.read end doc = Nokogiri::HTML.parse(html, nil, charset) doc.css('.date-outer').each do |node| print '<span style="margin-right: 20px;">' print node.css('.date-header').text print '</span>' print node.css('.entry-title').inner_html puts '<br>' end unless doc.css('.blog-pager-older-link').empty? url = doc.css('.blog-pager-older-link').attribute('href').value else break end end

Suppose you wish to make a list of all the titles posted at a particular blog, then you write a program something like this:

require 'open-uri'
require 'nokogiri'

url = "http://nuttycellist-unknown.blogspot.jp/"

loop do
  charset = nil
  html = open(url) do |f|
    charset = f.charset
    f.read
  end

  doc = Nokogiri::HTML.parse(html, nil, charset)

  doc.css('.post-title a').each do |node|
    puts node.to_html
    puts "<br>"
  end

  unless doc.css('.blog-pager-older-link').empty?
    url = doc.css('.blog-pager-older-link').attribute('href').value
  else
    break
  end
end

The CSS selectors should be determined depending on the html code employed on the site.

What you will get is an html file showing the titles.

<a href="http://nuttycellist-unknown.blogspot.jp/2017/09/arrival-of-fall-season.html">Arrival of fall season</a>
<br>
<a href="http://nuttycellist-unknown.blogspot.jp/2017/09/chicken-cooked-with-welsh-onion.html">Chicken cooked with welsh onion</a>
<br>
<a href="http://nuttycellist-unknown.blogspot.jp/2017/08/seventy-two-years-have-passed.html">Seventy two years have passed</a>
<br>
...

S	M	T	W	T	F	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30

Day: September 1, 2017

Web Scraping with Nokogiri (3)

Web Scraping with Nokogiri (2)