Mechanize and The Radio Use Web Site (2)

Suppose you have a file containing the list of the stations of which you wish to get the QTH.

JH2CMH
JI4JGD
JA5IVG
JR2AWS
JA4VPS
JI3CJP
7K1CPT
JH3HGI
JA4MRL
JH2FOR
JE1TRV

You write a ruby program like this:

require 'mechanize'
agent = Mechanize.new
agent.user_agent = 'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)'
url = 'http://www.tele.soumu.go.jp/musen/SearchServlet?SK=2&DC=100&SC=1&pageID=3&CONFIRM=0&SelectID=1&SelectOW=01'
page = agent.get(url)

File.foreach('a1cc.txt') do | t |
  print t.chomp << " "
  next_page = page.form_with(:name => 'select_condition') do |form|
    form.MA = t.chomp
  end.submit
  puts next_page.css('form[name="result"] td')[8].text.gsub(/[\n\t]/,"")
end

And you will get:

% ruby mechanize.rb

JH2CMH 愛知県日進市
JI4JGD 岡山県井原市
JA5IVG 香川県高松市
JR2AWS 岐阜県高山市
JA4VPS 広島県廿日市市
JI3CJP 滋賀県近江八幡市
7K1CPT 茨城県かすみがうら市
JH3HGI 兵庫県赤穂郡上郡町
JA4MRL 岡山県岡山市南区
JH2FOR 愛知県あま市
JE1TRV 東京都町田市

Mechanize and The Radio Use Web Site

require 'mechanize'
agent = Mechanize.new
agent.user_agent = 'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)'
url = 'http://www.tele.soumu.go.jp/musen/SearchServlet?SK=2&DC=100&SC=1&pageID=3&CONFIRM=0&SelectID=1&SelectOW=01'
page = agent.get(url)
next_page = page.form_with(:name => 'select_condition') do |form|
  form.MA = 'JA1YRL'
end.submit
print next_page.css('form[name="result"] td')[8].text
% ruby mechanize.rb
東京都豊島区

You put a call sign, and get the QTH of the station. Unfortunately, if the call sign is a shorter one, like JA1RL, the code here does not work, because the site will give you all the QTHs for JA1RL*.

Web Scraping with Nokogiri (6)

Yet another example.

require 'open-uri'
require 'nokogiri'

#url = "http://psk31.cocolog-nifty.com/digitalmode/"
url = "http://psk31.cocolog-nifty.com/digitalmode/2017/09/2017-6e90.html"

loop do
  charset = nil
  html = open(url) do |f|
    charset = f.charset
    f.read
  end

  doc = Nokogiri::HTML.parse(html, nil, charset)
  url = doc.css('.entry-nav').css('a').attribute('href').value
  if url == 'http://psk31.cocolog-nifty.com/digitalmode/' then
    break
  end

  print '<span style="margin-right: 20px">' << doc.at_css('.entry-nav + h2').text << '</span>'
  print doc.at_css('.entry-nav').at_css('a').to_html.gsub(/« /, "")
  puts '<br>'

end

Initial URL is manually adjusted to keep the code simple.

The link is https://spinorlab.matrix.jp/en/psk31/

Web Scraping with Nokogiri (4)

Put some space between titles.

require 'open-uri'
require 'nokogiri'

url = "http://nuttycellist-unknown.blogspot.jp/"

loop do
  charset = nil
  html = open(url) do |f|
    charset = f.charset
    f.read
  end

  doc = Nokogiri::HTML.parse(html, nil, charset)

  doc.css('.date-outer').each do |node|
    print node.css('.date-header').inner_html
    print '<span style="margin-left: -60px;"></span>'
    node.css('.entry-title').each do |title|
      print '<span style="margin-left: 80px;"></span>'
      print title.inner_html
    end
    puts '<br>'
  end

  unless doc.css('.blog-pager-older-link').empty?
    url = doc.css('.blog-pager-older-link').attribute('href').value
  else
    break
  end
end

The output file is here.

Web Scraping with Nokogiri (3)

require 'open-uri'
require 'nokogiri'

url = "http://nuttycellist-unknown.blogspot.jp/"

loop do
  charset = nil
  html = open(url) do |f|
    charset = f.charset
    f.read
  end

  doc = Nokogiri::HTML.parse(html, nil, charset)

  doc.css('.date-outer').each do |node|
    print '<span style="margin-right: 20px;">'
    print node.css('.date-header').text
    print '</span>'
    print node.css('.entry-title').inner_html
    puts '<br>'
  end

  unless doc.css('.blog-pager-older-link').empty?
    url = doc.css('.blog-pager-older-link').attribute('href').value
  else
    break
  end
end

Somewhat improved. The link is here.

If there are multiple posts in a day, the titles are not separated properly with the short code shown here.

Web Scraping with Nokogiri (2)

Suppose you wish to make a list of all the titles posted at a particular blog, then you write a program something like this:

require 'open-uri'
require 'nokogiri'

url = "http://nuttycellist-unknown.blogspot.jp/"

loop do
  charset = nil
  html = open(url) do |f|
    charset = f.charset
    f.read
  end

  doc = Nokogiri::HTML.parse(html, nil, charset)

  doc.css('.post-title a').each do |node|
    puts node.to_html
    puts "<br>"
  end

  unless doc.css('.blog-pager-older-link').empty?
    url = doc.css('.blog-pager-older-link').attribute('href').value
  else
    break
  end
end

The CSS selectors should be determined depending on the html code employed on the site.

What you will get is an html file showing the titles.

<a href="http://nuttycellist-unknown.blogspot.jp/2017/09/arrival-of-fall-season.html">Arrival of fall season</a>
<br>
<a href="http://nuttycellist-unknown.blogspot.jp/2017/09/chicken-cooked-with-welsh-onion.html">Chicken cooked with welsh onion</a>
<br>
<a href="http://nuttycellist-unknown.blogspot.jp/2017/08/seventy-two-years-have-passed.html">Seventy two years have passed</a>
<br>
...