Blog to Ebook Conversion (4)

I first intended to delete all the tags from the text body of an article, but this is obviously wrong because in that way all paragraphs will be combined to one, thus making the output very unreadable.

So it is necessary to detect a new paragraph, and insert the appropriate tags there. The problem is paragraphs are generated in the original HTML files in various ways. Some examples are:

aaa
bbb
<br />
ccc
ddd

or

<div>
aaa
bbb
</div>
<div>
 
</div>
<div>
ccc
ddd
</div>

Blog to Ebook Conversion (3)

It is possible that you have a one large file that contains all the text in your ebook, but usually we create a single file for each chapter so that ebook readers can handle and navigate the book more easily.

If you define the chapter to contain all the articles in a month, here is one example to split the single large file into the smaller files for each month.

BEGIN{
 outfile="x"
 s1="a"
 s2="a"
 out=(outfile s1 s2 ".html")
 monthb4=""
}
/<h1>/ {
  split($0,array,"/") # array begins with [1]
  if(monthb4 != array[1]) {
  monthb4=array[1]
  if(NR!=1){
  close(out)
  if(s2=="z") {
   if(s1=="z") exit 1
   s1=chr(ord(s1)+1)
   s2="a"
  }
  else
   s2=chr(ord(s2)+1)
  out=(outfile s1 s2 ".html")
  }
  }
 }
 {print > out}

The program assumes that the original file is something like this:

<h1>8/22/2011</h1>
<h2>My first article.</h2>
...
<h1>9/01/2011</h1>
<h2>My second article.</h2>
...

The first number in the line with an h1 tag is the month in which the article is posted, so each time we detect the change of the month, the file name is updated as “xaa.html”, “xab.html” and so on.

Blog to Ebook Conversion (2)

In many cases, your blog contains images which is stored at the site your blog resides. You need to download all the images from the site to be included in your ebook.

Typically, the html code is something like this:

<img border="0" src="http://yourFBblog.com/-BQdcDYPg-yI/UvHQo26PFuI/AAAAAAAABGY/cUzndRZG1lw/s1600/001.jpg" height="213" width="320" /></a>

Out of this code, you will generate % wget command such as:

wget http://yourFBblog.com/-BQdcDYPg-yI/UvHQo26PFuI/AAAAAAAABGY/cUzndRZG1lw/s1600/001.jpg

This wget command will make the file “001.jpg” in your current directory, which is OK, but one thing you must consider is that the URLs are unique but not the file names. If you have some number of different URLs with the same file name such as “001.jpg”, wget will automatically generate the files “001.jpg”, “001.jpg.1”, “001.jpg.2” and so on. This is sometimes inconvenient, and it may also happen that the original file names are either very long or encoded in a character code which you may wish to avoid to handle.

Therefore, my short awk program is:

{
 gsub(/<img.*src="/, "")
 gsub(/\.JPG.*$/     , ".JPG")
 gsub(/\.jpg.*$/     , ".jpg")
 printf("wget -O myfile%03d.jpg %s\n",count, $0)
 count++
}

This program generate a wget command each time it finds an img tag, and the images are stored into serially numbered files, such as “myfile000.jpg”, “myfile001.jpg”, and so on.

BEGIN{
 nrsave_date =-999
 nrsave_title=-999
 img_count=0
}

 /<h2 class='date-header'>/ {nrsave_date=NR}
 /<h3 class='post-title entry-title' itemprop='name'>/ {nrsave_title=NR}
 NR==nrsave_date +2 {print "<h1>" $0 "</h1>"}
 NR==nrsave_title+2 {print "<h2>" $0 "</h2>"}

 /post-body entry-content/, /post-footer/ {
  gsub(" "  , " ")
  if(/^[[:blank:]]+$/) next
  if(/^<img/) {
   printf("<img src=\"../Images/myfile%03d.jpg\" />\n", img_count)
   img_count++
  }
  if(/^</);
  else
  print $0
 }

This is the new awk program to generate an html file, in which img src is rewritten as a local reference. The html file, along with the img files, can be converted into an ebook by using an authoring tool such as SIGIL.

Blog to Ebook Conversion

This is one way to make an ebook out of your blog. It largely depends on what style sheet (CSS) you are using, so actually you need a lot of adjustments before you actually get some practical output.

Here we assume that you want to extract the date, the title and the text body from each of your articles, and they are in an HTML format as in the following manner.

<h2 class='date-header'>
<span>
31/01/2014
</span>
</h2>
...
<h3 class='post-title entry-title'>
<a href='http://yourFBblog.com/2014/01/QRP_and_QRS.html'>
QRP and QRS
</a>
</h3>
...
<div class='post-body entry-content'>
Sometimes you may wish to get all the titles of your previous articles in your or someone else’s blog.
</div>
<div class='post-footer'>
...

Then you will write a short awk program such as:

BEGIN{
 nrsave_date=-999;
 nrsave_title=-999
}

 /post-body entry-content/, /post-footer/ {print}
{
 if(/<h2 class='date-header'>/) nrsave_date=NR
 if(/<h3 class='post-title entry-title' itemprop='name'>/) nrsave_title=NR
 if(NR==nrsave_date+2) print "<h1>" $0 "</h1>"
 if(NR==nrsave_title+2) print "<h2>" $0 "</h2>"
}

The output will be something like the following, if you remove unnecessary tags:

<h1>31/01/2014</h1>
<h2>QRP and QRS</h2>
I love QRP operations, but some people ...
<h1>20/02/2014</h1>
<h2>Snow on my ANT</h2>
Last night, we had a heavy snow...

This can be fed into an Ebook editor, for example, SIGIL to get your own ebook immediately.

Resample at 4 times the tone frequency

Unfasor
http://en.wikipedia.org/wiki/Phasor

Why do you resample the side-tone at 4 times the tone frequency to obtain its envelope? This relates to how you think of the side-tone, or sinusoids (=sine waves) in general. At first glance, the amplitude of a sine wave goes up and down, so depending on the timing you observe the signal, you will get various values as its amplitude. One idea to detect the peak value in an interval greater than its period, somewhat equivalent to an envelope detector using a diode and a CR low-pass filter circuit.

But the truth is that the amplitude of a sinusoid does not vary but is constant, as you see in the bottom half of the figure. The radius of the circle is constant! The amplitude varies because you are looking at the shadow of the rotating bar onto the y-axis.

The problem is how you can estimate the length of the rotating bar by just observing its shadow.

The clue is to observe the shadow not once in its rotating period, the phase of the observation being arbitrary, but observe twice in the period, each separated with the time equal to a quarter of the period. The sampling phase of the observation in this case does not matter with the following reasons.

phaser2

If the sample rate is 4 times the tone frequency, you will have 4 sampling points (in red) on a circle each separated with 90 degrees. With an arbitrary sampling phase of th, the two adjacent samples are at the phases of th and th+90deg, respectively, which means the two values are x(i)=A*cos(th) and x(i+1)=A*cos(th+90deg), where A is the length of the rotating bar.

Therefore, you will have
sqrt(x(i)^2+x(i+1)^2)
=sqrt(A^2*cos^2(th)+A^2*cos^2(th+90deg))
=A*sqrt(cos^2(th)+sin^2(th))
=A.

Signals with 90 deg phase difference have an important role in signal processing. One example is the SSB generator with a PSN circuit, which is somewhat more complex because you need to give the phase difference not to a sine wave with a fixed frequency but to s signal with some bandwidth, say between 0.3kHz to 2.7kHz.

Code Practice Files

ChristmasCarol

You can download code practice files from, for example, http://www.arrl.org/code-practice-files. The following is how to make your own practice files.

First, you need to find some text files. I always visit Project Gutenberg, which offers over 42,000 free ebooks in various format including HTML, EPUB, Kindle, and plain text. You will download a book in a plain text format.

% tr -s '[:punct:]' ' ' <original.txt >withoutpunct.txt

You may wish to delete all the punctuations from the text.

% ebook2cw -w 40 -f 600 -s 8000 -b 64 -q 1 -o ChristmasCarol-40wpm ChristmasCarol.txt
% ls
ChristmasCarol-25wpm0000.mp3
ChristmasCarol-30wpm0000.mp3
ChristmasCarol-35wpm0000.mp3
ChristmasCarol-40wpm0000.mp3

ebook2cw is a program to convert ebooks to Morse MP3s/OGGs files. It might be convenient if you add the options -a and -t to add ID3 tags to the file. Or you can import the files into, say, iTunes, and add the tags there.

My Keying (5)

Bug_quad

When you have a side-tone sound file instead of the key’s contact signal, you need to estimate the latter from the former. There are various ways, such as using a Hilbert filter or simply using an envelope detector. Here the audio signal is resampled at 4 times the tone frequency, i.e. at 2400Hz (=4*600Hz), and sqrt(x(i)^2+x(i-1)^2) is computed as signal power, where x(i) is the i-th sample value.

% gawk '{print sqrt($1*$1+b4*b4); b4=$1}' <test2.csv >test3.csv

My Keying (4)

Bug_Dot2Dot

This is a histogram of the duration of spaces when a space is between two dots. The case happens in a single letter, like in “h”, or between two letters like between “n” and “a”.

Bug_Dot2Dash

Spaces between dot and dash, like in a letter “a”, or between the letters like “n” and “n”.

Bug_Dash2Dot

Spaces between dash and dot, like in a letter “n”, or between the letters “a” and “a”.

Bug_Dash2Dash

Spaces between dash and dash, like in a letter “m”, or between the letters “a” and “n”.

BEGIN {
type="null"
val1b4="null"
val2b4=0
val3b4="null"
val1b44=val1b4
val2b44=val2b4
val3b44=val3b4
}

{
 if(val1b44=="dw" && $1=="dw" ) {
  if(val3b44=="long" && $3=="long")
   type="ll"
  if(val3b44=="long" && $3=="short")
   type="ls"
  if(val3b44=="short" && $3=="long")
   type="sl"
  if(val3b44=="short" && $3=="short")
   type="ss"
  print val2b4, type
 }

 val1b44=val1b4
 val2b44=val2b4
 val3b44=val3b4
 val1b4=$1
 val2b4=$2
 val3b4=$3
}
% gawk -f myprog.awk <w3 >w4
% head w3
dw 20 short
up 34 short
dw 105 long
up 98 long
dw 96 long
up 44 long
dw 22 short
up 26 short
dw 24 short
up 28 short
% head w4
34 sl
98 ll
44 ls
26 ss
28 ss
31 ss
156 sl
42 ls
42 sl
37 ls
%R
> th=seq(0,250,10)
> hist(datass$V1,xlim=c(0,250),ylim=c(0,10),breaks=th,main="Dot-to-Dot",col="green")
> hist(datasl$V1,xlim=c(0,250),ylim=c(0,8),breaks=th,main="Dot-to-Dash",col="blue")
> hist(datals$V1,xlim=c(0,250),ylim=c(0,8),breaks=th,main="Dash-to-Dot",col="yellow")
> hist(datall$V1,xlim=c(0,250),ylim=c(0,8),breaks=th,main="Dash-to-Dash",col="red")

My Keying (3)

Bug_dw_up

This is a scatter plot considering the dot (or dash) with its directly following space as an entity. The group 1 consists of, say “h”s, in which both dot and space periods are very exact. The group 2 consists of the letters such as “o”s, in which both the periods are controlled manually. The groups 3 and 4 are for the spaces between letters.

% head w2
dw 20
up 34
dw 105
up 98
dw 96
up 44
dw 22
up 26
dw 24
up 28
% cat myprog.awk
{
 if(NR%2 == 0) print b4, $2
 b4=$2
}
% gawk -f myprog.awk <w2 >dw_up
% head dw_up
20 34
105 98
96 44
22 26
24 28
% R
> mydata<-read.table("dw_up")
> mydata
    V1  V2
1   20  34
2  105  98
3   96  44
4   22  26
5   24  28
> plot(mydata$V1,mydata$V2,col="blue",xlim=c(0,250),ylim=c(0,250),main="Dot/Dash vs. Space",xlab="Dot/Dash",ylab="Space")

My Keying (2)

Bug_down

This is a histogram of key down period (i.e., either dots or dashes). It seems that the dot to dash ratio is around 25:100, or 1:4.

Bug_up

This is a histogram of key up period, or of spaces.

Save Time: 2014-02-02 15:12:44
Units:(mV)
                     CH1
Frequency:      4.564 Hz
Period:       219.091 mS
PK-PK:           3.520 V

1                 400.00
2                   0.00
3                   0.00
4                   0.00
5                  80.00
6                   0.00
7                   0.00
8                 -80.00
9                   0.00
10                  0.00
11                -80.00
12                  0.00
13                  0.00
14                  0.00
15                  0.00
16                160.00
17                  0.00
18                  0.00
19                 80.00
20               1760.00
21               2880.00
22               3280.00
23               3280.00
24               3280.00

The text file obtained from the oscilloscope is something like this.

BEGIN { th=2000; n=10; b4=1 }

/^[0-9]/ {

 if($2>th) out=1
 else
 out=0

 if(b4==1 && out==0)
  i=n

 if(i>0) {
  out=0
  i--
 }

 b4=out

 print out
}

This awk programs gives a “0” and “1” sequence by slicing the voltage.

BEGIN { b4=1; ndown=0; nup=0 }

{
 if(b4==1 && $1==0) {
  ndown=0
  if(nup>0) print "up", nup
 }

 if(b4==0 && $1==1) {
  nup=0
  if(ndown>0) print "dw", ndown
 }

 if($1==0)
  ndown++
 else
  nup++

 b4=$1
}

And this program counts the run of either “0”s or “1”s, and gives:

dw 20
up 34
dw 105
up 98
dw 96
up 44
dw 22
up 26
dw 24
up 28
dw 21
up 31
dw 11
up 156