Data Frames

Data sets in R are most often stored in data frames. A data frame is a two dimensional data structure, with each row representing a case and each column reresenting a variable. You can generate a data frame, for example, from vectors as in the following way, where each vector represents a column.

> name    <- c("Alpha", "Bravo", "Charlie", "Delta")
> weight  <- c(31.0, 47.2, 69.5, 99.8)
> price   <- c(9.2, 13.7, 21.4, 38.5)
> example <- data.frame(name, weight, price)
> example
     name weight price
1   Alpha   31.0   9.2
2   Bravo   47.2  13.7
3 Charlie   69.5  21.4
4   Delta   99.8  38.5

Once you have a data frame, it is a relatively simple task to analyze the data and to draw graphs of various types.

> graph <- ggplot(example, aes(x=weight, y=price))
> graph + geom_point() + stat_smooth(method=lm)

dataFrame1Figure 1. Scatter Plot and a Linear Regression Line

        letter_space      11688          0
                 dot        510          0
       element_space        265          0
                 dot        533          1
       element_space        341          0
                 dot        511          2
       element_space        333          0
                 dot        499          3
        letter_space       1451          0
                 dot        541          0
       element_space        530          0
                dash       1647          0
       element_space        281          0
                 dot        505          1
        letter_space       2539          0
           (853 more lines deleted..)

Here is a first part of the output from “myprog” which reads a file containing Morse code that starts with “HR HR” by JO1FYC.
[1] http://homepage2.nifty.com/jo1fyc/sound/20051010_nikki-32.mp3

This is a text file and readily loaded into R by using read.table().

> mydata <-read.table("20051010_nikki-32_8kHz.aaa", header=FALSE)
> ggplot(mydata, aes(x=V1, fill=V1)) + geom_histogram()

dataFrame2Figure 2. Histogram

Linear Regression

It seems that the duration of dot cycle is almost constant except for the last few seconds. So I extracted a “nice part” of the dot and space duration data from the originals by discarding the first ten and the last sixty-two points, and tried to find the line of best fit by applying a linear model.

bug_dot20
Figure 1. Dot and Space pair in a Single Dot Cycle

> dot2   <- read.table("BK2.dot"  ,header=FALSE)
> space2 <- read.table("BK2.space",header=FALSE)
> res=lm(space2$V1~dot2$V1)
> res

Call:
lm(formula = space2$V1 ~ dot2$V1)

Coefficients:
(Intercept)      dot2$V1
     873.38        -1.13

> p <- qplot(dot2$V1,space2$V1,color=dot2$V1)+geom_point(shape=23,size=1)
> p+ geom_abline(intercept=873.38, slope=-1.13, colour="red", size=1)

ggplot2

Draw the same graphs with the library ggplot2.

bug_dot10Figure 1. Dot Length

bug_dot11Figure 2. Space Length

bug_dot12Figure 3. Duration of Dot Cycle

Bug Key

JO1FYC [1] offers some mp3 files of his Morse music in his blog [2], and I was tempted to anylize the latest one [3], dated May 6, 2013. Since the file contains a series of consecutive dots with a bug key, it might give some good parameters for a physical model of such keys.
[1] http://homepage2.nifty.com/jo1fyc/index.htm
[2] http://homepage2.nifty.com/jo1fyc/cw_nikki.htm
[3] http://homepage2.nifty.com/jo1fyc/sound/BK-100_DOT_20130506.mp3

bug_dotsFigure 1. Dot and Space Length

The dots continue more than 30 sec, and you can count 362 dots. The lengh of each dot is incresing slowly for the most of the time, and for the last few seconds it increases rather rapidly.

bug_dots2Figure 2. Dot Length

bug_dots3Figure 3. Space Length

but_dots4Figure 4. Duration of Dot Cycle

The figures are obtained by the following commands.

% mplayer -quiet -vo null -vc dummy -af volume=0,resample=8000
          -ao pcm:waveheader:file="BK-100_DOT_20130506_8kHz.wav"
          "BK-100_DOT_20130506.mp3"
% myprog BK-100_DOT_20130506_8kHz.wav > BK.txt
% awk '{if($1=="dot" || $1=="dash") print $2}' BK.txt > BK.dot
% awk '{if($1=="element_space")     print $2}' BK.txt > BK.space

% gnuplot
gnuplot> plot "BK.dot" with line, "BK.space" with line

% R
> dot   <-read.table("BK.dot",   header=FALSE)
> space <-read.table("BK.space", header=FALSE)
> added=dot$V1 + space$V1
> plot(dot$V1)
> plot(space$V1)
> plot(added)

Binomial Test

Suppose you have a (fair) coin and toss it for fifty times. How many heads will you get?

binom50_0r5
Figure 1. Coin Toss (50 times, p=0.5)

The answer is shown in Fig. 1. You are most likely to have twenty-five heads with the probability of 0.1122752. You are very unlikely to have, say, fourteen heads, because the probability of happening this is only 0.0008329743, which is less than one in thousand times.

binom50_0r2
Figure 2. Coin Toss (50 times, p=0.2)

If you somehow have a coin that gives you a head with the probability of 0.2, then you will have the results shown in Fig. 2. This time you are most likely to have ten heads with the probability of 0.139819.

binom50_0r1
Figure 3. Coin Toss (50 times, p=0.1)

binom50_0r02
Figure 4. Coin Toss (50 times, p=0.02)

With the head probabilities of either 0.1 or 0.02, the results are shown in Figures 3 and 4, respectively. In the latter case, you are most likely to have only one head, and the probabilites of having 0, 1, 2, 3 or 4 heads are 0.364169680, 0.371601714, 0.185800857, 0.060669668, 0.014548339, respectively.

Question 1:

You have a fake coin with the words “p=0.02” engraved on the head, and toss it fifty times. If you happen to have 3 heads insted of only one, do you say that your “fake” coin is a fake?

Answer 1:

No, you can’t. The binomial test for this case is;

>  binom.test(3,50,0.02)

	Exact binomial test

data:  3 and 50
number of successes = 3, number of trials = 50, p-value = 0.07843
alternative hypothesis: true probability of success is not equal to 0.02
95 percent confidence interval:
 0.01254859 0.16548195
sample estimates:
probability of success
                  0.06

The p-value is 0.07843, which is not less than the significance level of 0.05, and we can not reject the null hypothesis that the probability having a head is 0.02.

> dbinom(0:2,50,0.02)
[1] 0.3641697 0.3716017 0.1858009
> sum(dbinom(0:2,50,0.02))
[1] 0.9215723
> 1-sum(dbinom(0:2,50,0.02))
[1] 0.07842775

In other words, the probability of having either 0, 1 or 2 heads is 92.15723%, and the probability of having more than or equal to 3 heads is 7.842775%, which is more than 5% and can not be considered to be “very unlikely”.

Question 2:

In the year 2012, you observed 47 XYZ patients in Alpha_district (population 490,000), and 3 XYZ patients in Bravo_district (population 10,000). The incidence rates for the two districts are 0.96e-04 and 3.0e-04, respectively. Do you say that the people in Bravo_district is more prone to XYZ?

Answer 2:

The same as Answer 1.

Flotr2

Flotr2 is a library for drawing HTML5 charts and graphs.

(function () {
var
container = document.getElementById(‘container’),
start = (new Date).getTime(),
data, graph, offset, i;
// Draw a sine curve at time t
function animate (t) {
data = [];
offset = 2 * Math.PI * (t – start) / 10000;
// Sample the sine function
for (i = -2.0; i <= 2.05; i += 0.1) {
data.push([i, Math.sin(i – offset) * Math.exp (-i*i)]);
}
// Draw Graph
graph = Flotr.draw(container, [ data ], {
xaxis : { max : 2.0, min : -2.0 },
yaxis : { max : 1.0, min : -1.0 }
});
// Animate
setTimeout(function () {
animate((new Date).getTime());
}, 50);
}
animate(start);
})();

Roméo et Juliette

JULIETTE.–O Roméo! Roméo!–Pourquoi es-tu Roméo?–Renie ton père et rejette ton nom; ou, si tu ne le veux pas, jure seulement de m’aimer, et je cesse d’être une Capulet.

ROMÉO, _à part_.–Dois-je l’écouter plus longtemps, ou répondrai-je à ceci?

[allowphp]
require_once(“/usr/local/apache2/htdocs/wordpress_3.5.1/myphptex.inc”);
tex(‘begin{equation}
Re{z} =frac{6npi dfrac{theta +psi}{2}}{
left(dfrac{theta +psi}{2}right)^2 + left( dfrac{1}{2}
log leftlvertdfrac{B}{A}rightrvertright)^2}.
end{equation}’);
[/allowphp]

Lorem ipsum

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

[allowphp]
require_once(“/usr/local/apache2/htdocs/wordpress_3.5.1/myphptex.inc”);
tex(‘begin{eqnarray}
y&=&x^2 \\
z&=&cos(2pi x)
end{eqnarray}’);
[/allowphp]