Petra's Outreach Program for Women Blog: January 2013

Thursday, January 31, 2013

The past week has been frustrating. Since the last release of Errai the updates and changes have made it so Errai won't run. We can't figure out where the bugs are that prevent Errai from running and we have been trying all week to figure it out. With something as complex as Errai, it is difficult to find the bugs right away. We had to give up on our plans of performance testing of Errai for now.. The complexity of a lot of projects is what has kept me out of FOSS up until now. Women from Systers and devchix kept telling me I needed to build my portfolio in FOSS to help me get a job. A couple of women sent me links from github of projects they thought I would be interested in based on my background. It's difficult to dive into complex projects when you are a beginner and haven't been there from the beginning of a project. My mentor has been on the Errai project since the beginning and he can't figure out all the bugs preventing Errai from running yet. I established an account on github back then but had no idea what I was doing. I was able to learn the basics of github but it wasn't until this internship that I learned any more details of github with the help of my mentor Jonathan. Hopefully, the details of our internships on our blogs will encourage other beginners to give it a try.

Friday, January 25, 2013

Most software developers at some point find themselves Googling for help. One of my pet peeves is when developers use the same tired example to educate. If other people are using the example you probably don't need to repeat it. Come up with your own example! We are searching because we didn't find the tired example too helpful. My other pet peeve is when developers only give a few lines of code without giving a complete example demonstrating the concept in question. I ran into these problems when trying to look up ExecutorService which is finally why I had to get help from my mentor. That is why I made sure our example made it to my blog. I imagine if I kept searching through all the links I could have found something helpful but that takes time. Part of the reason for the FOSS movement is to help other developers. When you solve a particularly hard problem be sure to post it so others can learn.

Wednesday, January 23, 2013

If we make the data big enough, we would expect a big increase in the amount of time taken as we add more threads, unless you have an SSD (solid-state disk). The more files you try to write simultaneously on a regular mechanical disk, the slower the total throughput will be. This is because the disk is forced to spend more time seeking to a new track, and less time writing to the track it's already on.

So here's what I did next: I added a new variable into the mix. I added a second @Varying parameter (axis=series) for the amount of data that should be written. I tried ranging from 10MB to 100MB in increments of 10MB to what happens hoping maybe a clear trend will emerge.

import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import org.jboss.perfrunner.Axis;
import org.jboss.perfrunner.PerfRunner;
import org.jboss.perfrunner.Varying;
import org.junit.Test;
import org.junit.runner.RunWith;
@RunWith(PerfRunner.class)
public class SampleTest {
   @Test
   public void numTest (
   @Varying(name = "MB", axis=Axis.X, from = 1, to = 10000000,step = 250000)int numMax)throws Exception{
       String content = "0";
       FileWriter fstream = new FileWriter("out.txt");
       BufferedWriter out = new BufferedWriter(fstream);
       for(int num=0; num < numMax; num++){
       out.write(content);
       }
       out.close();
   }
   @Test
   public void perThread (
   @Varying(axis=Axis.SERIES, name = "Megabyte",from = 10000000, to = 100000000, step= 10000000) final int megabyte,
   @Varying(name = "thread", axis = Axis.X, from = 1, to = 10, step = 1) final int threadCount) throws InterruptedException {
   // First define the task that needs to be done.

   Runnable task = new Runnable() {
   @Override
   public void run() {
try{
      // create temporary file with extension suffix
       File file1 = null;
   file1 = File.createTempFile("PerThreadTest", ".javatemp");
   BufferedWriter out = new BufferedWriter(new FileWriter(file1));
   for (int i = 1; i < megabyte; i++) {
       out.write('a');
       }
        out.close() ;
       } catch (IOException e) {
           // TODO Auto-generated catch block
           e.printStackTrace();
       }
   }
   };
   // Now define the executor service that will execute the above task
   ExecutorService exec = Executors.newFixedThreadPool(threadCount);
   // Submit the task 10 times
   for (int i = 0; i < 10; i++) {
   exec.submit(task);
   }
   // Finally, wait for all the submitted tasks to complete (1 hour should be way more than enough!)
   exec.awaitTermination(1, TimeUnit.SECONDS);
   // Free the threads that were created by Executors.newFixedThreadPool(threadCount) above
   exec.shutdown();
   }
}

I ran the test 4 more times at this weight and different weights. The results were random for outliers for different weights at a different number of threads. We think what we're seeing in the data is the expected trend: the more different files we write simultaneously, the longer it takes to write the same amount of data. There's an occasional dominating effect where one of the files takes a really long time to write (like 10x longer). We think what we're seeing is the operating system deciding (based on whatever rules it uses) to flush out the disk cache so one writer gets held back while the OS processes data from all sorts of places, including my other writers as well as other unrelated processes but the underlying signal is there, for sure.

As promised here is the code for 10 MB varied over the number of threads. I had trouble coming up with it and needed some help. The examples for concurrency in the oracle tutorial were for sockets which didn't have much relationship to what I was doing.Many other tutorials used the same examples. Mine was varied over 1 to 10 threads by a step of 1 because it is difficult to vary by 1, 2, 5, and 10 without the other steps in between. The first test in the code is just the first one done previously.

import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import org.jboss.perfrunner.Axis;
import org.jboss.perfrunner.PerfRunner;
import org.jboss.perfrunner.Varying;
import org.junit.Test;
import org.junit.runner.RunWith;
@RunWith(PerfRunner.class)
public class SampleTest {
     @Test
    public void numTest (
    @Varying(name = "MB", axis=Axis.X, from = 1, to = 10000000,step = 250000)int numMax)throws Exception{
         String content = "0";
        FileWriter fstream = new FileWriter("out.txt");
        BufferedWriter out = new BufferedWriter(fstream);
        for(int num=0; num < numMax; num++){
         out.write(content);
        }
        out.close();
    }
    @Test
    public void perThread
@Varying(name = "thread", axis = Axis.X, from = 1, to = 10, step = 1) final int threadCount) throws InterruptedException {
      // First define the task that needs to be done.
    // This task writes a million 'a' characters to a file:
    Runnable task = new Runnable() {
    @Override
    public void run() {
try{
          // create temporary file with extension suffix
        File file1 = null;
    file1 = File.createTempFile("PerThreadTest", ".javatemp");
    BufferedWriter out = new BufferedWriter(new FileWriter(file1));
    int megabyte = 1000000;
    for (int i = 1; i < megabyte; i++) {
        out.write('a');
           }
        out.close() ;
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
       }
    };
       // Now define the executor service that will execute the above task
     ExecutorService exec = Executors.newFixedThreadPool(threadCount);
     // Submit the task 10 times (total data written will be 10MB)
     for (int i = 0; i < 10; i++) {
     exec.submit(task);
     }
    // Finally, wait for all the submitted tasks to complete (1 hour should be way more than enough!)
    exec.awaitTermination(1, TimeUnit.SECONDS);
    // Free the threads that were created by Executors.newFixedThreadPool(threadCount) above
    exec.shutdown();
    }
}

Friday, January 18, 2013

It turns out there wasn't really a threshold where suddenly the signal broke out above the noise. My first test was fine grain low numbers. When I used 1 MB, 10 MB, 100 MB, and 1000 MB the results with a little noise were more or less linear.

My next task is to make a second test method and try writing a fixed amount of data (say, 10MB) with a varying number of concurrent threads. In the first trial, 1 thread writes 10MB; second trial, 2 threads write 5MB each, … 10 threads write 1MB (to separate files). My second method can call the first one so there is no need to duplicate that logic! I will need to come up with a strategy for giving the file a unique name each time I call the method.. see File.createTempFile() http://www.java-examples.com/create-temporary-file. Last summer in my embedded systems research class I had been introduced to multithreading programming but it was in C++ and C#.
Jonathan recommended The Java Tutorial http://docs.oracle.com/javase/tutorial/essential/concurrency/index.html. Since I had the C++ and C# multithreading background Jonathan thought I could skip to the "immutable objects" section and recommended I read that part and the following (High Level Concurrency) carefully. He thinks I will probably want to choose ExecutorService for my test. but to take the time to work through the whole guide. I found it to be more complicated than Win32 and OpenMP used in C++ and C#. Jonathan recommended not to dwell too much on synchronized, Thread.*, or Object.wait()/Object.notify(). He claimed the newer APIs (ExecutorService, Callable<T>, etc.) are more straightforward.

I read the link on File.createTempFile() and it seems to be understandable. It was the concurrency tutorial I had more trouble with. Oracle tutorials can be hard to follow because they assume too much and you have to jump around links to understand the material. Besides that multithreading programming is considered one of the more difficult things to do in computer science. I haven't used multithreading in over a half of year so I need a refresher course. I remember the basic concepts it's just the details I don't remember.Since I was having trouble I started reading other people's tutorials and blogs on multithreading and ExecutorService. Some of it is beginning to make sense but I haven't been unconfused enough to ask Jonathan intelligent questions. I have to wait until the fog in my head while reading clears before I can ask questions. That sometimes takes some time but I do get there. At 3 PM I was having a bad case of the Imposter Syndrome. I called my mother, the retired university professor, to talk about it. She can relate to it but says she was too stubborn to give up. Back in the 50s and 60s my mom was the only woman in her Russian and German classes especially in graduate school. I have lost a lot of confidence since having a PhD adviser calling me stupid. It hurt my test taking and studying long after I left his lab. I have a friend, Dr. John Reed, who is a retired UTA councilor and has been trying to help me with study techniques. He recommends study 50 minutes and then take a 10 minute break. Do that 3 times and then go off somewhere for two hours doing something else. Then come back and do the 50 minutes/10 minutes thing again. He told me not to study more than 8 hours a day. He claims the brain needs time to process the information. He knows I stare at the book and computer screen too long in desperation and gives me hell about it. I told him I feel guilty if I am not studying. John's methods don't jive too well with most employers who expect people to work a straight 8 hour day or more.. Jonathan hasn't complained to me and has been very supportive.

When I figure this all out I will be sure to post it so I can educate other poor souls.

Wednesday, January 16, 2013

By Tuesday afternoon I was still doing performance testing wrong. I had been looking at queueSize in PooledExecutorService in org.jboss.errai.bus.server.async.scheduling. I thought I would vary that to determine how many users can use the ErraiBus. This is the code I was trying to test.

package org.company.firestorm.client.local;

import org.jboss.perfrunner.Axis;
import org.jboss.perfrunner.PerfRunner;
import org.jboss.perfrunner.Varying;
import org.junit.Test;
import org.junit.runner.RunWith;

@RunWith(PerfRunner.class)

public class PooledExecutorServiceTest {
    @Test
    public void testPooledExecutorService (
          @Varying(axis=Axis.X, name="Queue Size", from=0, to=100, step=10) int queueSize
          ) throws InterruptedException {
           Thread.sleep(queueSize);
       }
}
This was totally wrong. For one thing I was trying to white box test the internals of Errai instead of black box testing. Jonathan suggested I try performance test writing 0s into a file for 1 MB, 10 MB, 100 MB, and 1000 MB. It was pass 5 PM my time so I thought I would stop for the day and get something to eat. I fell asleep relatively early so I found myself up at 2 AM. I decided to do some work. I have gotten into the habit of now and then of working in the middle of the night this way. When I was in school it was common for the professors to be up at night and come in late to teach the next day. I found myself more than a few times exchanging emails with the grad adviser at 2 AM. Now that I am out of school I can work in the middle of the night without being interrupted by nurses, home health aids, and phone calls. The problem with this is Jonathan is on the irc 8 AM - 5:30 PM my time when I can ask him questions. I find myself some times having a hard time staying awake during this time because I was up early. I write emails to Jonathan of what I had been doing and answers he them when he gets in the next morning. I do chat with Jonathan on the irc during the day.

This morning I seemed to be doing something right. This is the code I was first testing early this morning:

package org.company.firestorm.client.local;
import java.io.*;
import org.jboss.perfrunner.*;
import org.junit.Test;
import org.junit.runner.RunWith;

@RunWith(PerfRunner.class)
public class sampleTest {
    @Test
    public void numTest (
    @Varying(name = "MB", axis=Axis.X, from = 1, to = 1000, step = 10) int numMax) throws Exception{
        String content = "0";
        FileWriter fstream = new FileWriter("out.txt");
        BufferedWriter out = new BufferedWriter(fstream);
        for(int num=0; num < numMax; num++ ){

        out.write(content);
        }
        out.close();

   }
}

<perfrunner-org.company.firestorm.client.local.sampleTest.html><out.txt>

This was the result I got. I didn't get the linear up to the right line that I expected. The text file "out" did have all the 0s expected.

Let's draw some parallels to what we plan to eventually do with Errai performance testing:

1. I am black-box testing BufferedWriter. I didn't have to examine the inner workings of BufferedWriter in order to create this test. I just used its public API.

2. The results surprised me! This is the hallmark of a useful performance test, because it means you're about to learn something new. :-)

3. The independent variable (numMax) in this test was not used to tune the BufferedWriter I was testing. Instead, it was used to decide how much exercise to give to the BufferedWriter.

I then used 1 MB, 10 MB, 100 MB, and 1000 MB for the "to = " part of the code and changed the step so the trials were tested 4 times (i.e. 0.25 MB, 2.5 MB...). This time I got more linear code that went up to the right. The "out.txt" had millions more 0s. It appears when I was measuring smaller numbers with smaller steps I was just measuring noise. The signal had not gotten over the S/N threshold. Now I must devise a performance test that would help me find the breaking point where the cost of writing to the file is detectable. I think I am going to eat and take a nap first.

Monday, January 14, 2013

It has been a frustrating first week and I wonder if I am in over my head. I chose a coding project because I want to be a software developer. I picked JBoss because it was written in java. I know java better than python or ruby like other organizations were using. I knew the projects wouldn't be easy. I wanted to be using a language I could get up to speed the fastest. I chose Errai because it was second on the list and I knew what a plugin was. The first on the list I had no idea what it was about. Other projects worked on testing which I knew nothing about. It turns out my first task with Errai is performance and scalability testing. It is good that I am learning about testing. The few job interviews I have been on they asked me how I would debug my code without a debugger. Being naive all I could say was put print statements in. UT Arlington had a testing class in the software engineering program but I was in the computer science program just learning algorithms, math, and statistics for software. I mentioned in my masters exit interview perhaps they should have computer science majors take a testing class.

My mentor, Jonathan Fuerth, was on vacation until January 7 so at first I was on my own. Jonathan had written some software, junit-4-perfrunner, so the results for junit performance testing would be graphed for easier interpretation. At first I thought I had a vague understanding of the way of using it but I was totally wrong. I thought I was to set up test cases method by method. The problem with that it makes the test cases and code hard to maintain since there are so many methods. Jonathan has tried to explain junit testing to me. I think I understand how to do it for accuracy but performance testing eludes me. I understand at a higher level but I don't know how to write the test cases. Jonathan assigned me to understand junit testing, junit-4-perfrunner, and Errai Bus which communicates between the client and the server before attempting to write the test harness. He suggested I play around with the code. I looked at the source code for Errai Bus and there close to a 100 files I would guess. With each file playing a different role I thought we were suppose to unit test file by file. That turned out to be wrong for the same reason method by method is wrong. We want to simulate real world scenarios, so we should focus on choosing one of those, then figure out which parts of the framework we need to build a performance testing harness for. We want to be able to answer questions like:
    "how many concurrent users can Errai support on a single server?"
    "what if my network goes out for 2 minutes, so everybody [10, 100, or 1000 users] gets
     reconnected at the same time?"
    "how does the performance of errai 3.0 differ from errai 2.2?"
    and so on.
We want to anticipate performance questions people are likely to ask us, and come up with the answers now. Jonathan says writing the test cases is the fun/hard part. I am waiting to catch a clue so the fun can begin.

Thursday, January 3, 2013

January 2 was suppose to be the first day of the internship. After a long arduous journey across my bedroom to my computer I discovered after waiting on the IRC and sending am email that my mentor was still on vacation. He'll be on vacation until January 7 so I am on my own until Monday. It was a long walk but I didn't break a sweat. I'm being facetious. I can walk well enough that it isn't any strain to walk around the apartment. I just have problems going long distances. I asked my mentor which version of the JUnit test runner to use on baseline scalability tests on Errai which is suppose to be my first task. He answered my email to give me an answer and to tell me he was on vacation. He had referred to a "new test runner" in the Errai bug tracker list which turned out be something he wrote: junit-4-perfrunner. The strain of the day drained me (once again being facetious) that I decided to take a nap before getting started. I can be a real slave driver when it comes to taking breaks.

Tuesday, January 1, 2013

I may have given the impression in the previous post I am only in computer science because of jobs and money. That is not the case. If I just wanted a job I could have stayed in teaching without a lot of additional training. I would only have to gone through summer alternative teacher training. Getting a MS in computer science took a little bit of investment in time and money. If I didn't enjoy computer science I never would have gone into it. What also led me into computer science was I enjoyed the problem solving when teaching math.