Software Engineering Experimentation
SWE 763 Project
Fall 2008
Sixth GMU Workshop on Experimental Software Engineering
Program Chair:
Jeff Offutt
Technical Program Committee:
SWE 763 students
The GMU University Workshop on Experimental Software Engineering
provides a forum for discussing
current experimental studies in the field of software engineering.
Papers are solicited for the studies listed
in this CFP,
as well as for other studies.
Accepted papers will not be published in any conference proceedings.
Submitted papers must not have been published previously,
but they may be submitted elsewhere in the future.
All submitted papers will be accepted.
Full-Length Papers:
Papers should be submitted 1.5 or double-spaced
in a font size no smaller than 11 points,
fully justified.
Papers must not exceed 25 double-spaced pages
including references and figures,
and will not be refereed by external reviewers.
All papers should indicate what is interesting about the presented work.
The first page should include an abstract of maximum 150 words,
a list of keywords,
and the complete address (including phone and e-mail address)
of the author.
The citations and references should be formatted in
standard software engineering format,
that is,
with bracketed citations ("[1]")
and citation keys that are either numeric or
strings based on the authors' names ("[Basi91]").
Presentations:
You will be allowed 25 minutes
for your presentation,
including 5 minutes for questions.
Submission Procedure:
Three hard copies
of a first draft of each paper
must be submitted before
4 November
to
Program Chair J. Offutt.
Each paper will receive at least three reviews,
one from the program chair and two from technical program committee members.
Reviews will be returned on
11 November,
and the final paper must be submitted electronically by
25 November.
Final papers must be submitted in PDF format (not MS Word or Latex!).
The final paper must be single spaced and in 10 point font.
| Milestones | Date |
| Topic selection: | 9 September |
| Experimental design review: | 23 September |
| Draft paper submitted: | 4 November |
| Reviews due: | 11 November |
| Final paper submitted: | 25 November |
| Presentations: | 25 November – 9 December |
Don't mind criticism --
If it is untrue, disregard it,
If it is unfair, don't let it irritate you,
If it is ignorant, smile,
If it is justified, learn from it.
- Anonymous
SUGGESTED TOPICS LIST
Following is a list of suggested topics for your empirical study.
You may choose any topic you wish,
either from this list or something of your own creation.
I specifically encourage you to consider carrying out an experiment
related to your current research.
You will notice that most of these studies do not involve much if any programming
but some will involve a lot of program execution.
Also,
these studies can be done more easily with clever use of shell scripts.
There can be a fair amount of overlap between these studies,
and you may want to share programs, test data sets,
or other artifacts.
Trading of this kind of experimental artifacts is greatly encouraged.
Some of these studies could use a partner to carry out some of the work,
so as to avoid bias from having one person conduct the entire experiment.
I encourage you to help each other;
please communicate among yourselves if you need help ...
ask and offer.
These descriptions are concise overviews ... I will be available
to discuss each project individually during office hours and
through email.
Empirical Studies Suggestions
- Java mutation experiments:
One resource we have available is a mutation testing system for Java,
muJava.
Instructions for downloading, installing, and running muJava are
available on the website.
There are several small experiments you could use muJava to run.
- Test criterion comparison.
For a collection of programs, develop tests that kill all mutants,
and develop tests that satisfy another criterion
(data flow, CACC, edge-pair, input parameter modeling, etc.).
Compare them on the basis of number of tests
and on their fault finding abilities.
- Mutation operator evaluation.
One key to mutation testing is how good the operators are.
Most of the class-level mutation operators are fairly new,
and it is possible that some are redundant
and others have very little ability to detect faults.
It would be helpful to have an experiment to evaluate the operators,
based on their abilities to find faults,
redundancy, or frequency of equivalence.
- Mutation as a fault seeding tool.
One use of mutation is to create faults for other purposes,
for example,
to compare other testing techniques.
- Web Modeling and Testing Evaluation:
I have a paper under review that proposes a method for modeling the presentation layer
of web applications.
This model can be used to generate tests, among other things.
If you have access to a reasonably sized web application,
it would be very interesting to apply this test method
to evaluate its effectiveness.
A draft paper is available.
- Software Engineering Factoids:
We have a lot of truisms about software engineering.
These are small facts, or "factoids" that "everybody knows" is true,
yet the source for these factoids are lost in the mists of time.
Some are based on data from the 1970s,
some are based on 30 year old casual observations,
and some were probably made up by speakers who wished for a fact
to support some point.
By now, "everybody" accepts these factoids as truth,
yet they may no longer be true
or may have never been true!
A few example factoids are:
- 80% of bugs are in 20% of the code.
- 60% of maintenance is perfective,
20% is adaptive,
and 20% is additive.
- 10% of programmers are 10 times more productive than the other 90%.
- Software is 2/3 maintenance, and 1/3 development.
- 90% of software is never used.
- The number of parameters to subroutines is always small.
- Object-oriented software is less efficient.
I am sure that you can think of more.
The goal of this project would be to verify one or more of the factoids.
This would require three steps:
(1) find the old sources for the factoid,
who originated it, what the fact was based on, and where it was used;
(2) verify whether the factoid is true for current systems; and
(3) quantify the correct version of the factoid as best as you can
from current data.
- Metrics Comparison:
Researchers have suggested a large number of ways to measure the
complexity and/or quality of software.
These software metrics are difficult to evaluate,
particularly on an analytical basis.
A interesting project would be to take two or more metrics,
measure a number of software systems,
and compare the measurements in an objective way.
The difficult part of this study would be the evaluation method:
How can we compare different software metrics?
To come up with a sensible answer to this question,
start with a deeper question:
What do we want from our metrics?
- Frequency of Multi-Clause Predicates in Open-Source Software:
Logic-based test criteria such as ACC are superior to
simpler criteria such as predicate coverage
only when predicates have more than one clause.
And ACC is only superior to all combination coverage
when predicates have several clauses; possibly four or five.
So an important question about the practicality and usefulness
of ACC
is how often predicates in real programs
have several clauses.
A study that counted clauses in real programs,
for example, open-source programs,
would help us determine how useful some of these techniques are.
The number of clauses per predicate should also be compared with overall
measures of size, such as lines of code, number of classes, etc.
- Frequency of Infeasible Subpaths in Testing:
Many structural testing criteria exhibit what is called the
feasible path problem,
which says that some of the test requirements are
infeasible in the sense that the semantics of the program imply
that no test case satisfies the test requirements.
Equivalent mutants,
unreachable statements in path testing techniques,
and infeasible DU-pairs in data flow testing
are all instances of the feasible path problem.
For example, in branch testing,
one branch might be executed if (X = 0)
and another if (X != 0);
if the test requirements need both branches to be taken during
the same execution,
the requirement is infeasible.
This study would determine,
for a sample of programs,
how many subpaths that are required to be executed by some test criterion
are infeasible.
A reference on the subject of the feasible path problem can be
found on my web site:
Automatically Detecting Equivalent Mutants and Infeasible Paths.
- Experiments in Coupling:
Last year my student, Aynur Abdurazik, completed her dissertation on
Coupling-Based Analysis of Object-Oriented Software.
In Chapter 10 she suggested several interesting areas for future research,
some of them experimental.
In particulate, sections
10.2.1, Application of Coupling Model to Web Applications,
10.2.2, Coupling-based Fault Analysis,
10.2.3, Comprehensive Empirical Validation of Three Specific Problems,
10.2.6, Coupling-based Reverse Engineering,
and
10.2.7, Coupling-based Component Ranking
suggest some potentially useful experimental directions.
- Inversion of Control/Dependency Injection:
The Inversion of Control and Dependency Injection design patterns are
becoming a common part of frameworks. These patterns provide a means
to develop loosely coupled applications. Both approaches encourage
interface-based development and the use of POJO (plain old java
objects). It would be interesting to investigate the effects of
these patterns on unit and integration testing.
- Declarative Programming:
Many Web frameworks attempt to alleviate the burden of tedious
programming tasks by allowing developers to specify navigation
and page composition logic declaratively in configuration files.
It would be interesting to investigate the effects of this type
of declarative programming on software testing.
- Use of #include Sentinals in C
In George Lakos' book Lark Scale C++ Software Design, he suggested
using double #include sentinals to reduce pre-processor overhead.
That is, there'd be the expected #ifndef clause at the beginning of
header files to prevent them from being included more than once;
however using his "double sentinal" technique there'd be similar
#ifndef clauses for files that, in turn, included that header file.
Thus the pre-processor wouldn't expend the overhead of opening and
scanning potentially hundreds of header files. This overhead could
theoretically be quite large with a large number of interdependent
source files.
However I have done some crude performance tests and have found no
significant speed-up in compile times between original code and code
that uses Lakos' technique. There could be a number of reasons
including the source wasn't complex enough, improvements in caching,
compiler improvements, etc. Still, as far as I know, no one has done
an empirical study to support or refute his claims.
Due to to Mark Coletti
© Jeff Offutt, 2008, all rights reserved.
This document is made available for use by GMU graduate students of SWE 763.
Copying, distribution or other use of this document
without express permission of the author is forbidden.
You may create links to pages in this web site,
but may not copy all or part of the text without permission
of the author.