ec.app.sequence
Class ThreadedSequenceFeatureInterpreter

java.lang.Object
  extended by ec.app.sequence.ThreadedSequenceFeatureInterpreter

public class ThreadedSequenceFeatureInterpreter
extends java.lang.Object

This code is meant to read the Features stored in file (hall of fame output) and generate LibSVM specific format files. It will also output the features separtely for those who like to study the features. The features are output to file SSCleanFeatures.txt. This does parallelization of computing feature matching. The parallelization is controlled by input argument of Threads. It will use chunking i.e total number of sequence % threads-1 will get equal share and the last one will get all the sequences in case of odd/even distribution. Number of threads used should be equal to number of cores/processors on machines for faster throughput. Another thing this class does is it does some simple simplification like if there is (AND true true) etc or (OR (NOT false)) etc will be reduced. In future we can even reduce the features to remove redundancy like matchesAtPosition motif3 AGT @ 45 AND matchesAtPosition motif1 T at 47 but have to think through whether redundancy (bloat) can be good/bad in some cases.

Author:
udaykamath

Field Summary
 boolean cleanOnly
           
static org.biojava.utils.regex.PatternFactory factory
           
 
Constructor Summary
ThreadedSequenceFeatureInterpreter()
           
 
Method Summary
 void close()
           
 void generateLibSVMFile(java.io.File gpFile, int threads)
           
static void main(java.lang.String[] args)
           
 void quickWrite(java.lang.String data)
          All threads use this, so better be synchronized
 void setup(java.lang.String fileName)
          This method reads the sequences from File with labels +1, -1 and tries to put it in right buckets.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

cleanOnly

public boolean cleanOnly

factory

public static org.biojava.utils.regex.PatternFactory factory
Constructor Detail

ThreadedSequenceFeatureInterpreter

public ThreadedSequenceFeatureInterpreter()
Method Detail

quickWrite

public void quickWrite(java.lang.String data)
                throws java.lang.Exception
All threads use this, so better be synchronized

Parameters:
data -
Throws:
java.lang.Exception

close

public void close()
           throws java.lang.Exception
Throws:
java.lang.Exception

generateLibSVMFile

public void generateLibSVMFile(java.io.File gpFile,
                               int threads)
                        throws java.lang.Exception
Throws:
java.lang.Exception

setup

public void setup(java.lang.String fileName)
This method reads the sequences from File with labels +1, -1 and tries to put it in right buckets. It also initializes factor for IUPAC parsing etc

Parameters:
fileName -

main

public static void main(java.lang.String[] args)