FreClu: Efficient Frequency-based De novo Short Read Clustering -- de novo clustering (last-modified : 20th August 2009) This source code is bounded with test data of input, "testInputData_smallRNA.txt" with a tabular format, "Sequence \ Expectation of error number at each base \ Frequency". Note: One has to compile the java files before running the program as below. javac MyTag_reverse.java javac ClusteringOfShortTags_searchReverse_hash_open.java Usage: java ClusteringOfShortTags_searchReverse_hash_open -i filename REQUIRED -i OPTIONAL -o -p -hashSize -stat OUTPUT FORMAT Because to generate all the trees needs too much I/O space, currently we only provide two output files named as "clusterOutput.txt" and "parent-childRelation.txt". "clusterOutput.txt": "Representative sequence \ Frequency of representative sequence \ Frequency of cluster \ Number of sequences in cluster \ Length of longest path". "parent-childRelation.txt": "Read sequence \ Representative sequence". Although in our program, we memorized all the sequences and their topologies in a cluster. EXAMPLE(Please try Java VM options -Xms and -Xmx to tune the heap size) java -Xms20G -Xmx20G ClusteringOfShortTags_searchReverse_hash_open -i testInputData_smallRNA.txt