skip to content

Department of Plant Sciences


Currently, PhaseR is a set of Python scripts and can be obtained from

PhaseR can identify phased small RNA loci using high throughput genomic sequencing datasets. Its input is the alignment of a small RNA dataset to a reference sequence. The alignment can be done using one of many freely available alignment software programs e.g. PatMaN, bowtie. The alignment file can be passed to the PhaseR algorithm to identify potentially phased small RNA loci.

The PhaseR algorithm uses an extension of the probability calculation proposed by Chen at al. (2007) to distinguish likely occurrences of phasing from random events. In Chen’s original algorithm each position along an sRNA locus was treated as a binary variable. In other words it could only have two states, either it was occupied by the 5’end of an sRNA or it was not. The new algorithm innovates by using the counts of the number of sRNAs with a 5’ end in each position when calculating the probability for the locus. The algorithm also considers every possible sRNA in the dataset to be the start of the sRNA locus for which the end is unknown. So every possible length for that locus is tested as long as it is not bigger than a certain number of nucleotides for which there are no matching sRNAs.

A matrix of probabilities is then built with one dimension corresponding to all possible phased segments ("locus") and the other to all possible registers. Each position in the matrix is filled with the minimum probability from the hypergeometric tests performed for the different abundance thresholds. The last step consists of determining the segment, register, abundance and probability for the element in the matrix with the lowest probability. These probabilities are calculated using the hypergeometric distribution as proposed by Chen at al. (2007).

Phaser is implemented in the Python programming language and it is released under the GPL v3, See for more details. Please contact Bruno Santos on for further support.