Species and Genome assembly:
SMARTIV supports input sequences, extracted from binding experiments performed on the following species and genome assemblies:
Input file format:
SMARTIV gets a list of genomic
coordinates in BED format
) or a list of sequences in FASTA format
). In case both
formats are available, we recommend providing the .BED.
The BED file should have the following columns values
- chromosome name in the 1st column,
- starting and ending position in the 2nd and 3rd columns,
- and strand in the 6th column.
The list of sequences should be sorted by sequence binding score in descending order
(i.e. sequences on the top must have higher binding signal/noice ratio than on the bottom).
By default, input sequences should be sorted by user, but additional option "enable SMARTIV sorting for
is provided. Using this option is recommended for specific commonly used
view example) and
format (view example
If automatic sorting optionis used, the input file should contain sequence binding
score values ("score" in 5-th column for BED 6-column format, or "signalValue" and "pValue" for
ENCODE narrowPeak format).
The FASTA format includes
- File in BED 6-column format will be sorted by 5th column in descending
- file in ENCODE narrowPeak format will be sorted by 8th (primary) and 7th column (secondary)
in descending order.
- A header line that starts with '>'
- Followed by a line containing a sequence.
Unless the header contains the coordinates of the sequence, SMARTIV will use
find the coordinates of the sequence in the assembly.
The coordinates are extracted only from FASTA headers that contain:
- Chromosome (e.g. chr12 or chrM)
- Start and end position of the sequence, separated by dash (e.g. 30096589-30096634)
- Strand (+ or -)
- and optionally: the binding score (e.g. 12.56)
These fields are separated either by tabs, colons ':', or spaces, ' '.
Any other FASTA header is ignored.
BLAT may fail to find the coordinates of a sequence, even though the sequence is derived from the assembly.
When this happens, the sequence is ignored. It is possible that due to this, the number of valid sequences
will drop below 2000 and SMARTIV will report an error.
Clicking on the 'Load example' button loads an example of an input list in BED format.
The calculation parameters are set to default but can be changed by the user. By clicking on the 'Submit' button, the job will be submitted and the results will be presented automatically on the server.
The provided sample data
is PAR-CLIP binding data obtained for the human PUM2 protein1
The dataset was extracted from the doRiNA2
1. M. Hafner, M. Landthaler, L. Burger, M. Khorshid, J. Hausser, P. Berninger, A. Rothballer, M. Ascano, Jr., A.C. Jungkamp, M. Munschauer, A. Ulrich, G.S. Wardle, S. Dewell, M. Zavolan, T. Tuschl, Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP, Cell 141(1) (2010) 129-41.
2. K. Blin, C. Dieterich, R. Wurmus, N. Rajewsky, M. Landthaler, A. Akalin, DoRiNA 2.0--upgrading the doRiNA database of RNA interactions in post-transcriptional regulation, Nucleic Acids Res 43(Database issue) (2015) D160-7.
K-mer length range:
SMARTIV uses a k
-mer-based algorithm to search for enriched motifs (Note: the length of the k
-mers does not define the final motif length).
By default, SMARTIV provides a pre-defined range 5-7
. The maximal length range allowed is 4 to 10 nucleotides.
To select a specific length, insert the same value to both 'min.' and 'max.' boxes.
SMARTIV is able to extract two types of motifs: a combined sequence and structure motif (in 8-letter alphabet) and a sequence-based motif (in 4-letter alphabet).
SMARTIV provides an option to display only one of the motif types or both. By default, a combined sequence and structure motifs (in 8-letter alphabet) will be extracted.
SMARTIV Supports several alternative folding methods from which to choose. By default,
implemented by the RNAfold
tool from the
package, is selected.
An optional parameter that enables you to give your job an informative name, otherwise, the job will get a unique number identifier.
The "E-mail address" is an optional field, required in order to get a link to the results page. If you don't get an E-mail from SMARTIV within a reasonable time, check your spam folder, it might accidentally get there.
SMARTIV represents the best motif for each requested k
For each motif (PWM) SMARTIV provides both a graphical presentation using the WebLogo software and the matrix itself as a text file.
In addition, SMARTIV represents k-mers that were used to build the PWM (view an example of the result page
WebLogo graphical representation:
The PWM motif is represented as a logo, using an adjusted version of the WebLogo
Logo can be downloaded in JPG or PDF.
-value presented above the logo reflects the correspondence between the derived PWM and the original scores of the sequences (derived from the CLIP experiment).
It is estimated using the mmHG statistics, which evaluates the association between two ranked lists, assigning an FDR corrected p
-value to each PWM (Steinfeld et al., 2013).
By clicking on 'View the list of k-mers composing the motif
', SMARTIV displays a table,
including the significant exact strings of length k
-mers) that were used to build the PWM and the related statistical information.
The table is also provided for download as a text file.
The exact motif string color-coded by the logo color scheme.
The value presented is the mHG score, corrected for multiple testing, which is a tight bound for the P-value (P-value ≤ corrected mHG score).
The total number of input sequences.
The total number of sequences containing the motif.
The index, in which the division of the input list into target and background by the mHG statistics, gives the optimal enrichment of the motif at the top of the list.
The number of sequences containing the motif among the n top sequences.
Measures to what extent the motif is found at the top of the list comparing to the total list. Defined as: (b/n) / (B/N).
For more information about the mHG statistics, please refer to: Eden et al. (2007)