R2-Tutorials:

Tutorial 1.3 Single dataset in step by step wizard

This document is intended to get you started with the microarray analysis platform R2

The R2 Portal

To reach the portal of R2, open your favorite browser and type this url (in the near future through the humangenetics website www.humangenetics-amc.nl):

Type http://hgserver1.amc.nl/cgi-bin/r2/main.cgi

The main screen of R2 shows up: Figure 1

Fig1: R2 in firefox browser

It is divided in three areas:On the left side there is a menu bar providing direct access to different parts of the software. To the right a news bar shows the available updates for software (in green), datasets (blue) and annotations (red). In the middle a step by step guide is shown that provides access to most analysis through a simple walktrough wizard. Whenever a lightblue “i” icon appears in R2, extra information can be obtained after one click. In this case on the selected dataset. Hovering over the grey “i “’ provides a popbox with additional information or help. In this tutorial the publicly available dataset "Mixed Colon - Marra - 64 - MAS5.0 - u133p2 will be used to explore the possibilities of R2.

Choose this set

R2 can analyse gene expression for a single dataset or across more datasets. This choice can be made in step 3 of the dialog box (see fig 2):

Fig2: The step by step interface

In the dialog box step 3 (Fig 2), use the pulldown menu to select type of analysis.

In this case we choose the single gene analysis.

Type your gene of interest and hit next.

For our illustrations we took AXIN2, see figure 3. In many cases more then one probeset is reported for each gene in this case 4 probesets. The different reporters are in descending order by their average presence signal. The signal of a reporter is flagged present when according to the mas5 algorithm the reporter probeset is expressed. Automatically the probeset with the highest average signal is selected, occasionally other probesets assigned to the same gene could be of interest depending on the structure of the gene.

Fig3: Selecting appropriate probesets for a gene

Adjust the settings as in the figure

Hit next

A graph will be generated from the AXIX2 gene expression (Figure 4). Each dot represents the expression of the probeset in a specific tumour, ordered by their expression from left to right. Hovering over the dots will give more information on the corresponding tumours. Under the X-axis, colored boxes are depicted, representing clinical information of the samples in so called "tracks". Again, hovering over them will reveal underlying data. For AXIN2 there is a clear relation between the expression levels and “tissue” track. So these tracks underneath the image give a quick glance at some of the clinical parameters, defined for the dataset. It is also possible to define your own custom made tracks (tutorial to be developed).

Fig4: AXIN2 in Y-plot

At this point you might want to try some other settings to investigate their workings and play around with some other genes. Based on the annotations you can select subsets of genes (see tutorial to be developed). For data transformation we recommend the 2log for statistical reasons (see manual). Hovering over different parts of the screen will again reveal additional information.Probeset verificationScrolling down on the same page where the AXIN2 expression graph was generated there is a probeset verification table. In order to investigate the relevance of this verification, choose another gene in the main window.

Hit the shortcut link top left of your window: and type MBNL2.in the search box

Hit next.

Note the warning icon in the probeset selection box. Keep the selection as it is and hit next

Scroll down to the probeset verification table.

This lists for the various reporters of MBNL2 whether they are in agreement with the genome position of MBNL2 reference sequence. If all are stating “YES” then everything appears alright. For the MBNL2 reporters in the middle there are multiple “NO” indications indicating there is something wrong with it.

To analyse the reporter furthermore, clicking on the “TVIEW” opens the TranscriptView application.

This program depicts the alignment of EST and mRNA sequences to the genome sequence (Fig5). It has also aligned the accession numbers used to generate the reporters on the array, this view can be viewed to inspect the quality of a reporter. Note that the reporter “1553536_at” is aligned to the genomic region of the MBNL2 reference sequence, but that it’s color is red. This indicates an alignment to the reverse complement of the genome. In some case the reporter can be positioned in the intronic (light green color ) region which also can be a reason not o pick a certain probest. NB: Currently probeset verification is only provided for various human affymetrix array types.

Fig.5 TranscriptView

Two gene analysis

Close Transcriptview and go back to the main screen

Choose Correlate two genes

In case two genes are provided (fig: 6) the correlation of the expression values of the two genes in this data set is shown.

Fig6: Correlate two genes

Choose AXIN2 and SORD Hit next

A so called Y-Y plot (Figure 7) is generated; this shows the correlation values for the expression correlation and shows this as two overlaid graphs; left axis gives expression values for the first gene, the right for the second gene (in blue). Colors etc can be adapted.

Fig7: Y-Y plot of 2 genes

At this point you might want to provide genes of choice to this analysis

View a gene in groups:

In R2 ‘groups’ can be created from the annotation tracks. These tracks are specific for each dataset. Personalised tracks can be generated in the menu on the left of the main screen with “my settings”. In the Marra dataset the tumours are annotated with two clinical data tracks: their position in the colon (location) and the tissue type, adenoma or normal (tissue).

In the main screen choose “View gene in groups”

Type AXIN2 (has a clear result, but you can take your gene of choice of course)

Click on the "Select a track" dropdown(see Figure 8)

Fig8: Selecting tracks for groupview

For dataset several combinations of the annotation are provided also.

Choose the tissue track

In Figure 9 the resulting graph is shown; it is clear that there is a relation.

Fig9: AXIN2 in groups

Another commonly used analysis option within R2 is to search for genes which show a correlated expression (showing a similar expression pattern as AXIN2 across the samples. At the gene selector page the prefered reporter is selected and also optional adjustments can we made to the search parameters. Normally the default settings are sufficient to perform an analysis. In this example one option is adapted a gene category , “oncogenesis” the speed up the search. This will result in the scanning for known oncogenesis genes (some 449 genes out of the 20.000 in total).

Execute this analysis

The result from the analysis will be a page containing 2 columns with hyperlinked genes, and to the right some statistics of the search. These tables report the genes which show a significant correlation with the expression of AXIN2. The left table indicating positively correlating genes, while the right table indicate negatively correlating genes. The significance for the correlations are depicted in the pval column, while the number of times the gene has been flagged with a present call is indicated in the column “pres”. In red, some gene ontology statements are added to the genes.

Fig10: Correlating genes with gene of choice

CDK4 is listed as one the genes showing a high correlation combined with a hight significance. After clicking the CDK4 gene in the left table a so called 2GeneView page is generated.

Perform this analysis

In the generated picture the AXIN2 expression is represented with red dots. However, the expressions of the second gene (in this case CDK4), has also been added to the picture in blue squares(Figure 11).

Fig11: Correlation of AXIN2 and CDK4

This kind of data-representtation is called a YY-plot (since both genes are are represented on the Y-axis). If 2 genes correlate with each other, then the patterns should follow each other, as is the case here. This may be a very interesting finding. We may want to know if a relation between these 2 genes has already been reported in the literature. To this end we can click on the pubsniffer link. This will query the pubmed database for the mentioning of such a relation within the titles or abstracts of published work.

Click the Pubsniffer link

We can see that such a relation has not been published before(Figure 12).

Fig12: Pubsniffer results

It is also possible to see take a look at the gene expression at manipulated cell-line experiments that have already been performed. At this moment there are not many publically available timeserie datasets available. However R2 offers the option to access you timeserie experiments after consulting the R2-contact support.

Tutorial 4.15 Find genes correlating with a single gene

A starting point for a lot of analyses is a single dataset and a gene of interest. In this tutorial the expression of the gene MYCN, a known oncogene in Neuroblastoma, is explored for correlating genes in a Neuroblastom tumor series. This same procedure is described as Search 1 in a paper by Jan Koster ea (submitted 2009)

Go to the R2 main page
Choose Single dataset in step 1 of the analysis wizard
Make sure the Tumor Neuroblastoma-Versteeg-88-MAS5.0-u133p2 series is selected in step 2
In step 3: select type of analysis, choose the Find correlated genes with a single gene
Type MYCN in the gene field (4)
Hit next(Figure1)

Fig1: The R2 step by step wizard with filled out fields

Next on screen is a selection field enabling adjustment of the tresholds for the statistical analysis (Fig2).

Fig2: Settings for correlation analysis

R2 has automatically chosen the best probeset for this gene to perform statistics with. For this analysis the default values can be kept.

Hit next (Fig3)

Fig3: Correlating genes with MYCN in Neuroblastoma series

The results in figure 3 are ordered sets of positively correlating genes in the left table, and negatively correlating genes in the right table. Hovering over a gene name gives additional information about the gene, as shown for MYCN in the figure. To the right a graph shows the distribution of the probesets, in this case this has a proper bell-shape. Below that graph a short table provides a quick summary of overrepresented Gene Ontology categories. Clicking on the hyperlinked genenames will open a new browser window showing the expression values of both genes in this dataseries in a Y-Y plot.

Lookup drugtarget PTN in the list
Click the hyperlinked genename(Fig 4)

Fig4: MYCN and PTN correlation in 2GeneView

The Y-Y plot shows the correlation values for the expression as two overlaid graphs; left axis gives expression values for the first gene, the right for the second gene (in blue). They"re ordered by increasing expression for the gene of interest. Colors etc can be adapted (see ).
At this point you might want to return to the treshold selection screen (Fig2) and adapt some settings: see for example Fig 5 where a search for the genes on chromosome that are known drugtargets and are involved in the MAPK pathway is initiated.

Fig5: Example of adapting selection settings.

Try different settings and use your gene of interest
This is the end of this tutorial

Tutorial 4.16 Identify genes differentially expressed between groups

In this tutorial differentially expressed genes between groups in a dataseries are discovered. To this end a series Neuroblastoma tumors was analysed with respect to MYCN amplifications. This same procedure is described as Search 2 in a paper by Jan Koster ea (submitted 2009)

Go to the R2 main page
Choose Single dataset in step 1 of the analysis wizard
Make sure the Tumor Neuroblastoma-Versteeg-88-MAS5.0-u133p2 series is selected in step 2
In step 3: select type of analysis, choose the "Find differentially expressed genes between groups" option(Fig 1)
Hit next

Fig1: The R2 step by step wizard with filled out fields

In the next screen a choice can be made which annotation "track" will be used to partion the set in groups. The Neuroblastoma series has extensive annotation; in this case the presence of a MYCN amplification is chosen (Fig 2).

Fig2: Choosing an annotation track to partition dataset

As usual the rest of the adjustable fields can be left as they are: R2 provides you with a configuration appropriate for most analyses.

Choose the mycn_amp track for partitioning in groups
Hit next (Fig 3)

Fig3: Dialog showing the groups that might partion this dataset

As obvious there are only two groups to partion this dataset; other tracks like agegroups permit more choices.

Again keep this choice
Hit next (Fig3)

Fig4: Differentially expressed genes for groups mycn amplified vs mycn single copy

Tutorial 5.5.4 2DGeneView accross Datasets

This tutorial will show you how to find datasets were two genes of interest have a high correlations.

Start with the default wizard portal (Fig 1) but choose "Accross datasets"

Figure1: The default wizard; choose accross datasets

Select 2DGeneView in step 3 (Fig 2)

Fig2: Choose 2DGeneView

Type AXIN2 and SORD as genes
Hit next
Leave the probeset choices as given; make sure 2log is chosen as Transformation
Hit next

A figure will be drawn (Figure 3). The main graph shows
Next to it a list of datasets is given with the correlations between the two probesets in descending order; from minus to plus

Fig3: Result of the 2DGeneView analysis: correlation of AXIN2 and SORD accross datasets

At the low end of the list you"ll find highest positively correlating datasets; fig4.

Fig4: Datasets with a positive correlation between the two genes of interest.

Views of the correlations can be obtained by clicking on the links; either in the table or in the graph.

Hit the links to the Exp-Lung (top) and the Marra (bottom of the list) links

This produces Figures 5 and 6; you find a high positive correlation between AXIN2 and SORD in the Marra set

Fig5: Correlation in Exp-Lung dataset

Fig6: AXIN2 and SORD in Colon-Marra set

To check the statistical merits of these data you can produce some additional graphs at the bottom of the 2DGeneView panel (Fig7).

Fig7: Additional graphs for statistical checks

These produce a sample graph showing the distribution of all samples (Fig8)

Set both to yes and hit Rebuild

Fig8: Distribution of samples for U133p2 chip

And also the histograms for distribution of the probeset values; eg for u133p2; Fig 9

Fig9: Distribution of probeset signal for chip U133P2

Of course you can check your own genes of interest now!

Play around with your favorite genes...

Tutorial 6.2 Identifying downstream genes using time-series data

This tutorial aims to find genes regulated downstream of a manipulated gene in an in-vitro cell-line system. This same procedure is described as combined Searches 3-5 in a paper by Jan Koster ea (submitted 2009). The timeseries data analysed here are from an inducible MYCN transgene in the neuroblastoma cell-line SK-N-AS.

First a selection will be made from the series of consistently regulated genes in this series (Search 3 in the article).

Goto the main screen of R2
From the menu to the left select "Timeseries" (Fig1)

Fig1: Clicking

This provides access to the Timeseries Wizard: Fig2.

In step1 select the neuroblastoma(hg) collection.
A selection of genes is needed so select "Create a list of genes"
Hit next

Fig2: Selecting from the Timeseries wizard

The resulting treeview (Fig3) provides you with an overview of all cell-lines that were manipulated in this dataseries. The genes manipulated show when you click on a cell-line.

Fig3: Select an experiment from the Timeseries tree

Click on SKNAS in order to expand this branch in the tree
Select mycn-wt as manipulated gene
Select the bottom three experiments; a triplo timeseries
Hit next (Fig4)

Fig5: Select settings for Timeseries data

The next panel allows for setting parameter tresholds to constrain your selection. Again the defaults are suitable for most analyses, however, three parameters require consideration. Firstly we want the genes to be expressed in all three experiments, secondly we take a logfold change of 0.6; this guarantees for genes having a response on the gene manipulated, and thirdly we want the single best probeset to represent a gene.

In the panel, select 3 for the minimal number of experiments
Type 0.6 in the best logfold field
Set the "Force single reporter" dropdown to yes
Hit next (Fig 5)

Fig5: The upper part of the result list panel

The result is a list of 1280 genes that are regulated in this set of experiments. R2 allows you to sort these lists to find most regulated genes. Scrolling down the list to the lower end shows additional analysis options(see Fig6).

Fig6: The lower end of the result table

The little table at the end of the list shows the number of genes that are regulated in the same way; in this case it shows that 439 genes are consistenly upregulated, and 703 consistently down regulated. The hyperlinks in blue allow for more analyses: the Gene Ontology analysis shows the GO categories that are overrepresented in this dataset (Fig7).

Fig7: Gene Ontology Analysis results panel

The Pathwayfinder will find KEGG-pathways showing an overrepresentation of genes (Fig8).

Fig8: KEGG Pathways having an overrepresentation of genes from this selection

R2 documentation: Tutorials

Table of contents

Tutorial 1.3 Single dataset in step by step wizard

Tutorial 4.15 Find genes correlating with a single gene

Tutorial 4.16 Identify genes differentially expressed between groups

Tutorial 5.5.4 2DGeneView accross Datasets

Tutorial 6.2 Identifying downstream genes using time-series data