R2-Tutorials:

R2 documentation: Tutorials


Table of contents


  • Tutorial 6.2 Identifying downstream genes using time-series data
  • Tutorial 1.3 Single dataset in step by step wizard

    This document is intended to get you started with the microarray analysis platform R2

    The R2 Portal

    To reach the portal of R2, open your favorite browser and type this url (in the near future through the humangenetics website www.humangenetics-amc.nl):


    The main screen of R2 shows up: Figure 1
    r2_browser_view

    Fig1: R2 in firefox browser


    It is divided in three areas:On the left side there is a menu bar providing direct access to different parts of the software. To the right a news bar shows the available updates for software (in green), datasets (blue) and annotations (red). In the middle a step by step guide is shown that provides access to most analysis through a simple walktrough wizard. Whenever a lightblue “i” icon appears in R2,  extra information can be obtained after one click. In this case on the selected dataset.  Hovering over the grey  “i “’ provides a popbox with additional information or help. In this tutorial the publicly available dataset "Mixed Colon - Marra - 64 - MAS5.0 - u133p2 will be used to explore the possibilities of R2.

    R2 can analyse gene expression for a single dataset or across more datasets. This choice can be made in step 3 of the dialog box (see fig 2):
    r2_stepbystep_single_gene

    Fig2: The step by step interface


    In the dialog box step 3 (Fig 2), use the pulldown menu to select type of analysis. 

    For our illustrations we took AXIN2, see figure 3. In many cases more then one probeset is reported for each gene in this case 4 probesets. The different reporters are in descending order by their average presence signal. The signal of a reporter is flagged present when according to the mas5 algorithm the reporter probeset is expressed. Automatically the probeset with the highest average signal is selected, occasionally other probesets assigned to the same gene could be of interest depending on the structure of the gene.
    geneselector_axin

    Fig3: Selecting appropriate probesets for a gene



    A graph will be generated from the AXIX2 gene expression (Figure 4). Each dot represents the expression of the probeset in a specific tumour, ordered by their  expression from left to right. Hovering over the dots will give more information on the corresponding tumours. Under the X-axis, colored boxes are depicted, representing clinical information of the samples in so called "tracks". Again, hovering over them will reveal underlying data. For AXIN2 there is a clear relation between the expression levels  and “tissue” track.  So these tracks underneath the image give a quick glance at some of the clinical parameters, defined for the dataset. It is also possible to define your own custom made tracks (tutorial to be developed).
    axin2_in_one_geneview

    Fig4: AXIN2 in Y-plot

    At this point you might want to try some other settings to investigate their workings and play around with some other genes. Based on the annotations you can select subsets of genes (see tutorial to be developed). For data transformation we recommend the 2log for statistical reasons (see manual). Hovering over different parts of the screen will again reveal additional information.Probeset verificationScrolling down on the same page where the AXIN2 expression graph was generated there is a probeset verification table. In order to investigate the relevance of this verification, choose another gene in the main window.

    This lists for the various reporters of MBNL2 whether they are in agreement with the genome position of MBNL2 reference sequence. If all are stating “YES” then everything appears alright. For the MBNL2 reporters in the middle there are multiple “NO” indications indicating there is something wrong with it.

    This program depicts the alignment of EST and mRNA sequences to the genome sequence (Fig5). It has also aligned the accession numbers used to generate the reporters on the array, this view can be viewed to inspect the quality of a reporter. Note that the reporter “1553536_at” is aligned to the genomic region of the MBNL2 reference sequence, but that it’s color is red. This indicates an alignment to the reverse complement of the genome. In some case the reporter can be positioned in the intronic (light green color ) region which also can be a reason not o pick a certain probest. NB: Currently probeset verification is only provided for various human affymetrix array types.

    Fig.5 TranscriptView

     
    Two gene analysis

    In case two genes are provided (fig: 6) the correlation of the expression values of the two genes in this data set is shown.
    correlate_choose_axin_sord

    Fig6: Correlate two genes



    A so called Y-Y plot (Figure 7) is generated; this shows the correlation values for the expression correlation and shows this as two overlaid graphs; left axis gives expression values for the first gene, the right for the second gene (in blue). Colors etc can be adapted.
    2geneview_axin_sord

    Fig7: Y-Y plot of 2 genes



    View a gene in groups:
     
    In R2 ‘groups’ can be created from the annotation tracks. These tracks are specific for each dataset. Personalised tracks can be generated in the menu on the left of the main screen with “my settings”. In the Marra dataset the tumours are annotated with two clinical data tracks: their position in the colon (location) and the tissue type, adenoma or normal (tissue).

    track_selection_mara_set

    Fig8: Selecting tracks for groupview

     
    For dataset several combinations of the annotation are provided also.

    In Figure 9 the resulting graph is shown; it is clear that there is a relation.
    view_axin_in_groups

    Fig9: AXIN2 in groups

    Another commonly used analysis option within R2 is to search for genes which show a correlated expression (showing a similar expression pattern as AXIN2 across the samples. At the gene selector page the prefered reporter is selected and also optional adjustments can we made to the search parameters. Normally the default settings are sufficient to perform an analysis. In this example one option is adapted a gene category , “oncogenesis” the speed up the search. This will result in the scanning for known oncogenesis genes (some 449 genes out of the 20.000 in total).

    The result from the analysis will be a page containing 2 columns with hyperlinked genes, and to the right some statistics of the search. These tables report the genes which show a significant correlation with the expression of AXIN2. The left table indicating positively correlating genes, while the right table indicate negatively correlating genes. The significance for the correlations are depicted in the pval column, while the number of times the gene has been flagged with a present call is indicated in the column “pres”. In red, some gene ontology statements are added to the genes.
    correlation_axin_sord_result

    Fig10: Correlating genes with gene of choice

    CDK4 is listed as one the genes showing a high correlation combined with a hight significance. After clicking the CDK4 gene in the left table a so called 2GeneView page is generated.

    In the generated picture the AXIN2 expression is represented with red dots. However, the expressions of the second gene (in this case CDK4), has also been added to the picture in blue squares(Figure 11).
    2geneview_axin_cdk4

    Fig11: Correlation of AXIN2 and CDK4

    This kind of data-representtation is called a YY-plot (since both genes are are represented on the Y-axis). If 2 genes correlate with each other, then the patterns should follow each other, as is the case here. This may be a very interesting finding. We may want to know if a relation between these 2 genes has already been reported in the literature. To this end we can click on the pubsniffer link. This will query the pubmed database for the mentioning of such a relation within the titles or abstracts of published work.

    We can see that such a relation has not been published before(Figure 12).
    pubsniffer_axin_cdk4

    Fig12: Pubsniffer results


    It is also possible to see take a look at the gene expression at manipulated cell-line experiments that have already been performed. At this moment there are not many publically available timeserie datasets available. However R2 offers the option to access you timeserie experiments after consulting the R2-contact support.

    Tutorial 4.15 Find genes correlating with a single gene

    A starting point for a lot of analyses is a single dataset and a gene of interest. In this tutorial the expression of the gene MYCN, a known oncogene in Neuroblastoma, is explored for correlating genes in a Neuroblastom tumor series. This same procedure is described as Search 1 in a paper by Jan Koster ea (submitted 2009)

    Fig1-MainWizardFilloutFields

    Fig1: The R2 step by step wizard with filled out fields

     

    Next on screen is a selection field enabling adjustment of the tresholds for the statistical analysis (Fig2).

    Fig2-SettingsForCorrelation

    Fig2: Settings for correlation analysis

    R2 has automatically chosen the best probeset for this gene to perform statistics with. For this analysis the default values can be kept.

    Fig3-ResultsForAnalysisMYCNInclHover

    Fig3: Correlating genes with MYCN in Neuroblastoma series

     

    The results in figure 3 are ordered sets of positively correlating genes in the left table, and negatively correlating genes in the right table. Hovering over a gene name gives additional information about the gene, as shown for MYCN in the figure. To the right a graph shows the distribution of the probesets, in this case this has a proper bell-shape. Below that graph a short table provides a quick summary of overrepresented Gene Ontology categories. Clicking on the hyperlinked genenames will open a new browser window showing the expression values of both genes in this dataseries in a Y-Y plot.


    Fig4-MYCN-PTN-2GeneView

    Fig4: MYCN and PTN correlation in 2GeneView


    The Y-Y plot shows the correlation values for the expression as two overlaid graphs; left axis gives expression values for the first gene, the right for the second gene (in blue). They"re ordered by increasing expression for the gene of interest. Colors etc can be adapted (see ).
    At this point you might want to return to the treshold selection screen (Fig2) and adapt some settings: see for example Fig 5 where a search for the genes on chromosome that are known drugtargets and are involved in the MAPK pathway is initiated.
    Fig5-Adapting-Settings

    Fig5: Example of adapting selection settings.



    Tutorial 4.16 Identify genes differentially expressed between groups

    In this tutorial differentially expressed genes between groups in a dataseries are discovered. To this end a series Neuroblastoma tumors was analysed with respect to MYCN amplifications. This same procedure is described as Search 2 in a paper by Jan Koster ea (submitted 2009)

    Search2Fig1-MainWizardFilloutForm

    Fig1: The R2 step by step wizard with filled out fields



    In the next screen a choice can be made which annotation "track" will be used to partion the set in groups. The Neuroblastoma series has extensive annotation; in this case the presence of a MYCN amplification is chosen (Fig 2).
    Search2Fig2-Choosing-Track-For-Groups

    Fig2: Choosing an annotation track to partition dataset


    As usual the rest of the adjustable fields can be left as they are: R2 provides you with a configuration appropriate for most analyses.
    Search2Fig3-Possible-Groups-Dialog

    Fig3: Dialog showing the groups that might partion this dataset


    As obvious there are only two groups to partion this dataset; other tracks like agegroups permit more choices.
    Search2Fig4-Diff-Expr-Genes-For-Groups

    Fig4: Differentially expressed genes for groups mycn amplified vs mycn single copy

    Tutorial 5.5.4 2DGeneView accross Datasets


    ChooseAccrossDatasets

    Figure1: The default wizard; choose accross datasets


    Choose2DGeneView

    Fig2: Choose 2DGeneView

    A figure will be drawn (Figure 3). The main graph shows
    Next to it a list of datasets is given with the correlations between the two probesets in descending order; from minus to plus
    2DGeneViewAxin2Sord

    Fig3: Result of the 2DGeneView analysis: correlation of AXIN2 and SORD accross datasets



    At the low end of the list you"ll find highest positively correlating datasets; fig4.
    PositiveCorrelating2DGeneView

    Fig4: Datasets with a positive correlation between the two genes of interest.


    Views of the correlations can be obtained by clicking on the links; either in the table or in the graph.
    This produces Figures 5 and 6; you find a high positive correlation between AXIN2 and SORD in the Marra set
    Axin2SordInExpLung

    Fig5: Correlation in Exp-Lung dataset


    Axin2SordInColonMara

    Fig6: AXIN2 and SORD in Colon-Marra set


    To check the statistical merits of these data you can produce some additional graphs at the bottom of the 2DGeneView panel (Fig7).
    AdditionalGraphsChoice

    Fig7: Additional graphs for statistical checks




    These produce a sample graph showing the distribution of all samples (Fig8)

    SampleGraphAxin2Sord

    Fig8: Distribution of samples for U133p2 chip



    And also the histograms for distribution of the probeset values; eg for u133p2; Fig 9
    HistoGram133p2

    Fig9: Distribution of probeset signal for chip U133P2



    Of course you can check your own genes of interest now!


    Tutorial 6.2 Identifying downstream genes using time-series data

    This tutorial aims to find genes regulated downstream of a manipulated gene in an in-vitro cell-line system. This same procedure is described as combined Searches 3-5 in a paper by Jan Koster ea (submitted 2009). The timeseries data analysed here are from an inducible MYCN transgene in the neuroblastoma cell-line SK-N-AS.

    First a selection will be made from the series of consistently regulated genes in this series (Search 3 in the article).

    ChooseFromMenu

    Fig1: Clicking



    This provides access to the Timeseries Wizard: Fig2.

    SelectFromWizard

    Fig2: Selecting from the Timeseries wizard



    The resulting treeview (Fig3) provides you with an overview of all cell-lines that were manipulated in this dataseries. The genes manipulated show when you click on a cell-line.
    SelectFromExpTree

    Fig3: Select an experiment from the Timeseries tree



    AdjustSettingsForSelection

    Fig5: Select settings for Timeseries data


    The next panel allows for setting parameter tresholds to constrain your selection. Again the defaults are suitable for most analyses, however, three parameters require consideration. Firstly we want the genes to be expressed in all three experiments, secondly we take a logfold change of 0.6; this guarantees for genes having a response on the gene manipulated, and thirdly we want the single best probeset to represent a gene.
    Fig5-ResultsListUpperPanel

    Fig5: The upper part of the result list panel


    The result is a list of 1280 genes that are regulated in this set of experiments. R2 allows you to sort these lists to find most regulated genes. Scrolling down the list to the lower end shows additional analysis options(see Fig6).
    Fig6-ResultsListLowerPanel

    Fig6: The lower end of the result table


    The little table at the end of the list shows the number of genes that are regulated in the same way; in this case it shows that 439 genes are consistenly upregulated, and 703 consistently down regulated. The hyperlinks in blue allow for more analyses: the Gene Ontology analysis shows the GO categories that are overrepresented in this dataset (Fig7).
    Fig7-GO-Analysis

    Fig7: Gene Ontology Analysis results panel




    The Pathwayfinder will find KEGG-pathways showing an overrepresentation of genes (Fig8).
    Fig8-PathwayFinder

    Fig8: KEGG Pathways having an overrepresentation of genes from this selection