GEO serves as a public repository for a wide range of high-throughput experimental data. These data include single and dual channel microarray-based experiments measuring mRNA, miRNA, genomic DNA (including arrayCGH, ChIP-chip, and SNP), and protein abundance, as well as non-array techniques such as serial analysis of gene expression (SAGE), and various types of next-generation sequence data.
Platform
A Platform record defines the list of elements that may be detected and quantified in that experiment (e.g., cDNAs, oligonucleotide probesets). Each Platform record is assigned a unique and stable GEO accession number (GPLxxx). A Platform may reference many Samples that have been submitted by multiple submitters.
Example Platform record
2. Query and Analysis
GEO data can be retrieved and analyzed in several ways:
* To look at a particular GEO record for which you have the accession number, use the GEO accession box on the GEO homepage. Also, the Accession Display bar found at the top of each GEO record has several options for selecting the format and amount of data to view (see the Data Download section below).
* The simplest first step to find data relevant to your interests is to search Entrez GEO DataSets or Entrez GEO Profiles with keywords:
Entrez GEO DataSets queries all experiment descriptions, allowing identification of studies of interest
Entrez GEO Profiles queries gene expression profiles, allowing identification of genes of interest.
As with any other Entrez database, keywords or a simple Boolean phrase may be entered and restricted to any number of supported attribute fields, enabling effective query and mining of GEO data. Tools available under the ‘Preview/Index’ tab can help you construct complex, fielded queries.
Once you have identified a DataSet of interest, there are several features on the DataSet record that help visualize or identify interesting gene expression profiles within that experiment:
* Query subset A vs B tool - finds genes differentially expressed between experimental subgroups, more...
* Clusters - visualize cluster heat map images and select regions of interest for further study, more...
* Value distribution - a box and whiskers plot displaying the distribution of expression values of each Sample within a DataSet
* ‘Find gene in this DataSet’ box
Once you have identified gene expression profiles of interest, there are several tools on the Profile records that help identify additional genes of interest:
* Profile neighbors - retrieves other genes with similar expression patterns in that DataSet
* Chromosome neighbors - retrieves chromosomally closest 20 genes
* Links - to related NCBI databases including Gene, UniGene, OMIM and PubMed
3. Data Download
GEO data can be viewed and downloaded in several ways:
GEO records
* FTP download
All GEO records and raw data files are freely available for bulk download from our FTP site. Data are structured and formatted in a variety of ways, see our README for details.
* Links at the foot of Series records
Links to experiment family downloads in various formats and supplementary files are provided at the foot of each GEO Series record.
* Accession Display Bar
The Accession Display bar is found at the top of each GEO record and can be used to download or view complete or partial records, or related Platform, Sample and Series records. The Scope feature allows display of a single accession number (Self) or any (Platform, Sample, or Series) or all (Family) records related to that accession. Amount dictates the quantity of data displayed, with choices including metadata only (Brief), metadata and the first 20 rows of the data table (Quick), data table only (Data), or full metadata/data table records (Full). Format controls whether records are displayed in HTML, SOFT (plain text) or MINiML (XML) format.
* Construct a URL
An alternative to using the Accession Display Bar described above is to construct a URL to retrieve data. URLs are formatted as follows:
Example: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=gpl96&targ=self&view=brief&form=text
- this URL will retrieve a text file containing the 'brief' view of accession GPL96.
The possible values for each component are:
acc = a valid GEO accession i.e., gplxxx, gsmxxx or gsexxx
targ = self, gsm, gpl, gse or all
view = brief, quick, data or full
form = text, html or xml
Note that your browser may time-out when html format is selected for particularly large retrievals.
Alternatively, if you have perl, you can use this mechanism to retrieve data as follows:
perl -MLWP::Simple -e "getprint 'http://www.ncbi.nlm.nih.gov/projects/geo/query/acc.cgi?acc=GSM313800&targ=self&view=full&form=text'"
* Programmatic access
GEO records metadata can be programmatically accessed and retrieved using a suite of programs called the Entrez Programming Utilities (E-Utils), see more information...
* Entrez GEO DataSets and Entrez GEO Profiles query downloads
It is possible to export Entrez GEO DataSets and Entrez GEO Profiles document summaries by setting the tool bar at the head of the page to 'Send to: File'.
DataSet records and Profiles
* FTP download
All GEO DataSet records are freely available for bulk download from our FTP site.
* Links on DataSet records
Links to DataSet SOFT files are available under the 'download' button on each DataSet record.
* Programmatic access
GEO DataSets metadata can be programmatically accessed and retrieved using a suite of programs called the Entrez Programming Utilities (E-Utils), see more information...
* Profile values downloads
Use the 'Download profile data' button at the head of Entrez GEO Profiles retrievals to download the expression values of genes found in your query.
* Entrez GEO DataSets and Entrez GEO Profiles query downloads
It is possible to export Entrez GEO DataSets and Entrez GEO Profiles document summaries by setting the tool bar at the head of the page to 'Send to: File'.
4. Deposit and Update
There are several ways in which data may be submitted to GEO. Please refer to the Submitting data guidelines for a complete overview of the options available.
After we receive your final Series submission, we will begin processing your records. Once your records pass review, you will receive an e-mail confirming your GEO accession numbers and their release dates. Processing normally takes approximately 5 business days after completion of Series submission.
Each record you submit will receive a unique and stable GEO accession number that you may quote in manuscripts. Do not quote GEO accession numbers in manuscripts until you have received an approval notice e-mail from GEO staff. Records may remain private for several months until the data are published.
After your records have been approved, you can create a reviewer access link to your private submissions using the 'Click here to create a reviewer access link' near the top of your Series (GSExxx) record. The link that is generated can be sent to the journal editor who will circulate it to reviewers requiring access to your private data.
Submitters may perform updates and edits at any time to any of their submissions. Please refer to the Updating your GEO records page for full instructions.
Sample
A Sample record describes the conditions under which an individual Sample was handled, the manipulations it underwent, and the abundance measurement of each element derived from it. Each Sample record is assigned a unique and stable GEO accession number (GSMxxx). A Sample entity must reference only one Platform and may be included in multiple Series.
Example Sample record
Series
A Series record links together a group of related Samples and provides a focal point and description of the whole study. Series records may also contain tables describing extracted data, summary conclusions, or analyses. Each Series record is assigned a unique and stable GEO accession number (GSExxx).
Example Series record
DataSet DataSet records are assembled by GEO curators.
As explained above, A GEO Series record is an original submitter-supplied record that summarizes an experiment. These data are reassembled by GEO staff into GEO Dataset records (GDSxxx). A DataSet represents a curated collection of biologically and statistically comparable GEO Samples and forms the basis of GEO's suite of data display and analysis tools. Samples within a DataSet refer to the same Platform, that is, they share a common set of array elements. Value measurements for each Sample within a DataSet are assumed to be calculated in an equivalent manner, that is, considerations such as background processing and normalization are consistent across the DataSet. Information reflecting experimental factors is provided through DataSet subsets. Both Series and DataSets are searchable using the Entrez GEO DataSets interface, but only DataSets form the basis of GEO's advanced data display and analysis tools including gene expression profile charts and DataSet clusters. Not all submitted data are suitable for DataSet assembly and we are experiencing a backlog in DataSet creation, so not all Series have corresponding DataSet record(s).
Example DataSet record.
Profile Profiles are derived from DataSets.
A Profile consists of the expression measurements for an individual gene across all Samples in a DataSet. Profiles can be searched using Entrez GEO Profiles.
Example Profile records.
No comments:
Post a Comment