Hello everybody and welcome to today's NCBI webinar.
We will talk about the Genome Data Viewer today.
This slide has preliminary information you may need.
Wayne Matten is here with me and he will answer questions during the webinar when they come
up.
All the questions and answers will be available after the webinar in a document that will
be linked to our webinars and courses page.
It will also be available in our FTP directory for this particular webinar, that's the materials
directory there is a compressed URL that will take you to the directory.
Let's get started.
I will show you up few slides and then we will do a live demo for the rest of the time.
The Genome Data Viewer is the topic today and mostly today I will say GDV when I mean
the Genome Data Viewer because it's faster . My name is Peter Cooper and if you have
questions you can write to me at the address at the bottom of the slide.
peter.cooper@nih.gov We will talk about what the Genome Data Viewer
is and we will talk about accessing it.
We will do a brief overview of the interface and functions and mostly we will demonstrate
it live for you today.
The Genome Data Viewer is now NCBI's main genome browser and the assembly browser for
eukaryotic organizations and this replaces the map viewer which was the original genome
browser.
You will recognize the interface which is similar to other specialized browsers we've
had available like the 1000 genomes browser and the variation viewer.
Embedded in the center of all of those browsers and familiar I hope to you is the sequence
viewer panel which is present in other places like Gene and Nucleotide graphics view.
It has functions for navigating across the assembly, uploading mapped data and accessing
analysis tools.
Right now in the Genome Data Viewer we have over 500 eukaryotic RefSeq genomes available.
How do you access it?
There is a homepage that you can easily get to and the quickest way to find it is search
in Google and it has by default you search human genome and you can use this tree function
to browse various organisms that you can load into the Genome Data Viewer.
For example I can expand the node there to see zebrafish and other bony fishes and it's
a convenient way to navigate and browse by organism.
Today we will focus on a particular assembly, actually more than one for a particular organism
and that is the human genome.
You can search with the gene names in the box, SNP IDs, RefSeq accessions, or chromosome
positions.
You could also pick assembly listed here, the current reference assembly, GRCh38. or
the previous assembly, GRCh37.
You can also search alternate assemblies, the HuReF assembly and also the hydatid molar
cell line assembly.
There is a browser function where you can enter the browser itself with chromosome 1
and you can BLAST a specific database that contains a particular assembly you are interested
in and this ideogram view allows you to click and browse a particular chromosome that you
would like.
Another way to get to GDV and these have been around for a while in various places at NCBI.
So from the Gene I can link to the Genome Data Viewer and in Assembly I can link to
and browse in the Genome Data Viewer.
And today as a matter of fact we now have links from BLAST to the Genome Data Viewer
which went live this morning.
Here is the main browser interface and it should look familiar to you.
Sort of in the middle of it is the graphical sequence viewer and on the left-hand side
at the top are various search and navigation features.
If you run a search you can get results in the ideogram view and it shows you hits on
various chromosomes.
You might find hits on more than one place as it is a text search.
You can select the one you want from the panel underneath it.
There are widgets for uploading the data, adding tracks, loading BLAST results and we
will use those in a few moments when we do the demonstration.
At the top there are navigation functions and there is a gene and Exon navigator which
you can take advantage of.
At the very top there is an ideogram with the chromosome you are on and viewing at this
moment is chromosome 1 and you can use that as a way to navigate and display various parts
of the chromosome.
Just a little detail on additional functions and we will use these today and hopefully
we get a chance to use them all but might not get to all of them.
I can upload my own data and these are the various kinds of files types listed here on
the slide, BED, GFF, GTF, VCF, HGVS.
I can also use an URL or simply paste the data in the text box.
I can run BLAST from within the browser and view the alignments there, and there is a
little tool, the alignment inspector lets me navigate across my BLAST matches using
either the query coordinate system or the geneome coordinate system.
I can look at assembly region details.
There are places where we have alternate assemblies for the human genome because we can't represent
the diversity of the human genome in one particular assembly so we have ways of browsing those.
Finally there are some NCBI data sets that are mapped to the assembly and you can load
those to the add tracks function.
You can add all the tracks that are available in the sequence viewer.
Let's do some live examples.
I will focus on two chromosomes and a couple of different things on one and an area on
another.
First, we will look at the areas surrounding the Adenosine Deaminase gene on chromosome
20 a we will add recommended tracksets that NCBI has and configure the tracks that are
there.
We will upload custom data and show you how to print and share views from that and from
that particular gene we will run Primer-BLAST to get primers to amplify an Exon, then we
will BLAST some sequences from different species and load those into the browser to look at
conservation.
And on chromosome 2 if we get a chance we will take a look at region 116 which has alternate
assemblies and we will look at the region surrounding the Lactase gene . In the Lactase
gene we will look at gene expression tracks as well as ones I made myself from the tracks
available in the sequence viewer that have to do with Exon coverage.
And then we change assemblies and move from 38 back to 37 and we can upload an SRA data
set.
By the way these are all based loosely on a set of demos that Wayne did at ASHG a couple
of weeks ago.
If you want to get that particular handout, which is a nice PDF, there is a link at the
bottom of the slide that will take you there.
https://go.usa.gov/xn2R4
I will escape out of the web browser.
Here is the Genome Data Viewer interface.
You can pick a particular organism and we selected human for you.
If I want to select a different one I can do that here.
I can click to expand the nodes as I showed on the slide a few minutes ago to look for
a particular organism in here.
If I want to search the human genome here is the way to do it.
I can browse the genome or BLAST against this assembly and we can pick the assembly here
as I dscused.
I can search for lots of different markers on the genome.
I will search for Adenosine Deanminase so let's run that search.
Here we have the region surrounding Adenosine Deanminase with the default tracks available
here.
Adenosine Deanminase is a gene associated with severe human phenotypes if there are
mutations, one is severe combined immunodeficiency.
Let's say we are interested in some kind of clinical features of this particular gene.
I can load the clinical features track by going up here to the tracks menu and I can
go to the NCBI recommended tracksets here.
I will load the clinical tracks.
We now have the gene map on top for Adenosine Deanminase and I loaded a number of other
tracks.
This is an interesting one we will focus on which is the RefSeqGene alignment to the assembly
and it's interesting because these are particular reagents or records designed specifically
to give you a stable coordinate system with canonical sequences aligned to them and even
has the Exons spelled out on this transcript.
So we will use that as a way of mapping out where we are on the Exons of this gene.
Down here are all the variants from dbSNP.
Under that are variations from the ClinVar, our clinical database that talks about the
relationships between variations and human phenotypes.
ANd then there are other types of variation tracks that are large-scale variants, insertions
and duplications and things like that.
There are other tracks that have nothing in this particular area so I will delete these
tracks.
That's one thing we can do to modify the display settings.
Now I have a slightly simplified display.
Notice the track for the RefSeqGene record has an expanded view.
I have a gene bar and the transcript and protein sequence shown here.
Transcript in blue, the protein is in red.
It might be nice to have this gene view up here so the Adenosine Deanminase is rendered
in the same way.
You can do that by going to the tracks menu and configuring the tracks or also configure
the tracks by mousing over the top of the track and using the gear icon or you can use
a shortcut which is this button that looks like the tracks and if you click that you
will get various transcripts and protein sequences shown.
One thing to notice is the transcripts are shown with a gray background and if you go
to the help document for this track you realize that's because there are mismatches compared
to the genome.
If I mouse over one of these it will tell me that.
This transcript has one substitution compared to this genomic sequence.
That's because the transcript sequences here are based on mRNA sequences in GenBank, the
genome may have a different variant in it.
Notice there is a red line on the gene record that shows that it has the mismatch, so it
matches that mRNA sequence.
We can do something interesting and zoom in.
We will zoom in to Exon 2 and I will zoom to sequence.
So we are zoomed in to the region showing that particular change between the RefSeqGene
record and the genome.
This mismatch is at the position of a C/T variant and you see the variant is listed
here.
It is a common variant and you can add a track to show that.
One thing I noticed about this gene that makes it a little bit confusing to look at is, notice
this basically is written backwards because it's on the negative strand of the genome.
I could flip it around if I want to.
Let's go to the tracks menu and add another track.
We go to configure tracks.
Now I have the opportunity to choose from a range of tracks.
Let's go to variation tracks.
Some of these are already shown and are checked.
One that is not shown on the clinical track are the common variants and you see the ones
with global minor allele frequencies of 1% or more so those would be what we would call
polymorphisms.
You can see indeed this variation that is present in this person's genome that was used
to assemble this part of the genome has a variant there.
It is different from and if you look at the minor allele frequency this actually the minor
allele at this position.
It is a synonymous substitution that doesn't change the valine code to a different amino
acid.
There are other variants here and these color codes here and you can look at that by getting
the help document for this particular track.
Purple means they are pathogenic variants and if you go to ClinVar you realize this
particular one is associated with severe combined immunodeficiency.
Let's go ahead and now we will do something a little bit different and let's add some
data ourselves.
We will go to the Your Data widget on the left-hand side.
I am going to click this button to add files.
This is my file.
This is from the HapMap individual NA12878 and it is a list of HGVS renderings of SNPs
in this particular region for this Gene and I got this from the genome in a bottle data
available on the NCBI website.
Those variants are showing up for me now in this track at the bottom and I could have
named the track something more informative than the name of the file.
You see that person which is from the Utah population has that SNP as well.
You can navigate along these if you want to to see where they go, where they are.
Notice some of them are present in dbSNP and this one in this particular position of the
genome is not present in SNP, so if it's not an error, then it's a SNP that's not present
in the database.
I can use my Exon navigator to go back to the Exon 2 we were looking at . Suppose I
like this and interested in saving it and showing it to someone else, how can I do that?
There are two things you can do.
One is go to the tools menu and click on the link that says print friendly PDF and you
can create a PDF file right here.
And I can view that.
That is a nice graphic I can use to make a figure if I want.
The other option you have is to go up here and click on the link that says share this
page and that gives me a URL I can share with people and that's good for about 90 days.
Let's go on to use some analysis tools within the browser and let's move to a different
Exon.
I will close that.
I will go over here to Exon four and I will center this a little better.
and it is a Exon that has some disease gene SNPs in it and you can see the pathogenic
SNPs listed here.
If I want to amplify this, to get primers that will amplify this exon, I can easily
do that.
I go to tools and go to BLAST and primer search and I will primer BLAST the visible range.
It opens up Primer-BLAST and if I want to a reasonable thing to do is adjust these ranges
so my primers are binding outside the Exon.
And I can pick the database, in this case I want to amplify this from genomic DNA and
it is automatically set for me to Homo sapiens.
If I do that, and just to save time I will show you what the results look like.
so we have a set of primers here.
I can load those.
After I run this they will be available to me in the graphical sequencer viewer.
If I go to the tracks menu and go to configure tracks, you notice there is a section called
Primer-BLAST.
I can load those and I have several of them here because I've run this a bunch of times.
Let me zoom out a little bit so you can see those.
There they are on the genome so I have them available and I can see where they would bind
and what sequence they would amplify.
Let me go ahead and get rid of those.
Suppose I want to do some BLAST searching against this genome.
I can go to a different function or widget and that is the BLAST widget.
I could go to tools and run a new BLAST search.
I have the ability here to do that and I have already set this up with parameters that I
want.
I could run the mouse and the cow Adenosine Deanminase against the genome and I won't
do that but I will go ahead and open this so you can see what the result looks like.
This file is available on the FTP site if you want that.
So here are my results and this is what the mouse alignment looks like to the genome and
this is what the cow alignment looks like to the genome.
The cow is a little better aligned than the mouse is.
That blast RID is available over here and I can go ahead and add it or if it was a new
one, or one I had not run in this way, I could show it by adding the RID.
This device is a way of navigating across the different hits.
There is also a separate larger alignment inspector here.
Notice I have two tracks here.
One is the BLAST results themselves and the other is something called the cleaned alignment
and those are a little bit more useful because you see how the hits from the same sequence
are connected to each other.
I can close that one if I don't want to use it.
You can use this to navigate across matches based on the messenger RNA sequence or on
the genomic sequence here.
You see the mouse sequence is the one with a longer accession number and the cow has
a shorter accession number and typically the cow is a little bit better match but not in
all the exons.
Let's go ahead and do one more quick thing on chromosome 20.
I don't want to do that yet.
I want to leave chromosome 20 and go to chromosome 2 and we will do a couple of quick things
to show expression and take a look at the last widget down here on the bottom which
is the assembly region details.
I can jump to chromosome two by going over here and clicking it from the ideogram view
and now I am on chromosome 2 and I can get rid of the alignment inspector.
I am zoomed way out and I don't want to search for adenosine deaminase.
I want to look at a region where there is a mark undernearth the ideogram which tells
you there is alternate assemblies available for that particular region.
There also may be some assembly difficulties in the region and you can look quickly at
this one which is region 116, near the centromere.
It is a region that could be difficult to assemble.
I am there and I need to add a map that's going to give me some information about this
particular area so what I want to do is go to tracks and that track is one of the recommended
tracks which will be Assemblies.
You can see there are number of curation issues in this region.
There are some clones that don't match the way we would like.
These are called discordant clones, these are clone-end sequences.
Here is the map I wanted to focus on and these are alternate assemblies for this region that
represent alternative versions of this part of the genome.
If you scroll down a little bit on this genome you see there is a big gap close by.
I can take one of these and in assembly region details we see marks which represent places
where this particular sequence, this alternative locus for example, this alternative assembly
has relative insertion compared to the genome and I can switch this around and change it
to the one that is master sort of in this view.
You see when it reloads that the genome has to be gapped to support that.
Let's do the last thing we came to do today which is to look at a different place on chromosome
2 which is the region surrounding the lactase gene.
Here is Lactase and I still have the assembly support tracks up.
What I will do here is change that and let's say we are interested in expression so there
is an expression trackset available here.
These are a bunch of Next Generation RNA seq runs aligned to the genome and are used in
process of annotating things so we can make splice variants and things like that.
And also give you a way of looking at expression because these are different tissues or organs.
You can count the number that aligned in various ways.
Let me return this to the expression track.
Here is sort of the aggregate of this is.
The more interesting thing is to look down at these which are the individual libraries
and you see this gene is expressed in duodenum, and that library is labeled small intestine.
We would expect Lactase as a digestive enzyme predominately to be expressed in infants and
in some cases expressed in adult and we will come back to that idea in a moment.
Let me zoom out a little bit and this is a nice example because you see that Lactase
has specific expression in that particular organ whereas the genes on either side are
more ubiquitously expressed.
These are the intron-spanning reads and there are other ways of thinking about this.
You can basically look at Exon coverage and where do these RNA-seq reads align to the
genome and one way to display that is through the tracks here and I made a set of tracks
by going to the configure menu and loading them into the browser and basically saving
them into my NCBI account.
You can definitely save them if you want to.
We have already done that for you.
Here is one that I saved.
Certainly you get similar results and these are no longer spanning the introns but highlighting
for you the Exons.
And we also get expression again in the duodenum library.
The last trick I want to do for you today is let's change the assembly and load a track
that we have already that is aligned to this.
I will cheat.
There are lots of ways to change the assembly like go back to the genome data viewer home
page and switch it there but I think we will add the ability to do that to the Genome Data
Viewer page directly in a little bit but not there yet.
I will change the assembly version just by cheating and typing the .25 in there.
You see now on that I'm on GRCh37 patch 13 zoomed out on chromosome 1.
I can put us in a place we are familiar with let's find the Lactase gene.
Now I want to upload a track and this is a track available here as one of the supported
accession types.
This is a SRA run for the library RP 11 which comprises 70% of the genome sequence.
Remember only one allele is included in the reference assembly so it will be interesting
to see what the other allele is this person has.
I will load that.
The same kind of thing we saw before where there are red places are the mismatches so
there are places where this person is heterozygous for positions in the lactase gene.
We will do one last search and look for the SNP that's responsible for, in some populations,
the persistence of lactase activity in adulthood, what provides people the ability to drink
milk without gastrointestinal distress.
I will go ahead and load that SNP to look at the genotype at that position.
Here is the SNP I am talking about.
I will search instead of the gene which we have been doing all day, I will search with
this particular SNP.
It lands me right here and I am in a different gene, MCM6.
All these SNPs landing in the promoter of the lactase gene so it's upstream relatively
speaking.
You see this person is a heterozygote for this change and has the A allele at this position
that allows lactase to be expressed in adulthood.
The G is the one in the genome assembly is the most common allele worldwide is the one
that is incorporated into this part of chromosome 2.
And of course you can zoom out and see that this is near the, if I change the region and
go to Gene with pad to zoom out a little bit faster and I will slide it over a little bit
and there is lactase over here and upstream of it is MCM6.
That's a lot I did today and maybe a little bit too much and we ran a little over the
time but let's go ahead and pause and answer any questions that we can and if we don't
have any questions we will take a couple of minutes to let people think if they have any.
I will share the slide with you at the end and there are a number of useful links on
the slide for the Genome Data Viewer and also a help document as well as a YouTube video
that Wayne made that is an overview of the Genome Data Viewer.
There is a fact sheet available on the FTP site.
In general please check out our YouTube channel and we make videos of all of our webinars
and put instructional videos up there.
There is a Learn page at NCBI which you can get to from the homepage . That URL will take
you to the fact sheets on just about any NCBI resource you can think of.
And if you have questions check out our support center . There are a lot of answers to questions
there, and you can also write to the help desk from that page.
Wayne doesn't see any questions so if you have questions about the webinars in general
you can write to us at the address given, webinars@ncbi.nlm.nih.gov.
If you have general NCBI questions you can use the support center link on the previous
slide or write to info@ncbi.nlm.nih.gov and of course you can always write to me, peter
dot cooper at NIH.gov.
Thank you for coming and I will go ahead and end the webinar.

For more infomation >> How Many Backlinks do You Need Per Day? - Duration: 7:01.
For more infomation >> Bell + Howell Set of 4 Torchlite Nano LED Flashlights - Duration: 11:23.
For more infomation >> তবুও ব্যায়াম কেন এতটাই জরুরী || Why Exercise is So Important - Duration: 3:23. 

For more infomation >> Carol's Daughter Monoi Luxuries Fragrance Collection - Duration: 10:37. 
Không có nhận xét nào:
Đăng nhận xét