Hello everybody this is Peter Cooper from the NCBI.
With me today is Beverly Underwood from the GenBank submission staff.
Today we're going to talk about 16S GenBank submissions.
The slides for today should be available to you in the handout section in the right-hand
side of the screen, you can download them there if you want to.
They are also available at the FTP site in the Materials Q&A directory; that compressed
URL will take you there.
Now I will turn it over to Beverly.
Thank you Peter and Welcome.
Today we will show you how to submit your 16S ribosomal RNA data to NCBI.
You will learn the fastest route to submitting and receiving Accession numbers for 16S ribosomal
RNA sequences.
We will show you where to access this faster submission tool.
We will demonstrate a submission Help you understand this submission process
and Provide you with a pre-submission checklist
to help your submission go more smoothly.
We will start right away with the demonstration.
This is the link to the submission tool.
(https://submit.ncbi.nlm.nih.gov/subs/genbank) You need to use your NCBI login or register
for a free NCBI account if you do not yet have one.
OK, we are now going to the Submission Portal using the link in the slides.
First, I need to check that I am logged in.
I am not logged in, so I will click the login button to enter my NCBI username and password
and click sign in.
If you do not yet have a NCBI log in, you can register for one by clicking on "register
for a free NCBI account" and following the steps on the page.
So we are now logged in.
This is the GenBank Submission Portal.
You can click on "Short Description and brief instructions to read brief descriptions
of what you can submit in this tool and the requirements for these submission types.
Below that are submissions you may have already made.
To start a new submission, click the new submission button.
We have now started a new submission.
At the top of the page is your SUB number.
This SUB number is not an accession number.
You should only use this number for tracking and corresponding with NCBI until accessions
are assigned.
This is the submitter page, this page has your contact information.
The address on this page will be displayed in your final GenBank record.
Throughout this talk, I may refer to a GenBank record as a flatfile.
For this webinar, these mean the same thing.
If you are a new user, you will need to fill out the form, making sure you have filled
in all the fields with an asterisk.
If you are a return user, this form should be filled out already with the information
you previously provided.
Note that if you are not using a United States address, the state/province field is optional
and you can leave this field blank.
Review the information and make changes, as needed.
Again, this address will be displayed in the final GenBank record.
I have looked over everything on this page and since my information is correct, I will
click continue.
For this demonstration, we are submitting prokaryotic 16S ribosomal RNA sequences, so
I will select small subunit rRNA only (16S rRNA).
But you can see the other submission types that you can submit through this tool listed
here.
We may have webinars on those in the future.
Below the submission types, is a free text field for a submission title.
This field is optional.
You may use the submission title to help you track your submission.
The submission title is not displayed in the final GenBank record and is for your use only.
For this demonstration, I will leave this field blank.
I am done with this page, I will click continue.
Now we are on the Sequencing technology page.
Select the method you used to obtain the sequences from the options listed on this page or if
your method is not listed, select Other and type the method you used in the provided text
box.
For this demonstration, I will select Sanger.
Note if you used one of these other methods, you will need to indicate if you assembled
the sequences and you will be prompted for the name of the assembly program and version.
Now we are on the Sequences page.
First you need to select when you want the sequences released to the public nucleotide
database.
If you want the sequences to be publicly available as soon as possible, click release immediately.
If you are still working on a publication and do not want the sequences available for
some time, select release on a specified date.
For this demonstration I will select to release on a specified date and select April 29, 2017.
Next you are asked if a chimera check program was used to check the sequences.
Chimeric sequences are artifact sequences formed by the incorrect joining of two or
more biological sequences.
Chimeric sequences are commonly formed during laboratory procedures to isolate 16S ribosomal
RNA.
There are publicly available chimera check programs designed to identify chimeric sequences.
We highly encourage you to perform a chimera check of your sequences and remove chimeric
sequences from your file prior to submitting.
If you did this analysis, you should check Yes and provide the name and version of the
program you used.
For this demonstration, I will select No.
Please note that all submitted 16S sequences will be screened for chimeras, so your submission
will be delayed if chimeric sequences found in the file.
There is a link in the submission checklist provided to you which discusses chimera detection.
Next it is asked if your sequences are from a pure-cultured or uncultured source.
You may be wondering what these options mean so I will describe some scenarios to help
explain these.
A pure-cultured strain would be if you isolated strains on an agar plate and then isolated
the 16S rRNA from the genome of an individual strain on the plate.
An example of an uncultured sample would be if you extracted DNA directly from a mixed
environmental sample, like a soil or gut sample, and then amplified and sequenced 16S from
the mixed DNA sample.
If you extracted 16S rRNA from a mixed sample, then introduced the 16S rRNA into a laboratory
E coli strain, uncultured is still the appropriate selection.
I will select pure culture for my demonstration.
Now it is time to provide the sequences file.
But first, let us quickly look at a FASTA file of multiple sequences.
If you have multiple 16S sequences to submit, put all of the sequences into one FASTA file,
like this example.
It is very important to correctly format the FASTA file.
Each sequence in the FASTA file has a definition line, which is marked by use of the greater
than sign, >, this is followed by the sequence ID.
In this file, I used the strain identifiers as the sequence IDs.
You may use your laboratory identifiers, such as the strain or clone ID, for the sequence
IDs, but you may also use Seq1, Seq2, etc.
The sequence IDs must be unique and must be less than 25 characters.
There are limits on what characters you can use for the sequence IDs.
The checklist made available to you with this webinar contains a link to FASTA Format help.
You may also access it by clicking on the help link on the sequences page.
Here, "Help on FASTA file."
Now I will upload my FASTA file using the choose file button.
Depending on your web browser, the button may read "Browse" instead of choose file.
Notice after I select the file a progress bar appears to indicate the progression of
file upload.
You will see something similar when files are validated.
Larger files take longer to both upload and validate.
The file is uploaded and I have filled out all the fields on the page, so I will click
continue.
If you miss a required field, you will get a red error with directions on what you missed.
After clicking continue, the file is validated and the sequences are checked for some common
problems like vector contamination and chimeric sequences.
If your sequences have these issues, you will get notification on the sequences page.
After your sequences pass validation, you will be prompted for source information on
the Source Modifiers page.
This page is where you will provide your source metadata.
Let's look over this page briefly.
There are short instructions and an example source modifiers table which shows the type
of information collected here.
The required information for your type of submission is listed, but you may provide
more source information to make your submission more informative to database users.
A complete list of valid source modifiers and descriptions may be found by clicking
on More help on source modifiers on the Source Modifiers page.
You may obtain a table template to provide the source information by clicking on Download
Source Modifier Template.
You can save this file to your computer and open it for editing.
The table can be filled out in a text editor or a spreadsheet program.
If you use a spreadsheet program, make sure you save the file type as Text (tab-delimited).
Instructions for using a spreadsheet program can be found at the source modifiers link
that I pointed to earlier.
This is the source modifiers file I have already prepared.
The top row is the header row with the source modifier labels.
Below the header row are the corresponding data rows.
The table contains the required information for a cultured prokaryotic submission, which
includes sequence IDs, organism names, and unique strains.
I have also provided additional isolation-source and country information.
The organism is the scientific name of the organism which provided the sequenced genetic
material.
It can be the genus species name, but does not have to be that specific if you do not
know it.
The strain is the alpha-numeric sample code you use in your laboratory for this strain.
In this file, pme07-01, and these other codes, are a strain codes.
The isolation-source describes the local environment where the sample was obtained.
The country is the location where this organism or sequenced sample was obtained.
Additional information about the location within the country may be included after the
country name and a colon.
A link to these source modifier descriptions is in the checklist provided to you with this
webinar.
All of the values in this table are separated by a tab, so this is a tab-delimited table.
I will now upload this table using the choose file button and click continue to validate.
My source modifiers file contained a row with a new organism name.
Because it is not yet a recognized name, I am receiving a warning that NCBI does not
recognize the name.
If you receive this warning, you should check the spelling to check for typos.
If are no typos in the names and you are submitting a sequence from a new organism or an organism
not yet in the NCBI taxonomy, click continue.
If there is a typo, follow the directions in the warning, correct the names in your
file and upload a new file for validation.
Uploading a new file will overwrite the previous file.
The name in my demonstration is correctly spelled so I will click continue.
We are now on the references page.
Everything you include on this page will appear in the final GenBank record or flatfile.
At the top are the sequence authors.
The sequence authors are the names of people who helped with generating the sequences.
Names need to be entered with the first name or given name first followed by the last name
or family name.
So I will type my name, Beverly Underwood and my colleague's name Peter Cooper.
Notice as I was typing a flatfile preview appeared on the right.
This is a display for you to review – it displays how the names will appear in the
flatfile or final GenBank record.
If a name in the preview does not look correct or how you want it to appear, you need to
correct it on the left.
You should pay attention to the preview and correct the names as needed.
Now provide the publication where you have or will discuss the data you are submitting.
If you do not intend to publish a paper for these sequences, but you want to make the
sequence data public, you may select unpublished and leave the remaining fields blank.
If you will publish or have published, select as appropriate, fill in the information and
click continue.
Again, all of the information you provide on this page will appear in the FF.
Ok we are almost done!
The overview page, this is the final page which displays all of the information you
provided to us in the previous pages.
The release date you selected is indicated near the top of the summary.
The contact address for use in your records is next, followed by the sequence authors,
publication information, and sequencing technology.
The unprocessed files you uploaded are provided here along with a feature table file that
was made from the files you provided.
A summary of what we have received as well as a list of any processing that was done
to your fasta file, such as trimming vector, removing sequences will be summarized in reports
here.
If nothing was done, the report will state the number of sequences received and the sequence
length range.
Processed FASTA and source files that will be used to make your GenBank records are listed
last.
If any sequences were removed or trimmed because of vector or chimera, these files will contain
those edits.
Review this page it and if you have corrections, go back using the tabs at the top to correct
the data.
After you have reviewed the information, click Submit.
Congratulations we have now submitted data to NCBI!
Im going to go back to my slides now.
So what happens after we click Submit?
After you click submit on the overview page, your submission will go into automated processing
where it is further evaluated for errors such as those listed on this slide.
During this time, the status in Submission Portal will either be "Queued" or "Processing".
I'll pause for a few seconds to give you time to read this slide.
If there are no errors, the submission will quickly receive Accession numbers and you
will receive an email with a link to portal where you can download the final processed
flatfiles.
The submission status is processed and there is a green check mark.
If there are errors you need to correct, you will receive an email shortly after submission,
like this one, with a link to the submission.
Click on the link in the email to be directed to your submission.
In portal the status is "Error" and a report describing the problems is posted.
Click on the error report to read what the problems are.
You may click on the headers in the error report for more information about the error.
So these headers are clickable.
Once you have the corrected FASTA file or know how to correct the problem, click the
Fix button to correct the problems in the submission, and you can submit to try again.
We have prepared for you a pre-submission checklist of information you need to collect
ahead of time to help your submission go smoothly.
This checklist pdf is available as a handout in the webinar materials area.
The checklist also contains an example flatfile with the associated submission dialog pages
to show how the information you provide relates to the forms in the submission dialogs and
to the flatfile.
Ribosomal RNA submissions received through this tool will be processed faster than other
routes.
Larger submissions will take longer, but currently median response time to Accession numbers
with final processed records or Error Reporting is less than a day, but maybe as soon as 10
minutes after clicking submit.
So we encourage you to use this tool to get your data out faster.
Thank you, I'll turn this over to Peter now.
Thank you Beverly.
We have a couple of questions I think are worth addressing for everybody.
The first one, has to do with the source organism name.
What if the organism name is unknown, what do you include in the source modifier table?
I think you can address it in two ways.
One, to include the most precise taxonomic group you know, that is probably the right
answer.
But if it is completely unknown, what would you recommend they do in that case?
If it is completely unknown.
Use unknown organism.
We would get that and we may ask them more questions.
If it is from an uncultured sample, and you know it is bacterial, or archaeal, then you
can call it uncultured bacterium, if it is cultured, and you really don't know, you can
just call it bacterium I hope that helps.
Okay, another question that is similar has to do with this concept of operational taxonomic
units.
Can they submit an OTU file along with mapping files, telling you what sequence goes with
what OTU.
I think this is one we want to spend a little time and give this person an answer from our
taxonomists, so we will consult with them and get back to you.
Okay, another question is, which software do you recommend for looking for chimeras,
you mentioned one, I think.
There are a number of them available.
In submission portal now, we are using uchime to check for chimeras.
There is also DECIPHER.
There are some publicly available programs if you search the literature.
One last question, this has to do with updating your sequences.
One question was about... the specific example was, you submitted unpublished, then you wanted
to add the publication later to the submission, what do you need to do?
Okay, so we have an update page.
Okay, great.
So there is a link which describes how to format your updates, so follow the instructions
on the page, it includes information on how to change your publication status from unpublished
to published.
As well as updating your sequences, source modifiers, feature information, all sorts
of information.
And you would email that to gb-admin@ncbi.nlm.nih.gov.
Just make sure it is formatted as described on the update page, and include your accession
numbers with your request.
Okay, that's all the questions that we have time for today.
If there are any others in the questions pod that we didn't get to, we'll write out the
answers and make them available in a document in the Materials directory that I mentioned.
We will also send you a link to a survey monkey.
We'd like to get some feedback both about the webinar and about the submission tool.
Thanks very much and thank you for coming today.
>> [Event Concluded]
Không có nhận xét nào:
Đăng nhận xét