Hi, my name's Jennifer Mclean and I'm a part of the Research Data team located in
the Fisher Library at the University of Sydney.
This is a presentation about research data management for University of Sydney staff
and students.
In this session, I will cover how you can find our website, what research data management
is, why research data management is important and how you can implement good research data
management practices.
At the end, I will talk about some of the tools available to University of Sydney staff
and students that are helpful for managing research data.
To begin, I'll show you how to access our website.
Start on the University of Sydney website.
Click on Library at the top of the page, then hover over 'research', then select 'research
data management'.
Here you'll find the University's research data management guidelines.
They contain a lot of the information I will talk about today, including links to the research
data management planning tool, various storage options and all of the training sessions that
we hold throughout the year for University of Sydney staff and HDR students.
You can also Google 'research data management Sydney Uni' and it will lead you to this
page.
Before we get into the practices of research data management, it's worth just quickly
clarifying a few things.
Firstly, what is research data management?
Research data management is the practice of making decisions about your digital and physical
research data, and then putting those decision into action.
The types of processes that are involved in research data management include data collection,
the organisation of your data, storing your data, preserving your data and publishing
your data.
We will go through a lot of these today to give you more of an idea of what is considered
'best practice' for research data management.
Understanding what your research data is is a really important first step in research
data management.
Research data is essentially anything that has helped you to form the basis of your research
output.
You can see lots of different examples of research data on the screen, which shows that
'research data' is not just confined to numbers in a spreadsheet.
Research data can be observations, survey answers, scripts, photographs, notebooks,
artefacts and diaries, just to name a few.
If you aren't sure what your research data is, come and have a chat to the Research Data
team in the Library.
So why is research data management so important?
There are policy reasons for this and there are also benefits for researchers who follow
good research data management practices.
For policy, there is the Australian Code for the Responsible Conduct of Research.
The Code details what the expectations are for researchers in regard to their research
data, such as the expectation that data will be retained for the appropriate amount of
time.
The University has a Research Data Management Policy that details what researchers and students
at the University should be doing to manage their research data.
For example it's compulsory for all research projects at the University to have a research
data management plan, which I will discuss further soon.
Funder policy is another reason why research data management is becoming so important.
In Australia, the Australian Research Council requires that anyone applying for funding
outlines how they are going to manage their data.
They also strongly encourage that data collected or created under the grant is made available
for others to view and re-use.
The National Health and Medical Research Council also strongly encourages the sharing of data
created or collected under any of their grants.
Funders in both the US and UK are a lot more strict in their research data policies, so
we should be prepared in case that happens in Australia soon too.
Some publishers now also have policies in regard to research data management, mainly
that research data must be made available when an article is accepted for publication
in the journal.
Some journals that have data policies include Nature, Science and PLOSone.
As I mentioned before, research data management also has benefits for a researcher.
When you implement good research data management practices, your data should be well organised
so that you can actually locate the data on your computer.
Your data should be stored safely and securely and backed up, so you won't unexpectedly
lose your data.
Your data will remain accessible, so if you need to access it later on down the track,
you can.
Your data should be in a publishable state, so if you want to, or if you have to, you
can publish your data successfully.
Finally, because research data management facilitates the sharing and re-use of research
data, it can mean that there is less duplication of work that has already been done.
Re-using data that has already been created can mean better use of your project's resources.
Now we have an understanding of research data management and why it's important, we can
start to look at how you can approach research data management.
This is our research data lifecycle which shows some of the main stages for a project's
research data, and the aspects of research data management that are important to focus
on in each stage.
I will talk about each of these stages today and what should be addressed and actioned
in each stage.
The first stage is 'Plan and Fund', these are generally things that should be addressed
at the very beginning of your research project.
First up are research data management plans, or RDMPs.
An RDMP is compulsory for all research projects at the University, including HDR projects.
RDMPs must be submitted by the University's research data management planning tool which
you can find on our website.
If you haven't done an RDMP before, I recommend going through the research data management
planning checklist before you submit a formal RDMP via the tool.
The checklist can also be found on our website.
It will ask you questions about your research data and point you in the direction of useful
resources to help you along.
Once you have completed the checklist, you should be ready to complete and submit your
formal RDMP.
A good RDMP is one that grows and evolves with your project so you should update your
RDMP as your project changes.
At the end of your project, your RDMP should be an accurate representation of your research
data and the research data management practices you've followed throughout the project.
There are several things that should be addressed in a research data management plan, a lot
of which we will cover today.
These are things like where will you store your data?
How do you plan on naming the files and folders for your digital data?
How long will you have to keep your data for?
What metadata will you keep for your data?
Just a tip, metadata means data about your data.
For example, it might be the title of a dataset, a description of the dataset and the name
of the dataset creator.
At the start of your project you should also clarify and have an understanding of ownership
in regard to your research data.
If you're uncertain about ownership at the University, read the Intellectual Property
Policy from 2016.
Clarifying ownership of research data does several things.
It helps you understand who has permission to publish using the data, who is allowed
to take the data away from the University when they leave and who owns the intellectual
property rights of your research data because even though you collected the data, it doesn't
necessarily mean you own the data.
Now let's have a look at the research data management practices to look at while you're
collecting, creating and analysing your data If you're dealing with human data, you'll
probably have to create consent forms.
Assistance with this is provided by the Ethics Office, however there are a few things to
think about in regard to research data management.
For example, on a consent form you should include how the research data will be stored,
how the data will be used and if the data will be shared with anyone, including data
publication.
In relation to this, you might also highlight how the identity of the participants will
be kept anonymous during and after the study.
For example, if you plan on publishing the data you might be de-identifying the data
to protect study participants.
Data can be messy.
To remove inaccurate or corrupt records from your dataset, data cleaning is essential.
For example in one single dataset you might have several different entries for the same
thing.
You could have the United States of America for one entry, USA for another and then just
US for another.
This can make analysing the dataset harder and also means that you might end up with
incorrect results at the end of the analysis process.
You need to make sure your data is in good order after you've collected it.
One way of doing this is using a free tool called Open Refine to clean your data.
You can get in contact with our team if you want further information about data cleaning.
File naming is so important in ensuring that you can easily locate your files while you
are using them, or if you need to refer to them in the future.
Yet it's so easy to do a bad job of it and end up in a mess, like what you can see on
the screen right now.
So a few basic tips for file naming.
Firstly, be consistent.
This is the best thing you can do to ensure that you'll always have a general idea of
what the file is called.
If you're working in a group, decide what keywords you will use to avoid confusion.
For example, if doing a survey you want to make sure everyone is calling it a survey
rather than some calling it a survey and others calling it a questionnaire.
Use a hyphen or an underscore rather than spaces between words as some analysis software
won't accept the file unless it has no spaces in the file name.
Even if you don't think you'll use analysis software hyphens and underscores are good
practice, and you never know what software you might use later on.
It's also a good idea to work out how to best sort your files so that you can find
them quickly.
For example, if it's easiest to sort by date then put the date first in the file name.
Versioning is important to ensure that you can revert to a previous iteration if you
need to and, in some cases, it can help you understand the evolution of the dataset.
A few tips for versioning are decide how many versions to keep, especially if you have really
large files.
Keep previous versions in one place to avoid duplication of work and so you can find other
versions of your data if you need to.
Use a version control table or versioning software like Git or Mercurial to make this
process easier and more transparent.
You can find the version control template you can see on the screen on our website.
I mentioned metadata very briefly at the start of the presentation.
Metadata is important because it helps you and others to understand your dataset.
Think about what information you might need in a few years or what information someone
else would need to make sense of your data, and that's the information you should be
keeping.
It can be things like location, date, creator, description and software used to collect or
analyse the data.
It's essential that your metadata is correct, so make sure you write it down as you go along
and don't try and commit it all to memory.
There are a few easy ways to keep track of your metadata.
You can put it in a plain text file, or a README file and store it alongside the data,
or if you're using an eNotebook to store your data, you can easily add the metadata
next to the relevant dataset.
I'll talk more about eNotebooks at the end of this presentation.
During and after data has been collected, it's important to store the data securely
and to preserve the data so that it remains accessible.
When selecting a place to store your data, you need to ask yourself a couple of things.
Firstly, who can access my storage?
If you're using your computer at home, can your kids or a housemate log on to your computer
and accidently delete something or access something they really aren't allowed to.
Can data be easily shared with your supervisor or collaborators?
Ideally, you want to choose a storage options that allows you to easily share data with
people that you need to share it with, rather than emailing datasets or sharing a USB stick.
Where are my data and documents actually being stored?
This refers to the location of the server, for example if you're using Dropbox then
your data and documents are being stored in the United States and come under US law.
The University really wants your data to be stored in Australia.
Are my data and documents being backed up?
If you're doing this manually, it means having your data in 3 different locations.
Better still, find a system that will automatically back up your data for you.
Will my data and documents remain accessible?
If you're using a 3rd party provider for your data, do you know what happens if the
company goes out of business?
Will you get a chance to recover your items?
When it comes to choosing a storage device, make sure it is sustainable.
Remember that we use to use DVDs and CDs to store data, now most computers being sold
don't even have a disc drive.
I will talk about some storage options available to you at the University at the end of this
presentation.
It's important to know what the retention period is for your data and then to keep the
data for that minimum period so that you can support your research findings.
Retention periods vary, for example the standard retention period for research data is 5 years
minimum but if your data is of national or international significance or impossible to
repeat then it should be archived and kept forever.
You can find more details about retention periods on our website.
Accessibility has already been mentioned a few times already today.
One of the main points in ensuring that your data remains accessible is choosing the correct
file format to preserve the data in.
Keeping in mind that this might be different to the file format that you collect and analyse
the data in.
The aim of this is to save the data in a format that will still be able to be opened in the
future.
These file formats will generally be widely used within your discipline, or open and non-proprietary.
For example, a dataset in a spreadsheet is best saved as a .csv file as this is able
to be opened by a wide variety of programs but it's ok to save it as an excel file
as it's widely used.
That brings us to the publish and share stage of the research data lifecycle.
An increasingly important part of research is publishing and sharing research data
When it comes to publishing data, you might be doing it because you want to or you might
be doing it because you have to.
It's a good idea to check the data sharing policy of the journal you're submitting
your research to to make sure you're prepared and ready to share your data if you need to.
Let's take a quick look at the data sharing policy of the journal PLOSone.
PLOSone requires authors to make all data that underpins the findings of a research
article available without exception.
PLOSone are quite a strict journal when it comes to data sharing, they also require that
the data is made freely available in a repository, it must have a Digital Object Identifier or
a DOI upon manuscript submission and the license for the data must not be more restrictive
than a Creative Commons attributions license, CC-BY.
And just in case you're working with human data, no one will ever make you share data
when it could endanger the people involved in the study.
Now there is publishing data and then there is publishing data well.
We want everyone to be publishing their data well.
The dataset you can see on the screen is an example of poorly published data.
It's openly available in the repository Figshare, however there is no descriptive
information about the dataset.
This means that this dataset is can't be understood by anyone except the author.
To publish a dataset well you need to do a few things.
Firstly, publish your metadata alongside the dataset.
This will give the dataset context and help others understand what the data is.
Choose a file format that's accessible to everyone so that everyone can open it now
and in the future.
If appropriate, select a license for the dataset which means that others can view and re-use
your dataset like the Creative Commons attribution license CC-BY.
This license means that others can view and re-use your data, but they have to cite you.
CC-BY isn't the best option for everyone so get in contact with us if you want assistance
with choosing a license.
Get a persistent identifier for yourself, and for your dataset.
For your dataset get a DOI.
This gives your data a stable home and also makes it easier to cite your dataset.
Researchers can get a persistent identifier for themselves called an ORCiD.
ORCiDs are free and are unique to you.
When you publish anything, you can associate your ORCiD with it as well as your name.
This is really useful for people who have common names, or publish using a name different
to the name they use every day.
It also means that you should be able to select an ORCiD ID and find all of the datasets and
publications from that person.
When publishing data, you should also choose a good place to publish.
The University has a repository for researchers from the University of Sydney.
You can also search re3data.org to see if there is a repository that suits your research
area, this is known as a discipline specific repository.
You can also publish in more general repositories like FigShare.
The University supplies a few different tools that are useful for research data management
including REDCap, eNotebooks, the Research Data Store and CloudStor.
These are provided free of charge to University students and staff and generally need a UniKey
for access.
REDCap allows you to create your own secure online surveys.
It has basic and longitudinal capabilities so you can create quite complex surveys and
databases if you need to.
Data in REDcap is encrypted while it's stored in REDCap, so it's very secure and good
to use for clinical data.
REDCap allows you to export in a range of formats, so it can be easily exported to other
software.
It also allows you to customise what you export, for example you can remove all identifiable
information from the dataset before you export it if working with human data.
To request access to REDCap, visit the IT self-service portal.
There is a variety of online information that can assist you in using REDCap in you need
help.
You can see the links on the screen now.
An eNotebook allows you to conveniently store and share data, notes and documents in one
secure place online.
The University uses the LabArchives platform as our eNotebook.
The eNotebooks are stored in Australia using Amazon Web Services.
eNotebooks can be shared with anyone, at any level.
This means you can share your whole eNotebook with a collaborator at another University,
or just a small section of.
You can even add people with a non-university email address, like Gmail and Hotmail.
An eNotebook is very accessible, you can open and use it on your device as long as you have
an internet connection.
eNotebooks have lots of advantages when it comes to research data management, for example
you can upload and securely store data, it has an automatic back-up, you can even easily
view previous version of your data, you can see who made revisions to documents and data
via the revision history and you can easily keep metadata record alongside dataset
You can create an eNotebook by visiting the website on the screen.
For support, you can email: enotebook.support@sydney.edu.au The Research Data Store is a data storage
option available for University staff and HDR students.
It has unlimited storage that is stored in Australia and automatic back-up and disaster
recovery to prevent you from losing data.
It's also a good way to collaborate with other Sydney Uni researchers, as you can set
up a research data store folder so that anyone on a research project can access it, provided
they have a UniKey.
The research data store is a good option for you if you need to easily access data on campus,
if you'll be using analysis tools or HPC, if you want to collaborate internally with
University of Sydney Staff and students, or if you need to store big datasets.
To request access to the Research Data Store you'll need to fill in a research data management
plan via the RDMP tool.
Visit our website for more details.
Cloudstor is another storage option you can use, although we would generally recommend
using the Research Data Store or an eNotebook in the first instance.
With Cloudstor you get 100GB of storage for free.
It's stored in Australia and also has an app available for easy access to documents
and data on your device.
You can also sync a desktop folder to CloudStor so you don't always have to open up your
browser to access your files.
It also has a feature that allows you to share uploaded data and documents via a link, so
it can be handy if you need to share a dataset externally.
Cloudstor is best used for individual use, rather than group projects.
You can login to Cloudstor by typing in the link on the screen, or by visiting our webpage.
OK, so that's all I have to cover for this research data management overview.
We have looked at the research data lifecycle and what research data management practices
need to be followed at each stage of a research project.
We have also talked about the tools on offer for research data management at the University
of Sydney If you have any questions or want to make
an appointment with us, email researchdatasupport@sydney.edu.au.
Access to the Research Data tools and a lot of what I talked about today is available
on our website.
Thanks for watching!
Không có nhận xét nào:
Đăng nhận xét