What is Data?
According to the U.S. Office of Management and Budget, research data is defined as follows:
(i) Research data is defined as the recorded factual material commonly accepted in the scientific community as necessary to validate research findings, but not any of the following: preliminary analyses, drafts of scientific papers, plans for future research, peer reviews, or communications with colleagues. This "recorded" material excludes physical objects (e.g., laboratory samples). Research data also do not include:
(A) Trade secrets, commercial information, materials necessary to be held confidential by a researcher until they are published, or similar information which is protected under law; and
(B) Personnel and medical information and similar information the disclosure of which would constitute a clearly unwarranted invasion of personal privacy, such as information that could be used to identify a particular person in a research study.
There are several different types of data:
- Observational: Data captured in real-time, ususally irreplacable
- Examples: Sensor data, telemetry, survey data, sample data, neuroimages
- Experimental: Data from lab equipment; often reproducable, but can be expensive to do so
- Examples: Gene sequences, chromatograms, torid magnetic field data
- Simulation: Data generated from test models wher emodel and metadata (inputs are more important than output data)
- Examples: Climate models, economic models
- Derived or Compiled: Data that is reproducable (but very expensive to do so)
- Examples: text and data mining, compiled atabase, 3d models, data gathered from public documents
Data and Storage File Formats
Storage formats can include, but are of course not limited to:
- Text: e.g ascii, word, PDF
- Numerical: e.g. ascii, SPSS, STATA, Excel, Access, MySQL
- Multimedia: e.g. jpeg, tiff, mpeg, quicktime
- Models: e.g. 3D, statistical
- Software: e.g. Java, C
- **Discipline-specific*: e.g. FITS in astronomy, CIF in chemistry
- **Instrument-specific*: e.g. Olympus Confocal Microscope Data Format
Funding Agency Requirements
The sharing of research findings has always been critical to the development of science. Whether through society publications, corporate publishing, or open access forums, scientific progress is dependent upon the sharing of data collected through:
- observation (e.g. sensor or survey data),
- experimentation (e.g. gene sequences, chromatograms)
- simulation (e.g. climate or economic models), and
- derivation/compilation (e.g. text and data mining, 3D models).
In order to foster scientific progress and promote the expansion and diversity of user communications, federal agencies are issuing data sharing mandates for funded proposals. This means that researchers receiving funds from these agencies are subject to the requirements of these mandates.
Example Data Management Plans
The following list of Data Management Plan examples are available for variety of research domains. Use the following examples as a guide to writing your own:
- DataONE Examples
- University of California San Diego Examples
- Rice University Examples
- University of New Mexico Examples
- ICPSR Examples
- University of Minnesota Examples
Data Management Tools
Preparing your research data for effective management throughout the data lifecycle and adhering to agency mandates does not have to be a time consuming task. The following tools will help you create a well-organized Data Management Plan with ease:
- DMPTool provides guidance and resources for your Data Management Plan. The goal is to provide a "flexible, online tool to help researchers create data management plans." With the DMPTool, rsearchers can:
- Create ready-to-use agency and directorate-specific data management plans
- Comply with data management plan requirements
- Acquire easy-to-follow guidance on creating plans.
- The Digital Curation Center Data Management and Sharing Plan Website is the "leading hub of expertise in curating digital research" in the United Kingdom. Much like the DMPTool, the DCC's DMP tool is a flexible web-based tool that allows rsearchers to create personalized data management plans, and includes many of the same features as DMPTool.
- The ICPSR Guidelines for Effective Data Management Plans provides resarchers with the following:
- A defined list of the elements comprising a Data Management Plan
- An explanation as to why each element is important for a DMP
- A recommended list of elements to be included in a Data Management Plan
- The DataONE Data Management Plan Outline is a quick reference guide for outlining a Data Management Plan, and provides a generic example of how each section fo the plan may look once completed.
Your research data is a source of potential value to resarchers and society at large, and sharing it helps to put its potential value to use. In order to maximize the value of your data, it is imperative to inform others how they can use it, while protecting your rights as the creator.
The most effective way of informing others of how they can use your data is by applying a license to its use.
If you want to find a license that is right for your data or learn more about licensing data in general, please refer to the following resources:
- Creative Commons for Data
- Used to protect a broad spectrum of creative works
- Can be used for public domain and more restrictive licensing
- The Digital Curation Center 'How to License Research Data' Guide
- Provides a comprehensive guide to licensing data
- Open Data Commons
- Provides a simple introduction to using the ODC license for data
- In some cases is a more appropriate license than Creative Commons
- GNU Licenses
- GNU General Public License (v3.0) is used to license many free software packages
- GNU Affero GPL (v.30 is used to license free software being run over a network
- GNU Free Documentation License is a license for manuals, textbooks, and similar documents
- GNU Design Science License is a license for text, images, music, data, but not for documentation or source code