Data Governance and Storage Plan
Purpose:
In applied science, efficiently managing data in the long-term is critical for ensuring solutions are effective and sustainable. If the necessary data is difficult to find, inaccessible, or disorganized, users will be less likely to continue using a solution, so thoughtful data governance and storage should be prioritized early in the co-development process.
The Data Governance and Storage Plan provides guidance on data: formations, use, sharing, version control, storage, and publication. It leverages "Findable", "Accessible", "Interoperable", and "Reuseable" (FAIR) principles on data use, storage, and sharing, as well as NASA's guidance on data management plans. It provides guides on intake data, output data (internal and external), and intermediary datasets.
How and When to Use This Tool:
The Data Governance and Storage Plan is used in the beginning of Phase 2, as soon as stakeholders begin making decisions on co-development and approaches to be used in solution design. This plan is used alongside the Technical Requirements Plan and the Solution Implementation and Impact Monitoring Plan
NASA Earth Action Solutions Co-Development Toolkit, v0.1 | 77
Section 1: Guidance on Data Governance and Storage
1. Data Inputs
The solution is likely to use and generate Earth science datasets derived from NASA's existing satellite, airborne, and modeling products. Primary data categories include:
1.1 NASA Satellite Remote Sensing Data (Input)
1.1.1 Examples include (but are not limited to):
- MODIS, VIIRS, Landsat, MISR, CERES, OCO-2/3, GEDI, ECOSTRESS
- Level-1 (radiances), Level-2 (geophysical retrievals), and Level-3 (gridded products)
Source: NASA Earthdata/DAACs (LAADS DAAC, LP DAAC, NSIDC DAAC, OB.DAAC, GHRC, PO.DAAC, etc.)
Access: Open and publicly available without restriction.
1.1.2 NASA Modeling and Analysis Data (Input)
- NASA GEOS (GMAO) atmospheric reanalyses
- NASA land surface modeling products (e.g., NLDAS, GLDAS)
- NASA carbon cycle and flux modeling outputs
Source: GMAO, GES DISC, and related DAACs
Access: Publicly available.
1.1.3 NASA Airborne Science Data (Input)
- Data from past airborne campaigns (e.g., AVIRIS, AirMOSS, CARVE, ACT-America)
- Level-0 to Level-3 instrument data, including radiance, reflectance, atmospheric observations, lidar, or hyperspectral imaging.
Source: DAACs (e.g., ORNL DAAC, GHRC)
Access: Public Accessible.
1.1.4 Personally Identifiable Information (PII) & Sensitive Data
Refer to the PII guidance, including data security and redacting PII information, provided in the Solution Co-Developments in Practice document. Everyone has the responsibility to protect PII in any form (physical or electronic) from unauthorized disclosure, modification, or destruction in order to ensure its confidentiality, integrity, and availability. PII data should only be destroyed in accordance with NASA Records Retention Schedules. PII that is no longer needed should be destroyed in order to reduce risk to NASA. NASA CUI Destruction requirements can be found in NPR 2810.7. Additional guidance is provided through NASA Records Management.
Sensitive data, such as data on location on sensitive assets such as Wildlife, Military Assets, Refuge locations, should be treated with data security levels provided by International Traffic in Arms Regulations (ITAR) and the Export Administration Regulations (EAR).
1.1.5 Commercial data such as high-resolution images provided by CSDA
Any commercial data, including data from CSDA, comes with guidance on use and sharing of the data. Since different providers give different guidance based on the license agreements, follow the guidance provided by CSDA or the commercial data provider for use and distribution.
1.2 New Data Products Created (Output)
Generated outputs may include:
- Derived geophysical parameters
- Calibrated/quality-checked subsets of NASA data
- Gridded model–observation fusion products
- Machine learning outputs, uncertainty quantification, or statistical summaries
- Analysis scripts, workflows, visualization tools, or Jupyter notebooks
All derived data and software will be made public in accordance with NASA's Open-Source Science policy.
NASA Earth Action Solutions Co-Development Toolkit, v0.1 | 78–79
2. Standards to Be Used for Data and Metadata
2.1 File Formats
Project data will use community-standard, interoperable formats, including:
- NetCDF4 (CF-compliant) for gridded data
- HDF5 for hierarchical scientific data
- GeoTIFF for spatial raster products
- CSV for tabular summaries
- JSON/YAML for configuration files
- PNG/JPEG for static images
2.2 Metadata Standards
Metadata will follow NASA and community norms:
- CF Conventions (Version ≥ 1.7) for NetCDF
- NASA's Directory Interchange Format (DIF) or ISO 19115/19139 as required by DAACs
- DOI assignment metadata for all publicly released datasets
- OGC standards for spatial reference and geolocation metadata
2.3 File Naming Conventions
File naming conventions can be based on partner consensus; the name has to reflect the contents for ease of data management. Addition of version counts helps in identifying the latest version of the document. The document has to be openly accessible to all partners with "Need to Know" privileges, therefore storing this in a location that enables this is recommended. Any files that are high level security should only be accessible to those with a "Need to Know."
All data and metadata will be validated prior to archiving.
3. Data Sharing and Access
3.1 Compliance With NASA Open Science
All project outcomes—including data, software, workflows, documentation, and publications—will be openly shared without restriction.
No proprietary periods are requested.
3.2 Release Timeline
- Input data: Already public.
- Derived datasets: Public release within XX days of validation and no later than the end of the project period.
- Software, code, and workflows: Released publicly at or before the first publication or public presentation.
- Documentation, READMEs, metadata: Provided at time of data release.
3.3 Data Repositories
All data will be deposited in NASA-approved, discipline-appropriate archives, such as:
- GES DISC (Atmospheric & modeling data)
- LP DAAC or LAADS DAAC (Land & atmosphere satellite data)
- PO.DAAC (Ocean & geophysical data)
- ORNL DAAC (Terrestrial ecology, airborne science)
If DAAC acceptance is outside the proposal scope, data will be hosted on:
- NASA-sponsored data portals (per program guidance)
- University/organization servers
No paywalls, login requirements, or embargoes will be applied.
NASA Earth Action Solutions Co-Development Toolkit, v0.1 | 80
4. Data Preservation and Archiving
4.1 Long-Term Preservation
NASA DAACs will serve as the long-term repository for final products. DAACs ensure:
- Redundant storage
- Integrity checking
- Long-term maintenance
- DOI assignment
- Continued open access
4.2 Backup and Versioning
During the project:
- Primary storage: institutional secure servers
- Version control: GitHub or NASA-approved version-control platform
- Redundancy: institutional cloud and local replicated storage
- Preservation: datasets prepared in immutable tagged releases
5. Software, Tools, and Workflows
5.1 Open-Source Software
All custom code will be released under an open-source license such as MIT, BSD-3, or Apache 2.0. This includes:
- Data processing pipelines
- Modeling workflows
- Analysis scripts
- Jupyter notebooks
- QA/QC utilities
- Visualization scripts
5.2 Repositories
Project software will be shared through:
- GitHub or GitLab public repositories
- Accompanied by:
- README
- Installation instructions
- Example datasets
- Containerization (Docker/Singularity) when appropriate
- CI testing setup when appropriate
5.3 Reproducibility
Workflows will include:
- Fully documented provenance
- Environment specifications (Conda, Dockerfile, requirements.txt)
- Reproducible notebook workflows
- Data access scripts using Earthdata APIs
NASA Earth Action Solutions Co-Development Toolkit, v0.1 | 81
6. Data Management Responsibilities
The PI will oversee:
- Data quality control
- Compliance with NASA Open Science policies
- Metadata creation and validation
- Repository submission
- DOI assignment
- Maintaining communication with NASA DAAC representatives
A designated Data Manager or Co-I will handle:
- Version control
- Code review
- Data packaging
- Archival submission preparation
- Documentation creation
7. Policies for Re-Use, Redistribution, and Citation
All datasets, code, and documents will be released under open, unrestricted licenses consistent with NASA terms, typically:
- Creative Commons Attribution (CC-BY-4.0) for data
- MIT, BSD, Apache, etc. for software
Users will be encouraged to cite:
- The dataset DOI
- Relevant software repositories
- Relevant data repositories: Github, Zenodo
- Associated publications
- NASA DAAC source datasets
8. Expected Data Volume
Estimated ranges (adjustable to project specifics):
- Satellite subsets & model data: 1–5 TB
- Airborne data usage: 0.1–3 TB
- Derived products: 50–500 GB
- Software + documentation: < 5 GB
Storage and transfer strategies will be scaled accordingly.
9. Restrictions, Ethics, and Security
- No restrictions on public release.
- No personally identifiable information or controlled data involved.
- All datasets comply with NASA and federal open-data requirements.
- Sensitive flight information (if applicable to airborne data) will be handled per NASA DAAC rules, but will not limit derived data release.
10. Branding/acknowledgements
NASA has an open data policy for all data created using their resources, but requires acknowledgement for their contribution in enabling, supporting technically or providing in all data created, software developed. OCOMM provides guidance on the specific acknowledgement statement and should be referred to as necessary.
NASA Earth Action Solutions Co-Development Toolkit, v0.1 | 83
Section 2: Data Governance and Plan Checklist
| Category | Requirement / Best Practice | Checklist Items |
|---|---|---|
| Solution Name | ||
| Data Types & Sources | Identify NASA input and derived data | List satellite datasets; list model data; list airborne data; describe derived outputs; confirm DAAC sources are public |
| Data & Metadata Standards | Use standard formats and metadata | NetCDF4/CF; HDF5; GeoTIFF; CSV; ISO 19115/19139; DIF; OGC spatial metadata; W3C PROV |
| Open Science Compliance | Meet SPD-41a requirements | No embargo; public release; open access; no login required |
| Data Sharing | Provide access pathways and documentation | Identify DAAC; provide DOI; include README; release notebooks and workflows |
| Repositories | Use NASA DAACs and open repositories | Submit to GES DISC / LP DAAC / ORNL DAAC / PO.DAAC; use Zenodo/GitHub for prereleases |
| Preservation | Ensure long-term data integrity | Integrity checks; backed-up servers; immutable version tagging |
| Documentation | Provide complete documentation | READMEs; processing steps; uncertainties; QA/QC; workflow diagrams |
| Software & Tools | Release all code as open-source | Public GitHub; MIT/BSD/Apache license; examples; notebooks; containers |
| Reproducibility | Ensure workflows are replicable | Document workflows; data retrieval scripts; versioned code; automated tests |
| Data Volume | Estimate and manage storage | Estimate raw/intermediate/final sizes; confirm storage resources |
| Roles | Assign team responsibilities | PI oversight; Data Manager for metadata and QA/QC |
| Licensing & Reuse | Enable broad reuse | CC-BY 4.0 for data; MIT/BSD/Apache for code; citation instructions |
| Ethics & Security | Ensure compliance | No PII; no restricted data; policy-compliant. |
| QA/QC & Validation | Provide validation processes | QA/QC steps; validation; uncertainty estimates |
NASA Earth Action Solutions Co-Development Toolkit, v0.1 | 84