From NASA to Dataverse: Preserving Access to Crucial Remote Sensing, Environment, and Population Data
For more than 25 years, researchers across disciplines and policymakers worldwide have relied on the NASA Socioeconomic Data and Applications Center (SEDAC) to provide essential open-access population, environmental, and socioeconomic data. The spatial data produced or curated by SEDAC were specifically intended to facilitate integration with gridded Earth science data, in particular data from NASA’s constellation of remote sensing instruments. SEDAC has been managed by the Center for Integrated Earth Science Information (CIESIN) at Columbia University since 1998.
Today, we are excited to announce a new chapter for this invaluable resource: the transition and expansion of the CIESIN data collection, which will now be hosted as part of CAFE RCC’s Data Collection on the Harvard Dataverse Repository. This transition ensures more sustainable, long-term preservation of data while broadening access for researchers, policymakers, and practitioners across disciplines.
The Harvard Dataverse is a generalist repository that adheres to the FAIR Principles, is open-source, and enables anyone to both deposit and download data. It is free and open to anyone from any discipline, and anyone can share, archive, cite, access, and explore data.
What’s Changing and What’s Not?
The complete collection of 300 existing SEDAC datasets are migrating to the Harvard Dataverse repository for wider and more sustainable access. All SEDAC data will be available as a clearly labeled subcollection within the CIESIN collection on Harvard Dataverse. CIESIN is also releasing new data into this CIESIN dataverse collection. For example, the Global Gridded Relative Deprivation Index (GRDI v1.10) and EnvClim grids of urban exposure to climate risks have already come online, and over the coming weeks and months you’ll see other datasets being added to these.
According to CIESIN director, Dr. Alex de Sherbinin, “The CIESIN collection is of great value to those who are seeking to understand current extreme weather risks, such as extreme heat and sea level rise, among highly exposed populations. It also includes projections of population and urban areas to better characterize future exposure. This makes it a perfect fit for the CAFE collection.”
Here’s a look at some of the datasets available on the CIESIN dataverse subcollection:
- Gridded Population of the World (GPW): Global population distribution by age and sex, this dataset has been the gold standard for environmental exposure analysis, disaster preparedness, and policy modeling since the 1990s. Versions 3 and 4 of this flagship data product are available for download in the SEDAC subcollection of the CIESIN collection on Dataverse, and version 5 is under development for future release on Dataverse.
- Air Pollution Metrics: Detailed local air quality data, originally developed by academic partners at Washington University in Saint Louis and Harvard University’s Chan School of Public Health, supports ongoing public health research and monitoring.
- Environmental Performance Index (EPI): Data and reports for every release of the Yale/Columbia EPI from 2006-2024, as well as its predecessor, the Environmental Sustainability Index (ESI).
- Global Urban Points and Polygon Dataset (GUPPD): Records on over 125,000 settlements worldwide, including names and population trends since the 1970s, this dataset supports urban planning and historical research.
These datasets, among many others, will be complemented by new datasets currently in the development pipeline (including, in some cases, updates to the datasets highlighted above). As datasets are completed, they will be deposited into the CIESIN subcollection on Dataverse.
What to Expect in the New Repository
You’ll notice the new repository looks and works a bit differently than what you might be used to. On NASA Earthdata Search, users can still access SEDAC data through a faceted search on variables, years, and regions. On Harvard Dataverse, under CIESIN’s subcollection within the CAFE collection, users will instead browse file lists and apply basic filters. The data are packaged in formats that make them easily downloadable for analysis.
To help you adjust, we’re developing support materials, including tutorials, FAQs, and guides, to make it easier for you to find and download exactly what you need from the new CIESIN subcollection on Harvard Dataverse.
Looking Ahead
The transition of CIESIN’s data to the CAFE’s Harvard repository collection marks a significant step forward in the long-term preservation of SEDAC’s socioeconomic and environmental data. These datasets are vital for research on population dynamics, environmental exposures, and health outcomes, offering high-value resources for both scientific and policy applications.
As we continue this transition, you can expect regular updates and new releases of both legacy and next-generation datasets. We’ll be sharing updates and showcasing data from CIESIN in CAFE’s newsletter, on CAFE’s social channels, and on the CAFE GitHub page.
Explore CIESIN’s Data Collection Now, and let us know what features or datasets are most useful for your work.
Photo: © ChrisTYCat - stock.adobe.com
