Celebrating 1,000 Datasets: The CAFE Dataverse Community Hits a Milestone!
Researchers from around the world have now contributed over 1,000 datasets to CAFE’s collection on Harvard Dataverse, an open-source repository where you can find or share data key to health and extreme weather research.
This milestone is a community-wide achievement. When CAFE launched its Harvard Dataverse collection three years ago, the goal was simple but ambitious: Create a trusted home where researchers could share, preserve, and discover data following FAIR (Findable, Accessible, Interoperable, Reusable) principles that comply with NIH data sharing guidelines. The response has far exceeded what CAFE’s Data Management team imagined, with over 1000 authors from over 100 institutions contributing data from more than 100 countries to the collection. To date, more than 500 datasets have been downloaded, and harvested datasets have been accessed nearly 6,700 times.
“The level of acceptance the research community has made in making CAFE a home for sharing data goes beyond us,” reflects Kevin Lane, Co-Lead of the Data Management Function. “Researchers are talking about it with each other and reaching out to share their data. It’s extraordinary to see. I don’t think when we started this project we thought it would be this successful this quickly. We are grateful for the community’s great contributions.”
Reaching 1,000 datasets means that researchers now have access to data that spans exposure assessment, extreme weather, air quality, health outcomes, environmental health indicators, and more. Access to high-quality, well-documented data is essential for researchers, policymakers, and communities to understand risks and implement solutions.
Partnerships Power the Platform
Reaching 1,000 datasets was made possible by a wide range of contributions, from individual investigators, to new subcollections submitted by the community, to collaborative efforts to bring large existing collections into CAFE’s Dataverse.
For example, CAFE partnered with the Center for International Earth Science Information Network (CIESIN) at Columbia University to migrate NASA’s Socioeconomic Data and Applications Center (SEDAC) datasets into a CIESIN subcollection within CAFE. Together, the teams worked to preserve long-term access to more than 300 SEDAC datasets spanning gridded population, environmental, and socioeconomic data.
CAFE’s collaboration with DesignSafe followed a similar path, but through harvesting rather than migration. The DesignSafe subcollection in CAFE Dataverse lists and describes datasets that are published in the DesignSafe Data Depot Repository, the open-access repository of the National Science Foundation-supported Natural Hazards Engineering Research Infrastructure. DesignSafe datasets were gathered during experiments, simulations, field reconnaissance missions, surveys, and other activities that study the impact of natural hazards on people and structures. By coordinating metadata and descriptions between teams, researchers can now find natural hazards datasets in the same place they already look for extreme weather, exposure, and health data.
Behind the Scenes
This work would not have been possible without the Harvard Dataverse team, whose partnership and technical expertise have been central to CAFE’s success from the beginning.
“The inherent knowledge of building something like this is eye opening,” Lane notes. “You see how much work goes into mechanisms like metadata, and how to harmonize and link with other highly curated datasets and our own.”
Together, the CAFE and Harvard Dataverse teams have focused on a single priority: making it straightforward for researchers to share their data. Over nearly three years, they’ve defined what belongs in the CAFE collection, specified metadata that should always accompany a dataset, and refined templates and workflows so contributing data is smooth and easy. As the collection and community have grown, the same approach has been extended to support subcollections and harvested datasets, allowing individual investigators, large collaborations, and partner repositories to participate in ways that are consistent and manageable.
The result is that researchers with valuable data now have a trusted home for sharing it, and others know they can find the data resources they need to advance their own research.
Get Involved
We invite you to be part of what comes after 1,000.
- Explore the CAFE Collection on the Harvard Dataverse, which follows FAIR principles and adheres to NIH guidelines for data sharing.
- Help expand the collection by contributing data for sharing, and help us increase discoverability by following our updated guidelines on the metadata we collect.
- Browse our GitHub repository for code, tutorials, and tips on handling common data formats and sources, and check out our Getting Started with GitHub guide.
- Spread the word: Share the collection with collaborators, students, and partners who may be looking for data, or a home for their own research data.
Together, we can keep building a shared resource that makes it easier to do rigorous, reusable, and impactful research.







