Recognizing a need for centralized health data to answer key questions about COVID-19, the National Institutes of Health extracted siloed data at the beginning of the pandemic to establish a national COVID-19 database. Now, it’s one of the largest collections of COVID-19 patient data in the world, MIT Technology Review reported June 21.

Six key details about the database:

  1. At the database’s conception, researchers sought to tackle two key problems with U.S. healthcare data. The first is that federal, state and local privacy laws overlap and sometimes contradict one another, making it difficult to access patient data. The second is that health records often are siloed by the institutions that own them because of both privacy laws and the profitability of selling deidentified health data.

  2. The National COVID Cohort Collaborative, or N3C, combines patient records from different institutions across the country and makes the data accessible to researchers. It houses 6.3 million de-identified patient records from 56 institutions and counting.

  3. Most of N3C’s records go back to 2018, and contributing institutions have promised to keep updating the data for five years.

  4. Contributing organizations participate by offering data for two groups: people who have COVID-19 and a control group. Before submitting to N3C, organizations strip the data of any personally identifiable details, except ZIP codes and the data of service. Once the records reach N3C, technicians clean them and enter them into the database.

  5. Anyone, whether they’re affiliated with an institution, can submit a research proposal to N3C. A committee from Baltimore-based Johns Hopkins University reviews each proposal and determines which version of the data the researchers will be able to access.

  6. Researchers think N3C is one of the most promising tools for studying long COVID-19, which will be a prevalent research topic in coming years.