Secure Data Enclave — SDE

A secure virtual desktop research environment.

The Secure Data Enclave (SDE) provides Columbia researchers with a highly secure, remotely accessible, virtual Windows 10 or Red Hat Linux desktop environment to collaboratively analyze and store sensitive data (PII, RHI, and PHI). The SDE functions as a virtual, remote-friendly, alternative to traditional "cold-room" computing environments, where a physical computer, disconnected from the internet, is used for sensitive data analysis.

Using Citrix remote desktop, researchers can work on sensitive data and collaborate with other members of their project simultaneously. SDE projects are logically isolated; researchers are able only to access their data explicitly uploaded on their SDE project's virtual environment. Moreover, they are restricted from internet access and can only reach applications installed within the SDE (e.g. Stata, R, Python). Data can only be transferred to and from the system by the designated "Data Security Officer" (DSO), required for each project. All data is securely wiped after project retirement in compliance with DOD 5220.22-M standards. 

SDE Details

The SDE is HITRUST certified by CUIMC Security as HIPAA-compliant, and is certified for the storage and analysis of PII and PHI data. Users can reference the CUIMC Security RSAM registration ID number (3868), which confirms the SDE's certification by CUIMC Security for HIPAA- and sensitive data compliance.

Additionally the SDE has been approved for use of popular restricted datasets including the Bureau of Labor Statistics National Longitudinal Surveys (NLSY) datasets, University of North Carolina Longitudinal Study of Adolescent Health (Add Health) datasets, and European Commission Eurostat restricted economic datasets.

Researchers on the SDE have the option to choose either a virtual Windows 10 or a Red Hat Linux desktop system, each powered by 4 cores of an Intel I8462Y CPU and 16GB of RAM.

Storage allocation is partitioned between shared and individual user storage. Standard projects receive:

  • 100 GB of raw data storage in a shared Data Directory
  • 100 GB of collaborative workspace in a shared Group Work Directory
  • Individual users are each provided with
    • 50 GB in each individual Working Directory
    • 2 GB in each individual Home Directory designated for code files
    • 5 GB in each individual Output Directory, for staging files that need to be relocated from the SDE

If increases in CPU, RAM, or storage resources are necessary, reach out to [email protected] to request a review by RCS. Such changes may incur additional fees to acquire and provide said resources.

The Research Computing Services (RCS) team handles software installations and updates. Currently, the SDE supports many research analysis software, including Stata, R, Python, STAN, and QGIS. Other programs, depending on licensing availability, have included SPSS, SAS, and more (not that the cost for user licenses must be paid for by the project owner).

The standard offering is five accounts: 1-2 Primary Investigators (PIs), 2-3 Research Assistants (RAs), and 1 Data Security Officer (DSO). Both for security and system access volume we ask project applicants to err towards restricting project members to as few users as necessary. If more researchers require access, it can be accommodated, but there may be additional costs associated.

Users must have a UNI and VPN access to use the SDE. Outside collaborators can get a UNI and VPN access through appropriate department-level HR status. CUIT’s RCS team manages accounts for the SDE for Columbia-affiliated users.

Researchers using the SDE system must identify a Data Security Officer (DSO). Often this is someone from the researchers' local IT group. The DSO will need to be added to the project's IRB protocol (if applicable). The DSO is responsible for:

  • Loading and removing the restricted-use data
  • Retrieving output on behalf of their project members
  • Ensuring that all materials exported from the SDE do not violate the data use agreement or their project’s data handling requirements
  • Conducting a training session for the researchers on how to securely access and manage their project data stored on the SDE

The SDE is priced at $1,000 per project, per year, which includes up to four user accounts, and one Data Security Officer account. Discounts are available for bulk purchases. Contact [email protected] to discuss.

Project Onboarding Process

Send an email to [email protected] to get started. Please provide:

  • Your UNI and department/school at Columbia
  • The name of your data-provider and the type of sensitive data you expect to receive (e.g. PHI, RHI)
  • The name(s) of your PI (if not yourself)
  • Any questions you may have

A member of CUIT's Research Services department will get back to you to go over your information and review the SDE requirements and restrictions.

After confirming the SDE is a good fit for your project, you will need to gather updated paperwork:

  1. Proof of data provider approval for using the SDE (if data provider is a non-Columbia entity). Typically this is in the form of a data agreement (DUA or DAA), modified to stipulate that the SDE will be used and signed by both the data provider and SPA; if you are based in Teachers College or Barnard, the SPA signature can be replaced by a representative from your school. If no such formal approval exists, some sort of written approval by the data provider must be acquired.

    Generally, you should discuss with the data provider what data security information they need to include in their DUA/DAA. Please reach out to Research Services at [email protected] for the SDE Data Security Plan and assistance with language to provide Data Providers and Columbia IRB.

    If your data provider is within Columbia, please provide documentation of this.
     
  2. Proof of IRB approval* for using the SDE. If you have an existing IRB protocol, it must be modified to stipulate the SDE is being used. For the "System ID numbers" sub-question, you should reference the CUIMC Security RSAM registration ID number, 3868, which confirms the SDE was certified by CUIMC Security for HIPAA compliance. You should also add all users that will be accessing the SDE to the IRB protocol, including any Research Staff and the DSO.

    After approval, you can provide proof via a PDF copy of the approval email or the downloaded protocol "data sheet".
    *Alternatively, you can provide proof of IRB protocol exemption.
     

To formally apply for the SDE, complete this Qualtrics SDE application form and submit it to CUIT Research Services. The form requires:

  • Baseline project information
  • Names and contact information for all PI(s), Research Staff and DSO
  • Document upload: Proof of data provider approval with SPA signature for using the SDE (if data provider is a Columbia entity, attach proof of this instead)
  • Document upload: Proof of IRB approval/exemption for using the SDE

After receiving a complete application, Research Services will generate a custom SDE User Agreement. All PI(s), Research Staff, and your DSO must sign this document (Adobe Signature is accepted).

This agreement, along with your ARC chartstring information for payment ($1,000/year/project), should be uploaded to this Qualtrics user agreement and payment form.

Research Services will set up a time to give you and your DSO training on how to use the SDE. Training covers basic operation of accessing and conducting analysis on the SDE, as well as overview of data security measures in place and expectations of enforcement.

After training, your DSO is permitted to upload your project's data in a manner compliant with the agreement of the data provider. At this point, PIs and Research Staff may begin their analysis on the SDE. 

If you haven't requested to have your project retired and deleted within a year's time, CUIT will reach out to confirm if you'd like to extend your SDE contract for another year, at the annual fee.

Representative Projects

Restricted Datasets

  • Add Health

The SDE features several projects using the University of North Carolina’s Adolescent Health (Add Health) restricted datasets. Projects have focused on a variety of areas including the relationship of genetic factors and social outcomes. Examples range from projects testing the phenotype distinctions and social mobility among second-generation Latinos.

  • NLSY

The SDE has featured several Bureau of Labor Statistics National Longitudinal Survey of Youth projects, with a range of varying research projects, such as studying the degree of the transmission of economic advantage.

Government Datasets

  • VAT Tax Project

In agreement with the the city government in South America, Columbia researchers are analyzing the economic and fiscal impacts of special tax treatments in Value Added Tax (VAT) systems using government provided tax data.

  • State Health Data

In a multi-part study leveraging state-level Department of Health data, researchers at Columbia are using the SDE to compare and analyze the effects of prenatal care on subsequent fertility.

Proprietary Data

Researchers who collect sensitive PII and PHI data may use the SDE to securely analyze their data with projects - assuming the data meets their research design criteria approved by IRB and is used properly within the parameters of the consent granted by the subjects. Several projects have moved their survey and even lab findings to the SDE for analysis by their group.

Data Stewardship

Some researchers have found use in the SDE as a method of securely curating sensitive data to students and other junior researchers.

FAQ

Yes, multiple users can log in at once. Each user receives their own SDE account and connects to their Windows or Linux desktop via Citrix.

Only the users on your project will have access to the same data. Each project's data storage is split among several drives, some of which are individual and some of which are shared among the project.

Yes, Access is available along with the entire MS Office Suite. You can see a list of typically available programs here and we can discuss installing special programs (if you have a license) if needed.

No. The SDE is completely isolated; there are no network capabilities on SDE machines.

No. Because there is no network connection (the virtual machines are air-gapped), it is not possible to connect to applications over the network.

Yes. Nearly all projects that use restricted datasets need an IRB protocol. If you believe your project is an exception, you can confirm that by asking to have your project approved by the IRB as exempt and providing that documentation.

At some point after leaving Columbia (graduation, retirement, moving jobs), a user's VPN access is rescinded and they will no longer be able to access the SDE. To maintain access, the user should speak to their local HR person about how they can maintain UNI access with VPN privileges after they leave Columbia.

Yes, as long as the departing user's data is properly removed, and the new user is properly onboarded.

No. Your DUA needs to contain the signatures of SPA (or Barnard/TC representative, if that is the case) and a representative from the data provider.