Secure Data Enclave — SDE
A secure virtual desktop research environment.
The Secure Data Enclave (SDE) provides Columbia researchers with a secure, remotely accessible, virtual Windows 10 desktop environment to store and collaboratively analyze PII and PHI data as an alternative to traditional "cold room" computing environments.
Using a web browser, researchers can work on sensitive data and collaborate with other members of their project simultaneously. Researchers will only be able to access data explicitly placed in the virtual environment which is destroyed after use and is restricted so it can only reach other systems within the SDE. Data can only be transferred to and from the system by the designated Data Security Officer (DSO), required for each individual project.
SDE Details
The SDE is certified by CUIMC Security as HIPAA-compliant, and is certified for the storage and analysis of PII and PHI data. Users can reference the CUIMC Security RSAM registration ID number (3868), which confirms the SDE's certification by CUIMC Security for HIPAA compliance.
Additionally the SDE has been approved for use of popular restricted datasets including the Bureau of Labor Statistics National Longitudinal Surveys (NLSY) datasets, University of North Carolina Longitudinal Study of Adolescent Health (Add Health) datasets, and European Commission Eurostat restricted economic datasets.
Researchers on the SDE have access to a virtual Windows 10 desktop system with 4 cores of an E5-2680 CPU and 16GB of RAM. The amount of storage is divided between shared and individual user storage. A standard project is given 60 GB of data storage and 25 GB of group work collaborative space. Individual users get 20 GB of working directory space, and 2 GB for a Home Directory for code files and an additional 2 GB to stage files that need to be moved off the SDE.
If increases in CPU, RAM or storage resources are necessary, reach out to [email protected] to request a review by RCS. Such changes may incur additional fees to acquire and provide said resources.
The Research Computing Services (RCS) team handles software installation and updates, but user licenses must be provided by the project. Currently the SDE supports many types of statistical packages, including Stata, R, STAN, QGIS, and more. Other historically used programs, depending on licensing availability, have included SPSS, SAS, and more.
The standard offering is four accounts: Primary Investigator (PI), two Research Assistants (RAs), and Data Security Officer (DSO). Both for security and system access volume we ask project applicants to err towards restricting project members to as few users as necessary. If more researchers require access, it can be accommodated, but there may be additional costs associated.
Users must have a UNI and VPN access to use the SDE. Outside collaborators can get a UNI and VPN access through appropriate department-level HR status. CUIT’s RCS team manages accounts for the SDE for Columbia-affiliated users.
Researchers using the SDE system must identify a Data Security Officer (DSO). Often this is someone from the researchers' local IT group. The DSO will need to be added to the project's IRB protocol (if applicable). The DSO is responsible for:
- Loading and removing the restricted-use data
- Retrieving output on behalf of their project members
- Ensuring that all materials exported from the SDE do not violate the data use agreement or their project’s data handling requirements
- Conducting a training session for the researchers on how to securely access and manage their project data stored on the SDE
The SDE is priced at $552 per project, per year, which includes two user accounts, and one Data Security Officer account. Discounts are available for bulk purchases. Contact [email protected] to discuss.
Project Onboarding Process
Send an email to [email protected] to get started. Please provide:
- Your UNI and department/school at Columbia
- The name of your data-provider and the type of sensitive data you expect to receive (e.g. PHI, RHI)
- The name(s) of your PI (if not yourself)
- Any questions you may have
A member of CUIT's Research Services department will get back to you to go over your information and review the SDE requirements and restrictions.
After confirming the SDE is a good fit for your project, you will need to gather updated paperwork:
- Proof of data provider approval for using the SDE (if data provider is a non-Columbia entity). Typically this is in the form of a data agreement (DUA or DAA), modified to stipulate that the SDE will be used and signed by both the data provider and SPA. If no such formal approval exists, some sort of written approval by the data provider must be acquired.
Generally, you should discuss with the data provider what data security information they need to include in their DUA/DAA. Please reach out to Research Services at [email protected] for the SDE Data Security Plan and assistance with language to provide Data Providers and Columbia IRB.
If your data provider is within Columbia, please provide documentation of this.
- Proof of IRB approval* for using the SDE. If you have an existing IRB protocol, it must be modified to stipulate the SDE is being used. For the "System ID numbers" sub-question, you should reference the CUIMC Security RSAM registration ID number, 3868, which confirms the SDE was certified by CUIMC Security for HIPAA compliance. You should also add all users that will be accessing the SDE to the IRB protocol, including any Research Staff and the DSO.
After approval, you can provide proof via a PDF copy of the approval email or the downloaded protocol "data sheet".
*Alternatively, you can provide proof of IRB protocol exemption.
To formally apply for the SDE, complete this Qualtrics SDE application form and submit it to CUIT Research Services. The form requires:
- Baseline project information
- Names and contact information for all PI(s), Research Staff and DSO
- Document upload: Proof of data provider approval with SPA signature for using the SDE (if data provider is a Columbia entity, attach proof of this instead)
- Document upload: Proof of IRB approval/exemption for using the SDE
After receiving a complete application, Research Services will generate a custom SDE User Agreement. All PI(s), Research Staff, and your DSO must sign this document (Adobe Signature is accepted).
This agreement, along with your ARC chartstring information for payment ($552/year/project), should be uploaded to this Qualtrics user agreement and payment form.
Research Services will set up a time to give you and your DSO training on how to use the SDE. Training covers basic operation of accessing and conducting analysis on the SDE, as well as overview of data security measures in place and expectations of enforcement.
After training, your DSO is permitted to upload your project's data in a manner compliant with the agreement of the data provider. At this point, PIs and Research Staff may begin their analysis on the SDE.
If you haven't requested to have your project retired and deleted within a year's time, CUIT will reach out to confirm if you'd like to extend your SDE contract for another year, at the cost of $552/project/year.
Representative Projects
Restricted Datasets
-
Add Health
The SDE features several projects using the University of North Carolina’s Adolescent Health (Add Health) restricted datasets. Projects have focused on a variety of areas including the relationship of genetic factors and social outcomes. Examples range from projects testing the phenotype distinctions and social mobility among second-generation Latinos.
-
NLSY
The SDE has featured several Bureau of Labor Statistics National Longitudinal Survey of Youth projects, with a range of varying research projects, such as studying the degree of the transmission of economic advantage.
Government Datasets
-
VAT Tax Project
In agreement with the the city government in South America, Columbia researchers are analyzing the economic and fiscal impacts of special tax treatments in Value Added Tax (VAT) systems using government provided tax data.
-
State Health Data
In a multi-part study leveraging state-level Department of Health data, researchers at Columbia are using the SDE to compare and analyze the effects of prenatal care on subsequent fertility.
Proprietary Data
Researchers who collect sensitive PII and PHI data may use the SDE to securely analyze their data with projects - assuming the data meets their research design criteria approved by IRB and is used properly within the parameters of the consent granted by the subjects. Several projects have moved their survey and even lab findings to the SDE for analysis by their group.
Data Stewardship
Some researchers have found use in the SDE as a method of securely curating sensitive data to students and other junior researchers.