Research Data Transfer — Globus

Secure, efficient and reliable file transfer service for large data transfers within Columbia and to external collaborators.

Globus is a unified high-performance data-transfer and sharing platform that allows you to move large and complex datasets directly between any two applications, systems, or local machines, eliminating the need for downloading and then uploading the data. You can use Globus for…

  • Data transfers between HPC Clusters or your servers.
  • Data transfers between a server and your laptop.
  • Transferring / sharing data with researchers and collaborators at other institutions.
  • Data transfers between supported cloud storage applications to/from any of the above.
  • Automated data transfers between any of the above.

Transfers happen unattended (with a confirmation email when complete), data verification is on by default, and encryption is enforced.

Announcements microphone

Announcements

  • RCS now offers a dedicated Globus managed endpoint for users who need collections to transfer their data, but do not have the IT resources or expertise to set up an endpoint to put the collection on themselves. This is a good solution for moving data from SRCPAC HPC cluster to Box, AWS S3, Google Drive, etc. If you need a collection, email [email protected] to begin the process (a ServiceNow ticket will be automatically created in your name).
  • A Globus request form is now available for CUIMC users who need a Globus Collection (or for any Columbia users transferring sensitive data).

Globus terminology

Globus uses specific terms. High-level definitions for quick reference; see Handling Collections vs Endpoints if more detail is needed.

Why use Globus to transfer and share data?

  • Fast: If you transfer large files or large collections of files (TBs, or even PBs) that take 15+ minutes, then Globus is highly recommended to expedite your data transfers. Globus is an efficient alternative to scp, sftp, and rsync over ssh utilities, which are best-suited for small datasets.
  • Reliable: If your data transfers may be interrupted due to an unreliable connection or exceeded disk quota, Globus is a great solution since it automatically resumes your data transfer in the case of temporary disconnections.
  • Secure: Globus integrates with the grid security infrastructure and adds encryption to both the data and control channels for moving data between two endpoints (e.g. your computer, HPC clusters, Google Drive, OneDrive, etc.). As a result, the data moves directly between the source and destination endpoints and cannot be accessed or stored by Globus, only by the GridFTP servers running on your managed endpoints. 

NOTE: At this time, all CUIMC users or users with CUIMC data must request a collection via CUIMC's Globus Request form, which is certified for sensitive data (RHI, PHI, PII). Morningside users with sensitive data can set up their own "high assurance" endpoint following a risk assessment from CUIT Risk Management team.

  • Convenient: With Columbia's Globus Connect and Open Access subscriptions, you can create a data-sharing endpoint on almost any device: your laptop or personal desktop, campus HPC clusters, lab servers, Google Drive, Amazon S3 bucket, Box, OneDrive, and more.
  • Collaborative: You can securely transfer data both in and outside of Columbia using Globus. The basic Globus transfer service is free for all non-profit organizations, so transferring data to external collaborators outside of Columbia is likely free for them as well!
Globus logo

How do I get started with Globus?

If you are new to Globus, follow these steps to create your Columbia Globus account:

  1. Navigate to the Globus login page.
  2. Select Columbia University from the drop-down (you can type the first letters to narrow results).
  3. Log in with your UNI and UNI password, and authenticate with Duo.
  4. Select your preferred permission-level for releasing your account information to Globus. Many users select the middle option.
  5. To join Columbia's Globus subscription, see below.

If you already have a Globus account from another organization, log in as described above and choose Link to an existing account. The Identity Linking Tutorial explains in detail how Identity Linking works.

  1. Request access to the Columbia University Standard subscription.
  2. While you wait to be approved, download Globus Connect Personal to set up a data transfer endpoint on your own Mac, Windows or Linux system. 
  3. Optional: Follow Globus' tutorial to practice sharing data.
  4. Optional: If you plan to share data from your computer directly to another Globus user, you must enable sharing in your Globus Connect Personal app. Click on the Globus app icon (in upper-right toolbar on Macs, lower-right toolbar in Windows), then select Preferences, choose the Access section, and finally check the Sharable box.

1. Log into Globus with your @columbia.edu identity.

2. Open Globus Connect Personal on your computer (see above to install GCP).

3. Navigate to the File Manager in Globus from the left-hand navigation panel.

4. Enter the name of your Globus Connect Personal collection at the top of the left panel (or vice versa). Tip: the name of your collection can also be found under Bookmarks --> Your Collections

5. Enter "CUIT Ginsburg Google Drive" at the top of the right panel (or vice versa).

Globus File Manager screen with Personal Collection and CUIT Ginsburg Google Drive collections entered as endpoints

6. On the left, select the file(s) you would like to transfer.

7. On the right, select the destination where you would like the files to be transferred to (MyDrive is the top-level location for LionMail Drive). If you don't select a specific folder, the file(s) will be dropped in the generic top-level Drive location.

8. Click the Start button at the top on the side you will be sending the data from. You will see a pop-up indicating that the transfer is in progress.

9. You will receive an automated email from Globus Notification <[email protected]> when the transfer is complete. You can also monitor progress using the Activity page in Globus (accessible from the left-hand navigation panel).

Globus File Manager with left-hand Start button circles and "Transfer request submitted successfully" pop-up on right

1. Log into Globus with your @columbia.edu identity.

2. Open Globus Connect Personal on your computer (see above to install GCP).

3. Navigate to the File Manager in Globus from the left-hand navigation panel.

4. Enter the name of your Globus Connect Personal collection at the top of the left panel (or vice versa). Tip: the name of your collection can also be found under Bookmarks --> Your Collections

5. Search for the name of the CUIT HPC cluster at the top of the right panel (or vice versa). All users that have an HPC account will have automatic access to their cluster's collection.

Globus Web App page asking for permission to connect to CUIT HPC cluster collection

6. Once you select the cluster, you will need to authenticate your HPC account within Globus. Click Allow.

6. On the left, select the file(s) you would like to transfer.

7. On the right, specify the destination where you would like the files to be transferred to.

8. Click the Start button at the top on the side you will be sending the data from. You will see a pop-up indicating that the transfer is in progress.

9. You will receive an automated email from Globus Notification <[email protected]> when the transfer is complete. You can also monitor progress using the Activity page in Globus (accessible from the left-hand navigation panel).

FAQ

If you (or your department) don't have the IT expertise or resources to establish your own Globus server (aka Globus endpoint), then you can reach out to CUIT Research Computing Services by emailing [email protected] and we can help you set up a Globus collection (e.g. for Box, Google Drive, AWS S3, or other ) on our managed Globus endpoint.

Please bear in mind:

  • RCS' managed endpoint is under Globus' "Standard" subscription, which does not allow sensitive data of any kind.
  • Any collections that RCS establishes are meant to be short-term, however feel free to reach out and we can help you strategize on a longer-term solution.

Globus uses GridFTP, a high-performance extension to FTP, optimized for high-bandwidth, wide-area networks, providing more reliable high-performance file transferring and synchronization than ftp, scp, or rsync. Grid FTP automatically tunes parameters to maximize bandwidth by auto-selecting the most appropriate settings for concurrency and parallelism on every transfer task.

That said, Globus transfers are still subject to your local environment's constraints, including:

  • Local network speed (check your current speed here)
  • Endpoints: Transfers involving a personal endpoint are likely to be slower than transfers between institutional endpoint. If you are transferring to a storage device, then SSDs with USB-C or USB 3.0 connectors are recommended to optimize speed (rather than HDDs or SSDs with older connectors)
  • Resources: The load or available resources (RAM, CPU, etc.) of the source and destination collections
  • Storage systems: The performance of the source and destination storage systems