Google Cloud presents: GCP Fundamentals for Research
Transcript
All right, that's 1:02. I know people will continue trickling in, but we have a packed agenda, so we want to get started. So, good afternoon everybody. I'm Jessica Eaton. I'm from CIT research computing services and I'm happy to see you all here for our Google cloud fundamentals session. my team research computing services organizes research computing trainings and workshops like this one and our primary mission is to support Colombia's researchers by providing central services like HPC access secure analysis enclave for sensitive large data transfer tools and just consulting in general.
So, please email us if you are trying to figure out research resources at Colombia and we'll be happy to point you in the right direction, even if it's just not us. and here is our agenda for today's session. I'm not going to spend a time reading this because it's packed, but we plan to save questions for the end. And you can feel free to drop questions in the chat as we go and we'll come back to them unless they're very easily answerable in the chat. we chose the topics from the registration email survey and if there's something you want to dig into more later or something missing, you can let us know in the feedback survey after the session.
Our main presenter today will be Russ Goldenbroit, our partner at Google that we meet with at least monthly. Russ has worked at Google for nearly 10 years and is a principal solutions architect specializing in cloud computing. We also have our Google account executive Nikki Bernstein here to help answer questions. And then finally, a quick disclar This session is being recorded. It will be posted online on a UNI protected web page which I'll share with you guys in a follow-up email. Before I give the mic to Russ, I'm going to kick us off with some Colombia specific information about Google Cloud.
So Colombia has a formal business associate agreement with Google which means we get access to all the GCP modules and we provide some extra security and privacy protections on default as well as integrating the GCP login with Colombia's single sign on with your uni and payment with Colombia's ecosystem. comes the technicalities. you have to be a Morningside faculty or staff member to be able to leverage Colombia linked GCP projects and you must provide a Colombia arc chart string which will be used to pay for your cloud usage. So if this is you, you can create a project via the request form linked on CIP/GCP page which I just dropped.
the rates that are charged are in sync with Google's public cloud pricing. so I know from looking at the registration that many people here are students who do not have access to a chart string or they may be staff or member faculty members that maybe they do have a chart string but they're concerned about running up costs. So in these cases postocs and grad students can remember that their PIs often can get one of these Colombia linked GCP projects for their labs. But for everyone else Russ is going to go over some resources that students or researchers can use to apply for free GCP credits.
also towards the end of the webinar, he's going to go over how to analyze your billing and how to set up GCP budgets and alerts so you can sort of reign in runaway cost. and again, I'm going to send you all the links after the session. So, You can just pay attention now. I'll send it over to Russ for the fun stuff.
the fun stuff. Very nice. thank you for the introduction Jess so it's a pleasure to be here with all of you today. So as Jess was mentioning just a sort of introduction I am working with Google as a cloud engineer for quite some time. and in that time primarily the people and organizations that I've been working with are a lot of different educational institutions, academic medical centers, researchers. so I have a plethora of use cases and experiences that have gone on through the years.
my whole goal is really to educate during this session and maybe really go over some of the more basic foundational things and things that have been of value to different organizations like yours throughout the years of my tenure at Google that you may or may not be aware of. So without further ado, I'm going to start sharing my screen and Jess mentioned this is being recorded. So you won't have to actually I guess take notes but you can refer to this but we'll get the links al together as well. So the first thing that I wanted to start out with is I'm assuming that you can see my screen. is that accurate? Good. Awesome.
So the first thing that I wanted to go over is the fact that Google is one of several major providers out there and so it is likely that individuals who have any background in cloud computing have probably worked with some of our competitors like AWS or Azure. this is public documentation that Google posts to essentially make it easier for individuals to be able to map services that they may be used to using in either AWS or Azure to the equivalent in Google Cloud. to give some examples and it's filterable.
So for example if you're looking for let's say a compute offering for example you just need access to virtual machines you can easily filter the list and look at the various offerings that are available Google's core offering be being compute engine but perhaps you're more familiar with Amazon's EC2 or what Azure virtual machines which is what Microsoft calls it This is just an easy way for you to be able to sift through the equivalent resources and services so that you can get a sense of what services do I need to know of and begin with if I'm used to working in a different cloud provider.
So that can be very helpful and we'll link to this resource but wanted to sort of start out with that assuming that you may be using other cloud providers for various tasks the other thing that Jess mentioned this is something that I'm very familiar with just because of my background in working with researchers across the United States at different universities ies, academic medical centers. essentially Google understands that, research costs money and, researchers tend to have, either grants or perhaps the funding is a little bit more scarce, especially during, some of the times more recently.
So Google has a research credits program that is exclusively for either faculty, PhD researchers where you can apply for a grant of credits. I believe it's up to $1,000 for Ph students and postocs and for faculties, staff, PIs, you can get up to $5,000 in GCP credits. And basically it's not meant to fund the full research of course.
It's meant to basically build the scaffolding do benchmarking testing to essentially make sure that everything is good so that when you're ready to run larger scale computational tasks and perhaps that you have grant funding for a lot of the foundations will probably be set up already. and the form is actually quite easy. So, it's really just a simple form that gets submitted with a handful of questions about your project and some of the details about the Google Cloud account that you'd be working in.
and then once that gets approved, the credits get sent to the billing account that's attached to your Google Cloud project and then you'll have those credits available to you for the work that you're doing. So at a high level that is the research credits program. so I think that that will be probably of use to a good majority of you that are working in the research space. after this I wanted to talk a little bit about I wanted to get into the heart of it. So as Jess was mentioning we have a packed agenda.
and I understand that some individuals here are most likely not as familiar, perhaps never even seen anything in the Google Cloud ecosystem or environment. And so I wanted to go over some high-level concepts, show you what it's like to work in the console, and then go over a number of different examples, make it very live for you to really be able to see what it's like to do certain tasks that really are quite common when people are trying to either run jobs or spin up applications.
so that's my goal is to get you a little bit more comfortable from that perspective. So what we're going to be focusing on today and I'm going to go over this at a very high level first so that you understand what this is about. the first thing that you need to understand and this is a little bit different from the way that AWS works or Microsoft. So if you're familiar with other cloud providers this will be a little bit different but as you can imagine the source of identity for cloud is basically your Google cloud accounts. So in the back end you have identities that are provisioned from by Colombia that give you access to essentially any of these GCP console accounts that allow you to spin up virtual machines and things of that nature.
I should say that I'm not sure how the internal provisioning works at Colombia, but we've worked with the team and so that I'm sure you can reach out to the central IT group here if you need accounts provisioned like Jessica was mentioning earlier. but your identities are provided here. Google cloud has a specific way in which projects or basically if you're trying to do research or anything and you need access to computational resources the way it works is your organization Colombia as a whole has an organization this is my demo environment so you see my name at this fake domain
But Columbia has an organization that is dedicated for the overarching administration for all the different projects and resources groups and different applications and researchers have within the university. And then underneath all of that is where all of your projects will live. So very high level, but these are all obviously demo projects for that I've been working with. But in a general sense, when you go in and decide you're going to do some sort of new project, new app, etc.
you will get access to a project basically by filling out a few different pieces of information and one of them being is the billing account which I'm going to go into a little bit more detail but this billing account is where all of the costs are managed and I'll talk about that in a little bit more but the project is where
your apps, your workloads, everything lives and it's a resource segmentation. That's what it does. It basically separates your individual projects from maybe the person next to you or somebody else in your department. And you can create as many of these as you want. but generally speaking, this is how the segmentation works. So going to go back. this is the homepage essentially. So based on what I've been talking about, we have a highle project name that I'm working in. I wanted to just point out a couple of different things that you'll be most likely working with if you spin up any type of resources in Google Cloud has a number of different ways that you can interact with resources. So the first way is the guey. So it has a pretty robust guey.
So, if you actually click on this hamburger menu on the left side and you look at all pro products, you can actually literally just kind of scroll through and click whatever you're interested in talking about or working on I should say. And then at the top, since Google is a search company, if you weren't familiar with something and you didn't know know where it existed, you could actually search for it in here. So, if I really wanted to, I can type in compute engine and I could actually see all of the possibilities that I could search for that have compute engine and I can then click on it and that will actually bring me to the guey area where a lot of my let's say virtual machine instances are sitting and different metrics around the virtual machines. So, at a high level the guey exists.
The second way that you can actually interact with all the resources is by clicking this button at the top. It says activate cloudshell and this is just an authorization. cloud shell is essentially a DBN instance that is running in the background. not specifically within your project, but it's tied to your project. And it basically allows you to enter different CLI commands to interact with the variety of different resources that are occurring in your organization.
there's an SDK called Cloud which is essentially the SDK that you can use either here or via Python libraries or a number of different libraries that allow you to script and do more programmatic interaction with different Google Cloud resources. before I start going in and showing you the first thing I'm going to show you is actually the virtual machines because I think that's actually a good starting point for a lot of individuals when you're running whether it's computational jobs or standing of an application on a VM.
this is sort of like where people begin usually before they get into a lot of the managed services that I'll talk a little bit about. But generally speaking, I wanted to give you a taste of what it's like to actually work in the console, perform several actions that a lot of individual researchers might work with. So, let me quickly Whoops. I have a script that I want to show you guys for demonstration purposes.
So I created this so for demonstration purposes this is an example of I don't know if you guys can see my cloud shell good okay so in here we actually can see the SDK working as in cloud shell and for example if I wanted to create a virtual machine. In this case, I'm calling it demo VM. I'm specifying the type of machine where I want it located. I'm basically all of the details of this machine are flagged.
and this will I mean a lot of researchers or statistitians or people that are comfortable with essentially programmatic languages prefer to work in cloud shell or in these types different scripting libraries so that they're able to automate a lot of functions like creating virtual machines or other services. and so I just want to give you a taste. Actually, in the interest of time, I actually created this instance of so I'll just show you the result. but if you actually go to view instances, you can actually see all of the various virtual machines that I have running. One of them being this demo VM that I created.
just so when you do I could have created this completely in the guey which is quite a nice feature that Google having worked with the other providers it is particularly nice in terms of how the guey is laid out for individuals that are maybe not as familiar with the libraries or more of the programmatic ways to interact. and so it's nice where re let's say I was going to create a new virtual machine with a specific configuration. I can easily first of all just click around and see all of the specs for the variety of different machines that might be of interest to me.
but you also get to see a monthly estimate which is obviously very important to understand because as a researcher just in general we're all cost constrained. So obviously spinning up this type of instance that costs $36,000 a month. It's probably good to know to have an estimate of how much this thing is going to cost compared to some other virtual machine. So that's always very nice. the other thing that's really nice is part of the advanced configurations. So what's nice about Google Cloud is you're able to create custom machines. based on a specific series meaning basically it's a CPU platform that's being used under the hood.
you can specify the exact amount of cores and memory that you require for that machine so that you don't overutilize the resources. that's just something worthy to note. The other thing is that's great about this is that after you've configured it, you can actually click this button at the top which is the equivalent code and it provides you essentially what you've selected in three different formats either CLI Terraform if you're familiar about automating it. I'm not going to get into that right now but usually at a scale that's very helpful and of course REST API. So this can be very helpful once you get the hang of setting up the different customizations that are available.
So the machine configuration, the o the storage, the security, the networking, everything is relatively simple clickth through or if you are more well-versed in CLI then you can use the cloud shell. Now outside of this I think what's very helpful to take a look at is the ways in which the different services that interact with each other. So let's say you have this virtual machine, this demo VM that's stood up. I'm going to go back and you essentially want it to do something.
a lot of times when I'm dealing with researchers a lot of researchers historically have stored data in let's say Google Drive and so if I actually like so for the purposes of this demonstration I have a Google Drive account the identity is actually connected to the same Google cloud
code instance and I just created a sample text file. It just says this is the test file. But the idea here is that let's say you have an application that a bunch of could be anything. It could be like genomics data or some other test data that you're storing in Google Drive for some reason. there are other options but you want to move it into the cloud so that you can do let's say some sort of analysis or computational assessment on it. the great part I think about just in general the Google cloud ecosystem and its tiein to workspace is that you either through CLI or through let's go back to where's that VM instance? Okay, here it is.
through some scripting means you can actually scripted to essentially move data from let's say one place to another so that you can run different computational tasks. So for this purpose Google has an very simple object level storage that is available to store large volumes of data at a relatively cheap price. Meaning this is incredibly cheap to store data I have a variety of different things buckets in here that I've used for a variety of different demos that I've worked with.
but what I actually want to do here is take that Google Drive file that I had created. Where is that? that I had created that's in Google Drive. And I want to be able to essentially move that file over to Google Cloud Storage so that I could do some sort of analysis or whatever on it using a bunch of the to Google Cloud tools that we have. so again I created a script here. but generally speaking what this is doing is that the script is reading from Google Drive and it's copying it over to a Google Cloud Storage bucket.
So that if I hit enter so I've now saved that file on my cloud shell and then I can run this Python script right that's located in on here it's basically going to look for that test file that text file
and move it into the storage bucket. And so after you're running it on cloud shell it's essentially transferring that file over to here or hopefully it does I have to go back. whoops let's see where did it drive to So okay here it is. And so you can actually see it here. and this can be used for a lot of computational tasks what have you. very powerful in general just to be able to do different analyses. simple if you're quite simple if you're very used to working in cloud environments but it's really just to convey how do you start working with the different tools that are in Google Cloud.
So, in general, I'm going to leave the virtual machines where they are at the moment and I'm going to move over to something that may be more interesting to data analytics individuals. So if you're a statistician or you're working in a variety of different statistics programs, you're probably familiar working with things like Jupyter notebooks. and what I wanted to point out here is if we go into so Google Cloud has a product that's actually changed names many different times, but I believe it's now called workbench is essentially managed Jupyter notebooks.
So for people that are trying to do any type of analysis or create something that's sharable in terms of I'm going through a certain number of let's say some tests Jupyter notebooks are a great mechanism to essentially save the flow of work that you're doing and be able to share it with other researchers or contributors that can replicate it or use it for one of their own use cases or whatever it may be. so workbench is this managed Jupyter notebook.
the first thing I want to just preface here is that why is this interesting and why is this a good thing for researchers to really know about and the first part of it is that these notebooks run on Google cloud infrastructure. So, as you can kind of see in this column with a few of these notebooks that I have, they all have different amounts of vCPU and RAM. Kind of like what we were talking about with the virtual machines. This piece is completely configurable. So, if you're doing let's say large training of machine learning models as an instance, for example, I mean, you would need a lot more vCPU and RAM and potentially some things like GPUs.
And with workbench when you create a notebook you actually can specify a lot of the specific configurations the machine the amount of disk the type the network that you're using GPUs etc which it's very good because otherwise you'd have to spin up individual virtual machines and configure them independent of running the Jupyter notebooks. And so that this is a really powerful tool for individuals because you as an analyst or someone who's doing some sort of research into something specific do not have to worry about managing the underlying infrastructure. You can basically just kind of click through the prompts and it will create modify existing infrastructure to meet your needs. So that's a very powerful tool.
I always like to point out now inside workbench another distinction I wanted to make collab so collab is a free service that a lot of researchers tend to and statistitians etc tend to be well aware of because it offers Jupyter notebooks basically for free this is a little bit different so we're the difference between collab and there's actually collab enterprise and workbench. The biggest distinctions is the specifying the underlying infrastructure. So if you need some very robust compute and GPUs and things of that nature, those are available and modifiable for you for your notebooks. the other thing is security.
A lot of researchers, especially if you have regulated workloads that you may be working with, may have strict requirements you couldn't use the consumer version of collab with let's say medical data or things like that. Google Cloud has things like HIPPA compliance or other regulatory compliance and v a variety of different controls that essentially will provide you the mechanism to be able to control it and then do a variety of different analyses on these notebooks. So that's the one thing I wanted to point out there.
so that's said, let me actually open one of these notebooks. So I just have to figure out one which. So pardon me for a second. Let's open the last one. what's happening. what's happening here is essentially you have a variety of different virtual machines that are that run in the background that support the Jupyter Lab instances.
So the first thing that happens when you're trying to work and this is actually kind of nice because idle instances will shut down and save you cost so that if you for some reason forget the fact that you're running Jupyter Lab if you're not using it the system will realize after some time and essentially suspend it so that you're not going to pay for the virtual machine that's running. otherwise you'll have to actually close it out. and you can just actually come in here and easily start or stop instances. probably should have started this before because it does take some time to create the instance and then start Jupyter Lab on top of it.
but let's see if this is going to start up anytime soon. So you can see it's actually still provisioning the instance. what's nice again I think in general about Google Cloud is that a lot of information is available to you to get a sense of what's happening in the instance itself. So if you knew nothing about it and you were sort of just clicking around you'd be able to figure out it's provisioning right now so it's taking some time to actually set up. What's also nice is that this little right this will be your best friend when you're starting out in Google Cloud.
There's this little learn tab. This will actually understand where you're working essentially in the space that you're in. And you can actually find addition different reference documentation and training that may be applicable to the tools that you're in at the time. And so it will actually be a great help for you if you know nothing about u the tool and you need a learning experience. All right. So it looks like that this was provisioned. Let's go back and click open Jupyter Lab again. And what it does is it's going to open up the instance in a separate tab in your browser.
And so, for those that are, used to sort of the Jupiter experience, this will probably seem pretty, familiar to you. So, the first thing that we're in the launcher at the top what you're going to see is the machine that we've provisioned with it. So again this is configurable if you are finding things to be slow or taking too much time and you want to sort of beef it up you can modify the underlying infrastructure as needed.
I think somebody here probably is very familiar with the v varieties of different notebooks and things that are available. But I created a couple of different ones that are familiar to me and I think analysts for the purposes of this I'll just go into the R version. Whoops. That's the console. Let's see what I have here. So, I want to Where is that? So, I want to find my notebook for the first piece. Where is that? here we go.
or no. Is it this one? Nope. R. Let's see. here we go. So, I wanted to give an example so that this hit a little bit home. so, I created basically an R data analysis notebook to give it a little bit more substance. very high level.
and again, for many of you that are analysts or, using this on a day-to-day basis, this example may seem trivial, but what I'm really trying to sort of mark home mark here is essentially the ease in which you're able to get to work with the right supporting infrastructure in Google Cloud to do your analyses and share these notebooks and the outputs of these notebooks with the appropriate people to sort of move your research or whatever you're working on and so we have a sample our data analysis. This has a bunch of code that's being used to run through. you can just click up top here which is going to run the notebook. you don't have to worry about all the red. It's just a bunch of warnings.
But this particular code is actually looking at some built-in data sets that are part of the R distribution and doing some analyses on them in terms of let's say I think this was flower samples and there were also sample data. So it's finding different distributions averages standard deviations creating some plots and so forth and so on. there's a very large u amount of things that you can do in Jupiter and then I could then share this notebook if I wanted to with maybe a collaborator or somebody else that is working on what I'm working on and so that they could either replicate it or use it to do something else.
So this is the workbench tool. and really again what it is is this ability for you to sort of spin up these instances that have either, whatever resources you need, GPUs, storage, etc. what's really nice about this also is that since it's integrated into Google Cloud Workbench you could realistically very easily actually use Workbench in conjunction with other cloud tools. So for example I was showing before moving data from a virtual machine to cloud storage. all of the data storage for example could be in cloud storage that you're working with and maybe the analysis and all the compute resources that you're working with could be in this notebook.
And so it's very easy to create scripts and basically notebooks that would be able to interact with Google Cloud Storage or other services to really sort of accelerate your work and whatever you're sort of working on. So that is all I kind of wanted to talk about in terms of the data analytics piece around workbench. The next thing that I wanted to work talk about is something one of the cooler things that we're doing in the space and I think everybody is pretty familiar and recognizes the advent of artificial intelligence and the v variety of different tools that are available.
So Google's AI model, Gemini, if you're not very familiar, is slowly being integrated into every service. And so re finding you're going to find that it's also integrated into Google Cloud. So, if you actually come over here and you look at this little star thing, it will bring up a brand new tool called cloud assist, it's essentially just a cloud assistant to that you can query against your Google cloud instance.
So all of the data is your data meaning one else is getting insights into the underlying storage or resources but if you wanted to for example I mean it's even saying here's an example show me my compute engine insights and I wanted to just know something about what's going on in my environment. it's using Gemini to essentially query a bunch of the metadata and really provide me some information back and as you can see it's giving me code that I could easily run to get the insights. I think in the future what we're going to see is this is going to be a lot more agentic meaning it's going to be able to actually perform actions on your behalf.
So, it's going to become easier and easier for you to use natural language and essentially get what you want out of the platform. I just wanted to show that because it's very helpful in case you get stuck and you really are like, I don't know what to do here and let me ask Gemini. could be very helpful. that being said, the actual platform that Google uses, that I'll quickly go through is called Vertex AI that you may or may not be familiar with, but I wanted to just touch on it at a very high level because this is where a lot of the research is going today. this is where a lot of essentially anybody who's working on AI platforms they're probably going to be interested in taking advantage of these things.
So if you're familiar with AWS and their bedrock offering and Azure has various offerings I think through pilot Google's approach to AI is a little bit different because we actually hosting a platform we have a platform that's called vertex AI and within that platform is essentially everything an individual might need
to create and deploy an application that uses AI and that's very different from other providers because for example if you were using open AIS their entire model for the most part has been like here's an API you can use it for whatever you want but we're not giving you infrastructure we're not giving you ways to test it's just essentially here's an API that you can hit against and use the response as you'd like. Google's approach is different because they want to help essentially across the full spectrum of application deployment. And so there are a number of different tools here that allow you to essentially do things like deploy the models against a v like your applications. essentially with vertex AI studio you can actually test out a variety of the models.
and then there's model garden, which I'll walk through in a second. there actually is a platform to pick and choose a variety of different models that are available. let me show you a couple of these right now just so you can get a flavor. But if you go into Vertex AI Studio, there's a lot of different things you can do here. But the main purpose of Vert.Exi X AI Studio is actually to test and tune the different models that you may be using in your application. why this is nice that you can see if you go up here to models there's a bunch of different models that Gemini has out right now but it's not just Gemini's models that are available to you in Vert.Ex AI.
This is one of the biggest factors that's different between a lot of the providers is that you can see anthropics models that are third-party or llama's models are available and then there's other models that you can have available that are open source that you may want to bring into your experimentation or testing with. This is part of the value ad. and then why does this different from just some generic chat experience where is that you can actually adjust key factors that affect the model responses like temperature or how much of the output is sent out.
this is important because from a cost perspective, this is something you want to keep an eye on because the more input and the more output that you have the more costly it becomes. But we don't have time to go through a lot of this, but I did want to touch on some of the different tools that are available. if we go back and I just wanted to quickly show you one more thing with Vertex AI, if I can get to where is that? here it is.
So if you go to the model garden, what's nice about this is that model garden is truly a sort of compendium of different models that are available to anyone who's trying to do something with them. It's not just trying to push Gemini's model. I think Google realizes and probably most providers realize that there isn't one model that's the best. It's really b depends on the use case. and so what's nice and what's cool is especially the ones that I've worked with in the healthcare and life sciences space for example let's say you were looking at radiology type models or this is the I'm doing it by task. sorry it should be able to actually do that.
where are those? okay. So for example, let's say there's some obvious always always disclaimers when you're using a model, but generally speaking, a lot of these models that are perhaps open- sourced and deployable with a click of a button on Vertex AI and allow researchers to use them as foundational models so that they can accelerate their research without having to sort of retrain or create their own essentially.
has been very helpful to some of the researchers and groups that we've worked with. And these will continue to expand especially in the healthcare and life sciences space where we're doing a lot of good work with researchers across a lot of different universities. so in the interest of time I know we only have 10 minutes left. I did want to touch on billing because that's going to be really important. so if we go into again the search bar and we go to billing, the first thing that's going to pop up is again, a billing account is essentially your payment mechanism.
depending on how organizations decide to do it they can have chargeback mechanisms or you can just input a credit card but essentially the billing account has some sort of payment mechanism attached to it and you can attach that billing account to as many projects as you want because you could have different things you're working on. but that's how you're going to sort of spend for these associated billing accounts. So, if I go to this linked billing account, what it's going to actually buil bring up is this reporting dashboard. there's a lot of different things that are available in terms of insights and things that you can discover. whoops, I went to a different desktop. Okay, no, I'm good.
so the initial dashboard that you're going to get is essentially going to be your overall costs. So if you hover over things, it's a very quick and easy way to see how much you're spending. it also will even forecast based on historical spend. I will say that if you're using it for high performance computing or any type of computational tasks this can be less helpful unless you have done some type of benchmarking exercise. So, for example, if I ran a benchmark on March 18th and it told me that something was $16 and I know the exact specs of the job that I ran then I can use that data to sort of extrapolate how much it might cost to run the same thing at scale.
but if you don't do that then this historical context could be somewhat meaningless because this doesn't represent what you're about to run from in terms of a computational task. So I always recommend that if for researchers specifically that are doing more computational type tasks that they do benchmarking exercises. and what I mean by that is they spin up virtual machines and such or however they're doing their computational tasks. and they run jobs at a smaller scale so they can get a sense for how much it's going to cost at a larger scale. so that's one mechanism. as you can kind of see there is this concept of budgeting and alerting.
So, this is pretty I'll click on there's a section under cost management on the left under billing that is dedicated to budgeting and alerting. and basically what this is doing is I can create essentially a budget for my project or I can actually segment it however So depending on if for example I want to look at all of the projects in my organization or if I just want to click on perhaps and do one of the different projects and then I could even break it down by service. So all the v variety of different services that are in Google cloud that I'm using actively through a bunch of different demos and things of that nature.
I could actually click on, specific service just to get a sense of what is the cost that's historically been spent on that service. And then what allows me to do after I do that, I have to create a I don't know. Test. You have to obviously call it something. I just pick something at random. Whoops. I did not. Yep. And so then basically you can specify a budget in dollars for it. And what this is basically saying this is just the comparison. So if I say I'm going to $100 and it's going or maybe I should say a thousand. It's going to set that threshold essentially and the real thing what comes into play here is threshold rules.
So the email that's attached to your account as the owner essentially of the project is going to get alerted when any of these situations come true. So for example when at this point it says 50 90 and 100% of different amounts it will trigger email alerts and that will go to the variety of different users based on what you click. you can also create some very fancy notifications for example you could use pubsub which is basically a messaging buffer so that these messages could go somewhere of your choice via SMS to somebody's cell phone or a slack channel or really wherever you want.
but you can do some very fancy things around setting these alert thresholds. And then I think one of the last things that I like to talk about is really the reporting aspect. And so if you go on the lefth hand side of billing under cost management and you go to reporting you're actually going to see a ton of different information. this is in preview, so you probably wouldn't see this, but in the future, you're going to have Gemini's cloud assist that's going to actually be helping you understand metrics and you can actually use it to help build queries against your data. And that's going to make it a lot easier for you to be able to see what's happening, what are you spending dollars on.
and so what's nice about this is that you have all these filters that are available that allow you to group the variety of different SKUs, the actual time line, meaning when were these charges incurred. you can do all sorts of slicing and dicing to essentially create reports that are exportable or downloadable let's say via CSV or integradeable with a variety of different systems and you can do a lot of different and this is really helpful for example if you're trying to give this to higherups or give this to someone to so that they understand what your
work costs. maybe you want to include it as part of a grant that maybe you mark so you have this sort of benchmark that's available. and there's a lot of information available but the purpose the point is that these reports are customizable and available to the users. Now I think this is a good place to stop. we could obviously go on and on forever and I recognize that this is relatively short session for everybody involved. will be following up with some labs that are essentially contained environments that people can play around with.
some of the concepts around creating virtual machines, working with a variety of services to really just get a sense of and familiarity with Google Cloud. and so I'll be setting those up and setting it to everybody who's here so that they have access to it for probably around a month. We can't keep it forever. but that should help in giving perhaps some guidance. and I know we're pretty much at time so I don't know if there are questions, but I could also possibly answer them asynchronously if there are
We had one quick question in the chat regarding the pricing. so is it possible if you commit to a certain amount of spend in the future, rather than just paying as you go, can you get a discount? Is there discounted options? Basically,
So the answer to that is depending on the sca so it depends on the scale. But the general idea is that so Nikki and I we are part of the account team. and essentially if you need access to larger volumes of resources, we work with a lot of researchers that can basically come to us and say I need 100 GPUs for my research and I can't get it on site or anywhere. We work with them to basically create reserved instances at very highly discounted rates and that's how we get them that access for a year or something.
All right.
All right. I know we're at 2 o'clock. I'm sorry, folks. this was really a survey course. if there are certain aspects that you really want to dig into, please let me know. we'll be sending out the feedback survey. I know there are a few questions that were really pointed that I might throw to Russ for a little followup after this. and yeah, thank you Nikki, thanks everybody who came.
Take care.
We hope to see you soon and there's lots of trainings coming up including HPC trainings, in Linux, intro to Python, intro to R our refresher from the libraries. So, please check us out. All right. thank everybody. Have a good day.
So, please email us if you are trying to figure out research resources at Colombia and we'll be happy to point you in the right direction, even if it's just not us. and here is our agenda for today's session. I'm not going to spend a time reading this because it's packed, but we plan to save questions for the end. And you can feel free to drop questions in the chat as we go and we'll come back to them unless they're very easily answerable in the chat. we chose the topics from the registration email survey and if there's something you want to dig into more later or something missing, you can let us know in the feedback survey after the session.
Our main presenter today will be Russ Goldenbroit, our partner at Google that we meet with at least monthly. Russ has worked at Google for nearly 10 years and is a principal solutions architect specializing in cloud computing. We also have our Google account executive Nikki Bernstein here to help answer questions. And then finally, a quick disclar This session is being recorded. It will be posted online on a UNI protected web page which I'll share with you guys in a follow-up email. Before I give the mic to Russ, I'm going to kick us off with some Colombia specific information about Google Cloud.
So Colombia has a formal business associate agreement with Google which means we get access to all the GCP modules and we provide some extra security and privacy protections on default as well as integrating the GCP login with Colombia's single sign on with your uni and payment with Colombia's ecosystem. comes the technicalities. you have to be a Morningside faculty or staff member to be able to leverage Colombia linked GCP projects and you must provide a Colombia arc chart string which will be used to pay for your cloud usage. So if this is you, you can create a project via the request form linked on CIP/GCP page which I just dropped.
the rates that are charged are in sync with Google's public cloud pricing. so I know from looking at the registration that many people here are students who do not have access to a chart string or they may be staff or member faculty members that maybe they do have a chart string but they're concerned about running up costs. So in these cases postocs and grad students can remember that their PIs often can get one of these Colombia linked GCP projects for their labs. But for everyone else Russ is going to go over some resources that students or researchers can use to apply for free GCP credits.
also towards the end of the webinar, he's going to go over how to analyze your billing and how to set up GCP budgets and alerts so you can sort of reign in runaway cost. and again, I'm going to send you all the links after the session. So, You can just pay attention now. I'll send it over to Russ for the fun stuff.
the fun stuff. Very nice. thank you for the introduction Jess so it's a pleasure to be here with all of you today. So as Jess was mentioning just a sort of introduction I am working with Google as a cloud engineer for quite some time. and in that time primarily the people and organizations that I've been working with are a lot of different educational institutions, academic medical centers, researchers. so I have a plethora of use cases and experiences that have gone on through the years.
my whole goal is really to educate during this session and maybe really go over some of the more basic foundational things and things that have been of value to different organizations like yours throughout the years of my tenure at Google that you may or may not be aware of. So without further ado, I'm going to start sharing my screen and Jess mentioned this is being recorded. So you won't have to actually I guess take notes but you can refer to this but we'll get the links al together as well. So the first thing that I wanted to start out with is I'm assuming that you can see my screen. is that accurate? Good. Awesome.
So the first thing that I wanted to go over is the fact that Google is one of several major providers out there and so it is likely that individuals who have any background in cloud computing have probably worked with some of our competitors like AWS or Azure. this is public documentation that Google posts to essentially make it easier for individuals to be able to map services that they may be used to using in either AWS or Azure to the equivalent in Google Cloud. to give some examples and it's filterable.
So for example if you're looking for let's say a compute offering for example you just need access to virtual machines you can easily filter the list and look at the various offerings that are available Google's core offering be being compute engine but perhaps you're more familiar with Amazon's EC2 or what Azure virtual machines which is what Microsoft calls it This is just an easy way for you to be able to sift through the equivalent resources and services so that you can get a sense of what services do I need to know of and begin with if I'm used to working in a different cloud provider.
So that can be very helpful and we'll link to this resource but wanted to sort of start out with that assuming that you may be using other cloud providers for various tasks the other thing that Jess mentioned this is something that I'm very familiar with just because of my background in working with researchers across the United States at different universities ies, academic medical centers. essentially Google understands that, research costs money and, researchers tend to have, either grants or perhaps the funding is a little bit more scarce, especially during, some of the times more recently.
So Google has a research credits program that is exclusively for either faculty, PhD researchers where you can apply for a grant of credits. I believe it's up to $1,000 for Ph students and postocs and for faculties, staff, PIs, you can get up to $5,000 in GCP credits. And basically it's not meant to fund the full research of course.
It's meant to basically build the scaffolding do benchmarking testing to essentially make sure that everything is good so that when you're ready to run larger scale computational tasks and perhaps that you have grant funding for a lot of the foundations will probably be set up already. and the form is actually quite easy. So, it's really just a simple form that gets submitted with a handful of questions about your project and some of the details about the Google Cloud account that you'd be working in.
and then once that gets approved, the credits get sent to the billing account that's attached to your Google Cloud project and then you'll have those credits available to you for the work that you're doing. So at a high level that is the research credits program. so I think that that will be probably of use to a good majority of you that are working in the research space. after this I wanted to talk a little bit about I wanted to get into the heart of it. So as Jess was mentioning we have a packed agenda.
and I understand that some individuals here are most likely not as familiar, perhaps never even seen anything in the Google Cloud ecosystem or environment. And so I wanted to go over some high-level concepts, show you what it's like to work in the console, and then go over a number of different examples, make it very live for you to really be able to see what it's like to do certain tasks that really are quite common when people are trying to either run jobs or spin up applications.
so that's my goal is to get you a little bit more comfortable from that perspective. So what we're going to be focusing on today and I'm going to go over this at a very high level first so that you understand what this is about. the first thing that you need to understand and this is a little bit different from the way that AWS works or Microsoft. So if you're familiar with other cloud providers this will be a little bit different but as you can imagine the source of identity for cloud is basically your Google cloud accounts. So in the back end you have identities that are provisioned from by Colombia that give you access to essentially any of these GCP console accounts that allow you to spin up virtual machines and things of that nature.
I should say that I'm not sure how the internal provisioning works at Colombia, but we've worked with the team and so that I'm sure you can reach out to the central IT group here if you need accounts provisioned like Jessica was mentioning earlier. but your identities are provided here. Google cloud has a specific way in which projects or basically if you're trying to do research or anything and you need access to computational resources the way it works is your organization Colombia as a whole has an organization this is my demo environment so you see my name at this fake domain
But Columbia has an organization that is dedicated for the overarching administration for all the different projects and resources groups and different applications and researchers have within the university. And then underneath all of that is where all of your projects will live. So very high level, but these are all obviously demo projects for that I've been working with. But in a general sense, when you go in and decide you're going to do some sort of new project, new app, etc.
you will get access to a project basically by filling out a few different pieces of information and one of them being is the billing account which I'm going to go into a little bit more detail but this billing account is where all of the costs are managed and I'll talk about that in a little bit more but the project is where
your apps, your workloads, everything lives and it's a resource segmentation. That's what it does. It basically separates your individual projects from maybe the person next to you or somebody else in your department. And you can create as many of these as you want. but generally speaking, this is how the segmentation works. So going to go back. this is the homepage essentially. So based on what I've been talking about, we have a highle project name that I'm working in. I wanted to just point out a couple of different things that you'll be most likely working with if you spin up any type of resources in Google Cloud has a number of different ways that you can interact with resources. So the first way is the guey. So it has a pretty robust guey.
So, if you actually click on this hamburger menu on the left side and you look at all pro products, you can actually literally just kind of scroll through and click whatever you're interested in talking about or working on I should say. And then at the top, since Google is a search company, if you weren't familiar with something and you didn't know know where it existed, you could actually search for it in here. So, if I really wanted to, I can type in compute engine and I could actually see all of the possibilities that I could search for that have compute engine and I can then click on it and that will actually bring me to the guey area where a lot of my let's say virtual machine instances are sitting and different metrics around the virtual machines. So, at a high level the guey exists.
The second way that you can actually interact with all the resources is by clicking this button at the top. It says activate cloudshell and this is just an authorization. cloud shell is essentially a DBN instance that is running in the background. not specifically within your project, but it's tied to your project. And it basically allows you to enter different CLI commands to interact with the variety of different resources that are occurring in your organization.
there's an SDK called Cloud which is essentially the SDK that you can use either here or via Python libraries or a number of different libraries that allow you to script and do more programmatic interaction with different Google Cloud resources. before I start going in and showing you the first thing I'm going to show you is actually the virtual machines because I think that's actually a good starting point for a lot of individuals when you're running whether it's computational jobs or standing of an application on a VM.
this is sort of like where people begin usually before they get into a lot of the managed services that I'll talk a little bit about. But generally speaking, I wanted to give you a taste of what it's like to actually work in the console, perform several actions that a lot of individual researchers might work with. So, let me quickly Whoops. I have a script that I want to show you guys for demonstration purposes.
So I created this so for demonstration purposes this is an example of I don't know if you guys can see my cloud shell good okay so in here we actually can see the SDK working as in cloud shell and for example if I wanted to create a virtual machine. In this case, I'm calling it demo VM. I'm specifying the type of machine where I want it located. I'm basically all of the details of this machine are flagged.
and this will I mean a lot of researchers or statistitians or people that are comfortable with essentially programmatic languages prefer to work in cloud shell or in these types different scripting libraries so that they're able to automate a lot of functions like creating virtual machines or other services. and so I just want to give you a taste. Actually, in the interest of time, I actually created this instance of so I'll just show you the result. but if you actually go to view instances, you can actually see all of the various virtual machines that I have running. One of them being this demo VM that I created.
just so when you do I could have created this completely in the guey which is quite a nice feature that Google having worked with the other providers it is particularly nice in terms of how the guey is laid out for individuals that are maybe not as familiar with the libraries or more of the programmatic ways to interact. and so it's nice where re let's say I was going to create a new virtual machine with a specific configuration. I can easily first of all just click around and see all of the specs for the variety of different machines that might be of interest to me.
but you also get to see a monthly estimate which is obviously very important to understand because as a researcher just in general we're all cost constrained. So obviously spinning up this type of instance that costs $36,000 a month. It's probably good to know to have an estimate of how much this thing is going to cost compared to some other virtual machine. So that's always very nice. the other thing that's really nice is part of the advanced configurations. So what's nice about Google Cloud is you're able to create custom machines. based on a specific series meaning basically it's a CPU platform that's being used under the hood.
you can specify the exact amount of cores and memory that you require for that machine so that you don't overutilize the resources. that's just something worthy to note. The other thing is that's great about this is that after you've configured it, you can actually click this button at the top which is the equivalent code and it provides you essentially what you've selected in three different formats either CLI Terraform if you're familiar about automating it. I'm not going to get into that right now but usually at a scale that's very helpful and of course REST API. So this can be very helpful once you get the hang of setting up the different customizations that are available.
So the machine configuration, the o the storage, the security, the networking, everything is relatively simple clickth through or if you are more well-versed in CLI then you can use the cloud shell. Now outside of this I think what's very helpful to take a look at is the ways in which the different services that interact with each other. So let's say you have this virtual machine, this demo VM that's stood up. I'm going to go back and you essentially want it to do something.
a lot of times when I'm dealing with researchers a lot of researchers historically have stored data in let's say Google Drive and so if I actually like so for the purposes of this demonstration I have a Google Drive account the identity is actually connected to the same Google cloud
code instance and I just created a sample text file. It just says this is the test file. But the idea here is that let's say you have an application that a bunch of could be anything. It could be like genomics data or some other test data that you're storing in Google Drive for some reason. there are other options but you want to move it into the cloud so that you can do let's say some sort of analysis or computational assessment on it. the great part I think about just in general the Google cloud ecosystem and its tiein to workspace is that you either through CLI or through let's go back to where's that VM instance? Okay, here it is.
through some scripting means you can actually scripted to essentially move data from let's say one place to another so that you can run different computational tasks. So for this purpose Google has an very simple object level storage that is available to store large volumes of data at a relatively cheap price. Meaning this is incredibly cheap to store data I have a variety of different things buckets in here that I've used for a variety of different demos that I've worked with.
but what I actually want to do here is take that Google Drive file that I had created. Where is that? that I had created that's in Google Drive. And I want to be able to essentially move that file over to Google Cloud Storage so that I could do some sort of analysis or whatever on it using a bunch of the to Google Cloud tools that we have. so again I created a script here. but generally speaking what this is doing is that the script is reading from Google Drive and it's copying it over to a Google Cloud Storage bucket.
So that if I hit enter so I've now saved that file on my cloud shell and then I can run this Python script right that's located in on here it's basically going to look for that test file that text file
and move it into the storage bucket. And so after you're running it on cloud shell it's essentially transferring that file over to here or hopefully it does I have to go back. whoops let's see where did it drive to So okay here it is. And so you can actually see it here. and this can be used for a lot of computational tasks what have you. very powerful in general just to be able to do different analyses. simple if you're quite simple if you're very used to working in cloud environments but it's really just to convey how do you start working with the different tools that are in Google Cloud.
So, in general, I'm going to leave the virtual machines where they are at the moment and I'm going to move over to something that may be more interesting to data analytics individuals. So if you're a statistician or you're working in a variety of different statistics programs, you're probably familiar working with things like Jupyter notebooks. and what I wanted to point out here is if we go into so Google Cloud has a product that's actually changed names many different times, but I believe it's now called workbench is essentially managed Jupyter notebooks.
So for people that are trying to do any type of analysis or create something that's sharable in terms of I'm going through a certain number of let's say some tests Jupyter notebooks are a great mechanism to essentially save the flow of work that you're doing and be able to share it with other researchers or contributors that can replicate it or use it for one of their own use cases or whatever it may be. so workbench is this managed Jupyter notebook.
the first thing I want to just preface here is that why is this interesting and why is this a good thing for researchers to really know about and the first part of it is that these notebooks run on Google cloud infrastructure. So, as you can kind of see in this column with a few of these notebooks that I have, they all have different amounts of vCPU and RAM. Kind of like what we were talking about with the virtual machines. This piece is completely configurable. So, if you're doing let's say large training of machine learning models as an instance, for example, I mean, you would need a lot more vCPU and RAM and potentially some things like GPUs.
And with workbench when you create a notebook you actually can specify a lot of the specific configurations the machine the amount of disk the type the network that you're using GPUs etc which it's very good because otherwise you'd have to spin up individual virtual machines and configure them independent of running the Jupyter notebooks. And so that this is a really powerful tool for individuals because you as an analyst or someone who's doing some sort of research into something specific do not have to worry about managing the underlying infrastructure. You can basically just kind of click through the prompts and it will create modify existing infrastructure to meet your needs. So that's a very powerful tool.
I always like to point out now inside workbench another distinction I wanted to make collab so collab is a free service that a lot of researchers tend to and statistitians etc tend to be well aware of because it offers Jupyter notebooks basically for free this is a little bit different so we're the difference between collab and there's actually collab enterprise and workbench. The biggest distinctions is the specifying the underlying infrastructure. So if you need some very robust compute and GPUs and things of that nature, those are available and modifiable for you for your notebooks. the other thing is security.
A lot of researchers, especially if you have regulated workloads that you may be working with, may have strict requirements you couldn't use the consumer version of collab with let's say medical data or things like that. Google Cloud has things like HIPPA compliance or other regulatory compliance and v a variety of different controls that essentially will provide you the mechanism to be able to control it and then do a variety of different analyses on these notebooks. So that's the one thing I wanted to point out there.
so that's said, let me actually open one of these notebooks. So I just have to figure out one which. So pardon me for a second. Let's open the last one. what's happening. what's happening here is essentially you have a variety of different virtual machines that are that run in the background that support the Jupyter Lab instances.
So the first thing that happens when you're trying to work and this is actually kind of nice because idle instances will shut down and save you cost so that if you for some reason forget the fact that you're running Jupyter Lab if you're not using it the system will realize after some time and essentially suspend it so that you're not going to pay for the virtual machine that's running. otherwise you'll have to actually close it out. and you can just actually come in here and easily start or stop instances. probably should have started this before because it does take some time to create the instance and then start Jupyter Lab on top of it.
but let's see if this is going to start up anytime soon. So you can see it's actually still provisioning the instance. what's nice again I think in general about Google Cloud is that a lot of information is available to you to get a sense of what's happening in the instance itself. So if you knew nothing about it and you were sort of just clicking around you'd be able to figure out it's provisioning right now so it's taking some time to actually set up. What's also nice is that this little right this will be your best friend when you're starting out in Google Cloud.
There's this little learn tab. This will actually understand where you're working essentially in the space that you're in. And you can actually find addition different reference documentation and training that may be applicable to the tools that you're in at the time. And so it will actually be a great help for you if you know nothing about u the tool and you need a learning experience. All right. So it looks like that this was provisioned. Let's go back and click open Jupyter Lab again. And what it does is it's going to open up the instance in a separate tab in your browser.
And so, for those that are, used to sort of the Jupiter experience, this will probably seem pretty, familiar to you. So, the first thing that we're in the launcher at the top what you're going to see is the machine that we've provisioned with it. So again this is configurable if you are finding things to be slow or taking too much time and you want to sort of beef it up you can modify the underlying infrastructure as needed.
I think somebody here probably is very familiar with the v varieties of different notebooks and things that are available. But I created a couple of different ones that are familiar to me and I think analysts for the purposes of this I'll just go into the R version. Whoops. That's the console. Let's see what I have here. So, I want to Where is that? So, I want to find my notebook for the first piece. Where is that? here we go.
or no. Is it this one? Nope. R. Let's see. here we go. So, I wanted to give an example so that this hit a little bit home. so, I created basically an R data analysis notebook to give it a little bit more substance. very high level.
and again, for many of you that are analysts or, using this on a day-to-day basis, this example may seem trivial, but what I'm really trying to sort of mark home mark here is essentially the ease in which you're able to get to work with the right supporting infrastructure in Google Cloud to do your analyses and share these notebooks and the outputs of these notebooks with the appropriate people to sort of move your research or whatever you're working on and so we have a sample our data analysis. This has a bunch of code that's being used to run through. you can just click up top here which is going to run the notebook. you don't have to worry about all the red. It's just a bunch of warnings.
But this particular code is actually looking at some built-in data sets that are part of the R distribution and doing some analyses on them in terms of let's say I think this was flower samples and there were also sample data. So it's finding different distributions averages standard deviations creating some plots and so forth and so on. there's a very large u amount of things that you can do in Jupiter and then I could then share this notebook if I wanted to with maybe a collaborator or somebody else that is working on what I'm working on and so that they could either replicate it or use it to do something else.
So this is the workbench tool. and really again what it is is this ability for you to sort of spin up these instances that have either, whatever resources you need, GPUs, storage, etc. what's really nice about this also is that since it's integrated into Google Cloud Workbench you could realistically very easily actually use Workbench in conjunction with other cloud tools. So for example I was showing before moving data from a virtual machine to cloud storage. all of the data storage for example could be in cloud storage that you're working with and maybe the analysis and all the compute resources that you're working with could be in this notebook.
And so it's very easy to create scripts and basically notebooks that would be able to interact with Google Cloud Storage or other services to really sort of accelerate your work and whatever you're sort of working on. So that is all I kind of wanted to talk about in terms of the data analytics piece around workbench. The next thing that I wanted to work talk about is something one of the cooler things that we're doing in the space and I think everybody is pretty familiar and recognizes the advent of artificial intelligence and the v variety of different tools that are available.
So Google's AI model, Gemini, if you're not very familiar, is slowly being integrated into every service. And so re finding you're going to find that it's also integrated into Google Cloud. So, if you actually come over here and you look at this little star thing, it will bring up a brand new tool called cloud assist, it's essentially just a cloud assistant to that you can query against your Google cloud instance.
So all of the data is your data meaning one else is getting insights into the underlying storage or resources but if you wanted to for example I mean it's even saying here's an example show me my compute engine insights and I wanted to just know something about what's going on in my environment. it's using Gemini to essentially query a bunch of the metadata and really provide me some information back and as you can see it's giving me code that I could easily run to get the insights. I think in the future what we're going to see is this is going to be a lot more agentic meaning it's going to be able to actually perform actions on your behalf.
So, it's going to become easier and easier for you to use natural language and essentially get what you want out of the platform. I just wanted to show that because it's very helpful in case you get stuck and you really are like, I don't know what to do here and let me ask Gemini. could be very helpful. that being said, the actual platform that Google uses, that I'll quickly go through is called Vertex AI that you may or may not be familiar with, but I wanted to just touch on it at a very high level because this is where a lot of the research is going today. this is where a lot of essentially anybody who's working on AI platforms they're probably going to be interested in taking advantage of these things.
So if you're familiar with AWS and their bedrock offering and Azure has various offerings I think through pilot Google's approach to AI is a little bit different because we actually hosting a platform we have a platform that's called vertex AI and within that platform is essentially everything an individual might need
to create and deploy an application that uses AI and that's very different from other providers because for example if you were using open AIS their entire model for the most part has been like here's an API you can use it for whatever you want but we're not giving you infrastructure we're not giving you ways to test it's just essentially here's an API that you can hit against and use the response as you'd like. Google's approach is different because they want to help essentially across the full spectrum of application deployment. And so there are a number of different tools here that allow you to essentially do things like deploy the models against a v like your applications. essentially with vertex AI studio you can actually test out a variety of the models.
and then there's model garden, which I'll walk through in a second. there actually is a platform to pick and choose a variety of different models that are available. let me show you a couple of these right now just so you can get a flavor. But if you go into Vertex AI Studio, there's a lot of different things you can do here. But the main purpose of Vert.Exi X AI Studio is actually to test and tune the different models that you may be using in your application. why this is nice that you can see if you go up here to models there's a bunch of different models that Gemini has out right now but it's not just Gemini's models that are available to you in Vert.Ex AI.
This is one of the biggest factors that's different between a lot of the providers is that you can see anthropics models that are third-party or llama's models are available and then there's other models that you can have available that are open source that you may want to bring into your experimentation or testing with. This is part of the value ad. and then why does this different from just some generic chat experience where is that you can actually adjust key factors that affect the model responses like temperature or how much of the output is sent out.
this is important because from a cost perspective, this is something you want to keep an eye on because the more input and the more output that you have the more costly it becomes. But we don't have time to go through a lot of this, but I did want to touch on some of the different tools that are available. if we go back and I just wanted to quickly show you one more thing with Vertex AI, if I can get to where is that? here it is.
So if you go to the model garden, what's nice about this is that model garden is truly a sort of compendium of different models that are available to anyone who's trying to do something with them. It's not just trying to push Gemini's model. I think Google realizes and probably most providers realize that there isn't one model that's the best. It's really b depends on the use case. and so what's nice and what's cool is especially the ones that I've worked with in the healthcare and life sciences space for example let's say you were looking at radiology type models or this is the I'm doing it by task. sorry it should be able to actually do that.
where are those? okay. So for example, let's say there's some obvious always always disclaimers when you're using a model, but generally speaking, a lot of these models that are perhaps open- sourced and deployable with a click of a button on Vertex AI and allow researchers to use them as foundational models so that they can accelerate their research without having to sort of retrain or create their own essentially.
has been very helpful to some of the researchers and groups that we've worked with. And these will continue to expand especially in the healthcare and life sciences space where we're doing a lot of good work with researchers across a lot of different universities. so in the interest of time I know we only have 10 minutes left. I did want to touch on billing because that's going to be really important. so if we go into again the search bar and we go to billing, the first thing that's going to pop up is again, a billing account is essentially your payment mechanism.
depending on how organizations decide to do it they can have chargeback mechanisms or you can just input a credit card but essentially the billing account has some sort of payment mechanism attached to it and you can attach that billing account to as many projects as you want because you could have different things you're working on. but that's how you're going to sort of spend for these associated billing accounts. So, if I go to this linked billing account, what it's going to actually buil bring up is this reporting dashboard. there's a lot of different things that are available in terms of insights and things that you can discover. whoops, I went to a different desktop. Okay, no, I'm good.
so the initial dashboard that you're going to get is essentially going to be your overall costs. So if you hover over things, it's a very quick and easy way to see how much you're spending. it also will even forecast based on historical spend. I will say that if you're using it for high performance computing or any type of computational tasks this can be less helpful unless you have done some type of benchmarking exercise. So, for example, if I ran a benchmark on March 18th and it told me that something was $16 and I know the exact specs of the job that I ran then I can use that data to sort of extrapolate how much it might cost to run the same thing at scale.
but if you don't do that then this historical context could be somewhat meaningless because this doesn't represent what you're about to run from in terms of a computational task. So I always recommend that if for researchers specifically that are doing more computational type tasks that they do benchmarking exercises. and what I mean by that is they spin up virtual machines and such or however they're doing their computational tasks. and they run jobs at a smaller scale so they can get a sense for how much it's going to cost at a larger scale. so that's one mechanism. as you can kind of see there is this concept of budgeting and alerting.
So, this is pretty I'll click on there's a section under cost management on the left under billing that is dedicated to budgeting and alerting. and basically what this is doing is I can create essentially a budget for my project or I can actually segment it however So depending on if for example I want to look at all of the projects in my organization or if I just want to click on perhaps and do one of the different projects and then I could even break it down by service. So all the v variety of different services that are in Google cloud that I'm using actively through a bunch of different demos and things of that nature.
I could actually click on, specific service just to get a sense of what is the cost that's historically been spent on that service. And then what allows me to do after I do that, I have to create a I don't know. Test. You have to obviously call it something. I just pick something at random. Whoops. I did not. Yep. And so then basically you can specify a budget in dollars for it. And what this is basically saying this is just the comparison. So if I say I'm going to $100 and it's going or maybe I should say a thousand. It's going to set that threshold essentially and the real thing what comes into play here is threshold rules.
So the email that's attached to your account as the owner essentially of the project is going to get alerted when any of these situations come true. So for example when at this point it says 50 90 and 100% of different amounts it will trigger email alerts and that will go to the variety of different users based on what you click. you can also create some very fancy notifications for example you could use pubsub which is basically a messaging buffer so that these messages could go somewhere of your choice via SMS to somebody's cell phone or a slack channel or really wherever you want.
but you can do some very fancy things around setting these alert thresholds. And then I think one of the last things that I like to talk about is really the reporting aspect. And so if you go on the lefth hand side of billing under cost management and you go to reporting you're actually going to see a ton of different information. this is in preview, so you probably wouldn't see this, but in the future, you're going to have Gemini's cloud assist that's going to actually be helping you understand metrics and you can actually use it to help build queries against your data. And that's going to make it a lot easier for you to be able to see what's happening, what are you spending dollars on.
and so what's nice about this is that you have all these filters that are available that allow you to group the variety of different SKUs, the actual time line, meaning when were these charges incurred. you can do all sorts of slicing and dicing to essentially create reports that are exportable or downloadable let's say via CSV or integradeable with a variety of different systems and you can do a lot of different and this is really helpful for example if you're trying to give this to higherups or give this to someone to so that they understand what your
work costs. maybe you want to include it as part of a grant that maybe you mark so you have this sort of benchmark that's available. and there's a lot of information available but the purpose the point is that these reports are customizable and available to the users. Now I think this is a good place to stop. we could obviously go on and on forever and I recognize that this is relatively short session for everybody involved. will be following up with some labs that are essentially contained environments that people can play around with.
some of the concepts around creating virtual machines, working with a variety of services to really just get a sense of and familiarity with Google Cloud. and so I'll be setting those up and setting it to everybody who's here so that they have access to it for probably around a month. We can't keep it forever. but that should help in giving perhaps some guidance. and I know we're pretty much at time so I don't know if there are questions, but I could also possibly answer them asynchronously if there are
We had one quick question in the chat regarding the pricing. so is it possible if you commit to a certain amount of spend in the future, rather than just paying as you go, can you get a discount? Is there discounted options? Basically,
So the answer to that is depending on the sca so it depends on the scale. But the general idea is that so Nikki and I we are part of the account team. and essentially if you need access to larger volumes of resources, we work with a lot of researchers that can basically come to us and say I need 100 GPUs for my research and I can't get it on site or anywhere. We work with them to basically create reserved instances at very highly discounted rates and that's how we get them that access for a year or something.
All right.
All right. I know we're at 2 o'clock. I'm sorry, folks. this was really a survey course. if there are certain aspects that you really want to dig into, please let me know. we'll be sending out the feedback survey. I know there are a few questions that were really pointed that I might throw to Russ for a little followup after this. and yeah, thank you Nikki, thanks everybody who came.
Take care.
We hope to see you soon and there's lots of trainings coming up including HPC trainings, in Linux, intro to Python, intro to R our refresher from the libraries. So, please check us out. All right. thank everybody. Have a good day.
