Classic: Introduction to Linux Workshop 2025 | HPC Training Series #1
Transcript
Okay. Welcome, everyone, to our intro series. These are four-part series that we host every semester.
These workshops, we hope, get everyone who's new to our cluster familiarized with our HPC setup. As you can see, let's go to the next slide. This is the roadmap and what we hope to achieve over the next coming weeks.
It will be a four-week workshop. Today will be the intro to Linux. Most of all of our clusters are RHEL 8 and RHEL 9 systems.
So, for this first workshop, we hope you take away just the basic aspects how to navigate our systems, create files and directory, and give you some basic troubleshooting techniques just in case you ever run into issues with your workflows. The second workshop will be intro to Bash. Bash is the scripting language that you'll be using to communicate with Slurm.
Slurm is our the heartbeat of our cluster. With Slurm and Bash itself, you'll be requesting GPUs, cores, RAMs, and you'll also be specifying your input and output directory using Bash itself. Our third week will be intro to Python in HPC, not intro to Python, which will give you the understanding of how to optimize your scripts and your workflows using Python, some of the most advantages of using GPUs over CPUs in your workflows.
And the last workshop will give you an overall overview of HPC and tie in these other elements into one workshop, and it will give you some examples how HPC is being used in real-world scenarios. So, who are we? Well, my name is Mac Short. I'm the manager of the HPC team.
I work alongside or manage six engineers, Helena, Wacas, Al, Syed, Sam, and we're also joined on a part-time basis by two engineers called Benny and Ryan. So, what do we do? So, we serve the research community here at Columbia with high-performance computing, be it the hardware, which could be the storage, the physical nodes themselves, rollouts, and expansions. We also serve and manage the software, be it the scheduler, as I mentioned before, lab archive, Globus for large-scale data transfers, and anything in between.
We currently serve and maintain three clusters, Terramoto, Ginsberg, and Insomnia. If you were to put all these resources in one pool, we'll be serving close to 32,000 CPU cores, approximately 80 terabytes worth of error correcting RAM, over 220 GPUs, and four petabytes of parallel file system. Next slide, Wacas.
So, before I hand the mic over to Wacas, we have a couple of house rules before we get started. Number one, please try not to interrupt Wacas when he's in mid-flight. Any questions, if you run into difficulties, please paste them in a Zoom chat window, and one of my admins will address those concerns for you.
We will be preserving 15 minutes at the end, so any questions that you want to raise vocally, we will reserve that time for you to address your questions. Lastly, this session will be recorded, and these slides and the recording will be distributed to anyone who's attended today. So, you have the luxury of actually just following the sessions now and just following along later your own pace.
So, with that, Wacas, it's over to you, and work your magic. Thank you, Max, for the great introduction. Hi, my name is Wacas Hanif, and I'm part of the HPC group here at Columbia, and I'm very excited to actually do this for you guys.
So, for now, turning off the camera for a bit to keep the connection smooth, and let's rock and roll. All right. So, first off, what is Linux? Linux is an operating system that manages hardware and software, allowing users to interact with their computers.
It's known for being open-source, secure, and highly customizable, unlike Windows or MacOS. Linux is free and comes in many different versions called distributions. It powers everything from servers and supercomputers to smartphones and embedded devices.
In short, Linux is everywhere, even if you don't realize you're using it. Why Linux? It is free access to the source code. It is open-source, of course, so you have free access to the source code, allows customization and transparency, runs on everything from desktops to servers and embedded systems, strong permission management and fast communication and community-driven updates.
It rarely crashes with long uptime, making it ideal for critical systems. It's efficient, multitasking, and great for handling heavy workloads. Many distributions are free, reducing the software expenses, large active community providing help and resources.
Why I chose Linux. So, when I started with Linux, my main goal was to host my own website. As a student, I didn't have the budget for expensive Windows server licenses, so Windows required costly software like Microsoft IIS for web hosting, which costs approximately $2,000, and then Microsoft SQL server for database, which costs around $4,000.
Linux provided a free and open-source alternative with Apache for web hosting and MySQL for database, both widely used and community-supported. Plus, Linux is known for its strong security and reliability, so it's making it a great long-term choice. So, by choosing Linux, I had full control over my system without spending a dime on licensing.
So, before we jump into navigating Linux, here's the login that you'll be using to get to the system. You must have received the welcome email. The system name is insomnia.rcs.columbia.edu, and you would use your uni prefacing the domain name.
So, I'm going to slice it down for a little bit. All right. So, for Windows 10 and 11, you can actually use the command prompt that's built in, or you can download PuTTY, which would be needed for older versions of Windows like 7 and 8. If you haven't done this already, you can find PuTTY for free on the ColumbusCUIT homepage, and you can get the first result on the link to the download page and download it, and double-click to run PuTTY.exe. If you're on a Mac, you would run the program called terminal, and to get to that, you use magnifying glass, and you can search for it.
Just type terminal. I'll give two minutes, and we'll start then. I'm going to show you how I will be approaching the cluster.
So, I'm going to make it a little bigger. ssh myuni at insomnia.rcs.columbia.edu. Turn a few things on. Just give me a second.
All right. So, when you hit enter, you will see something like this. Don't panic.
I just wanted to talk a little bit about this. There's a name for this. It's actually called trust on first use.
The message you get the first time you log into any Linux server. So, you would definitely want to type yes and press enter. This is nothing but it's encrypting your entire session, so anybody that is trying to be in the middle of the session between you and the server would not be able to tell that what has happened, and it wouldn't be able to sniff your username or password.
So, I'm going to type yes, and then you will now receive a push notification for multi-factor authentication before accessing the cluster. So, simply just choose your preferred method, like Duo, mobile app, or phone call, or whatever the message is. Oops.
If you stay too long, and it prompted me for the two-factor authentication, and now it will ask the password. So, just type your password carefully and hit enter. If you are entering your password and don't see anything appearing on the screen, don't worry.
That's normal. Linux does not show password characters, not even asterisks or dots. For security reasons, this prevents anyone from guessing your password length just by looking.
So, I'm going to enter my password now. All right. So, we are at our server.
So, we'll take a second here, and if anybody did not have access, you can go to the chat, and my colleagues will be glad to help you out. By the way, I'm using the MobX term, which is also where you can access to MobX from any cluster. It's an open-source third-party tool.
So, I assume that everyone has the access. So, now that everyone has the access to the server, let's start with some basic commands to gather system information. First, we can check the system's name using the hostname.
So, I'm going to type in hostname, and it is showing me it displays the name of the system useful for identifying the machines, especially in the networks. All right. So, next, I'm going to type in uptime, and it's basically showing me how long the system has been running.
It's great for checking the system's stability, and finally, we are moving towards another command, which is date. It prints the current system date and time and helps in scheduling tasks or verifying the system's time settings. All right.
Moving on to the next command, which is pwd. It's short for print working directory, or some people say present working directory. It's a simple but essential Linux command.
It displays the full path of the current directory you're in. This is useful when navigating the file system, especially in complex directory structures. So, it is basically to print out the directory structure of where you are, what is your current directory, and what directories were before your current working directory.
All right. So, let's try another one, cd. So, this command is used to change the directories in Linux.
cd slash, it's going to move you to the root directory, which is the topmost level of the file system, and from here, all other directories branch out. So, see, for example, we are here at wh2612, which is my username, and if I type cd and slash, which is the root directory, it's going to take me there, and immediately you see that you are your location is changed, and if I do pwd, it's going to show me this, that where I am right now. So, I'm going to go back to my home directory, which is slash home, wh2612.
All right. So, next command is to making the directories. mkdir command is used to create the directories.
This creates a new directory. I'm going to type in mkdir test and hit enter. Oh, so, errors are your best friends in Linux.
So, if you read the error, it says cannot create directory test file exists. So, I'm going to check here. Yes, there is a test.
Either directory or file. So, I'm going to create another directory using different name. Maybe test2.
Okay. So, that's how you create the directories. Up next, let's see what this command does.
So, basically, touch command is used to create an empty file or update the timestamps of an existing file. This creates a new file. I'm going to create a file name.
My touch space, my file name. So, this basically created an empty file. And if I want to see this file's timestamp, so, this got created at 221.
And if I want to update the timestamp, I will just simply do touch and then name of the file and it will update the time creation of the file. All right. So, let's move on to the next one.
Pwd and remove a directory. So, how can we remove a directory which I created just now? So, rm is the command to remove a file or directory. If you are removing a file, you can just type rm and name of the file, my file2.
It will just ask if you want to remove it. You type y, yes. It will remove the file.
And if you want to remove a directory, you will need a flag, which is rm-r and then name of the directory. Okay. So, let's say if you want to remove this test2, it is asking if you want to remove the directory or not.
Yes, I want to remove this directory. Okay. So, for the files, you don't need this hyphen r flag.
But for the directories, yes, you need this hyphen r flag. And let's try to create an error. I'm going to create test2 again, this directory, and I'm going to delete this directory or remove this directory without hyphen r flag.
Rm space test2. And it gave me an error saying cannot remove test2 is a directory. All right.
So, I need this hyphen r flag test2 to delete this directory. Yes. All right.
So, moving on to the next. So, if I go into cd test directory, chain directory to the test, and if I do pwd, it will show me where I am right now. Okay.
And if I want to go back one directory, I just type cd space dot dot. Okay. It moves up one level to the parent directory.
And if I hit enter, and if I do pwd, now the path has changed. Before I was in test directory. Now I am in my home directory.
All right. I'm going to go back into the cd into test directory again. And if you see, there's another change you're facing.
You're seeing that this change from tilde to test. So, if I simply enter cd, and I don't enter dot dot, it will take me back to the home directory, which is tilde. So, there is a difference.
If you cd space, if you enter cd space dot dot, it will take back one directory up. But wherever you are in the system, if you want to go back to your home directory, which is tilde, you just type cd, it will take you back to your home directory. So, just like cd with no arguments, or no flag.
Okay. So, running pwd again. And here I am.
I am in my home directory. All right. We covered this one, cd without arguments.
We covered tilde paths. Let's take a look at it. So, I'm going to go inside cd into test directory.
And I just typed cd test. It's a relative path. It looks for the test directory in the current directory.
Since this test directory was in my home directory, it took me there. It changed the directory from my home to the test. All right? But there is another path, which is called absolute path, which starts from slash.
If you have noticed that, when I do pwd, it starts from slash, and then goes from the directory structure insomnia, and then home, and then my user name, and then test. This is called the absolute path. So, absolute path is basically the full path of the which starts from the root, and goes directly to the test directory.
So, if I go back one directory, and I type cd slash insomnia, and then home, and then my user name, 2612, and then test. It will serve the same purpose as this. But this time, I'm using the absolute path instead of the relative path.
I can use the relative path if I am in the directory where this test directory exists. But for the relative path, I don't need that. Okay? So, if I hit enter, it still got me into the test directory.
So, two key takeaways. Absolute path begins with slash, specifying the full path from the root directory. Relative path does not start with slash.
It is relative to the current working directory. If you have noticed, I was using something which was completing my names. And that's my best friend, tab.
So, see, for example, if I go back to my home directory, I'm in my home directory. And if I want to go to the test directory, using the absolute path, what I'm going to do is cd slash. I just entered ins, and then I hit tab, tab, and it completed the path.
It completed the name. So, tab completion to autocomplete the file, our directory name, starting with ins. All right? And then I'm going to hit ho, and then tab.
It completed the home. And I just entered wh, and I'm going to use tab this time, but it's not going to complete it, because there are more directories with wh. Tab, tab.
So, there is another directory, wh2526. That's another user. And this one is wh2581.
So, there are multiple directories showing. But if I do wh26 and then tab, it completed, because there was nothing wh26 other than 12, which is my user name. So, tab is your best friend.
So, see, for example, I'm going to use the ls command. And if I type m, and I do tab, so it immediately told me that it printed the full name of the file, which is my file.txt. So, it helps, basically, quickly completing the commands, file names, our directory names, just by pressing tab. It works both for both commands and files and directory names to save time and reduce the errors.
The last thing is very important, reduce errors. All right. So, moving on to the next, as you were seeing, I was using something in ls.
What is ls? It basically lists the contents of the current directory, giving you a quick view of what's there. So, I'm going to ls, and this is listing the contents of my home directory, whatever it is here in my home directory. So, it is listing the contents of the directory.
And we can use multiple flags with ls. I'm going to use ls-l, which is going to long list the contents, meaning that if you need more details, this command gives you a deeper look at your files using the l-l flag. So, I'm going to use that.
And I'm seeing the difference. When I was using ls, it was just printing whatever contents of my home directory had. But now, it is giving me more information about the content.
Same file, same directories, but having more information. So, what is this information? So, ls-l, it basically tells me different things that who owns this file or directory, okay, who is the owner of that, which group it does belongs to, the third column is the group associated with the file or directory, and then the size of the file, and the last date and time and when it was changed, which is today. All right.
So, first column, name of the owner of this file or directory, second, the group it belongs to, third, the size of the file or directory, and the fourth column, it's showing the last modification date and time. And of course, file name or directory name. So, I'm going to play more with the ls command.
So, it lists the content of the directory, and there is another flag, hyphen a. So, it lists all the hidden files, those starting with a dot. So, I'm going to do that, ls-a. So, you will see.
So, before, when I did ls, it showed me some files and directories, and I'm going to do that, ls-a. Now, you're seeing a difference. There are some files starting with dot.
These are the files, this option reveals the files and directories that are normally hidden, and then if I do ls-al or maybe la, so it combines the detailed listing with the hidden files, giving you a comprehensive view of everything in the directory. I'm going to talk a little bit about .bash profile and .bashrc. So, these are basically, they store your preferences or configuration settings, such as for your shell or other applications. You can use them to automatically set up your environment when you log in, so you don't have to manually configure your system each time.
They were originally meant to be hidden from the ls command keeping them out of sight of the normal user to avoid the cluttering the directory. So if you see output of ls it is very clean and if you see ls-a it has a lot of files, a lot of hidden files. So dot files are a convenient way to save and manage your personalized settings without affecting the visibility of other files.
All right, moving on to the next. I'm gonna clear my screen with just simply typing clear and it cleared out all the clutter. All right, so some basic file operations.
So I'm gonna create a file. Let's see what is here with me. So I'm gonna copy this myfile2.txt to this directory test.
I'm gonna go inside the test directory to see what is there already. I'm gonna remove this myfile3 and myfile8 and now I'm gonna list the contents. There's nothing here in my test directory.
I'm gonna go back to my home directory and I'm gonna list the content of my directory and now I'm gonna simply copy this myfile2.txt to this test directory. So cp, that's how the syntax goes. Name of the file which is myfile2.txt space and then the name of the directory where I want to copy this file.
All right, so hit enter. If I check my home directory content, my file is still there and it copied over something in the test directory. Myfile2.txt is there and by the way, if you want to list the contents of this test directory by not going into the directory, you can just type ls and name of the directory and it will list the contents of the directory and if you check your present working directory or print working directory, you are still in your home but you listed the content of your test directory which has myfile2.txt. So I'm going to create another file test4 ls.
Here is test4 file and I'm gonna copy this test4 file to test directory and if I list the contents of the test directory, there is the test4. All right, so secure copy protocol SCP. It's a command line tool for securely transferring files between computers over SSH.
It is useful when you need to quickly copy files between local and remote systems or even between two remote servers. The basic syntax is similar to CP but it includes a username and remote host. To send a file from local kind of a machine to a remote server, we need to specify the source file and the destination in user's directory in the remote server.
So I'm gonna quickly copy a file from my local machine to this server. So let's see what do we have here in test ls. So currently this test directory have these two files and if I go back to my local machine, exit this out.
Okay, clear this out and if I list the contents of my desktop which is my Windows laptop, I have a lot of different things and if I want to copy over this what file I created. Okay, so I'm gonna try to copy this intro to linux.txt over to my server inside the test directory. So syntax will be SCP and then I'm gonna enter the name of and using the tab to my remote server where I'm gonna use my username wh2612 at insomnia.rcs.columbia.edu and then colon.
Now I'm gonna specify the path of where I want to put this file to. So here I'm gonna use the absolute path. Okay, so insomnia001 colon and my username wh2612 and then to the test directory.
Let's see if I'll be successful. All right, of course, it's gonna ask the MFA. Okay, I'm gonna do that and for some reasons raising.
Okay, approve and then password. All right, if you see this 100% success that means that our file was copied over to the remote server which is insomnia from our local machine and we can verify that here by listing the content of our test directory. Here it is intro to linux.txt. All right, that's how you copy the files or directories from your local machine to a remote server or from remote server to a remote server or maybe you can copy that from remote server to your local machine as well.
But we don't recommend that. We recommend Globus which is another tool we use for the large files to transfer from the servers. All right, moving on to the next.
Copying and removing the directories. Okay, I think I showed you how to copy the files and removing the file. So I'm going to remove the files.
I have shown you removing the directories and the files. Okay. Okay, so before, oops, what happened? All right, so before I showed you how to copy the files, now I'm going to show you how to move the files.
Ls, myfile2.txt is there. I'm going to remove this myfile2.txt from test. All right, so my test directory does not have this myfile2, but my present working directory which is my home has this file.
And I'm going to move this file this time, not the copy, myfile2.txt to test directory. And if I list the content of my current working directory which is home, this file should not be here because this got moved to the test directory. Okay, so I'm going to go inside the test directory and list the contents.
Here it is. And by the way, we can achieve one more goal from MV command. We can move the directories, we can move the files, but we can rename the files using MV command as well.
So, I'm going to change the name of myfile2.txt using the MV command. So, what I'm going to do is type MV. Okay, got a little bit frozen here.
Okay, MV myfile2.txt to maybe anything else, anything.txt. Okay, I'm going to list the content of my test directory. So, there is no myfile2.txt, but I can see anything.txt here. So, that's how you can rename the files as well.
All right, so moving on to next command which is cat and which is very useful. It is basically used to display the contents of a file when followed by the file name. So, I'm going to cat and I created a file named columbia.txt. And if I want to print that, so I'm seeing the contents of this file.
Okay, it printed all of the content of that file here on my screen. Okay, so that's how you read the files. So, if I want to read the contents of the test and then to the file I created intro to linux.txt, so that's how I can do that.
Oops, there we are. That's how you read the files. Let's check user information.
So, I'm going to clear my screen. Clear. And if I type id command, it displays the details about the user which is myself to wh2612 including my user id, group id, and any additional groups I belong to or my user name belongs to.
And then if I type groups, it lists all the groups that my user id is a member of. So, id is useful for checking a user's user id and group id and group memberships. As compared to the groups, it's a simpler way to see which groups a user belongs to.
Okay. Now, I'm going to check, okay, what is going on here? All right. So, this command, who am I? So, it's nothing basically, but it prints the user name of the current login user.
As compared to who space m space i. So, there is a difference. Who am I? Altogether, it's useful to quickly confirm which user account you're operating under. As compared to the who space m space i, it helps track the original login user, especially in multi-user or remote sessions.
All right. And I'm going to type another command, who. So, it's basically lists all the currently logged in users along with their session details such as terminal logins, time, and remote host, if applicable.
So, it is useful for monitoring active users on a multi-user system. It helps identify who else is logged in and from where. So, it's showing their IPs as well, where they're logged in from, and the time they logged in.
All right. So, that's what who command does. Okay.
I'm going to show you what the W command does. It's a little bit different, more kind of a detail. It shows who is logged in and what they're doing, including their login time, idle time, and the command they're currently running.
So, W provides a more comprehensive view of users' activity on the system, helping you monitor user sessions and activity in real. Okay. I'm going to move to the next command, which is man.
Man is basically short for manual pages. So, it is used to access the manual pages for a command, providing detailed information on its usage, options, and syntax. So, if I do man LS, so, it shows you the manual for the LS command.
So, you can also, if I want to search the hyphen L flag or maybe hyphen A flag, hyphen A is in front of me, so, I'm going to search for hyphen L flag. It shows you all the hyphen L iterations inside the manual pages. So, I'm going to move on to the next.
Next. And you can simply type N to move to the next iteration of hyphen L. Here it is. So, it's showing me what hyphen L flags does.
Use a long listing for man. Okay. Okay.
Now, let's explore more about essential file manipulation tools. And I'm going to clear my screen. All right.
So, grep. It's a very powerful search tool used to kind of filter text based on patterns useful for scanning log files, configuration files, large datasets. Let me show you an example.
Okay. Here. Grep.
And I'm going to grep the university word from my file Columbia.txt. So, it displays all the lines in Columbia.txt file that contain the word university. All right. Now, let's take a look at pipe, how pipe works.
So, as I told you that if you want to read a file, you can simply type cat and name of the file. Okay. So, I'm going to do that, Columbia.
So, it's going to basically take the input from actually output from the cat command and then saves inside the pipe. Okay. So, allow chaining the commands, passing the output of one command as input to another.
And I'm going to grep this university word or maybe location. So, I'm going to grep this word from output of this cat command. And this is going to read the file Columbia.txt and then saves the output here in the pipe and then gives that output to grep.
Okay. So, if it returns nothing, that means that this file does not have this. There may be a spelling mistake.
Maybe we can find the university. There it is. So, we did the grep on university and we used the grep command only.
But this time we just read the file, save the output in the pipe and then pass it over to grep. And then that is showing us the output of the grep. Okay.
All right. Let me clear my screen. Okay.
So, tail and head. Head and tail are basically useful for previewing large files without opening them. It's great for checking the beginning or end of the logs, configuration files or datasets.
So, I'm going to use head Columbia.txt, show the first ten lines of Columbia.txt by default, including the spaces between the lines as well. So, I'm going to head Columbia.txt and it's showing me the first ten lines and it's including these spaces as well. Okay.
And if I want to specify how many lines from the beginning of the file, I will use hyphen n flag, which stands for number. And I'm going to use five to specify only five lines should be shown to me. There it is.
So, it's basically showing me the five lines, including the spaces. And tail basically, it prints out the last lines, maybe ten lines. I'm going to show Columbia.txt. So, the tail command basically is printing the last ten lines, including the spaces.
And if I want to specify how many lines, I can just specify five. And it's showing me only five lines, including the spaces. All right.
So far, so good. Moving on to the editing. Okay.
So, Linux basically, when it comes to the text editors, there's no single obvious choice, but VI stands out for Linux admins. Why VI? It's simple, yet powerful, but can feel tricky at first. It's preinstalled on all Linux distros, Unix systems, and even Mac operating system.
And when a server crashes, VI is often the only editor available, making it essential for quick fixes. That's why many Linux admins, including myself, rely on it. It's always there when you need it.
So, I'm going to edit this file, which is VI. You can type VI and Columbia.txt, if you want. You can follow along.
You can copy anything from the internet, and then you can follow along. All right. So, some essential commands on VI are maybe there are two modes.
First is insert, and the second is escape. So, if I hit I, you see the change here on the bottom. It says I'm in insert mode.
What it does is I can easily type in anything. Okay. But if I hit escape button, I'm in escape mode.
So, the insert is gone. If I want to type anything, I cannot do that unless I press O. Oops. So, there are two basically modes.
One is insert mode, one is escape mode. Okay. And while you're in escape mode, if you want to go to the bottom of the file, you just press shift G. It will take you all the way down.
And if you press GG again, it will take you all the way to the up. All right. And if you want to go to the end of the line, you just while you're in escape mode, you just press shift and 4. It will take you to the end of the line.
And if you want to go to the start of the line, you just press shift 6, which is going to take you to the start of the line. All right. So, if you want to delete a whole line, you just don't need to do it while you're in insert mode.
You can just type, you can just hit DD, which is going to delete this line. So, I'm going to delete this line just by pressing DD. And if I want to undo it while I'm in escape mode, I just press U, which is short for undo.
Okay. So, so far in the editing tool, we have learned few things. How can we go into insert mode by pressing I? And how can we go to the escape mode by pressing escape? And if you want to go to the bottom of the file, shift G, top of the file, GG, end of the line, shift 4, start of the line, shift 6. And if you want to delete a line just while in escape mode, you just press DD.
And if you want to undo it, you just press U. All right. So, these are some basic commands of the of the editor. You can learn more, but for the sake of time, I'm going to move forward.
Some basic things on the Vim editor still I wanted to show you was if while you're in escape mode, if you want to save the file, you can just press colon and then W to write and then Q to quit. If you don't enter or write W, it will quit the file without the changes you made. Okay.
So, W, Q will, and hit enter, it's going to save the file with your changes. Okay. And if you want to set the numbers of the lines while in escape mode, press colon and then set number.
It's going to set the numbers of the lines. And if you want to remove it, set no number. I think it's together.
And it will remove the and if you want to introduce spell check while you're editing, you can do that too. Just by while in escape mode, set spell, it will check the spellings and tells you whatever is wrong. And set no spell will disable it.
All right. I'm going to quit out of this file. And I'm going to show you what the output redirection is.
So, for example, this greater than operator redirects output to a file. So, for example, if I have a command date, which I showed you in the beginning, that it prints out the date. Okay.
And if I want to save the output of this command, I'm going to use the redirection sign. Date. And redirection.
If a file exists with the name, it will replace the contents of the file if I'm using this redirection sign. So, I'm going to show you what I'm talking about by going into test directory. I'm going to read the intro to Linux.txt file content.
Oops. And then cat intro to Linux. This file has this content.
And what I was talking about was if I want to print out or save the output of this command to my file, it's going to replace the content. So, be very careful about this. And if I read the intro to Linux file now, it's going to print out the different content as compared to before.
So, to summarize, this redirection sign basically, it redirects the output of one command to the file. It's useful for kind of appending the content or maybe adding the content, removing the old content. All right.
So, moving on to the next command. Okay. One more example left.
ls-l. So, this will print out the content of this directory. And if I want to save this output to the same file, intro to Linux, and if I want to read this file, it should have different content now.
Okay. Moving on to the next, I'm going to use the echo. Echo is basically great for scripting, debugging, and quickly displaying values.
It's simple, yet essential for automating tasks in Linux. I'm going to use echo high, and it prints out high to the screen. It's commonly used to display messages, print variables.
And even if you want to write to the file, you can do that. Echo high, and then intro to Linux. And if I want to read this, now this file has different content.
The content has changed because I redirected the output of this echo to this file. Okay. All right.
I'm going to move on to the system health commands, some of the essential ones. These tools are essential for troubleshooting servers, but they're just as useful for your home machine as well.
So we rely on these tools on a daily basis, especially when servers are running slow or encountering issues. For instance, if a server is sluggish, we start with these tools to pinpoint the problem, whether it's resource usage or free disk space or network issues. So knowing how to use these commands is crucial for effective troubleshooting.
I'm going to use free first. It's displaying how much memory is used and how much memory is free, but this is not human readable, so I'm going to make it human readable. It's now making a bit sense.
It's showing me the swap space. Here it is. So free-hm provides a detailed report on your system's memory usage.
The hyphen h option I entered shows the memory in human readable format, while m, I'm going to show you for free-hm, it basically shows it in megabytes before it was showing in the bits or bytes. Moving on to the next command, which is top. It's a powerful tool for system monitoring.
It allows you to spot performance issues quickly and manage processes. It's an essential tool for any Linux admin to keep track of the system health. It's basically perfect for real-time system monitoring.
It shows real-time CPU usage and at least all the running processes using the PID. And there are many options to customize the display. Moving on to another command, edge top.
So, it's a visual, it's basically a next version of top. It's visually appealing, color-coded interface with an intuitive layout. Unlike top, which requires manual PID inputs for actions, edge top allows you to scroll through the processes and interact with them using arrow key.
All right? So, it basically makes your tasks easier to manage. So, edge top displays CPU memory usage as a graphical bars, offers process reviews and allows users to customize the display. It also basically supports the filtering and searching for specific processes, which top lacks by default.
All right? So, I'm going to quit. And you can quit by just pressing Q. VMstat. So, I'm going to show you VMstat.
It basically is a short for virtual memory statistics. It's a lightweight tool that provides a real-time snapshot of the system performance. It reports on CPU usage, memory, swap, disk IO, and system processes.
So, it basically helps diagnose performance bottlenecks. Unlike more detailed tools like edge top and VMstat, sorry, top and edge top, VMstat gives an overview of system health with minimal resource usage. So, making it useful for a quick diagnostics.
Moving on to the next. PS. It's very useful for checking the processes.
I'm going to use this. Processes owned by the users. And I'm going to use a UX.
Okay. So, only PS without the flags. That flag is kind of very detailed about different users.
It's useful for checking the processes owned by the users, giving you a quick overview of what's running on your current session. For more details, as I mentioned, you can use PS-AUX to see all the processes on the system. It lists basically all the processes running in your current session.
And I'm going to show you PS-E. So, that's a lot. A lot of information.
It basically provides a full system-wide view of running processes, including system processes and those owned by the users. So, that's a lot of processes. It is useful for the administrators to monitor system activity and troubleshoot resources, issues.
As I mentioned, PS-AUX lists every process with a lot more information. So, in Linux, every file or directory is represented by an inode, which includes which holds the metadata like permissions, ownerships, and size, but not the file name or content. Okay.
So, each new file consumes one inode. So, even if you have disk space left, you might not be able to create more files if the inode limit is reached. And I'm going to clear my screen.
And if I want to check the system-wide inode usage, I'm going to type df-i, which shows how many nodes are used and how many are free for each file system. So, this is file system, how many let me make it a bit more human readable. So, it is showing me the file system name, inodes, how much it has, how much used, how much free, and in percentage.
Okay. Why is it essential for you? We'll look at how to check the usage of your home directory in Scratch space. This is important to ensure you're not running out of the space when you're running the jobs or storing the data.
So, I'm going to use df-i of my, and this time I'm going to use the absolute path. And if I want to make, so, this is basically showing how much inodes my home directory has as a user. Okay.
And how much inodes I have consumed and how much are free. So, basically, I have like around maybe 60 million inodes available. If I'm not wrong.
Yeah. Around 60 million inodes available for me. And I have consumed 29 million.
Apologies. And I'm going to show you how to check the inodes usage of your Scratch space. Okay.
So, around 600 million inodes for Scratch space. The Scratch space is a much larger area for temporary storage. So, often it's used for the job computations.
All right. So, how to kind of end this SSH session. It's simple.
You can just type exit. You can type log out. You can press control D. You will be logged out of the session.
Any questions, concerns, we are available. Outside of these introductory sessions, Columbia offers additional training materials and workshops such as software carpentry classes where you can receive training in Unix and programming courses like Python. Links to these resources are listed on the slide and you will be able to get those slides after the class.
Here are some books that I really that really helped me both in learning Linux and passing the Red Hat certified system administrator exam. These books cover everything from the basics to advanced topics, making them great for anyone looking to dive deep into the Linux. If you're serious about mastering Linux, these are highly recommended reads.
Over to you, Max. Oh, well done. Good job.
I always find the hardest thing is trying to break down these commands into layman's terms. Great job. As stated at the beginning, next week will be intro to Bash.
I highly recommend anyone who is new to our cluster to attend that workshop. As previously stated, Slurm is the heart of our cluster. If you can't interact with Slurm, that means you can't run any jobs in our cluster.
You need Bash to be able to interact with a You're going to be using Bash to harness the resources such as GPUs, the RAMs, the CPU, the cores. You're going to actually specify your input and output files and directories as well. So, next week is key and I look forward to everyone.
Also, before I promised, 15 minutes at the end for anyone who has any questions for myself or Wacas. Caitlin, how's your SAP going from your end? Hi. I just pasted my syntax in.
It seems like it logs me out when I try to copy this file and I'm not sure why. I was wondering if it was because I'm under the free group or if that matters. That shouldn't matter at all.
What I'm going to do for you, my team and I will touch base with you after the call and we'll go over a few syntax. I think you're using a Windows machine. That's probably what's causing the problem.
So, I'll see the backslashes to the forward slashes. So, we'll touch base with you after the call and we'll drop you an email. Okay.
Thank you. Thank you. Any other questions? I've used other clusters before that use on-demand or something similar for file transfer software.
Is there anything like that or is it all sort of just the SCP? We have Globus. You've probably used it before. It's like a file zero on steroids.
The great thing about Globus, number one, you have a 10 gig data transfer on and off our cluster. Number two, it checkpoints. So, if you're transferring, let's say, 50 gig, your transfer was to fill halfway through.
Globus checkpoints all those transfers, so you're about to pick up back where it left off. So, we have Globus, which is one, and we're looking to implement open on-demand on all our clusters over the next six weeks. Awesome.
Cool. Yeah. Both of them are pretty easy to use.
So, it's pretty cool. Nice. Any other questions? Okay.
Great. All right. Wacas, thanks again.
Like I said, it's a real skill to be able to break those syntax down in layman's terms. I always thought this Linux workshop was one of the hardest. And I hope to see you guys next week.
Like I said, it's pretty essential for you guys to attend to each one. Each one will end at the intro to HPC workshop. So, next week, we're gonna show you how to liaise and talk to a scheduler.
The third workshop, we'll give you the intro to Python. We're gonna streamline your workflows. And obviously, on our clusters, you have multiple resources.
You're gonna have your GPUs, your CPUs and RAM. We're gonna show you how to optimize all those resources using Python. And also, the end games that get you the intro to HPC.
So, I hope to see you all guys there. And thank you, Wacas. And hopefully, see you guys next week.
Thank you, everyone. Thank you. Goodbye.
These workshops, we hope, get everyone who's new to our cluster familiarized with our HPC setup. As you can see, let's go to the next slide. This is the roadmap and what we hope to achieve over the next coming weeks.
It will be a four-week workshop. Today will be the intro to Linux. Most of all of our clusters are RHEL 8 and RHEL 9 systems.
So, for this first workshop, we hope you take away just the basic aspects how to navigate our systems, create files and directory, and give you some basic troubleshooting techniques just in case you ever run into issues with your workflows. The second workshop will be intro to Bash. Bash is the scripting language that you'll be using to communicate with Slurm.
Slurm is our the heartbeat of our cluster. With Slurm and Bash itself, you'll be requesting GPUs, cores, RAMs, and you'll also be specifying your input and output directory using Bash itself. Our third week will be intro to Python in HPC, not intro to Python, which will give you the understanding of how to optimize your scripts and your workflows using Python, some of the most advantages of using GPUs over CPUs in your workflows.
And the last workshop will give you an overall overview of HPC and tie in these other elements into one workshop, and it will give you some examples how HPC is being used in real-world scenarios. So, who are we? Well, my name is Mac Short. I'm the manager of the HPC team.
I work alongside or manage six engineers, Helena, Wacas, Al, Syed, Sam, and we're also joined on a part-time basis by two engineers called Benny and Ryan. So, what do we do? So, we serve the research community here at Columbia with high-performance computing, be it the hardware, which could be the storage, the physical nodes themselves, rollouts, and expansions. We also serve and manage the software, be it the scheduler, as I mentioned before, lab archive, Globus for large-scale data transfers, and anything in between.
We currently serve and maintain three clusters, Terramoto, Ginsberg, and Insomnia. If you were to put all these resources in one pool, we'll be serving close to 32,000 CPU cores, approximately 80 terabytes worth of error correcting RAM, over 220 GPUs, and four petabytes of parallel file system. Next slide, Wacas.
So, before I hand the mic over to Wacas, we have a couple of house rules before we get started. Number one, please try not to interrupt Wacas when he's in mid-flight. Any questions, if you run into difficulties, please paste them in a Zoom chat window, and one of my admins will address those concerns for you.
We will be preserving 15 minutes at the end, so any questions that you want to raise vocally, we will reserve that time for you to address your questions. Lastly, this session will be recorded, and these slides and the recording will be distributed to anyone who's attended today. So, you have the luxury of actually just following the sessions now and just following along later your own pace.
So, with that, Wacas, it's over to you, and work your magic. Thank you, Max, for the great introduction. Hi, my name is Wacas Hanif, and I'm part of the HPC group here at Columbia, and I'm very excited to actually do this for you guys.
So, for now, turning off the camera for a bit to keep the connection smooth, and let's rock and roll. All right. So, first off, what is Linux? Linux is an operating system that manages hardware and software, allowing users to interact with their computers.
It's known for being open-source, secure, and highly customizable, unlike Windows or MacOS. Linux is free and comes in many different versions called distributions. It powers everything from servers and supercomputers to smartphones and embedded devices.
In short, Linux is everywhere, even if you don't realize you're using it. Why Linux? It is free access to the source code. It is open-source, of course, so you have free access to the source code, allows customization and transparency, runs on everything from desktops to servers and embedded systems, strong permission management and fast communication and community-driven updates.
It rarely crashes with long uptime, making it ideal for critical systems. It's efficient, multitasking, and great for handling heavy workloads. Many distributions are free, reducing the software expenses, large active community providing help and resources.
Why I chose Linux. So, when I started with Linux, my main goal was to host my own website. As a student, I didn't have the budget for expensive Windows server licenses, so Windows required costly software like Microsoft IIS for web hosting, which costs approximately $2,000, and then Microsoft SQL server for database, which costs around $4,000.
Linux provided a free and open-source alternative with Apache for web hosting and MySQL for database, both widely used and community-supported. Plus, Linux is known for its strong security and reliability, so it's making it a great long-term choice. So, by choosing Linux, I had full control over my system without spending a dime on licensing.
So, before we jump into navigating Linux, here's the login that you'll be using to get to the system. You must have received the welcome email. The system name is insomnia.rcs.columbia.edu, and you would use your uni prefacing the domain name.
So, I'm going to slice it down for a little bit. All right. So, for Windows 10 and 11, you can actually use the command prompt that's built in, or you can download PuTTY, which would be needed for older versions of Windows like 7 and 8. If you haven't done this already, you can find PuTTY for free on the ColumbusCUIT homepage, and you can get the first result on the link to the download page and download it, and double-click to run PuTTY.exe. If you're on a Mac, you would run the program called terminal, and to get to that, you use magnifying glass, and you can search for it.
Just type terminal. I'll give two minutes, and we'll start then. I'm going to show you how I will be approaching the cluster.
So, I'm going to make it a little bigger. ssh myuni at insomnia.rcs.columbia.edu. Turn a few things on. Just give me a second.
All right. So, when you hit enter, you will see something like this. Don't panic.
I just wanted to talk a little bit about this. There's a name for this. It's actually called trust on first use.
The message you get the first time you log into any Linux server. So, you would definitely want to type yes and press enter. This is nothing but it's encrypting your entire session, so anybody that is trying to be in the middle of the session between you and the server would not be able to tell that what has happened, and it wouldn't be able to sniff your username or password.
So, I'm going to type yes, and then you will now receive a push notification for multi-factor authentication before accessing the cluster. So, simply just choose your preferred method, like Duo, mobile app, or phone call, or whatever the message is. Oops.
If you stay too long, and it prompted me for the two-factor authentication, and now it will ask the password. So, just type your password carefully and hit enter. If you are entering your password and don't see anything appearing on the screen, don't worry.
That's normal. Linux does not show password characters, not even asterisks or dots. For security reasons, this prevents anyone from guessing your password length just by looking.
So, I'm going to enter my password now. All right. So, we are at our server.
So, we'll take a second here, and if anybody did not have access, you can go to the chat, and my colleagues will be glad to help you out. By the way, I'm using the MobX term, which is also where you can access to MobX from any cluster. It's an open-source third-party tool.
So, I assume that everyone has the access. So, now that everyone has the access to the server, let's start with some basic commands to gather system information. First, we can check the system's name using the hostname.
So, I'm going to type in hostname, and it is showing me it displays the name of the system useful for identifying the machines, especially in the networks. All right. So, next, I'm going to type in uptime, and it's basically showing me how long the system has been running.
It's great for checking the system's stability, and finally, we are moving towards another command, which is date. It prints the current system date and time and helps in scheduling tasks or verifying the system's time settings. All right.
Moving on to the next command, which is pwd. It's short for print working directory, or some people say present working directory. It's a simple but essential Linux command.
It displays the full path of the current directory you're in. This is useful when navigating the file system, especially in complex directory structures. So, it is basically to print out the directory structure of where you are, what is your current directory, and what directories were before your current working directory.
All right. So, let's try another one, cd. So, this command is used to change the directories in Linux.
cd slash, it's going to move you to the root directory, which is the topmost level of the file system, and from here, all other directories branch out. So, see, for example, we are here at wh2612, which is my username, and if I type cd and slash, which is the root directory, it's going to take me there, and immediately you see that you are your location is changed, and if I do pwd, it's going to show me this, that where I am right now. So, I'm going to go back to my home directory, which is slash home, wh2612.
All right. So, next command is to making the directories. mkdir command is used to create the directories.
This creates a new directory. I'm going to type in mkdir test and hit enter. Oh, so, errors are your best friends in Linux.
So, if you read the error, it says cannot create directory test file exists. So, I'm going to check here. Yes, there is a test.
Either directory or file. So, I'm going to create another directory using different name. Maybe test2.
Okay. So, that's how you create the directories. Up next, let's see what this command does.
So, basically, touch command is used to create an empty file or update the timestamps of an existing file. This creates a new file. I'm going to create a file name.
My touch space, my file name. So, this basically created an empty file. And if I want to see this file's timestamp, so, this got created at 221.
And if I want to update the timestamp, I will just simply do touch and then name of the file and it will update the time creation of the file. All right. So, let's move on to the next one.
Pwd and remove a directory. So, how can we remove a directory which I created just now? So, rm is the command to remove a file or directory. If you are removing a file, you can just type rm and name of the file, my file2.
It will just ask if you want to remove it. You type y, yes. It will remove the file.
And if you want to remove a directory, you will need a flag, which is rm-r and then name of the directory. Okay. So, let's say if you want to remove this test2, it is asking if you want to remove the directory or not.
Yes, I want to remove this directory. Okay. So, for the files, you don't need this hyphen r flag.
But for the directories, yes, you need this hyphen r flag. And let's try to create an error. I'm going to create test2 again, this directory, and I'm going to delete this directory or remove this directory without hyphen r flag.
Rm space test2. And it gave me an error saying cannot remove test2 is a directory. All right.
So, I need this hyphen r flag test2 to delete this directory. Yes. All right.
So, moving on to the next. So, if I go into cd test directory, chain directory to the test, and if I do pwd, it will show me where I am right now. Okay.
And if I want to go back one directory, I just type cd space dot dot. Okay. It moves up one level to the parent directory.
And if I hit enter, and if I do pwd, now the path has changed. Before I was in test directory. Now I am in my home directory.
All right. I'm going to go back into the cd into test directory again. And if you see, there's another change you're facing.
You're seeing that this change from tilde to test. So, if I simply enter cd, and I don't enter dot dot, it will take me back to the home directory, which is tilde. So, there is a difference.
If you cd space, if you enter cd space dot dot, it will take back one directory up. But wherever you are in the system, if you want to go back to your home directory, which is tilde, you just type cd, it will take you back to your home directory. So, just like cd with no arguments, or no flag.
Okay. So, running pwd again. And here I am.
I am in my home directory. All right. We covered this one, cd without arguments.
We covered tilde paths. Let's take a look at it. So, I'm going to go inside cd into test directory.
And I just typed cd test. It's a relative path. It looks for the test directory in the current directory.
Since this test directory was in my home directory, it took me there. It changed the directory from my home to the test. All right? But there is another path, which is called absolute path, which starts from slash.
If you have noticed that, when I do pwd, it starts from slash, and then goes from the directory structure insomnia, and then home, and then my user name, and then test. This is called the absolute path. So, absolute path is basically the full path of the which starts from the root, and goes directly to the test directory.
So, if I go back one directory, and I type cd slash insomnia, and then home, and then my user name, 2612, and then test. It will serve the same purpose as this. But this time, I'm using the absolute path instead of the relative path.
I can use the relative path if I am in the directory where this test directory exists. But for the relative path, I don't need that. Okay? So, if I hit enter, it still got me into the test directory.
So, two key takeaways. Absolute path begins with slash, specifying the full path from the root directory. Relative path does not start with slash.
It is relative to the current working directory. If you have noticed, I was using something which was completing my names. And that's my best friend, tab.
So, see, for example, if I go back to my home directory, I'm in my home directory. And if I want to go to the test directory, using the absolute path, what I'm going to do is cd slash. I just entered ins, and then I hit tab, tab, and it completed the path.
It completed the name. So, tab completion to autocomplete the file, our directory name, starting with ins. All right? And then I'm going to hit ho, and then tab.
It completed the home. And I just entered wh, and I'm going to use tab this time, but it's not going to complete it, because there are more directories with wh. Tab, tab.
So, there is another directory, wh2526. That's another user. And this one is wh2581.
So, there are multiple directories showing. But if I do wh26 and then tab, it completed, because there was nothing wh26 other than 12, which is my user name. So, tab is your best friend.
So, see, for example, I'm going to use the ls command. And if I type m, and I do tab, so it immediately told me that it printed the full name of the file, which is my file.txt. So, it helps, basically, quickly completing the commands, file names, our directory names, just by pressing tab. It works both for both commands and files and directory names to save time and reduce the errors.
The last thing is very important, reduce errors. All right. So, moving on to the next, as you were seeing, I was using something in ls.
What is ls? It basically lists the contents of the current directory, giving you a quick view of what's there. So, I'm going to ls, and this is listing the contents of my home directory, whatever it is here in my home directory. So, it is listing the contents of the directory.
And we can use multiple flags with ls. I'm going to use ls-l, which is going to long list the contents, meaning that if you need more details, this command gives you a deeper look at your files using the l-l flag. So, I'm going to use that.
And I'm seeing the difference. When I was using ls, it was just printing whatever contents of my home directory had. But now, it is giving me more information about the content.
Same file, same directories, but having more information. So, what is this information? So, ls-l, it basically tells me different things that who owns this file or directory, okay, who is the owner of that, which group it does belongs to, the third column is the group associated with the file or directory, and then the size of the file, and the last date and time and when it was changed, which is today. All right.
So, first column, name of the owner of this file or directory, second, the group it belongs to, third, the size of the file or directory, and the fourth column, it's showing the last modification date and time. And of course, file name or directory name. So, I'm going to play more with the ls command.
So, it lists the content of the directory, and there is another flag, hyphen a. So, it lists all the hidden files, those starting with a dot. So, I'm going to do that, ls-a. So, you will see.
So, before, when I did ls, it showed me some files and directories, and I'm going to do that, ls-a. Now, you're seeing a difference. There are some files starting with dot.
These are the files, this option reveals the files and directories that are normally hidden, and then if I do ls-al or maybe la, so it combines the detailed listing with the hidden files, giving you a comprehensive view of everything in the directory. I'm going to talk a little bit about .bash profile and .bashrc. So, these are basically, they store your preferences or configuration settings, such as for your shell or other applications. You can use them to automatically set up your environment when you log in, so you don't have to manually configure your system each time.
They were originally meant to be hidden from the ls command keeping them out of sight of the normal user to avoid the cluttering the directory. So if you see output of ls it is very clean and if you see ls-a it has a lot of files, a lot of hidden files. So dot files are a convenient way to save and manage your personalized settings without affecting the visibility of other files.
All right, moving on to the next. I'm gonna clear my screen with just simply typing clear and it cleared out all the clutter. All right, so some basic file operations.
So I'm gonna create a file. Let's see what is here with me. So I'm gonna copy this myfile2.txt to this directory test.
I'm gonna go inside the test directory to see what is there already. I'm gonna remove this myfile3 and myfile8 and now I'm gonna list the contents. There's nothing here in my test directory.
I'm gonna go back to my home directory and I'm gonna list the content of my directory and now I'm gonna simply copy this myfile2.txt to this test directory. So cp, that's how the syntax goes. Name of the file which is myfile2.txt space and then the name of the directory where I want to copy this file.
All right, so hit enter. If I check my home directory content, my file is still there and it copied over something in the test directory. Myfile2.txt is there and by the way, if you want to list the contents of this test directory by not going into the directory, you can just type ls and name of the directory and it will list the contents of the directory and if you check your present working directory or print working directory, you are still in your home but you listed the content of your test directory which has myfile2.txt. So I'm going to create another file test4 ls.
Here is test4 file and I'm gonna copy this test4 file to test directory and if I list the contents of the test directory, there is the test4. All right, so secure copy protocol SCP. It's a command line tool for securely transferring files between computers over SSH.
It is useful when you need to quickly copy files between local and remote systems or even between two remote servers. The basic syntax is similar to CP but it includes a username and remote host. To send a file from local kind of a machine to a remote server, we need to specify the source file and the destination in user's directory in the remote server.
So I'm gonna quickly copy a file from my local machine to this server. So let's see what do we have here in test ls. So currently this test directory have these two files and if I go back to my local machine, exit this out.
Okay, clear this out and if I list the contents of my desktop which is my Windows laptop, I have a lot of different things and if I want to copy over this what file I created. Okay, so I'm gonna try to copy this intro to linux.txt over to my server inside the test directory. So syntax will be SCP and then I'm gonna enter the name of and using the tab to my remote server where I'm gonna use my username wh2612 at insomnia.rcs.columbia.edu and then colon.
Now I'm gonna specify the path of where I want to put this file to. So here I'm gonna use the absolute path. Okay, so insomnia001 colon and my username wh2612 and then to the test directory.
Let's see if I'll be successful. All right, of course, it's gonna ask the MFA. Okay, I'm gonna do that and for some reasons raising.
Okay, approve and then password. All right, if you see this 100% success that means that our file was copied over to the remote server which is insomnia from our local machine and we can verify that here by listing the content of our test directory. Here it is intro to linux.txt. All right, that's how you copy the files or directories from your local machine to a remote server or from remote server to a remote server or maybe you can copy that from remote server to your local machine as well.
But we don't recommend that. We recommend Globus which is another tool we use for the large files to transfer from the servers. All right, moving on to the next.
Copying and removing the directories. Okay, I think I showed you how to copy the files and removing the file. So I'm going to remove the files.
I have shown you removing the directories and the files. Okay. Okay, so before, oops, what happened? All right, so before I showed you how to copy the files, now I'm going to show you how to move the files.
Ls, myfile2.txt is there. I'm going to remove this myfile2.txt from test. All right, so my test directory does not have this myfile2, but my present working directory which is my home has this file.
And I'm going to move this file this time, not the copy, myfile2.txt to test directory. And if I list the content of my current working directory which is home, this file should not be here because this got moved to the test directory. Okay, so I'm going to go inside the test directory and list the contents.
Here it is. And by the way, we can achieve one more goal from MV command. We can move the directories, we can move the files, but we can rename the files using MV command as well.
So, I'm going to change the name of myfile2.txt using the MV command. So, what I'm going to do is type MV. Okay, got a little bit frozen here.
Okay, MV myfile2.txt to maybe anything else, anything.txt. Okay, I'm going to list the content of my test directory. So, there is no myfile2.txt, but I can see anything.txt here. So, that's how you can rename the files as well.
All right, so moving on to next command which is cat and which is very useful. It is basically used to display the contents of a file when followed by the file name. So, I'm going to cat and I created a file named columbia.txt. And if I want to print that, so I'm seeing the contents of this file.
Okay, it printed all of the content of that file here on my screen. Okay, so that's how you read the files. So, if I want to read the contents of the test and then to the file I created intro to linux.txt, so that's how I can do that.
Oops, there we are. That's how you read the files. Let's check user information.
So, I'm going to clear my screen. Clear. And if I type id command, it displays the details about the user which is myself to wh2612 including my user id, group id, and any additional groups I belong to or my user name belongs to.
And then if I type groups, it lists all the groups that my user id is a member of. So, id is useful for checking a user's user id and group id and group memberships. As compared to the groups, it's a simpler way to see which groups a user belongs to.
Okay. Now, I'm going to check, okay, what is going on here? All right. So, this command, who am I? So, it's nothing basically, but it prints the user name of the current login user.
As compared to who space m space i. So, there is a difference. Who am I? Altogether, it's useful to quickly confirm which user account you're operating under. As compared to the who space m space i, it helps track the original login user, especially in multi-user or remote sessions.
All right. And I'm going to type another command, who. So, it's basically lists all the currently logged in users along with their session details such as terminal logins, time, and remote host, if applicable.
So, it is useful for monitoring active users on a multi-user system. It helps identify who else is logged in and from where. So, it's showing their IPs as well, where they're logged in from, and the time they logged in.
All right. So, that's what who command does. Okay.
I'm going to show you what the W command does. It's a little bit different, more kind of a detail. It shows who is logged in and what they're doing, including their login time, idle time, and the command they're currently running.
So, W provides a more comprehensive view of users' activity on the system, helping you monitor user sessions and activity in real. Okay. I'm going to move to the next command, which is man.
Man is basically short for manual pages. So, it is used to access the manual pages for a command, providing detailed information on its usage, options, and syntax. So, if I do man LS, so, it shows you the manual for the LS command.
So, you can also, if I want to search the hyphen L flag or maybe hyphen A flag, hyphen A is in front of me, so, I'm going to search for hyphen L flag. It shows you all the hyphen L iterations inside the manual pages. So, I'm going to move on to the next.
Next. And you can simply type N to move to the next iteration of hyphen L. Here it is. So, it's showing me what hyphen L flags does.
Use a long listing for man. Okay. Okay.
Now, let's explore more about essential file manipulation tools. And I'm going to clear my screen. All right.
So, grep. It's a very powerful search tool used to kind of filter text based on patterns useful for scanning log files, configuration files, large datasets. Let me show you an example.
Okay. Here. Grep.
And I'm going to grep the university word from my file Columbia.txt. So, it displays all the lines in Columbia.txt file that contain the word university. All right. Now, let's take a look at pipe, how pipe works.
So, as I told you that if you want to read a file, you can simply type cat and name of the file. Okay. So, I'm going to do that, Columbia.
So, it's going to basically take the input from actually output from the cat command and then saves inside the pipe. Okay. So, allow chaining the commands, passing the output of one command as input to another.
And I'm going to grep this university word or maybe location. So, I'm going to grep this word from output of this cat command. And this is going to read the file Columbia.txt and then saves the output here in the pipe and then gives that output to grep.
Okay. So, if it returns nothing, that means that this file does not have this. There may be a spelling mistake.
Maybe we can find the university. There it is. So, we did the grep on university and we used the grep command only.
But this time we just read the file, save the output in the pipe and then pass it over to grep. And then that is showing us the output of the grep. Okay.
All right. Let me clear my screen. Okay.
So, tail and head. Head and tail are basically useful for previewing large files without opening them. It's great for checking the beginning or end of the logs, configuration files or datasets.
So, I'm going to use head Columbia.txt, show the first ten lines of Columbia.txt by default, including the spaces between the lines as well. So, I'm going to head Columbia.txt and it's showing me the first ten lines and it's including these spaces as well. Okay.
And if I want to specify how many lines from the beginning of the file, I will use hyphen n flag, which stands for number. And I'm going to use five to specify only five lines should be shown to me. There it is.
So, it's basically showing me the five lines, including the spaces. And tail basically, it prints out the last lines, maybe ten lines. I'm going to show Columbia.txt. So, the tail command basically is printing the last ten lines, including the spaces.
And if I want to specify how many lines, I can just specify five. And it's showing me only five lines, including the spaces. All right.
So far, so good. Moving on to the editing. Okay.
So, Linux basically, when it comes to the text editors, there's no single obvious choice, but VI stands out for Linux admins. Why VI? It's simple, yet powerful, but can feel tricky at first. It's preinstalled on all Linux distros, Unix systems, and even Mac operating system.
And when a server crashes, VI is often the only editor available, making it essential for quick fixes. That's why many Linux admins, including myself, rely on it. It's always there when you need it.
So, I'm going to edit this file, which is VI. You can type VI and Columbia.txt, if you want. You can follow along.
You can copy anything from the internet, and then you can follow along. All right. So, some essential commands on VI are maybe there are two modes.
First is insert, and the second is escape. So, if I hit I, you see the change here on the bottom. It says I'm in insert mode.
What it does is I can easily type in anything. Okay. But if I hit escape button, I'm in escape mode.
So, the insert is gone. If I want to type anything, I cannot do that unless I press O. Oops. So, there are two basically modes.
One is insert mode, one is escape mode. Okay. And while you're in escape mode, if you want to go to the bottom of the file, you just press shift G. It will take you all the way down.
And if you press GG again, it will take you all the way to the up. All right. And if you want to go to the end of the line, you just while you're in escape mode, you just press shift and 4. It will take you to the end of the line.
And if you want to go to the start of the line, you just press shift 6, which is going to take you to the start of the line. All right. So, if you want to delete a whole line, you just don't need to do it while you're in insert mode.
You can just type, you can just hit DD, which is going to delete this line. So, I'm going to delete this line just by pressing DD. And if I want to undo it while I'm in escape mode, I just press U, which is short for undo.
Okay. So, so far in the editing tool, we have learned few things. How can we go into insert mode by pressing I? And how can we go to the escape mode by pressing escape? And if you want to go to the bottom of the file, shift G, top of the file, GG, end of the line, shift 4, start of the line, shift 6. And if you want to delete a line just while in escape mode, you just press DD.
And if you want to undo it, you just press U. All right. So, these are some basic commands of the of the editor. You can learn more, but for the sake of time, I'm going to move forward.
Some basic things on the Vim editor still I wanted to show you was if while you're in escape mode, if you want to save the file, you can just press colon and then W to write and then Q to quit. If you don't enter or write W, it will quit the file without the changes you made. Okay.
So, W, Q will, and hit enter, it's going to save the file with your changes. Okay. And if you want to set the numbers of the lines while in escape mode, press colon and then set number.
It's going to set the numbers of the lines. And if you want to remove it, set no number. I think it's together.
And it will remove the and if you want to introduce spell check while you're editing, you can do that too. Just by while in escape mode, set spell, it will check the spellings and tells you whatever is wrong. And set no spell will disable it.
All right. I'm going to quit out of this file. And I'm going to show you what the output redirection is.
So, for example, this greater than operator redirects output to a file. So, for example, if I have a command date, which I showed you in the beginning, that it prints out the date. Okay.
And if I want to save the output of this command, I'm going to use the redirection sign. Date. And redirection.
If a file exists with the name, it will replace the contents of the file if I'm using this redirection sign. So, I'm going to show you what I'm talking about by going into test directory. I'm going to read the intro to Linux.txt file content.
Oops. And then cat intro to Linux. This file has this content.
And what I was talking about was if I want to print out or save the output of this command to my file, it's going to replace the content. So, be very careful about this. And if I read the intro to Linux file now, it's going to print out the different content as compared to before.
So, to summarize, this redirection sign basically, it redirects the output of one command to the file. It's useful for kind of appending the content or maybe adding the content, removing the old content. All right.
So, moving on to the next command. Okay. One more example left.
ls-l. So, this will print out the content of this directory. And if I want to save this output to the same file, intro to Linux, and if I want to read this file, it should have different content now.
Okay. Moving on to the next, I'm going to use the echo. Echo is basically great for scripting, debugging, and quickly displaying values.
It's simple, yet essential for automating tasks in Linux. I'm going to use echo high, and it prints out high to the screen. It's commonly used to display messages, print variables.
And even if you want to write to the file, you can do that. Echo high, and then intro to Linux. And if I want to read this, now this file has different content.
The content has changed because I redirected the output of this echo to this file. Okay. All right.
I'm going to move on to the system health commands, some of the essential ones. These tools are essential for troubleshooting servers, but they're just as useful for your home machine as well.
So we rely on these tools on a daily basis, especially when servers are running slow or encountering issues. For instance, if a server is sluggish, we start with these tools to pinpoint the problem, whether it's resource usage or free disk space or network issues. So knowing how to use these commands is crucial for effective troubleshooting.
I'm going to use free first. It's displaying how much memory is used and how much memory is free, but this is not human readable, so I'm going to make it human readable. It's now making a bit sense.
It's showing me the swap space. Here it is. So free-hm provides a detailed report on your system's memory usage.
The hyphen h option I entered shows the memory in human readable format, while m, I'm going to show you for free-hm, it basically shows it in megabytes before it was showing in the bits or bytes. Moving on to the next command, which is top. It's a powerful tool for system monitoring.
It allows you to spot performance issues quickly and manage processes. It's an essential tool for any Linux admin to keep track of the system health. It's basically perfect for real-time system monitoring.
It shows real-time CPU usage and at least all the running processes using the PID. And there are many options to customize the display. Moving on to another command, edge top.
So, it's a visual, it's basically a next version of top. It's visually appealing, color-coded interface with an intuitive layout. Unlike top, which requires manual PID inputs for actions, edge top allows you to scroll through the processes and interact with them using arrow key.
All right? So, it basically makes your tasks easier to manage. So, edge top displays CPU memory usage as a graphical bars, offers process reviews and allows users to customize the display. It also basically supports the filtering and searching for specific processes, which top lacks by default.
All right? So, I'm going to quit. And you can quit by just pressing Q. VMstat. So, I'm going to show you VMstat.
It basically is a short for virtual memory statistics. It's a lightweight tool that provides a real-time snapshot of the system performance. It reports on CPU usage, memory, swap, disk IO, and system processes.
So, it basically helps diagnose performance bottlenecks. Unlike more detailed tools like edge top and VMstat, sorry, top and edge top, VMstat gives an overview of system health with minimal resource usage. So, making it useful for a quick diagnostics.
Moving on to the next. PS. It's very useful for checking the processes.
I'm going to use this. Processes owned by the users. And I'm going to use a UX.
Okay. So, only PS without the flags. That flag is kind of very detailed about different users.
It's useful for checking the processes owned by the users, giving you a quick overview of what's running on your current session. For more details, as I mentioned, you can use PS-AUX to see all the processes on the system. It lists basically all the processes running in your current session.
And I'm going to show you PS-E. So, that's a lot. A lot of information.
It basically provides a full system-wide view of running processes, including system processes and those owned by the users. So, that's a lot of processes. It is useful for the administrators to monitor system activity and troubleshoot resources, issues.
As I mentioned, PS-AUX lists every process with a lot more information. So, in Linux, every file or directory is represented by an inode, which includes which holds the metadata like permissions, ownerships, and size, but not the file name or content. Okay.
So, each new file consumes one inode. So, even if you have disk space left, you might not be able to create more files if the inode limit is reached. And I'm going to clear my screen.
And if I want to check the system-wide inode usage, I'm going to type df-i, which shows how many nodes are used and how many are free for each file system. So, this is file system, how many let me make it a bit more human readable. So, it is showing me the file system name, inodes, how much it has, how much used, how much free, and in percentage.
Okay. Why is it essential for you? We'll look at how to check the usage of your home directory in Scratch space. This is important to ensure you're not running out of the space when you're running the jobs or storing the data.
So, I'm going to use df-i of my, and this time I'm going to use the absolute path. And if I want to make, so, this is basically showing how much inodes my home directory has as a user. Okay.
And how much inodes I have consumed and how much are free. So, basically, I have like around maybe 60 million inodes available. If I'm not wrong.
Yeah. Around 60 million inodes available for me. And I have consumed 29 million.
Apologies. And I'm going to show you how to check the inodes usage of your Scratch space. Okay.
So, around 600 million inodes for Scratch space. The Scratch space is a much larger area for temporary storage. So, often it's used for the job computations.
All right. So, how to kind of end this SSH session. It's simple.
You can just type exit. You can type log out. You can press control D. You will be logged out of the session.
Any questions, concerns, we are available. Outside of these introductory sessions, Columbia offers additional training materials and workshops such as software carpentry classes where you can receive training in Unix and programming courses like Python. Links to these resources are listed on the slide and you will be able to get those slides after the class.
Here are some books that I really that really helped me both in learning Linux and passing the Red Hat certified system administrator exam. These books cover everything from the basics to advanced topics, making them great for anyone looking to dive deep into the Linux. If you're serious about mastering Linux, these are highly recommended reads.
Over to you, Max. Oh, well done. Good job.
I always find the hardest thing is trying to break down these commands into layman's terms. Great job. As stated at the beginning, next week will be intro to Bash.
I highly recommend anyone who is new to our cluster to attend that workshop. As previously stated, Slurm is the heart of our cluster. If you can't interact with Slurm, that means you can't run any jobs in our cluster.
You need Bash to be able to interact with a You're going to be using Bash to harness the resources such as GPUs, the RAMs, the CPU, the cores. You're going to actually specify your input and output files and directories as well. So, next week is key and I look forward to everyone.
Also, before I promised, 15 minutes at the end for anyone who has any questions for myself or Wacas. Caitlin, how's your SAP going from your end? Hi. I just pasted my syntax in.
It seems like it logs me out when I try to copy this file and I'm not sure why. I was wondering if it was because I'm under the free group or if that matters. That shouldn't matter at all.
What I'm going to do for you, my team and I will touch base with you after the call and we'll go over a few syntax. I think you're using a Windows machine. That's probably what's causing the problem.
So, I'll see the backslashes to the forward slashes. So, we'll touch base with you after the call and we'll drop you an email. Okay.
Thank you. Thank you. Any other questions? I've used other clusters before that use on-demand or something similar for file transfer software.
Is there anything like that or is it all sort of just the SCP? We have Globus. You've probably used it before. It's like a file zero on steroids.
The great thing about Globus, number one, you have a 10 gig data transfer on and off our cluster. Number two, it checkpoints. So, if you're transferring, let's say, 50 gig, your transfer was to fill halfway through.
Globus checkpoints all those transfers, so you're about to pick up back where it left off. So, we have Globus, which is one, and we're looking to implement open on-demand on all our clusters over the next six weeks. Awesome.
Cool. Yeah. Both of them are pretty easy to use.
So, it's pretty cool. Nice. Any other questions? Okay.
Great. All right. Wacas, thanks again.
Like I said, it's a real skill to be able to break those syntax down in layman's terms. I always thought this Linux workshop was one of the hardest. And I hope to see you guys next week.
Like I said, it's pretty essential for you guys to attend to each one. Each one will end at the intro to HPC workshop. So, next week, we're gonna show you how to liaise and talk to a scheduler.
The third workshop, we'll give you the intro to Python. We're gonna streamline your workflows. And obviously, on our clusters, you have multiple resources.
You're gonna have your GPUs, your CPUs and RAM. We're gonna show you how to optimize all those resources using Python. And also, the end games that get you the intro to HPC.
So, I hope to see you all guys there. And thank you, Wacas. And hopefully, see you guys next week.
Thank you, everyone. Thank you. Goodbye.
