Understanding /proc
Last week I created a small ps
clone in ruby.
This was done purely out of curiosity, just wondering how does ps
works and how it knows all about current running processes. You can find the project here.
Cool things I learned in the process:
procfs
is a virtual filesystem that stores process data!ps
clone is merely file reading (really).
Exploring procfs
:
At first, I went around the web and read about ps
and Linux processes. This happened to be the first time I was introduced to procfs
reading about it in TLDP (awesome documentation btw).
In summary, all Linux’s processes can be found in /proc
folder. This folder is of procfs
type which, like I said before, is a virtual filesystem and most of its file descriptors point to in-memory data. This is why if you run ls /proc -l
you’ll notice that most files and folders are of size 0.
$ ls -l /proc
dr-xr-xr-x. 9 root root 0 Sep 25 22:10 1
dr-xr-xr-x. 9 root root 0 Oct 1 10:38 10
dr-xr-xr-x. 9 root root 0 Oct 1 12:46 101
dr-xr-xr-x. 9 root root 0 Oct 1 12:46 102
...
Inside /proc
there is one folder for each process running with its pid as name. So I opened one of the folders to see what I could learn about a running process just by reading these filed.
$ ls -l /proc/<pid>
total 0
dr-xr-xr-x. 2 fredrb fredrb 0 Sep 28 23:15 attr
-rw-r--r--. 1 root root 0 Oct 1 10:46 autogroup
-r--------. 1 root root 0 Oct 1 10:46 auxv
-r--r--r--. 1 root root 0 Sep 28 23:15 cgroup
--w-------. 1 root root 0 Oct 1 10:46 clear_refs
-r--r--r--. 1 root root 0 Sep 28 22:41 cmdline
-rw-r--r--. 1 root root 0 Oct 1 10:46 comm
-rw-r--r--. 1 root root 0 Oct 1 10:46 coredump_filter
...
Ok, now I have a bunch of files like autogroup
, gid_map
and maps
that I have no idea what they’re for. A good starting point would be checking for their documentation. But why on earth shouldn’t I just open them?
So I started looping through the files one by one and most of them were completely unreadable for me, until I ran into the golden pot:
$ cat /proc/<pid>/status
Name: chrome
State: S (sleeping)
Tgid: 3054
Ngid: 0
Pid: 3054
PPid: 2934
TracerPid: 0
Uid: 1000 1000 1000 1000
Gid: 1000 1000 1000 1000
FDSize: 64
Groups: 10 1000 1001
VmPeak: 1305996 kB
VmSize: 1232520 kB
...
This is great! Finally something human readable. It contains general data about the process, like its state, memory usage and owner. But is this all I need?
Not satisfied with /proc
file exploration, I decided to run ps
against strace
to see if it is accessing any of the files I found.
$ strace -o ./strace_log ps aux
strace
returns all system calls executed by a program. So I filter strace result by ‘open’ system call and as I suspected (maybe I didn’t) the files being open by operating system were the same I first checked:
$ cat ./strace_log | grep open
[...]
open("/proc/1/stat", O_RDONLY) = 6
open("/proc/1/status", O_RDONLY) = 6
[...]
open("/proc/2/stat", O_RDONLY) = 6
open("/proc/2/status", O_RDONLY) = 6
open("/proc/2/cmdline", O_RDONLY) = 6
open("/proc/3/stat", O_RDONLY) = 6
open("/proc/3/status", O_RDONLY) = 6
open("/proc/3/cmdline", O_RDONLY) = 6
[...]
Ok, so we have stat
, status
and cmdline
files to check, now all we need to do is to parse this and extract what we need.
The code
The implementation turned out to be fairly simple and it comes down to reading files and display its content in an organized matter.
Process data structure
We want to display our data in a tabular way; where each process is a record on this table. Let’s take the following class as one of our table records:
class ProcessData
attr_reader :pid
attr_reader :name
attr_reader :user
attr_reader :state
attr_reader :rss
def initialize pid, name, user, state, rss
@pid = pid
@name = parse_name name
@user = user
@state = state
@rss = rss
end
end
Finding Pid’s for running processes
Take into account what we know so far:
/proc
folder contains sub-folders with all processes- All process folders have their pid as name
So gathering a list of all current pids should be easy:
def get_current_pids
pids = []
Dir.foreach("/proc") { |d|
if is_process_folder?(d)
pids.push(d)
end
}
return pids
end
In order to be a valid process folder it must fulfill two requirements:
- It’s a folder (duh?)
- It’s name contains only number (this is why we have to cast folder name to int)
def is_process_folder? folder
File.directory?("/proc/#{folder}") and (folder.to_i != 0)
end
Extracting process data
Now that we know every pid in the system we should create a method that exposes data from ``/proc//status` for any of them.
But first, lets analyze the file.
$ cat /proc/<pid>/status
Name: chrome
State: S (sleeping)
...
Uid: 1000 1000 1000 1000
This file is organized in the following way: Key:\t[values]. This means that for every piece of data in this file we can follow this same pattern to extract it. However, some lines will have an individual value and others will have a list of values (like Uid)
def get_process_data pid
proc_data = {}
File.open("/proc/#{pid}/status") { |file|
begin
while line = file.readline
data = line.strip.split("\t")
key = data.delete_at(0).downcase
proc_data[key] = data
end
file.close
rescue EOFError
file.close
end
}
return proc_data
end
The method above results in the following structure:
get_process_data 2917
=> {"name:"=>["chrome"],
"state:"=>["S (sleeping)"],
"tgid:"=>["2917"],
"ngid:"=>["0"],
"pid:"=>["2917"],
"ppid:"=>["1"],
"tracerpid:"=>["0"],
"uid:"=>["1000", "1000", "1000", "1000"],
...
Reading user data
User uid and name association is kept in /etc/passwd
file, so in order to show the correct username we must also read this file and parse it.
For the sake of simplicity, let’s just read the whole file and save it in a Hash with key as Uid and value as name.
def get_users
users = {}
File.open("/etc/passwd", "r") { |file|
begin
while line = file.readline
data = line.strip.split(":")
users[data[2]] = data[0]
end
file.close
rescue EOFError
file.close
end
}
return users
end
Creating process records
So far we have found the pids in the system, read the status file and extracted the data. What we have to do now is to filter and organize this data into a single record that will be presented to the user.
current_processes = filesystem.get_current_pids
current_processes.each { |p|
process = create_process p
puts "#{process.name}\t#{process.user}\t#{process.state}\t#{process.command}\t#{process.rss}"
}
def create_process pid
data = get_process_data pid
name = data["name:"][0]
user_id = data["uid:"][0]
state = data["state:"][0]
if data["vmrss:"] != nil
rss = data["vmrss:"][0]
end
user = get_users[user_id]
return ProcessData.new(pid, name, user, state, rss)
end
The reason why we get VMRss
value is because we want to check resident memory values, this means, only what’s stored in the physical memory and not what’s sitting in our disk.
Extra (formatting)
You can format ProcessData text in a tabular way to get a prettier output.
format="%6s\t%-15s\t%-10s\t%-10s\t%-10s\n"
printf(format, "PID", "NAME", "USER", "STATE", "MEMORY")
printf(format, "------", "---------------", "----------", "----------", "----------")
current_processes.each { |p|
process = create_process p
printf(format, process.pid, process.name, process.user, process.state, process.rss)
}
Result:
PID NAME USER STATE MEMORY
------ --------------- ---------- ---------- ----------
1 systemd root S (sleeping) 8444 kB
2 kthreadd root S (sleeping)
3 ksoftirqd/0 root S (sleeping)
...
Conclusion
There is a lot of information that you can find under /proc
folder. This post only covers basic data like name, state and resident memory. But if you dig deep into those files you will find a lot more, like memory mapping and CPU usage.
It was very interesting exploring this part of Linux and hopefully you learned something new with this.
#linux #ruby #command-line #clone
⇦ Back Home | ⇧ Top |
If you hated this post, and can't keep it to yourself, consider sending me an e-mail at fred.rbittencourt@gmail.com or complain with me at Twitter X @derfrb. I'm also occasionally responsive to positive comments.