I often use pidof to find the PID of a running program. It works well with dhclient, dnsmasq and other executable binaries. But pidof runs into problems when trying to find the PID of script files that in turn invoke other programs. Take, for example, the Python 2 program deluge-gtk BitTorrent client.
In the first case, pidof fails to return any PID for the deluge-gtk executable file. In the second case, grepping for deluge-gtk in the output of ps -eF (all processes, extra full format) correctly returns the PID of the BitTorrent client which is executed by Python 2.
Let's take a look at the contents of the deluge-gtk executable file:
In 2016, many people would probably think of using Python modules such as BeautifulSoup, urllib, or requests for scraping and parsing web pages. While this is a good choice, in some cases it can be quicker to scrape web pages using the text browser lynx and parsing the results using grep, awk, and sed.
My use case is as follows: I want to programatically generate a list of rpm packages from Fedora's EPEL X (5, 6, 7), CentOS vault, CentOS mirror, and HP DL server firmware sites. I want this list to be comparable to the output of rpm -qa on RHEL machines. Here are some sample URL's for sites showing rpm package lists:
If you visit any of these links you will find that the basic format is the same -- from the left, the first field is an icon, the second field is the rpm filename, the third field is the date in YYYY-MM-DD, the fourth field is time in HH:MM, and the fifth field is file size.
Here is my bash script which parses file list html pages into a simple text file:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
You can see that lynx renders the page from HTML into regular text and dumps this output to a file if you pass the -dump option. But this is not enough, because lynx by default inserts a newline character in lines greater than 79 characters. To avoid this problem, you must manually set the line width to something larger. The maximum width in lynx is 990 characters, so I specified this value through the option -width=990. Finally the -nolist option removes the list of links that lynx inserts at the bottom of the page.
Using grep I then extract just the lines containing the string ".rpm". Next I replace all tabs with 4 spaces using sed and then use awk to print just the filename field. Finally I use sed to remove the ".rpm" extension from the filenames to make the output identical to the format of rpm -qa. Note that the last sed statement might not render correctly in your browser because I use mathjax on my blog. Unfortunately, the characters I am trying to express are also the tags for a mathjax expression; The sed snippet should appear as follows:
sed "s:\openparens\.rpm\closeparens::g" "${F3}" > "$2"
I have replaced '(' and ')' with openparens and closeparens, respectively due to my blog's mathjax plugin incorrectly interpreting the above expression as a mathjax statement.
If you don't escape ".rpm" with backslashes, '.' will be interpreted as a regex "match any character" which would match strings like "-rpm", ".rpm", "redhat-rpm-config", etc. This is undesirable.
BTW this script is for informational and educational purposes only. It would actually be easier to just invoke lynx with lynx -dump -listonly ... and skip the data munging steps of replacing tabs with spaces using sed. If you do it this way you will get just the links to rpm files from EPEL, CentOS mirror, etc. Then you can return just the filename from each link's path with awk:
CentOS maintains binary compatibility with Red Hat Enterprise Linux, so applications which run on certain versions of RHEL should be able to run without changes on analogous versions of CentOS. Recently, however, a client asked me why executable binaries from the initscripts package (which contains /bin/ipcalc, /bin/usleep, etc) on RHEL 6.X have slightly different file sizes with those from the CentOS 6.X initscripts package.
First, I needed to verify that the source code in the initscripts srpm's for RHEL and CentOS were identical.
I downloaded initscripts-9.03.46-1.el6.src.rpm for RHEL 6.6 from the Redhat partner site and I downloaded initscripts-9.03.46-1.el6.centos.src.rpm from CentOS vault at the following url:
I also did the same for the CentOS 6.6 initscripts package. I then renamed the directories for the extracted srpm's and then used the meld GUI diff tool to compare the .../src as well as the entire extracted initscripts srpm directories for RHEL 6.6 and CentOS 6.6.
As you can see below, the contents of the srpm's are identical:
Compiler options are contained within .../src/Makefile and the options are identical, as you can see from the diff results above. So the binary size differences are not due to differences in the source code, compiler options, or rpm Specfile between RHEL and CentOS.
Next, I did a simple C program compilation test of my own using gcc on a stock installation of RHEL 6.6 and CentOS 6.6.
Here is a simple hello world one-liner I have named hello.c:
#include int main(void) { printf("hello world!\n"); }
If I compile it with gcc with the following options
gcc hello.c -O0 -std=c99 -Wall -Werror -o hello
I still get slightly different file sizes on RHEL and CentOS:
RHEL 6.6 [root@localhost pset1]# ls -al hello -rwxr-xr-x. 1 root root 6473 Aug 9 08:31 hello
CentOS 6.6 [root@localhost pset1]# ls -al hello -rwxr-xr-x. 1 root root 6425 Aug 11 06:15 hello
This is a difference of 48 bytes.
I then used objdump from bintools to examine the assembly code in the compile hello object files. I renamed each object file as hello_rhel66 and hello_cent66, respectively. I am using the -s option with objdump so I can see full contents that also converts hex strings to ASCII.
Apparently the contents of the .interp and .comments section differ between the two binaries. I believe the same holds true for each of the individual binaries from the initscripts package on RHEL 6.6 and CentOS 6.6. Each of the object files may contain different comments and time stamps which will lead to different binary file sizes.
A client recently asked my company to help them convert Red Hat Enterprise Linux 5.11 and 6.8 to the comparable versions of CentOS. Before I present the scripts I wrote to automate this process, I believe that there is a lot of value in a RHEL subscription and better yet, Red Hat employs full-time kernel developers who submit patches and updates to the Linux open source project. In effect, supporting Red Hat indirectly supports the development of Linux as an operating system.
That being said, CentOS has full binary compatibility with RHEL and the only difference is that RHEL-specific logos and artwork are not present in CentOS. On the Internet there are lots of blog posts on converting from RHEL to CentOS but many of these guides are incomplete or provide incorrect information. In order to save others the trial-and-error required to fine-tune this process, I am sharing the scripts I wrote to automate the conversion process. For the purposes of this post, I will be converting RHEL 5.11 and 6.8 to Cent 5.11 and 6.8 but the process applies to all 5.X and 6.X versions.
Step 1
Generate a baseline of all packages from the RHEL installation DVD. The script I wrote takes one parameter, the path to the installation iso mount point and outputs a text file that can be compared with the local output of rpm -qa (which lists all packages currently installed on a RHEL system).
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Step 2
Generate a list of packages which differ from the baseline packages on the RHEL installation ISO. This step is necessary because when you do the RHEL to CentOS conversion, packages will be updated to stock CentOS package versions. If you have some errata packages installed on your RHEL system, these will be updated to stock CentOS package versions. Therefore you need to generate a list of local packages which differ from the rpm's on the RHEL installation DVD/iso.
RHEL 5.X only supports Python 2.4.3, so this is the Python version I wrote the script in. 2.4 was a new experience for me as I am accustomed to using the argparse module from Python 2.7.X instead of optparse. Also 2.4 doesn't support the file opening idiom with open(filename, 'r') as foo: ... instead you have to do something like
f = open(filename 'r') ... f.close()
and remember to manually close file objects after opening them. Fortunately, Python 2.6.X which is used on RHEL 6.X, and Python 2.7.X which is used on RHEL 7.X are backwards-compatible with 2.4.X, so my script works fine on RHEL 6.X, too.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
One issue I encountered when writing the Python script above is that rpm -qa on RHEL 5.X by default does not return the package architecture (i.e. i386, x86_64, i686). To also see package architecture you have to edit /usr/lib/rpm/macros and add the architecture field in the following format:
Fortunately, this is the default on RHEL 6.X so this setting only needs to be changed for RHEL 5.X so you can compare the baseline rpm package list generated in Step 1 with rpm -qa in Step 2.
Step 3
Run the conversion scripts for RHEL 5.11 and RHEL 6.8. Note that before running the scripts you must have mounted the appropriate CentOS installation iso on a mount point which you will specify to the script as a parameter.
Here is the script for RHEL 5.X:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The big difference between RHEL 5.X and 6.X is that for 6.X you need to rebuild the initial ramdisk image to remove the RHEL progress bar at boot.
Note that the script edits entries in /boot/grub/grub.conf so that there will be no references to RHEL in the grub boot menu.
Step 4
Reboot the system and manually upgrade the packages highlighted by rhel-baseline-diff.py in the output file pkg-diff.txt (upgrading errata rpm's to equivalent CentOS versions will require you to separately download and install the relevant packages).
Questions and comments (especially about how to improve the scripts) are welcome!