Webalizer across multiple vhosts

Was looking for a way to get webalizer to include stats for all the vhosts on my server and cobbled together this script to run in cron…

cd $logloc
ls | egrep -v 'error|gz|sh|webalizer' | xargs cat > $file
ls | grep gz | egrep -v 'error|sh|webalizer' | xargs gunzip -c >> $file
echo "Sorting $fname"
sort -t ' ' -k 4.9,4.12n -k 4.5,4.7M -k 4.2,4.3n -k 4.14,4.15n -k 4.17,4.18n -k 4.20,4.21n $file > $file.sorted
echo "Runnig Webalizer"
webalizer 2> /dev/null
rm $file
rm $file.sorted
echo "Complete"
exit 0

This grabs data from all my vhost’s access.log files in my apache2 directory, concatenates them into one file, sorts it into date order, and then runs all of it through webalizer. It’s handy to see just how much punishment the web server is putting up with, and what is being accessed the most.

Oh, for this to work you have to setup webalizer with the following line of config config:

LogFile /var/log/apache2/webalizer.sorted

Obtaining network interface information with dladm

Remember the good old days when you had to use ndd to know whether network interfaces on your machine negotiated the bandwidth and duplex settings correctly? And to make matters worse some interfaces would have slightly different ndd getters to obtain that information, which was fairly frustrating sometimes. Well, it’s been a long time coming, but with Solaris 10 you don’t have to that any more. A new fangled dladm utility takes care of abstracting the details of underlying network interface driver and can obtain the details of available network interfaces in a rather simple but quite useful format, all you have to do is invoke dladm with “show-dev” parameter (as on one of my systems):

# dladm show-dev
bge0 link: up speed: 100 Mbps duplex: full
bge1 link: up speed: 1000 Mbps duplex: full
bge2 link: up speed: 1000 Mbps duplex: full
bge3 link: unknown speed: 0 Mbps duplex: unknown

Finding Files with Spaces in Filenames

I guess I’m too old school, I still don’t like using a blank space as a separator between words in a file name — with every opportunity I replace the blank spaces with underscores because white spaces always trigger bugs in many file handling scripts. Unless every command in the script that works with file name can account properly for blank spaces in the file name, you’re risking some funny behavior produced by your otherwise “working-fine” script. One frequent annoyance is when you’re running find on a directory full of files with names containing spaces and trying pipe the output to xargs for further processing. Assuming I’m trying to search for files containing either “foo” or “bar” strings in their contents in the directory containing the following file names:

# ls -1
My Expenses
My Trip
Things To Do

Well, running find is not looking so good with the regular arguments:

# find . -type f -print | xargs egrep "(foo|bar)"
egrep: can't open ./My
egrep: can't open Expenses
egrep: can't open ./My
egrep: can't open Trip
egrep: can't open ./Things
egrep: can't open To
egrep: can't open Do

Not very useful, since find pipes its results as a single string and xargs just breaks it into separate arguments using spaces. The way to make this work is to force the find command to delimit each of the filenames with null character and make xargs to honor these delimiters, so it becomes very simple:

# find . -type f -print0 | xargs -0 egrep "(foo|bar)"
My Expenses:foo
My Trip:bar

One big drawback for this technique is that it is not universal, if you don’t have the GNU version of find and xargs, well, you’re pretty much out of luck. Which would apply to most of us running “classic” Unix systems. The other technique is slightly more subtle using the -exec switch in find, but of course you loose some of the niceties associated with xargs:

# find . -type f -exec egrep -l "(foo|bar)" "{}" \;
./My Expenses
./My Trip

It’s Official – Linux is better than Windows

Yep, you read this right. I’m officially declaring Linux to be more usable than Windows (provided of course that there is parity in available application software). The reason I’m saying this is Linux and Ubuntu in particular has passed the litmus test of usability in my book – my wife has declared Ubuntu Linux to be more pleasurable for use than Windows XP. I can’t say that my wife is passionate about computers, she is a journalist and sees computers and by extension operating systems as just another tool that gets her job done. So there are no emotional ties to any vendor or particular implementation on her part, her approach to computers is purely utilitarian and whatever OS gives her the least amount of anguish wins at the end of the day.

My has been a long time user of Windows and believe it or not it took a little bit of convincing to move her to a Mac, which she now absolutely loves. So she’s Mac OS X on her desktop, but the laptop she’s been lugging around has been running Windows XP up until now. And like with any Windows installation that just somehow mysteriously disintegrates over time (slow boot times, annoyingly slow wake-ups from hibernation, growing suspicions that the laptop in infected with a virus, etc.) , my wife started dreading using her laptop and would simply postpone her work till she gets to the desktop just to avoid the pain and anguish of using it. I can’t blame her, I would have done the same thing. So I decided to shrink the Windows partition and make the laptop dual bootable with Ubuntu Linux.

Installation went without a single hitch and choice of software satisfied all of my wife needs for her professional work – Firefox for web browing, Thunderbird for email and calendar, OpenOffice for office application, vpnc to vpn into a Cisco based VPN network. What do you know, my wife started using the laptop again and she loves it! Using the laptop feels a fair bit more snappier than with Windows XP, there is stronger feeling of security with Linux, and sharing files between the laptop and other computers is actually easier. I don’t think my wife will want to go back to Windows any more.

So there you have it, if a fairly non-technical computer user who is purely utilitarian in its approach to operating systems finds Linux more usable that Windows, well, it is a clear win for Linux and Ubuntu in particular in my book. Very well done for the Ubuntu folks! I really hope the goodness of Linux keeps on catching on among non-techies.

How to Gain Root Access in Ubuntu

In Ubuntu you are restricted to using a normal user. You must type sudo before any useful command to get it to work. When you are ready to move up to root status on your machine, here is how you can do so.

username@my-machine~# sudo su -
Enter password: (put in your password)

You can also set root’s password by doing this as a normal user:

username@my-machine~# sudo passwd root
New UNIX password:
Retype new UNIX password:

After doing that you may now change to root by doing just:

username@my-machine~# su -

How to Remove Powered By PHPlist Image

Go to the site root, for example /var/www/phplist .
edit admin/sendmaillib.php .
Replace this entire section:

$html["signature"] = $PoweredByImage;#'<div align="center" id="signature"><a href="http://www.phplist.com"><img src="powerphplist.png" width=88 height=31 title="Powered by PHPlist" alt="Powered by PHPlist" border="0"></a></div>';
# oops, accidentally became spyware, never intended that, so take it out again :-)
$html["signature"] = preg_replace('/src=".*power-phplist.png"/','src="powerphplist.png"',$html["signature"]);
} else {
$html["signature"] = $PoweredByText;


$html["signature"] = "This is my new signature!!";

Or even

$html["signature"] = ""; //I will come back and add one later, maybe..

Linux Physical to Virtual Conversion in XenServer ( P2V )

When you need to convert a CentOS machine into a Virtual Machine on XenServer, you may find you’re out of luck. I know I did, as I scoured the internet for a solution and never quite found one.
What you need:

1. A physical machine with linux installed. In this case we will use CentOS 5.
2. A XenServer Host which you can make a VM on.
3. Network connectivity between the two.
4. Root access to both machines.
Physical access to the physical machine is not needed if the steps are done correctly.

1. First on the physical server run the command:

# mount

What you are looking for here is any directories which are mounted from other devices, NFS mounts, etc. Anything not local to the disk. Keep track of the paths you find.
2. Next you will need to make sure that when you start runlevel 1, your networking and sshd will come up. In many cases running init 1 from another runlevel will preserve these. Better safe than sorry.

# chkconfig –level 1 sshd on
# chkconfig –level 1 network on

On debian systems or other linux systems you may need to do other commands instead. If in doubt you could probably get away with:

# echo /etc/init.d/network start >> /root/.bashrc
# echo /etc/init.d/sshd start >> /root/.bashrc

Or where ever your sshd and network daemons are located.

3. Next you need to go into runlevel 1 on the physical machine (beginning of down time for some, many or all services):

# init 1

4. After this is done, get back into the box if you were disconnected.
5. Now on the XenServer Host you need to create a VM. What I did which was successful was I made a VM with the exact same version of linux you are trying to migrate.
6. After installing the VM, enter it into runlevel 1 as well:

# init 1

7. rsync the physical server to the virtual server, by issuing this command. Each of the directories you may have found which are non-local filesystems will need an exclude. On the VM run:

# rsync -av –numeric-ids –delete –progress –exclude /sys –exclude /boot –exclude /dev –exclude /proc –exclude /etc/mtab –exclude /etc/fstab –exclude /etc/udev/rules.d –exclude /lib/modules physical-server-ip-or-hostname:/ /

8. Shut the physical box down:

# init 0

9. If you are planning on using both the VM and the physical machine, change the IP settings on the VM to the new ones:

# vi /etc/sysconfig/network-scripts/ifcfg-eth0
# vi /etc/hosts
# vi /etc/sysconfig/network

10. Remember to change IP settings on things like mysql user permissions, http.conf, conf.d/* files, etc.
11. Reboot the VM:

# init 6

Exit Strategy for BASH Scripts

For my more serious BASH scripts I like to have a predefined exit strategy layout at the top of the script, so it is easy to obtain the exit values along with make changes to the responses made by the script in different situations. Here’s how I do it.

 case $1 in
   echo You must be root to run this script
   echo You specified a file that does not exist
   echo This is yet another error message
 exit $1

And now all I need to do to give an error would be for example:

[[ -f $some_file ]] && echo 'tacos' >> $some_file || exiter 2

Nice, huh.

Remove Duplicate Files in Subdirectories Using MD5sum

Sometimes I find myself in a situation where I have combined tons of directories and files into one parent directory and I want to delete all copies of files. For example I may have combined thousands of MP3s into one directory. Even though the names may be different, some of the files may be the same. This is a script which will keep only the first file found of each type. So in other words if you have 3 songs that are called the same thing or even different things, but they are in fact the same exact file, this script will leave you with only 1.

Warning! This script does not ask you any questions and it tells no no lies. It will systematically destroy all matching files without a second thought. It will also follow symlinks so you’ve been warned.

# clear out previous md5sums
echo > /md5s

# this will find all gz files. I wrote this script in Solaris so some things are a bit more generic. For example on Linux you can use -iname instead of -name, to get .Gz, .GZ. .gZ and .gz . However the Solaris version I was on did not support this. Also you can usually leave out the '.' in linux.
# to search other files simply change the value inside the quotation marks, examples: "*.mp3". On many versions of find you can use more advanced syntax with boolean operators as well.
for x in $( find . -name "*.gz" )
          sum=$( md5sum $x | cut -d' ' -f1 )
          echo trying $sum
          if grep $sum /md5s
                    echo removing duplicate: $x
                    # remove this rm line to do a dry run
                    rm $x
                    echo $sum >> /md5s

Linux msync braindamage

About msync() system call

Contemplating a possible implementation of log-based transactional system and looking at the UNIX API it seems natural to employ msync() function for memory mapped log files.

     msync - synchronize memory with physical storage

     int msync(void *addr, size_t len, int flags); 

Indeed msync specification states the following:

The msync() function should be used by programs that require a memory object to be in a known state; for example, in building transaction facilities.

The idea is that the log file should be mmaped to the address space of the process, the log data is written to the memory and synchronized with disk by the power of OS virtual memory mechanism. This way there is no need to allocate in-memory buffer for log data and call write() when the buffer is full. Instead just when the transaction is to be committed, exactly the portion of the mmaped log that contains the transaction data is msynced and that’s it. Concurrently the data for other transactions can be written further down the log and stay cached in memory avoiding unnecessary I/O. Additional appeal to msync() gives the existence of two modes MS_ASYNC and MS_SYNC:

When MS_ASYNC is specified, msync() shall return immediately once all the write operations are initiated or queued for servicing; when MS_SYNC is specified, msync() shall not return until all write operations are completed as defined for synchronized I/O data integrity completion.

I can’t help but think that msync() was introduced to UNIX specifically to cater DBMS people. This cannot be a coincidence. This is just what one would want developing a DBMS engine.

Okay, so far I referred to POSIX and UNIX. However currently probably most attention deserves one particular implementation of POSIX API, namely, Linux. Just checking the Linux msync() man page it seems that everything is good. It pretty much conforms the POSIX specification. Or so it says.

Once you start wondering what is situation on the ground the picture becomes more complicated. One interesting tidbit can be found in FreeBSD man page:

The msync() system call is obsolete since BSD implements a coherent file system buffer cache. However, it may be used to associate dirty VM pages with file system buffers and thus cause them to be flushed to physical media sooner rather than later.

This is confusing. The purpose of msync is to ensure data integrity. I understand that if a process crashes then its modified mmapped data still remains in the system cache and at some point it will be synchronized with physical storage. So far so good. But what if the whole system crashes? Without msync() this will result in the data loss. Or are they saying that their msync() merely causes the page flush to happen somewhat earlier but not right away? So on FreeBSD there is no big difference whether you msync() or not as it provides no integrity guarantee anyway? Well, I don’t have answers to these questions as now I don’t want to spend much time on FreeBSD research, I’m more focused on the Linux.

About msync() on Linux

So what is about msync() on Linux precisely? In the fairly recent Linux release the following comment could be found in the file linux/mm/msync.c:

 * MS_SYNC syncs the entire file - including mappings.
 * MS_ASYNC does not start I/O (it used to, up to 2.5.67).
 * Nor does it marks the relevant pages dirty (it used to up to 2.6.17).
 * Now it doesn't do anything, since dirty pages are properly tracked.
 * The application may now run fsync() to
 * write out the dirty pages and wait on the writeout and check the result.
 * Or the application may run fadvise(FADV_DONTNEED) against the fd to start
 * async writeout immediately.
 * So by _not_ starting I/O in MS_ASYNC we provide complete flexibility to
 * applications.

So let me summarize the current status of msync() on Linux:

  • msync(…, MS_ASYNC) is effectively noop
  • msync(…, MS_SYNC) is effectively equal to fsync()

The bottom line is msync() is completely useless on Linux. It cannot help with transaction log idea described above and for that matter it cannot help with anything else. The comment in the source code suggests to use other system calls. At the same time the Linux man page for msync() is absolutely misleading. It makes it apear that everything’s fine, that it fully implements the UNIX specifications.

Okay, should we stop here? Or is there more to learn yet? Sure, it is.