Monday, May 20, 2013

All good things must come to an end...

Yesterday morning, my wife was doing her weekly Xubuntu updates (I taught her well!). After the updates she rebooted, and... nothing. It didn't come back. I tried, but all I heard was a weird noise coming out of her Dell Inspiron Zino. I figured it would be a dead hard drive, but in the mean time, I swapped my old desktop for her Zino. I put a 500GB drive in the old desktop, popped in the Xubuntu 12.04 LTS CD and did a quick install, then tried to see if I could put her old hard drive in the USB enclosure and get something out of it. No luck, this drive was well and truly dead. No biggie, the backups had run Sunday morning at 2:00 AM, so I installed the nfs-common package, then mounted the backup share from our file server and copied the contents of her home folder over, and within about two hours, she was back up and running.

After a bit of thought, I decided I would put another hard drive in her old computer and use it as my secondary desktop. It got some lower specs than the desktop I traded with her, but it will be more than fine for what I wanted it for. I installed the hard drive with OpenBSD 5.3, which I had installed just a couple of weeks ago. On boot-up, it gave me an error, then came up in single user mode with hard drive issues. After a few choice words, I figured I could just re-install OpenBSD on it, so I grabbed the CD set, popped the x86-64 CD in and rebooted. Nothing. I would up with some cryptic message, basically "CD-ROM: EF". Hmm, perhaps the i386 CD... Nothing. Damn! could the DVD/CD have gone bad, or perhaps the SATA circuitry on the motherboard? On a whim, I stuck the Debian 7.0 (Wheezy) x86-64 netinst in and rebooted. This time, it came up and displayed the Debian installation menu. I decided to try the OpenBSD CD one more time, so I stuck it in and rebooted. Same error as before. Back in went the Debian netinst CD, and after a moderately long amount of time, I now have a working Debian 7.0 desktop:

I'm not really sure what was up with the OpenBSD CDs and the DVD/CD burner/player on the Zino, but the system seems to be working fairly well right now. My intent is to delve into the Django Python web framework, and this system will work just fine for that.

The moral of this story is, all hardware fails, eventually. Plan for it, make backups. All of our important files from our home partitions on our desktops are backed up to the file server each morning at 2:00 AM. Once a week, all of the files on the file server are backed up to an external USB drive. Is this a perfect backup scenario? no, but for now, it works. When I can think of some way to get a copy of the backups off-site, I may implement that as well.

Sunday, May 12, 2013

The Linux console ain't dead yet!

Like most computer users, I do the majority of my work (play?) from a GUI desktop, but unlike most users, I use a mix of GUI and console tools. I'm not dogmatic about using console tools for everything though, with the rich content we now have, some things just work better in a GUI (IE: graphic editing/viewing tools, web browser, email client). Some content, on the other hand, still work just fine in a console environment.

One Internet service I use a lot is IRC. I monitor three different IRC channels (not at the same time!), and have used several IRC clients. I started out using an IRC client named "ScrollZ", but moved to "Epic" because ScrollZ wasn't available for Slackware, and I didn't feel like building it from source. I then flirted with "XChat" for a while, but I find most GUI applications tend to take up too much desktop space (why is that, I wonder?), so I went back to using Epic. After a while, curiosity got the better of me, and I decided to give the "Irssi" IRC client a try. The learning curve from the simpler Epic and ScrollZ IRC clients, to Irssi is fairly steep, but not insurmountable. The kind folks at Irssi even make a nice "Startup HOWTO" to help newbies (like me) get accustomed to Irssi. I've been using Irssi for a few weeks now, and have become more accustomed to it. Here is a screenshot of my chat connection to the Vintage Computer channel today:


Another really nice feature of using console programs is being able to multiplex two or more programs, in the same xterm window, at the same time. This is accomplished by using terminal multiplexer programs like "screen" or "tmux". I currently use screen on my Linux computers, and tmux on OpenBSD systems. I use screen, with one xterm window, to run my calendar and RSS reader programs. This works well for me, because I don't tend to need to watch my calendar program all day, just once or twice during the day. Thus allowing me to leave my RSS reader as the program that is in the foreground of the xterm window. The calendar program I use (and have been using for several years now) is "Calcurse". Calcurse is a wonderfully simple, yet full-featured NCurses-based console program that provides me with my calendar for appointments and my to-do list. One really nice feature of Calcurse is you can call it from the command line, with certain parameters, in order to just display appointments or to-do list items directly to the terminal, without having to start the user interface. Here is a screenshot of my Calcurse program:


The other program I run along with Calcurse within screen is my RSS reader. I have used a RSS reader named "Canto" for a few years now, but I have recently been experimenting with a different RSS reader: "Newsbeuter". So far, I'm finding I like Newsbeuter much better than Canto. Newsbeuter is written in C, and is very fast and responsive. It is also very configurable, and fairly simple to operate (once you get the basics down). Here is a screenshot of Newsbeuter, running in my screen xterm window:


Of course, my editor of choice is "Vim". I've been using Vim for many years now, and have become very accustomed to its keybindings and peculiarities. All of the console programs I've been discussing in this post have been configured to use vi/Vim keybindings. Here's the obligatory screenshot of Vim, editing one of my C programs:


The moral of the story here is, Linux tends to lend itself to using mixed-mode applications, and there is an extremely rich library of console based programs that have been used and refined for years. Don't forget or forsake console programs, just because you like a nice GUI on your desktop.

Thursday, April 25, 2013

Rebuilding my file server

So, I finally got un-lazy yesterday and rebuilt my home web/file server. I've had the pieces/parts for a while now, I just couldn't work up the motivation to actually do the work. Since I was planning on replacing the hard drives wholesale, I decided not to back up the system first, hoping that the current hard drives wouldn't crap out when powered down. Before I began, I took a quick uptime check: 309 days, 22:00 hours. *sniff*. Then I powered it down and opened the case to begin the work.

Wow, the amount of dust and dust bunnies that had built up was kind of epic. I grabbed the home vacuum cleaner and used the hose and soft brush attachment to remove the vast majority of of the dust from the system (had to take out the NIC in order to get behind it). Now the hardware part began. The first thing I did was replace the CMOS battery. In retrospect, I probably should have done that part with the power cord still attached, but I didn't. I'll remember this in the future. The next step was the removal of the two existing hard drives (primary: Seagate Barracuda 160GB, secondary: Western Digital Caviar Blue 500GB), which gave me better access to the RAM slots. After replacing the two 500MB DIMMs with four 1GB DIMMs, I installed the two brand new 1TB Western Digital Blue hard drives and closed the case. Ok, hardware upgrade complete, now for the software upgrade.

My server usually runs headless, but I connected a keyboard and LCD monitor up to it for the OS install. I had previously found some guides for creating a RAID1 array on Slackware (actually, this would work on pretty much any distro), so I popped the Slackware Linux 14.0 DVD (x86-64 side up, of course ;) ) in the drive and powered up the server. During the first part of the Slackware install, you login as root (no password), create your partitions and format them with your desired file system. Setting them up for a RAID1 array isn't much different. The main difference is you choose "FD" (Linux raid partition with autodetect using persistent superblock) for your main partitions, and of course "82" (Linux swap) for you swap partition. If your drives are identical, you only need to partition the first drive (call it /dev/sda). After you've finished with the partitioning, save it and quit. Oh, don't forget to mark your root ("/") partition as bootable! After this, I copied the partition information to my second drive (call it /dev/sdb), using this command:

sfdisk -d /dev/sda | sfdisk /dev/sdb

Once the drives had been properly partitioned, it was time to create the RAID1 arrays. I started with the root partition, then the swap partition, then the /home partition and finally the server partition:

root partition:
mdadm --create /dev/md0 --level 1 --raid-devices 2 /dev/sda2 /dev/sdb2 --metadata=0.90
swap partition:
mdadm --create /dev/md1 --level 1 --raid-devices 2 /dev/sda1 /dev/sdb1
/home partition:
mdadm --create /dev/md2 --level 1 --raid-devices 2 /dev/sda3 /dev/sdb3
/usr/local/pub partition (where the file shares and web server document root will be):
mdadm --create /dev/md3 --level 1 --raid-devices 2 /dev/sda4 /dev/sdb4

After the RAID arrays were created, I ran the "mkswap" command to make the swap partition usable:

mkswap /dev/md1

Now I began the Slackware setup. This followed the standard Slackware install, with the exception of using "/dev/mdx" for the partitions, instead of the usual "/dev/sdx" or "/dev/hdx". After the installation completed, before the first reboot into the system, I had to do a couple more things to ensure the RAID1 arrays would be picked up. The first was to update /etc/lilo.conf to use "/dev/md0" as the boot partition. to do that, you have to chroot into the new build:

chroot /mnt

Then edit /etc/lilo.conf and add this line:

raid-extra-boot = mbr-only

And edit the boot stanza to change it from "/dev/sdax" to "/dev/md0".

Exit from the editor and run the /sbin/lilo command to pick up the changes, then exit from chroot and save the RAID1 array data to the /etc/mdadm.conf file:

exit
mdadm -D --scan >> /mnt/etc/mdadm.conf

After this, I rebooted into my new RAID1 array, Slackware-powered file server. The rest was just configuration stuff, getting the web server set up correctly, getting the Subversion repositories set up and running correctly under Apache, copying the files and directories over from the old hard drives and then setting up NFS and Samba for file sharing. I'm still working on the last of it, but for the most part, my web/file server is up and running. Oh, I took an old 1TB Western Digital Caviar drive, put it in my external USB SATA enclosure and set the whole thing up on the top of the file server. I created one huge 1TB partition, formatted as EXT3, and plan to use that to back up the server.

Don't think this all went as smoothly as it shows above. This evolution was a major learning process for me. The first time I set it up and installed Slackware, I had forgotten to store the RAID array information to /etc/mdadm.conf, and on the first reboot, I only had my swap and root partition. I could have recovered from that, but I opted to do the entire thing over again, as a learning experience for me. I also ran into issues because of my faux pas with the CMOS battery. I use a stand-alone NIC on this box, because the internal NIC that is built into the motherboard doesn't work well under Linux (or at least it didn't before), and when I replaced the CMOS battery, that was re-enabled, so I had to fiddle around to disable the internal NIC in CMOS, then figure out how to tell Slackware that the stand-alone NIC was actually eth0.

Here are some of the documents I used to set up the RAID arrays:

Configuring RAID1 on Slackware 12.0

Slackware 14 (Software) RAID Installation on HDD with Advanced Format (4KB sectors)

Friday, April 19, 2013

My trials with a NAS appliance

I've been working up the intestinal fortitude to rebuild my file server for the past few months now. I purchased 4GB of RAM (a nice upgrade from the 1GB it came with), and two 1TB Western Digital drives (to replace the hodge-podge of drives I currently have in it) that I plan to run in a RAID1 array. I realize what an incredible pain in the back-side this will be, so I've been putting it off.

The other week I was talking with my wife about NAS appliances, and we both decided that it would be a fairly cheap investment (around $200 US, without drives (which I already owned)), so I ordered a Synology DS212j from amazon.com. I had read up on the specs of this little appliance, and it sounded just like what I needed. Heck, you could even get Subversion to run over HTTP on it! When it arrived, I installed the drives and began the setup. It was surprisingly easy, actually. The GUI is fairly well thought out. Once I had created the RAID1 disk set and the volumes, I created the users (my wife and myself), and shared the volumes over NFS and hit a brick wall. Yep, came to a screeching halt. A little bit of background...

I set up my first Linux server back in late 1998. It was an old 486 DX4-120 with 32MB of RAM that I installed Red Hat 5.1 upon. Back then, when I created our user accounts, I was given a UID and GID of 500, and my wife got 501. For the past 14+ years, I've carried those UID/GIDs with us, on every Linux/Unix desktop and server we've owned. Those of you that are fairly new to Linux will realize that modern Linux distros start the user UID at 1000 now. The Synology software starts theirs at 1024, and that is the admin user! My user account started at 1029.

Anybody reading this blog post, who is half-way familiar with NFS, will realize that NFS doesn't use username and password for authentication... it uses, you guessed it, the user's UID. This presents a particularly vexing problem because of a couple of things:

  • The version of NFS on the Synology DS212j does not allow UID mapping.
  • The Synology software will not allow UIDs below 1000
  • If you manually change the user's UID to less than 1000, the user won't appear in the GUI
  • The only way to assign shares to users is through the GUI
  • The Synology software will periodically scan for UIDs less than 1000, and remove them from /etc/shadow

These were basically insurmountable obstacles for me, so I ended up removing my drives, re-packing the Synology DS212j and returning it to amazon.com. And writing a fairly scathing review on amazon's site. The thing that really chapped my back-side on this is the issue has been reported repeatedly since around June 2011 (the earliest I can find on the Internet), and it still hasn't been fixed.

Now I must proceed with my original plans to rebuild my home file server. I'll let you know how that turns out...

Tuesday, April 2, 2013

Beware of Python's Bite

This entry is a bit of a departure from my usual entry. This one is a warning to programmers that come to Python from a Perl background.

I began using Perl in mid-1998, while working as a consultant for the Naval Aviation Depot in Jacksonville, Florida. My partner and I were doing web development in support of the T-45 Engineering group. Our primary development tool was called "Tango", which was a proprietary RAD tool for creating server-side CGI. We stepped into the Perl world totally by accident, when we were asked to add the ability to upload pictures and files to the site. I immediately fell head over heels for Perl, which quickly became my go-to tool for pretty much anything web-related, or for files that needed slicing and dicing.

Fast forward to May of 2005, and now I'm working as a consultant for the IBM office out of Greenville, South Carolina. This team had adopted Python as their "official" go-to tool for pretty much anything they needed to do, so I dutifully purchased a copy of the O'Reilly "Learning Python" book, and began to learn. It took me less time to adapt to Python's idiosyncrasies than I would have thought, and before long I was heavily into Python as well. For my later jobs, I would use Perl or Python more or less interchangeably, depending on the environment and the need, and do so to this day.

Working in Perl, we rapidly grow accustomed to its "fail silently" feature. Some times this is a blessing, and other times it is a royal pain in the ass. Those of us that migrated from Perl to Python tend to miss the fact that Python does not fail silently. On the contrary, it usually fails loudly and all over your terminal, with error messages and stack traces. One thing I had not counted on was its ability to lock up and not exit correctly upon receipt of an error. This past week I was asked to create a monitor script to scrape the /var/log/messages file on our Linux servers, checking for a SCSI error that would show up from time to time, when our servers would lose connection to their SAN shares. I looked around at my stable of pre-built scripts and identified one I had written in Python a year or so ago as one that could be easily modified and pressed into service without much of a delay. I dutifully modified the file, did a quick test on one of the servers that had been experiencing the issue, then pushed the script out to all 108 of our servers in this one application. I then pushed out an updated crontab for our service account to call this script once a minute, to check for the error.

Eight hours later, that entire application infrastructure was locked up tight as a drum. We couldn't even SSH into the servers, because each time we tried, we would receive a "too many open files" error, and get kicked out. A coordinated reboot of all 108 of the servers was implemented, and the offending cron job was stopped.

This blog entry serves as a retrospect for me. Something to look back on the next time I decide to create a "quick-and-dirty" monitoring script. I should also serve as a warning to any other Python programmers, who came into Python via a language like Perl. The root cause of the issue was that I had not wrapped two subprocess.Popen commands in a "try...except" block. This caused a cascading failure because the subprocess.Popen command was inside of a multi-threaded function. Normally, when subprocess.Popen encounters an error it will dump a stack trace and cause the program to exit. For some reason, placing this inside of a separate thread caused the script to hang. When the script hung, it left two system processes running, and 12 files open. Couple this with directly running the script from cron, every minute, and you can see how after 480 minutes or so, your server resources could be depleted to the point of lockup.

What have I learned from this issue? My main take-away is, there is no monitoring issue so important that it causes the developer to not do a thorough testing cycle prior to deployment of his/her script. My second take-away is to always wrap any kind of Python I/O request in a "try...except" block, to ensure we catch any exceptions, and handle them appropriately. My third take-away is to never, ever, trust Python to be run directly from cron again. Always write a wrapper shell script to check to see if an existing instance of the Python script is already running, prior to running another instance of the same script. That one feature would have saved a lot of people from spending a lot of time fixing an issue that never should have existed in the first place.

So, to any Python programmers, or aspiring Python programmers that might stumble across this blog entry, always beware of Python's bite. Be more pro-active in your development. Use the proper exception handling techniques. Do not allow any kind of script to be run from cron, without some kind of check to ensure you don't get runaway processes. Be thorough in your testing. Do both positive and negative testing (did your script find what it was looking for?). But most importantly, do not allow yourself to become too rushed or too complacent. Always be in charge.

Saturday, March 16, 2013

It's March Already?

So, I'm stuck here at work, on a triage call, and decided it was time to update this blog. There hasn't been much going on in my open source world this month. Today does mark a full month using CrunchBang Linux, and I'm starting to enjoy it more.

I kept having issues with the NVidia GT-530 card I was using. Come to find out, it hijacked my audio away from the built-in sound card on my system, and I had to jump through hoops to get it switched back. That was the last straw, so, on March 1st, I logged in to TigerDirect and ordered a MSI R5450 Radeon HD card. It arrived on March 6th, and I quickly renamed the existing xorg.conf file (the one built by the NVidia setup tool), shut down the computer, removed the NVidia card and replaced it with this new MSI Radeon card. I crossed my fingers and restarted the computer, and it came up and just worked. No more issues!

I have to say I'm really liking CrunchBang Linux. CrunchBang Linux is a fairly light-weight distro, and even running the composter, it is still very nimble and doesn't gobble down all of my RAM. The only down-side to using the Openbox window manager (if you can call it a down-side) is having to manually edit the ~/.config/openbox/menu.xml file to add a launcher for new applications that I've installed. This is not a show stopper, just a minor annoyance. You can either edit this file manually, using your favorite editor, or you can use the GUI menu option under "Settings -> Openbox -> Edit menu.xml". After you are done, you will need to restart Openbox (through "Settings -> Openbox -> Restart", or from the console by issuing "openbox --restart")

I have to give my "Stamp of Approval TM" to CrunchBang Linux. It has been many years since I've run Debian, but this Debian-based distro is lean and mean, light and useful. The real test is to see how long I run it without getting the urge to replace it with something else... ;)

Saturday, February 16, 2013

Back on Crunchbang Linux

I'm supposed to be working on a Python program to parse XML files for work (using the ElementTree module), but it is a nice Saturday morning here in Florida, and I'm listening to some nice Jazz (Bill Evans, Escape), so I thought I'd update this blog instead.

I broke down and installed the latest Crunchbang x86-64 to my main desktop last weekend. You know, the one I so rabidly wrote about installing Slackware 14.0 on. For months. Well, as much as I like Slackware, I just got tired of having to compile slackbuild scripts every time I wanted some new application. Crunchbang is based on Debian GNU/Linux, and "Waldorf", the name of the new Crunchbang release, is based on Debian Wheezy (Debian 7.0). Crunchbang is a fairly light weight distro, and uses the openbox window manager.

The Crunchbang installation was not without problems. That same Nouveau driver issue that plagued me with Slackware 14.0 also reared its ugly head with Crunchbang. Fortunately, the remedy was (mostly) the same: edit the Grub (in Crunchbang's case) boot stanza to insert "nomodelines", get it to boot in VESA mode, then install the commercial NVidia kernel module and drivers. The NVidia package takes care of blacklisting the Nouveau driver (which was a nice touch). After reboot I thought my problems were over, but for some reason, my mouse wouldn't work! After Googling for about an hour, and finding nothing, in desperation (because I was about this --> <-- close to ripping Crunchbang off and installing Xubuntu 12.04), I tried unplugging the mouse and plugging it into another USB port. Voila, it worked! So I unplugged it from that one and plugged it back into the original USB port, and it worked there as well. Shaking my head, I rebooted the machine to make sure it came back with full support, including the mouse. It did, and it has been running fine since. Strange.

I'm not going to bore you with yet another screenshot of my desktop, but what I would like to share is some info (and a screenshot) of the excellent terminal emulator that comes with Crunchbang... Terminator. According to the Terminator website: "Terminator is a cross-platform GPL terminal emulator with advanced features not yet found elsewhere.". Indeed. This is one whiz-bang terminal emulator. You can have tabs (just like other "popular" terminal emulators), and you can also split your terminal window into multiple terminal windows:



Cool, right? Yeah, I thought so ;)

On the file server front, I've got two 2GB RAM modules and two 1TB hard drives waiting on me to make the time to rebuild the server. Maybe this weekend will prove a good time to get this done. Got to work next weekend, so it won't happen then.