Friday, November 12, 2010

Enabling ECC memory in Linux without BIOS support

I build computers for reliability and low(er) power; I've been doing so long before the somewhat recent green kick. In particular, I want ECC memory, and a lot of it, and a good power supply. I don't care about CPU speed or the video card. I like to leave my linux box up for months, even a year. And ECC memory is necessary for this. I used to have to buy specific chipsets for Intel processors, but in the past 3 years I have chosen AMD processors solely largely because they support ECC. The AMD Athlon CPUs have a built-in memory controller and it has supported unbuffered ECC RAM all this time. So any motherboard is largely fine... or so I thought.

I finally assembled my new system with a Phenom II X4 and a lovely Gigabyte GA-MA785GM-US2H MB with nice copper wiring and good capacitors. I chose this M/B since it has the latest AMD 785G video, and it supports DDR2 which was cheaper than DDR3 when buying ECC RAM (I've been buying Kingston ECC RAM, and for this system it was 8G of KVR533D2E4K2/4G since it was amazingly cheap.). But this stupid mother ***** does not support ECC in the BIOS, which is a bit odd as the CPU talks to the memory directly. Apparently Gigabyte does not provide for this in their BIOS settings http://forums.amd.com/forum/messageview.cfm?catid=21&threadid=123883, see the response from Gigabyte.

I had the following fails:
  1. Running memtest86+ v4.10, the memory is not recognized as ECC. Argh.
  2. Flashing the latest BIOS for this M/B did not help. Argh.
  3. I tried adding the kernel boot parameter to GRUB ecc_enable_override, but that did not work. Argh.
To make a long story short, the solution is that you can force the Linux kernel module that enables ECC to load via:

% modprobe -v amd64_edac_mod ecc_enable_override=1
To verify that the ECC was turned on run
% dmesg | grep -i edac
And you should see something like:

[ 658.399849] EDAC amd64_edac: Ver: 3.3.0 Sep 19 2010
[ 658.400082] EDAC amd64: This node reports that Memory ECC is currently disabled, set F3x44[22] (0000:00:18.3).
[ 658.400102] EDAC amd64: Forcing ECC checking on!
[ 658.400198] EDAC MC: F10h CPU detected
[ 658.400230] EDAC MC: DCT0 chip selects:
[ 658.400236] EDAC MC: 0: 1024MB 1: 1024MB
[ 658.400242] EDAC MC: 2: 1024MB 3: 1024MB
[ 658.400246] EDAC MC: 4: 0MB 5: 0MB
[ 658.400251] EDAC MC: 6: 0MB 7: 0MB
[ 658.400254] EDAC MC: DCT1 chip selects:
[ 658.400259] EDAC MC: 0: 1024MB 1: 1024MB
[ 658.400263] EDAC MC: 2: 1024MB 3: 1024MB
[ 658.400267] EDAC MC: 4: 0MB 5: 0MB
[ 658.400271] EDAC MC: 6: 0MB 7: 0MB
[ 658.400333] EDAC amd64: This node reports that DRAM ECC is currently Disabled; ENABLING now
[ 658.400339] EDAC amd64: Hardware accepted DRAM ECC Enable
[ 658.401685] EDAC MC0: Giving out device to 'amd64_edac' 'Family 10h': DEV 0000:00:18.2
[ 658.401731] EDAC PCI0: Giving out device to module 'amd64_edac' controller 'EDAC PCI controller': DEV '0000:00:18.2' (POLLED)

The Linux modules that deal with ECC are labelled "enad". Some other commands you can run, are lsmod (to verify the amd enad module is loaded) and dmidecode --type memory (to see how the BIOS is reporting memory, which shows non-ECC RAM in this particular case).

Sunday, May 30, 2010

What shocks me about the falling prices of tech

Technology gets better and cheaper over time. The advances in computer speed, memory size, hard drive size, networking speed are revolutionary but to some degree expected. While it is still a bit hard to stomach, 32 core machines and 5TB hard drives will arrive.

But the cutting edge stuff like a mobile phone that can surf the web and run for 3+ hours still costs decent money.

What shocks me is you can get a low-end CPU and low-end M/B (motherboard) for $29 ($39 with $10 rebate at Fry's). These days, a low end M/B come standard with audio, networking and video. It supports SATA and USB. The CPU and the integrated video is better, especially cooler, than the best you could buy in 2002 for any amount of money. It's akin to saying wait 8 years and the phone they give away for $5 will be better and smaller than the top end iPhone or Android phone.

The kicker is the price. At $29 this means the M/B and the CPU collectively cost less than this. I can imagine the CPU costs say $8, after all it is just a piece of silicon and some packaging (yes, I'm greatly oversimplifying). But a M/B is a big (from a tech standpoint) item of lots of items. There are at least 100 parts that need to be assembled. The chipset alone is as complicated as the CPU. At this price level there are no margins. Which means this level of technology is so well understood anybody can do it for peanuts.

Amazing.

Tuesday, May 25, 2010

The passing of the torch from laptops to cellphones

For over a decade, people have been predicting the coming big thing would be a mobile computer you have with you all the time. It is obvious, the smartphone is this device. The Apple iPhone was the breakthrough device.

What became obvious to me a year ago (shame on me for not blogging about it then), was that this shift has already happened. Yep it will be a completely done deal in another year.

The existing competition for a mobile computer is well a mobile "computer", aka the laptop. And while laptops still outsell smart phones, the epiphany was realizing that a high-end smart phones cost more than the median laptop. And people are lining up in droves to buy them.

The indicators are various.
  1. Price is the ultimate indicator of worth. Including the cost of a data plan, a smart phone is much more expensive than a laptop over 3 years.
  2. The developer mindset has changed. The coolest apps to write are for mobile devices. Traditional desktop devices are passe.
  3. My own behaviour. I check my email as much on my high-end Android phone as much as on my computer. It is so easy.

Sunday, May 23, 2010

The end of public boredom as we know it

We've all passed the time in a line for the DMV, waiting for a bus/train, or even sitting solo at lunch. But this semi-lonely, sem-bored, semi-waste of times has ended. Thanks to the smartphone.

Now you can look busy, intently surfing the web, checking your idle calendar and continually verifying you still don't have any new email.

Monday, January 18, 2010

Be SMART, get advance warning that your hard drive is starting to fail

All modern hard drives support S.M.A.R.T. in which the hard drive (HD) runs self tests and reports on the status of internal metrics (seek errors, block errors, temperature, being dropped, etc). If a HD starts to do poorly in some of the internal metrics, it's one sign the drive might be starting to fail. Unfortunately predicting when the drive will actually failure is impossible right now, akin to predicting earthquakes. But it does give you some advance warning to redouble the backup efforts or to switch out that drive if it holds valuable data.

There exist many tools to access the SMART interface depending on your OS. The key thing is that the HD itself does all the work. The tools just provide access to the HD.

The beauty of SMART is that
  • you can run the SMART tests on a running system on a mounted HD in use! Since the HD does the SMART scheduling internally, it can figure out when and how to continue with it's own test while getting normal OS requests.
  • it's pretty easy to do a manual scan. For Linux, download the smartctl software and then start issuing commands. There's a lot of documentation on the web.
  • with a bit of work you can setup regular SMART background tests and have alerts sent when a HD falls below the built in "failure" thresholds.
The main drawbacks are
  • a thorough "long" test takes several hours. On a 1.5TB drive, it takes 4-8 hours.
  • you need to have the HD hooked up directly via its native interface, namely PATA or SATA. If a drive is hooked up via a USB enclosure, smartctl will claim the drive does not support SMART. Ugh. (I assume eSATA will work, as this is fundamentally SATA).
I'm not going to repeat all the information out there. But here's some decent links.

System tools in Linux and Unix

These keep changing over the years, but here's a quick run down

Hardware related:

ethtool, mii-tool: view / set ethernet device settings and the MII settings

lspci: show PCI devices

lshw: show hardware

lscpu, lshal, lsusb: show CPU, HAL and USB information. Note the USB information is often the USB controllers on your M/B, not the devices actually hooked up.

Linux Kernel

lsmod , rmmod, modprobe: manage Linux kernel modules (which how most device drivers are loaded)

lsof, fuser: show processes using files/directories/file systems. Very useful if you want to unmount a disk, say a removable drive, and you are told the device is still in use. Note that if you run the samba SMB/CIFS file server, it will continue to use a filesystem if at anytime in the past a client used that file system. I restart samba in this case.

mount, umount: mount and unmount filesystems. A trick I often use is to remount a filesystem as read-only or read-write via
mount -o remount,ro /dev/... 
or 
mount -o remount,rw /dev/...
which is much faster than unmounting and remounting.

Changing the DHCP (or any other service) config file in Ubuntu 9.10

This applies to Ubuntu 9.10 Karmic Koala. I did not have this problem in 8.10 Ibex.
I keep my system config files such as for dhcpd (DHCP daemon) in a separate directory with all my system tweaks. I then set up a symbolic link from /etc/dhcpd3/dhcpd.conf --> my config.

But when I started the dhcpd server via /etc/init.d/dhcpd3-server restart
I kept getting the error message "Can't open /etc/dhcp3/dhcpd.conf: Permission denied"

Moving the config to /tmp also did not work, as tested using the -t flag, via
dhcpd3 -t -cf /tmp/dhcpd.conf

Finally, after some hunting, aka Googling, I found the problem was Apparmor, which restricts the files and directories that various services can use. So I added the following line to /etc/apparmor.d/usr.sbin.dhcpd3, where the dir should be the physical dir you get from cd -P in bash, namely remove all symlinks:

...
/dir/holding/my/dhcpd/config/** r
...

I then restarted apparmor via /etc/init.d/apparmor restart

And now I'm golden.

Sunday, January 3, 2010

Essential maintainence of your Windows computer

Over the winter break, my brother brought up his wife's Win XP box which had been rendered unbootable due to a virus downloaded by their son. I pulled out the hard drive or HD and connected it as an external drive to a working XP box. I then spent many hours
  1. removing the viruses. I used MS Security Essentials and then some newly downloaded AV software from malwarebytes.com

  2. trying to get the cleaned HD to boot to no avail. Even booting via "Safe Mode with Command Prompt" which is least demanding boot process would hang in the middle.

  3. copying the personal data in C:\Documents And Settings to a safe place. On this 2005 Acer, this consisted of copying from the main XP partition (80G) to the second "data" partition on the HD (also 80G), as Acer had conveniently split the HD into two partitions, the second being just for data. But I worried that the blind re-install might overwrite both partitions, so I also copied the personal data to another HD. And finally my brother copied select folders to another computer.

  4. finally, running the Acer recovery CD 1. It turns out the recovery was a Norton Ghost image that was 2.0 GB which spanned 3 CDs. Now that the data was safe, the restore took very little time.

About 6 months ago, my own laptop got infected. I was able to remove the virus but the computer was no longer stable.

The moral: Recovering personal data and trying to remove the malware is very time consuming and stressful, as you don't want to trash anything accidently and there is sigficant uncertainty in what happened and what to do. You can easily waste 8-16 hours here. After you finally realize a clean install is necessary, reinstalling the OS is really pretty fast (less than an hour) and stress free.

Here are the essential actions you must do if you own a computer and don't want to lose your data or spend significant time/money/anguish trying to recover precious things. These are all obvious. And fortunately easier than ever before.

  1. (a) Make periodic backups, say every 1-6 months. If you skip this step, eventual disaster is almost certain due to hardware failures, a virus infection, or user error. If you do this step and nothing else, the damage is contained. External USB hard drives are $100 or less for 1 TB (!) of storage at all major retailers (Target, Costco, Longs, CVS, Walmart, etc).
    (b) pick a backup program of your choice and use it. Even just dumb copying "Documents and Settings" to a new folder with the date, say "docs-2010-01-05", gets you most of the protection you need.
  2. Install anti-malware (virus, rootkit, spyware) protection. Use your favorite program if you have one.
    Genuine: If you don't have a preference or don't have any protection, download the excellent free Microsoft Security Essentials which is free, lightweight and fast, comprehensive in that it protects against all sorts of malware, easy to use with a clean UI, and perpetual in that does not have a time limit. Did I mention it was free too? Reviews of it are very good with it catching most malware and it seems to be getting better. The one catch, you must be running a licensed or "genuine" copy of Windows XP, Vista or 7.
    Not necessarily genuine: If you are not running a genuine copy of Windwos, I suggest the AV and anti-spyware that comes with the free Google Pack and choose the anti-malware. (About 4 years ago a nice, super basic Norton Security Scan was included but about 2 years ago Symantec changed it to an annoying crippled Security Scan). As of today the PC Tools AV and anti-spyware package is included, which I have not tested.
  3. Enable automatic updates for Windows. Microsoft releases patches regularly, some of which protect against real threats.
  4. Use Firefox as your browser, and accept the updates. As of Jan 2010, FF is the best browser out there for security and overall usability. It keeps track of known bad web sites and will often warn you if a web site is trying to install something funny.
    What about the other browsers?
    (i) Chrome is very nice, with the ability to kill specific web pages that are causing problems, but it needs support for plugins before I can offer my highest recommendation.
    (ii) IE 8 is the best browser Microsoft has ever produced but IE 7 is only good and IE 6 is just plain scary. IE is not updated very often either, so it's simplest to stay away.
And that's it. These are the essential actions, so I've kept it short.

Saturday, January 2, 2010

Assigning / mapping hard drive letters in Windows XP

Do a google search for "assign usb drive letters", but you basically get the following instructions. I used Windows XP.

Note this technique works for external hard drives that are currently mounted.
  1. Right-click: My Computer -> Manage
    or
    Start Menu -> Settings -> Control Panel -> Administrative Tasks -> Computer Management
  2. In the Computer Management window left pane, left click on Disk Management. (You may need to open / expand the "Storage" item in the left pane.
  3. Find the disk partition in the bottom right pane that you want to assign a different drive letter. Right--click on that partition, which should have the undesired drive letter and typically has the words "NTFS... Healthy". Choose the option to "Change Drive Letter and Paths ..."
  4. Fill in the desired drive letter and you are done. This mapping should persist in the future too.
Keywords: assign drive letters Windows XP, map drive letters Windows XP, drive letters for removable drives.