Software RAID and EVMS Software RAID
The purpose of RAID (Redundant Array of Inexpensive Disks) is to divide information across several hard disk partitions (or even physical disks) while providing redundancy, lower latency, higher bandwidth for reading, and writing and recoverability from crashes. It is also used to form a virtual hard disk with the sum of the partitions, having the optimization of performance and data security as the main objective; unfortunately, to have either one the other has to be sacrificed. Two common techniques are disk striping and disk mirroring, respectively known as RAID level 0 (or RAID-0) and RAID level 1 (or RAID-1).
A RAID controller can be quite expensive, and as software RAID is also able to take on these tasks, without the hardware costs, and even being able to work with IDE disks, it becomes an interesting solution and a viable replacement for hardware RAID.
Beware that as the name implies, this is all done by software, thus stealing processing capacity from the system. If you need a lot of processing power, this might be a costly solution to you. So, if you need real power and are thinking on buying a machine with hot swap, multiprocessor, real fast SCSI system, very large disks, etc., software RAID might not be a solution for you, since using real hardware would add little to the cost of such a machine and performance would be better with hardware RAID.
Also note that the RAID layer is independent of the file system layer, i.e., you can have any of the available file systems using your RAID set. RAID levels
The current RAID for GNU/Linux supports the following levels:
In this mode, two or more disks are combined into one physical device. The disks are appended to each other, so writing to the RAID device will fill up disk 0 first, then disk 1 and so on. Disks do not have to be of the same size.
There is no redundancy in this level. If one disk crashes you will most probably loose all your data. You can however be lucky to recover some data, since the file system will be just missing one large consecutive chunk of data.
The read and write performance will not increase for single reads or writes. But if several users use the device, you may be lucky that one user effectively is using the first disk, and the other user is accessing files which happen to reside on the second disk. If that happens, you will see a performance gain.
Also called stripe mode. It is like linear mode, except that reads and writes are done in parallel to the devices. The devices should have approximately the same size because they fill up equally due to parallel access. If one device is much larger than the other devices then that extra space is still utilized in the RAID device, but this disk will be accessed alone during writes near the end of the RAID device and this reduces performance.
Like linear, there's no redundancy in this level either. But unlike linear mode, you will not be able to rescue any data if a drive fails. If you remove a drive from a RAID level 0 set, the RAID device will not just miss one consecutive block of data, it will be filled with small holes all over the device. Tools like fsck will probably not be able to recover much from such a device.
The read and write performance will increase, because reads and writes are done in parallel on the devices. This is usually the main reason for running RAID-0.
This is the first mode which actually has redundancy. RAID-1 can be used on two or more disks with zero or more spare-disks. This mode maintains an exact mirror of the information on one disk on the other disk(s). Of Course, the disks must be of equal size. If one disk is larger than another, your RAID device will be the size of the smallest disk.
If up to N-1 disks are removed (or crash), all data is still intact. If there are spare disks available, and if the system (eg. SCSI drivers or IDE chipset etc.) survived the crash, reconstruction of the mirror will immediately begin on one of the spare disks, after detection of the drive fault.
Read performance will usually scale close to the number of disks multiplied by the performance of each individual disk, while write performance is the same as on one device, or perhaps even less. Reads can be done in parallel, but when writing, the CPU must transfer N times as much data to the disks as it usually would (remember, N identical copies of all data must be sent to the disks), where N is the number of physical disks.
This RAID level is not used very often. It can be used on three or more disks. Instead of completely mirroring the information, it keeps parity information on one drive, and writes data to the other disks in a RAID-0 like way. Because one disk is reserved for parity information, the size of the array will be (N-1)*S, where S is the size of the smallest drive in the array. As in RAID-1, the disks should either be of equal size, or you will have to be content yourself with S being the the size of the smallest disk.
If one drive fails, the parity information can be used to reconstruct all data. If two drives fail, all data is lost.
The reason this level is not used more frequently, is because the parity information is kept on one drive. This information must be updated every time one of the other disks are written to, transforming the parity disk into a bottleneck situation unless it is a lot faster than the other disks. However, if you just happen to have a lot of slow disks and a very fast one, this RAID level can be very useful.
This is perhaps the most useful RAID mode when one wishes to combine a larger number of physical disks, and still maintain some redundancy. RAID-5 can be used on three or more disks, with zero or more spare-disks. The resulting RAID-5 device size will be (N-1)*S, just like RAID-4. The big difference between RAID-4 and RAID-5 is that the parity information is distributed evenly between the participating drives, avoiding the bottleneck problem of RAID-4.
If one of the disks fail, all data is still intact, because of the parity information. If spare disks are available, reconstruction will begin immediately after the device failure. If two disks fail simultaneously, all data is lost. RAID-5 can survive one disk failure, but not two or more.
Both read and write performance usually increase, but it's hard to predict how much.
It is, however, very important to understand that RAID is not a substitute for good backups. Some RAID levels will make your systems immune to data loss from single-disk failures, but RAID will not allow you to recover from an accidental rm -rf /
Since software RAID can be used with most block devices, it doesn't matter what hardware you have: SCSI, IDE or a mixture of these. This allows for interesting combinations and makes it possible to have RAID-10, that is a RAID-0 on top of a RAID-1 set.
What can be done with GNU/Linux's implementation of software RAID?
GNU/Linux makes RAID available through the use of the md driver. It includes support for the following features:
Automatic Hot Reconstruction: if the array is inconsistent due to a power outage or a replaced disk, it will be rebuilt in the background while the system is running.
Hot Spare: a standby disk will get used if one of the disks in the array fails.
Hot Swap: disks can be changed in a running array.
Configuring software RAID
Since the kernel already has the required support, you will just have to install the mdadm package to have the required tools. There is also an alternative set of tools in the raidtools package, but mdadm alone is enough. Both tools can be used in a complementary way.
The main differences between those tools are:
- mdadm can diagnose, monitor and gather detailed information about your arrays.
- mdadm is a single centralized program and not a collection of disperse programs, so there's a common syntax for every RAID management command.
- mdadm can perform almost all of its functions without having a configuration file and does not use one by default.
Also, if a configuration file is needed, mdadm will help with management of its contents.
We assume you have enough free disks to setup the desired level of RAID, as explained in <[xref] >, and have the mdadm package installed.
If you're building your software RAID with IDE disks, remember to put only one disk per IDE bus. If you have more than one disk on the bus, performance will be reduced.
Lets suppose we want to create a linear level RAID array with partitions sda3 and sdb2. The command we should type, is:
# mdadm --create --verbose /dev/md0 --level=linear --raid-devices=2 /dev/sda3 /dev/sdb2 mdadm: chunk size defaults to 32K mdadm: array /dev/md0 started. #
To be sure that the RAID array is running, you can check the contents of the /proc/mdstat file.
After creating the RAID array, we can create a file system on it as with any other block device:
# mkreiserfs /dev/md0
For different RAID levels you will have to add the number of the desired level in the --level option. Type a 0 for RAID-0, a 1 for RAID-1, and so on.
Spare disks can be specified with the -x or --spare-devices options. You should specify the number of spare devices here and remember that the number of --raid-devices equals the number of disks in the RAID array plus the number of spare disks.
Please refer to the mdadm 8 manual page for more details.
Enterprise Volume Management System (EVMS) is a more ambitious project which provides a mix of the features available in software RAID and Logical Volume Management (LVM).
Currently, EVMS recognizes:
- All locally attached disks
- DOS-style disk partitions (this is the kind used by GNU/Linux)
- GPT disk partitions (mainly used on IA-64)
- S/390 disk partitions (CDL/LDL)
- BSD disk partitions
- Macintosh disk partitions
- GNU/Linux MD/Software RAID devices
- GNU/Linux LVM volume groups and logical volumes
In addition to providing compatibility with these existing systems, EVMS also provides new functionality that can be built on top of any of the above volumes that EVMS already recognizes. Features that are currently included are:
- Bad Block Relocation
- Linear Drive Linking
- Generic Snap-shooting
In addition to these volume-level features, the EVMS tools provide convenient integration with numerous file system tools, to allow tasks such as mkfs and fsck directly from the EVMS user interfaces. Currently, the following file systems are supported:
- ext2 and ext3
Since there's already support in the kernel, all that is needed to start with EVMS is to install some packages.
There are two options of interfaces to use with the command line: a CLI (Command Line Interface) and an ncurses interface. We recommend using the ncurses interface and the CLI interface just for scripting purposes.
Thus, the evms-ncurses and all its required dependencies will have to be installed: urpmi evms-ncurses.
The first step is to check if the values provided with the sample /etc/evms.conf file are in accordance to your needs. If they are not, then you should make a safety copy of this file (you can always recover it if you reinstall, but making a backup when starting with a new program is always good practice) and change the values using your favorite text editor.
You also need to make sure that both /proc and /sys partitions are mounted whenever EVMS is started or being used.
To create new EVMS partitions, you should run evmsn to enter the ncurses interface.
The first screen shown is the Volumes panel. Navigating through this interface you will have an idea of how EVMS represents your disks.
Since there is no EVMS volume created, all the names of the compatibility volumes (the ones EVMS can manipulate with plugins) will be similar to those we have always been using, but with evms on them: /dev/hda1, for example, becomes /dev/evms/hda1.
To use the EVMS-enabled volumes, first you have to save the configuration by choosing Actions -> Save from the menu. Just press A and then S and confirm that you want to save changes. The next step is editing /etc/fstab and adding the names of the partitions with evms on them. After this change, you can reboot your machine and in the next start you'll be running with EVMS-enabled filesystems.
The process of changing a root file system to EVMS is a bit more complex requiring configuring the bootloader and adding an initrd ramdisk with EVMS support. That kind of configuration is beyond the scope of this guide.