[NTLUG:Discuss] Re: looking for raid & controller advice -- "FRAID" card = "software RAID"
Bryan J. Smith
b.j.smith at ieee.org
Sat Dec 4 10:14:38 CST 2004
[ FYI, there is a further discussion of this in the 2004 April article
of Sys Admin magazine entitled "Dissecting ATA RAID Options." ]
On Sat, 2004-12-04 at 02:54, Kevin Brannen wrote:
> I need to build a file server for my church. Rudundancy is a must,
> since I've lost several drives in the recent past (I'm not too keen on
> the WD2000 right now--I might even have some used ones to sell soon).
> I'm thinking a cheap way to solve this (as opposed to buying a NAS
> solution) is to get a semi-low cost computer, add 1G of RAM for lots of
> cache, and stick a 3ware 7506-4LP in it with 3 250G EIDE drives in a
> RAID-5 config,
Why not 4 drives for the same storage in RAID-0+1?
It will be much, much faster.
> running Linux and serving the files out the Gb network
> port with a Samba server. (Yes, the 2 clients are Win2k, ugh!) So far
> so good. I can get all the parts new, including a spare 4th drive for
> $1500, maybe somewhat less.
> However, not ever having used any of the 3ware products with Linux, I've
> got a few questions:
> * How big is the onboard cache? Can't find that on the website.
You have to understand the difference between a "traditional buffering"
RAID controller and a "3Ware Non-Blocking I/O" RAID controller.
The former is a microcontroller with Dynmaic RAM (DRAM). DRAM is cheap
(including Synchronous Dynamic RAM, SDRAM). But DRAM is a simple cell
design. This means that on a read request, it could be dozens of cycles
before the data comes through. Furthermore, on a write, the
microcontroller (uC) now becomes the "bottleneck" -- because _all_
access is "buffered." With an i960 design, 40-60MBps is maximum, which
is why I highly recommend a StrongARM/XScale controller -- of which, the
ATA RAID cards have just become available (see my previous post). Not
to mention the fact that ATA drives are already non-blocking, direct I/O
-- unlike SCSI, which is more suited for "buffered" I/O.
The latter is an ASIC (application specific integrated circuit) designed
specifically for PCI-to-ATA controller bus arbitration. I.e., data
moves almost _directly_ from the PCI bus to the ATA controller --
non-blocking. For SCSI drives, this matters little. But for ATA
drives, which are already direct access/non-blocking, it's ideal! And
to use non-blocking I/O, you need a 0 wait state memory technology,
hence Static RAM (SRAM -- _not_ to be confused with SDRAM). It is a
_true_ cache. The catch is that SRAM is expensive -- it's not a simple
cell, but a full combinational boolean logic (CBL) circuit.
The uC+DRAM v. ASIC+SRAM approach is like comparing a Linux system as a
router to a Layer-3 switch. The former is a buffering setup with lots
of layers, meaning its lower, but buffer a lot more traffic (as well as
do other things). It's ideal when you don't care about speed, but may
send lots of burst traffic occassionally. The latter buffers less
traffic, but it's a directly wired switches with non-blocking cache so
it a hell of a lot faster. You want the latter when you have a lot of
traffic regularly, and you want to switch it as fast as you can.
uC+DRAM is ideal for RAID-5 writes, because RAID-5 writes have to be
buffered and are slower because of the massive amount of data that has
to be read _first_. It's not the XOR operation that slows down RAID-5,
but the reading of the data to calculate the XOR. That's why uC+DRAM
boards have 64-256MB of SDRAM these days -- and, again, I _highly_
recommend a StrongARM or XScale (e.g., i8033x) I/O processor. Otherwise
an old i960 (e.g., i8030x) I/O processor won't break 50MBps by much.
ASIC+SRAM is ideal where the _entire_ path is non-blocking -- memory
mapped PCI I/O to non-block storage approach (ATA). With RAID-0, 1 and
0+1, you are merely pushing data directly to/from the drives, with the
ASIC providing doubling (RAID-1 or 0+1 writes), and read interleaving
(RAID-1 or 0+1 reads), or direct striping (RAID-0 or 0+1). It can also
cache a small portion for read/writes, as well as its own, embedded OS,
in 0 wait state SRAM. That's 3Ware. The cache is the size of the
SRAM. And that means during an extended set of random, RAID-5 writes,
it can "overflow." (from my FAQ which is not available right now because
my hosting provider is down):
3ware Product Form Bit at MHz (MBps) ASIC SRAM SDRAM ATA
Channels (MBps)
===================== ======= ============== ===== ==== ===== =====================
Escalade 6200 Half 32 at 33 (133) 32 0.5? (2) UDMA4 (66)
Escalade 6400 FULL 32 at 33 (133) 32 0.5 (4) UDMA4 (66)
Escalade 6410 Half 32 at 33 (133) 32 0.5? (4) UDMA4 (66)
Escalade 6800 FULL 32 at 33 (133) 32 1.0? (8) UDMA4 (66)
--------------------- ------- -------------- ----- ---- ----- ---------------------
Escalade 7210 Half 64 at 33 (266) 64 0.5 (2) UDMA5/6 (100/133)
Escalade 7000-2LP Half-LP 32 at 33 (133) 64? 1.0 (2) UDMA6 (133)
Escalade 7006-2LP Half-LP 32 at 66 (266) 64? 1.0 (2) UDMA6 (133)
Escalade 7410 Half 64 at 33 (266) 64 1.0 (4) UDMA5/6 (100/133)
Escalade 7450/7500-4 Half 64 at 33 (266) 64 2.0 (4) UDMA6 (133)
Escalade 7500-4LP Half-LP 64 at 33 (266) 64 2.0 (4) UDMA6 (133)
Escalade 7506-4LP Half-LP 64 at 66 (533) 64 2.0 (4) UDMA6 (133)
Escalade 7800 FULL 64 at 33 (266) 64 1.0 (8) UDMA5/6 (100/133)
Escalade 7850/7500-8 Half 64 at 33 (266) 64 2.0 (8) UDMA6 (133)
Escalade 7506-8 Half 64 at 66 (533) 64 2.0 (8) UDMA6 (133)
Escalade 7500-12 FULL 64 at 33 (266) 64 4.0 (12) UDMA6 (133)
Escalade 7506-12 FULL 64 at 66 (533) 64 4.0 (12) UDMA6 (133)
--------------------- ------- -------------- ----- ---- ----- ---------------------
Escalade 8000-2 Half-LP 64 at 33 (266) 64 1.0 (2) SATA1 (150)
Escalade 8500-4LP Half-LP 64 at 33 (266) 64 2.0 (4) SATA1 (150)
Escalade 8506-4LP Half-LP 64 at 66 (533) 64 2.0 (4) SATA1 (150)
Escalade 8500-8 Half 64 at 33 (266) 64 2.0 (8) SATA1 (150)
Escalade 8506-8 Half 64 at 66 (533) 64 2.0 (8) SATA1 (150)
Escalade 8500-12 Half 64 at 33 (266) 64 4.0 (12) SATA1 (150)
Escalade 8506-12 Half 64 at 66 (533) 64 4.0 (12) SATA1 (150)
--------------------- ------- -------------- ----- ---- ----- ---------------------
Escalade 9500S-4LP Half-LP 64 at 66 (533) 64 2.0? 128 (4) SATA1 (150)
Escalade 9500S-8/MI Half 64 at 66 (533) 64 2.0? 128 (8/2MI) SATA1 (150)
Escalade 9500S-12/MI Half 64 at 66 (533) 64 4.0? 128 (12/4MI) SATA1 (150)
Only the 9500S series now leverages _both_ SRAM + DRAM for the
_ultimate_ performance _regardless_ of RAID level. But you'll pay for
it.
In a nutshell, _no_ sub-$500 RAID-5 uC+DRAM controller I've seen can
match 3Ware 7000/8000 at RAID-0+1 in write performance. With the cost
of ATA drives being so low, it's much more price/performance effective
to go RAID-0+1 IMHO. Unless you are talking 8+ drives.
> * It advertises Linux support,
3Ware has had a _stock_ kernel support since 2.2.15 (yes, that's _2.2_,
not 2.4).
But as with any "intelligent" controller, you should match the firmware
with the OS. I also started these tables in my FAQ:
Linux Kernel Red Hat Kernel 3w-xxxx Driver 3w-9xxx Driver Rec. S/W
Release(s)
============ ============== ============== ============== ===================
(many more to come)
2.4.9 - 1.02.00.007 none 6.8, 7.3.2
- 2.4.9-e.25 1.02.00.027 none 6.9, 7.5.1/2
2.4.18 - 1.02.00.025 none 6.9, 7.5
- 2.4.18-27.8.0 1.02.00.029 none 6.9, 7.5.1/2
2.4.20 - 1.02.00.031 none 6.9, 7.5.3
(many more to come)
------------ -------------- -------------- -------------- -------------------
(many more to come)
2.6.8 2.6.8-1.541 1.26.00.039 2.26.02.001 6.9, 7.7.0/1, 9
(many more to come)
> and software called Disk Manager. Does DM work under Linux?
Yes, there is a specific version for Linux, along with a CLI (command
line interface) version (the two are mutually exlusive). The regular
(non-CLI) DM appears as a web server, and you can then pull up a web
browser to it. It only allows local access as root by default.
Understand that an "intelligent" RAID card puts all of the "RAID brains"
_on_ the card itself -- in its "firmware." It has an on-board
intelligence (an ASIC in the case of 3Ware, a microcontroller in the
case of others), so this is what "drives" _all_ of the RAID operations.
DM is _not_ required. It merely lets you, the admin, oversee RAID build
operations. The on-board firmware _will_ do them _regardless_ of DM
running or not. But it's highly recommended so the card can send
additional information to the system/admin. BTW, I've used the DM2
interface from the 9000 series with 7000/8000 series cards. It's vastly
improved over DM.
3Ware is also working with kernel/service developers to better integrate
its services into the standard kernel/services that most Linux distros
come with. E.g., the SMART daemon, so the 3Ware driver can send SMART
info directly to the Linux service, instead of just via its DM
interface.
> Is it useful? Or do you just tell the card via a BIOS like tool to go
> RAID-5 and the card handles it all automatically and Linux sees the
> card as 1 big drive.
_Both_. _All_ "intelligent" RAID cards have _both_ a BIOS _and_ an
on-board intelligence. That's how they differ from the "FRAID" cards.
With "FRAID" cards, the OS driver/manager puts _all_ of the "work" on
your main CPU. They are _required_. In fact, you _think_ it looks like
"1 big drive" from the standpoint of the OS, but it's the driver that is
doing that, using your CPU and taxing your system interconnect. Yes,
the CPU is talking _directly_ to the ATA drives, and using the FRAID
driver software. A very, very _poor_ solution, especially
performance-wise.
With an "intelligent" RAID card, the OS driver merely is a "dumb block
device driver." The ASIC or microcontroller (uC) _completely_hides_ the
discs from the system. The system _only_ talks to the ASIC/uC. It
can_not_ talk directly to the drive. Because all of that "RAID
intelligence" is on the board in firmware, the "dumb block device
driver" is then very simple, and can be fully GPL -- which the 3w-xxxx
driver is! That's why it's has been in the _stock_kernel_ since 2.2.15!
The DM software merely gives you an interface into the "brains" on the
card. Otherwise, the card does everything _autonomously_ of the
driver. The ASIC/uC catches failures, rebuilds drives, etc... with_out_
DM even being loaded. The driver merely lets the OS read/write from/to
the array, _not_ drive its RAID functionality.
This is different than the "FRAID" cards. Because if you boot a FRAID
card without loading the OS driver, the OS just sees the drives as
"standalone." The FRAID driver is required for the OS to "put it
together as 1 big drive." And the FRAID tools are required to "fix" the
RAID array, _with_ your "main" CPU. FRAID has 0 "intelligence"
on-board, it uses your main CPU for everything.
> * Will this card demand to be the "first drive"?
That's a BIOS setup issue. It's up to your BIOS settings on how you let
the 3Ware card take control of your Int10h functions. But yes, the
3Ware card does have a BIOS.
Again, you seem to be focused on the 16-bit, Int10h "BIOS" services
aspects. They are _not_ used once the OS loads. _All_ off-chipset
ATA/SCSI cards, RAID or not, offer a "BIOS" for booting. So there is
_no_ difference between a "regular" ATA card, a "FRAID" ATA card or an
"intelligent" RAID card -- they _all_ have BIOSes.
> I've got an extra PCI EIDE card in my home computer that insists on being
> hda-hdd. I could live with this but would prefer the MB drive be hda,
> and these drives be hde-hdh.
The first 3Ware array will be /dev/sda, the next /dev/sdb, etc...
assuming you have _no_ other SCSI drives/arrays.
If you are modifying an existing system, you will need to built an
initrd (initial ram/root disk) with the SCSI module, 3Ware card and SCSI
disk drivers. You may also need to tell GRUB to map BIOS disk 80h (C:)
to /dev/sda, if the 3Ware card is booting.
If you are installing a distro new on the 3Ware card, it should do all
this for you.
> * It advertises hot-swap (ain't gonna do it!) and hot-spare.
Yes. Using only 1 ATA drive per channel, this is _very_safe_.
> How does it tell you when it has lost a drive?
It beeps.
If DM is loaded and running, it can send you an e-mail and other alerts.
Depending on the hot-swap bay device, it can blink the light.
> Something in /var/log/messages or at next boot up?
That too.
Understand that DM gives you _lots_ of _automated_ options.
The e-mails are very nice -- especially to a pager.
This is in addition to all the standard Linux logging.
Yes, the 3Ware driver send messages to /var/log/messages.
The 3Ware card _also_ stores messages in its on-board EEPROM.
So in DM, I can read messages back 18+ months -- noting every _single_
sector failure -- stuff that is long gone from /var/log/messages.
Understand that most "intelligent" RAID cards do all this.
They have the on-board brains, RAM, EEPROM, etc...
You must be used to cheapy "FRAID" cards that don't. ;->
> Or something in /proc/whatever that needs to be monitored?
That's what DM taps.
But yes, the 3w-xxxx card sets up a /proc interface.
E.g. (on a very old 3Ware Escalade 6410):
$ ls /proc/scsi
3w-xxxx device_info scsi sg
$ ls /proc/scsi/3w-xxxx/
0
$ cat /proc/scsi/3w-xxxx/0
scsi0: 3ware Storage Controller
Driver version: 1.26.00.039
Current commands posted: 0
Max commands posted: 254
Current pending commands: 0
Max pending commands: 0
Last sgl length: 4
Max sgl length: 32
Last sector count: 32
Max sector count: 256
Resets: 0
Aborts: 0
AEN's: 0
$ cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: 3ware Model: Logical Disk 0 Rev: 1.2
Type: Direct-Access ANSI SCSI revision: ffffffff
> An annoying audible alarm would work, the Promise card says
> it has that.
3Ware does that _regardless_ of anything.
With DM, you have all sorts of other alerting options.
E-mail is very nice.
> * If in the future I want to add 1 more disk because I have room on the
> controller, will the card naturally just "expand"? (if i have to tell
> it in some setup tool, that's OK) Or will I have to save everything
> off, rebuild the whole array, then restore the data? If the latter,
> then maybe I need to add the 4th drive in up front. :-)
DM2 is supposed to allow dynamic rebuilding of a new, expanded layout.
I have not tried this though. And I would _never_ do such though.
I would create a 2nd array. It's faster and safer.
> * On single drive systems, I like to use a journaling file system (I
> prefer ReiserFS on Suse and ext3 on RH). For RAID-5, does a journaling
> FS matter?
No. Volume management is independent of journaling.
Additionally, the 3w-xxxx driver _does_ to a "flush" on shutdown. If
you've seen how newer kernels "flush" the ATA devices (because most ATA
drives have 1-8MB of SDRAM buffer), the 3w-xxxx driver does a "flush" of
its SRAM (and SRAM+SDRAM in the case of the 3w-9xxx driver) at that same
point before shutdown.
> Or because of the redundancy will the faster but potentially
> less reliable ext2 do just fine?
Ext2 is _not_ "less reliable." Journaling does _not_ increase
reliability**, that is a common and poor assumption.
Journaling _only_ improves recovery time when a filesystem is left
"inconsistent" (like on a power failure or improper shutdown).
[ **NOTE: The only exception I know of is Ext3 with full data
journaling _and_ the use of EEPROM memory for the journal. _Then_ you
can have near-guaranteed commits and near-full recovery of those commits
when the filesystem attempts remount after going inconsistent. ]
> (Note, the system *will* be on a UPS, so any outage < 5min probably
> won't even be noticed.)
> For those that have read this far. Has anyone used a Promise TX4000?
Yes, the Promise "FastTraks" are "FRAID" cards. I.e., there is 0
difference between it and a "regular" ATA card.
The Promise "SuperTraks" are their "intelligent" RAID cards. Early
versions use a very slow i960 (33MHz!) that wasn't capable of even
40MBps. The latter versions still seem to be i960 (66-100MHz i8030x)
and still sluggish at anything RAID-0, 1 or 0+1.
FRAID cards means all of the "RAID brains" are in the OS driver. These
brains are licensed from another company, and those can_not_ be GPL.
So you'll either need to load the:
1. Non-GPL vendor FRAID driver, or
2. GPL ataraid "core logic" and the GPL pdcraid "Promise interface"
#1 is kernel-specific, typically, and may not work. #2 is independently
developed, and may not work at all -- or worse yet -- seem to work, but
the assumed format/organization is _not_ the same.
> It's *half* the price of the 3ware one,
And the Promise SuperTrak is mega-$$$ more than the Promise FastTrak.
Why?
"Intelligent" SuperTrak:
i960 + DRAM + ATA controller(s)
"FRAID" FastTrak:
ATA controller-only
Pure hardware cost.
> advertises a XOR engine and resizing capability (for adding that 4th drive
> to give me a 750G device).
I recommend going RAID-5 _unless_ you buy a 3Ware Escalade 9500S with
128-256MB SDRAM SO-DIMM (as well as the on-board 2MB SRAM).
The small 1-4MB SRAM of the 3Ware Escalade 7000/8000 series can easily
overflow with lots of random, RAID-5 writes.
And in any case, _nothing_ beats a 3Ware Escalade series at RAID-0+1 --
especially writes, as well as interleaved reads.
> Since the 3ware is about $240 and the Promise is about $110,
> the difference is almost the cost of my spare drive. Is there any
> reason I should not go for the Promise card? (looking for good & bad
> experiences)
It's a FRAID card. They are basically considered "hell" for Linux
because all of the "brains" are in its drivers and that's a GPL issue.
They also differ *0* from a "regular" ATA card. In fact, there are
often "hacks" to upload the Promise FastTrak BIOS into a $35 Promise
"regular" ATA card.
You're paying $75 extra for _software_, *0* hardware. And your system
interconnect gets the added 2x transfer requirement for mirroring.
With an "intelligent" RAID card, you avoid all that. And since there is
no "brains" in the driver, but on the card itself, there is a 100% GPL
driver. Not only for 3Ware in its 3w-xxxx/9xxx (which are in the stock
kernels), but for the Promise _SuperTrak_ as well.
> Linux also gives me the option of using Software RAID, but that will
> require a 4-channel EIDE card because of the number of drives I want to
> use. Does anyone know if the Promise TX4000 will support a non-RAID
> config; i.e. just be an EIDE controller and not impose HW-RAID on me?
That's _exactly_ what a TX4000 is!
It's a "regular" ATA card with some "trick" 16-bit BIOS and a "trick"
32-bit OS driver. If you don't load the "trick" driver, it is a
"regular" ATA card!
In fact, that's what you'll get when you load Linux on it!
> (please don't tell me how bad Software RAID is, I'm not trying to start
> a big discussion about that, but this is an option I must give due
> deligence to)
But the Promise TX4000 _is_ "software RAID"! That's what a "FRAID" card
does! It does 100% of the RAID in the 32-bit OS driver!
The only difference is that it has a 16-bit BIOS setup and boot. Once
the 32-bit OS loads, it's 100% _useless_.
In fact, it's better to use Linux LVM/MD "software RAID" than to use a
"FRAID" card. Because Linux knows how to better and more optimally
organize the data than the FRAID card.
--
Bryan J. Smith b.j.smith at ieee.org
--------------------------------------------------------------------
Subtotal Cost of Ownership (SCO) for Windows being less than Linux
Total Cost of Ownership (TCO) assumes experts for the former, costly
retraining for the latter, omitted "software assurance" costs in
compatible desktop OS/apps for the former, no free/legacy reuse for
latter, and no basic security, patch or downtime comparison at all.
More information about the Discuss
mailing list