1. Introduction
1.1 SCSI's flxibility
One of SCSI's great features is it's flexibility. One can connect up to
7 or even 15 devices on a single adapter, each identified by its unique
SCSI-id. If one uses several LUN's (maximum is 8) per SCSI-id even more
devices can be connected on a single adapter. Furthermore one can easyly
connected and disconnected external devices, to a maximum of 15 x 8 = 120
devices.
1.2 The Linux kernel and SCSI
SCSI's flexibility poses a problem to the Linux kernel. The kernel has
a fixed major device number for all SCSI devices, so it has "only" 256
possible minor numbers to assign to all possible SCSI devices. It needs
these minor Id's hower to uniquely identify the partitions on these devices
as well. Assuming a maximum of 16 partitions per device, the kernel would
need 120 x 16 = 1920 minor id's to identify all the possible devices/partitions
on a single adapter, which exceeds the maximum of 256.
To solve this problem, the kernel assigns minor SCSI device numbers
dynamically. This is done by assigning them only to actually connected
devices in order of their SCSI-id's (mostly ascending).
2. An impractical Linux SCSI problem
2.1 General
A result of the dynamic allocation of minor device numbers is that Linux
can not uniquely identfy a SCSI device with a specified SCSI-id by it's
minor device number, for it's minor depends on the number of connected
SCSI devices with a lower SCSI-id. This means that connecting an external
SCSI device would change the minors of all internal devices with a higher
SCSI-id forcing one to change /etc/fstab when connecting the external device
and to change it back when disconnecting the device.
2.2 Wide SCSI
Normally "Narrow" SCSI allows op to 8 SCSI-id's (from 0 to 7), Wide SCSI
however allows up to 16 SCSI-id's (from 0 to 15). Hence one can connect
up to 15 devices on a wide SCSI adapter. Wide SCSI is "backward" compatible
so one can connect both Wide and Narrow devices on a Wide SCSI adapter,
which is convenient given the fact that most CD-ROMS, TAPE devices and
"older" disks are still Narrow.
When both Narrow and Wide devices are connected on a Wide adapter, one
might prefer to assign the high SCSI-id's (from 8 to 15) to Wide devices,
leaving the lower SCSI-id's available for narrow devices. Unfortunately
this would definitely result in the problems described in 2.2: an external
Narrow Disk would inevidably have a lower SCSI-id than the internal Wide
devices hence changing their minor device numbers.
3. A solution
3.1 The cause of the problem
As described the poblem in 2.1 is caused by the fact that the kernel assigns
minor SCSI device numbers in order of the device's SCSI-id's. This implies
that one could solve this problem if one could make the kernel scan the
devices in a different order.
3.2 The scanorder fix
The fix is a small patch which allows one to specify a boot parameter
"scsi_scanorder" or a scsi_mod module parameter "scanorder" which will make
the kernel scan the SCSI devices in the
specified order.
The current implementation however has some "raw edges"
which may need to be removed. One them is the fact that the scanorder can be
specified to specific host adapters based on their io addres or their base
address. I don't consider this to be ideal, cause inserting another PCI board
may change these addresses. A PCI slot id would be more useful in those cases,
but it is not a field in "struct Scsi_Host". This may not be a real problem, in
which case the actual problem is that I'm lazy :-)
Another "raw edge" is the fact that I'm not sure if it runs on non-intel
hardware. This was my main reason to introduce the "base=.." qualifier. I'm
not sure though if the scanorder fix is portable now...
For example "scsi_scanorder=io=0x330:1,6;base=0xfdffa000:2,3,10;1,8" means to
scan the devices of the adapter at io=0x330 (e.g. an AHA1542) in the following
order: 1,6,0,2,3,4,5,7. The adapter with base=0xfdffa000 (e.g. an AHA2940UW)
will be scanned in the order: 2,3,10,0,1,4,5,6,7,8,9,11,12,13,14,15. Any other
wide adapters will be scanned in the order
1,8,0,2,3,4,5,6,7,9,10,11,12,13,14,15. Other narrow adapters will be scanned in
the order 1,0,2,3,4,5,6,7.
4. Alternatives
4.1 choosing low SCSI-id's for internal devices
One alternative to this fix might be to choose low SCSI-id's for the internal
devices. As described in 2.2 this limits flexibility when the internal
devices are Wide and the external devices are Narrow.
4.2 Devfs
There is a patch which implements devfs. This results in a kernel generated
pseudo /dev directory in which the SCSI devices are given names reflecting
their actual SCSI-id's. Currently devfs still isn't included in the kernel
yet so the described fix might be of help until it is.
4.3 SCSIdev
This is a program that's run during boot. It creates /dev entries which
reflect the actual SCSI-id's etc. like Devfs does.
4.4 Why the scanorder fix?
Alternative 4.1 restricts one's flexibility. Alternative 4.2 is still under
development. Alternative 4.3 fixes a low-levelproblem at a high level.
The scanorder fix fixes the low level problem at a low level. It's fixed
where it originates. It's simple, it's optional, it doesn't hurt.
5. Where to get the scanorder fix
The "latest and the greatest":
http://flits102-126.flits.rug.nl/~rolf/scanorder/scanorder-2.2.14.patch.gz
Previous ones:
http://flits102-126.flits.rug.nl/~rolf/scanorder/scanorder-2.2.10-990730.patch.gz
http://flits102-126.flits.rug.nl/~rolf/scanorder/scanorder-2.2.10-990705.patch.gz
http://flits102-126.flits.rug.nl/~rolf/scanorder/scanorder-2.2.11-990816.patch.gz
http://flits102-126.flits.rug.nl/~rolf/scanorder/scanorder-2.3.13-990818.patch.gz
http://flits102-126.flits.rug.nl/~rolf/scanorder/scanorder-2.3.16-990904.patch.gz
6. Note
I posted a similar fix on linux-kernel about a year ago. This resulted
in a lot of feedback from which I learned about the alternatives in
4. The feedback also gave me reasons to think that connecting both Wide
devices with high SCSI-id's and Narrow devices was a bad idea. It was pointed
out that the 16 bits of the Wide SCSI bus are used for arbitration when
devices want to "have the bus". Because Narrow SCSI only sees
the lower 8 bits arbitration would fail when Wide devices would use the
high SCSI-id's (and hence the upper 8 bits). It was also pointed out however
that there is a certain ranking when arbitrating between several devices
want to have the bus, the ranking is based on SCSI-id's: 7,6,5,4,3,2,1,015,14,13,12,11,10,9,8.
This means that the highest SCSI-id gets the bus, however Narrow SCSI'ids
get it before Wide SCSI-id's. This seems to solve the problem to me, when
both a Wide (high SCSI-id) and a Narrow device want the bus, the Narrow
device gets it, so it doesn't need to know about the wide devices. I'm
not sure however....
...but there's practice as well. I run a 2.2.14 kernel on a dual PII with
a AHA2940U/UW. I have two Wide internal SCSI disks (SCSI-id 0 and 8) and
a Narrow CD-ROM (SCSI-id 6). I sometimes connect an external Narrow disk
(SCSI-id 5), and I _NEVER_ have any problem at all.
7. Feedback
Any feedback can be mailed to me: rolf@flits102-126.flits.rug.nl. Especially
feedback that helps me would be usefull. This means that for me the minor
SCSI device allocation _IS_ a problem to me, and I think I'm not the
only one. So anything that helps me solve this problem (it needn't be my
scanorder fix) is welcome. Don't just tell me that it's wrong, tell me
how to do it right. Thanks.