Nico Kadel-Garcia <nka
...@gmail.com> writes:
> If the RAID1 is configured correctly, it should never write to the
> "degraded" part of the array. This is one of the tricky parts of
> software RAID: it still allows direct access to that part of the array
> from the normal operating system tools. If you corrupt it behind the
> back of software RAID, well, re-assembling it is gong to be a problem.
> Normally that "disconnected" drive would be marked as out of sync at
> boot time, and restoring the array would cause the active disk to be
> mirrored to the second disk. That's why restoring the array takes so
> long: it has to read all of one disk, and verify and potentially write
> to all of the second one. But that kind of problem is inevitable if
> you have removable drives in RAID1, such as USB drives.
> Why did the drive go offline, and when? And are you using software or
> hardware RAID? How did the second drive get written to?
I am using software RAID on two USB drives. I know that
re-syncing can take ages; but I am prepared for this. Yet I must
prevent that one drive gets written to and then the other one
gets also written to, so that concurrent versions emerge and none
of the two drives is the "old" one which can be safely
overwritten with a mirror of the "current" one.
In other words: At each point in time, both drives must have the
same content, or one of them must have only obsolete content.
Lets call the drives A and B. Assume that I remove drive B by
pulling the USB plug. Then I do "touch current-drive" to mark
the remaining drive. Then I shutdown the system, re-connect drive
B and boot again. In all my experiments, this lead to a degraded
array being assembled with partitions from drive A. So far this
is what I needed. I can then re-add the partitions from drive B
with something like "mdadm /dev/mdX -a /dev/sdXX".
However, I also did the following experiment: after pulling the
plug on B, writing the file "current-drive" to A and finally
shutting down, I booted with only B connected. The system got up
and did its fsck (as expected, since the filesystems on B were
not cleanly unmounted before). I then shut the system down,
re-connected drive A and booted again.
In some cases, drive A was used to build the degraded array, and
in some cases drive B was used. I did not detect a pattern here.
This is not very convincing. One must keep in mind that this
series of events may also occur unprovoked: just think of an
unreliable USB hub.
You wrote that the "disconnected" drive would be marked as out of
sync at boot time. I presume this looks like this:
md: kicking non-fresh sda1 from array!
But by what criteria is a drive being categorized as "non-fresh"?