aboutsummaryrefslogtreecommitdiff
path: root/drivers/block/drbd/drbd_main.c
AgeCommit message (Collapse)Author
2012-12-06drbd: Remove obsolete checkPhilipp Reisner
Smatch complained about it this redundanct check. The check was introduced in 2006-09-13. On 2007-07-24 the body of the function was enclosed by get_ldev()/put_ldev() reference counting. Since then the check is useless and miss leading. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09Merge branch 'drbd-8.4_ed6' into for-3.8-drivers-drbd-8.4_ed6Philipp Reisner
2012-11-09drbd: fix regression: potential NULL pointer dereferencePhilipp Reisner
recent commit drbd: always write bitmap on detach introduced a bitmap writeout during detach, which obviously needs some meta data device to write to. Unfortunately, that same error path may be taken if we fail to attach, e.g. due to UUID mismatch, after we changed state to D_ATTACHING, but before the lower level device pointer is even assigned. We need to test for presence of mdev->ldev. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: always write bitmap on detachLars Ellenberg
If we detach due to local read-error (which sets a bit in the bitmap), stay Primary, and then re-attach (which re-reads the bitmap from disk), we potentially lost the "out-of-sync" (or, "bad block") information in the bitmap. Always (try to) write out the changed bitmap pages before going diskless. That way, we don't lose the bit for the bad block, the next resync will fetch it from the peer, and rewrite it locally, which may result in block reallocation in some lower layer (or the hardware), and thereby "heal" the bad blocks. If the bitmap writeout errors out as well, we will (again: try to) mark the "we need a full sync" bit in our super block, if it was a READ error; writes are covered by the activity log already. If that superblock does not make it to disk either, we are sorry. Maybe we just lost an entire disk or controller (or iSCSI connection), and there actually are no bad blocks at all, so we don't need to re-fetch from the peer, there is no "auto-healing" necessary. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: Call drbd_md_sync() explicitly after a state change on the connectionPhilipp Reisner
Without this, the meta-data gets updates after 5 seconds by the md_sync_timer. Better to do it immeditaly after a state change. If the asender detects a network failure, it may take a bit until the worker processes the according after-conn-state-change work item. The worker might be blocked in sending something, i.e. it takes until it gets into its timeout. That is 6 seconds by default which is longer than the 5 seconds of the md_sync_timer. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: fix potential list_add corruptionLars Ellenberg
If the md_sync_timer triggers a second time, while the work queued during the first time is still pending, this could result in list_add() of an already added item, and corrupt the work item list. This likely only triggered because of the erroneous batch-dequeueing of work items fixed with drbd: dequeue single work items in wait_for_work() Still, skip queueing if md_sync_work is already queued. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: Protect accesses to the uuid set with a spinlockPhilipp Reisner
There is at least the worker context, the receiver context, the context of receiving netlink packts. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: disambiguation, s/P_DISCARD_WRITE/P_SUPERSEDED/Lars Ellenberg
To avoid confusion with REQ_DISCARD aka TRIM, rename our "discard concurrent write acks" from P_DISCARD_WRITE to P_SUPERSEDED. At the same time, rename the drbd request event DISCARD_WRITE to CONFLICT_RESOLVED. It already triggers both successful completion or restart of the request, depending on our RQ_POSTPONED flag. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: announce FLUSH/FUA capability to upper layersLars Ellenberg
In 8.4, we may have bios spanning two activity log extents. Fixup drbd_al_begin_io() and drbd_al_complete_io() to deal with zero sized bios. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: report congestion if we are waiting for some userland callbackLars Ellenberg
If the drbd worker thread is synchronously waiting for some userland callback, we don't want some casual pageout to block on us. Have drbd_congested() report congestion in that case. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: differentiate between normal and forced detachLars Ellenberg
Aborting local requests (not waiting for completion from the lower level disk) is dangerous: if the master bio has been completed to upper layers, data pages may be re-used for other things already. If local IO is still pending and later completes, this may cause crashes or corrupt unrelated data. Only abort local IO if explicitly requested. Intended use case is a lower level device that turned into a tarpit, not completing io requests, not even doing error completion. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: allow read requests to be retried after force-detachLars Ellenberg
Sometimes, a lower level block device turns into a tar-pit, not completing requests at all, not even doing error completion. We can force-detach from such a tar-pit block device, either by disk-timeout, or by drbdadm detach --force. Queueing for retry only from the request destruction path (kref hit 0) makes it impossible to retry affected read requests from the peer, until the local IO completion happened, as the locally submitted bio holds a reference on the drbd request object. If we can only complete READs when the local completion finally happens, we would not need to force-detach in the first place. Instead, queue for retry where we otherwise had done the error completion. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: base completion and destruction of requests on ref countsLars Ellenberg
cherry-picked and adapted from drbd 9 devel branch The logic for when to get or put a reference is in mod_rq_state(). To not get confused in the freeze/thaw respectively resend/restart paths, or when cleaning up requests waiting for P_BARRIER_ACK, this also introduces additional state flags: RQ_COMPLETION_SUSP, and RQ_EXP_BARR_ACK. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: __drbd_make_request() is now voidLars Ellenberg
The previous commit causes __drbd_make_request() to always return 0. Change it to void. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: remove struct drbd_tl_epoch objects (barrier works)Lars Ellenberg
cherry-picked and adapted from drbd 9 devel branch DRBD requests (struct drbd_request) are already on the per resource transfer log list, and carry their epoch number. We do not need to additionally link them on other ring lists in other structs. The drbd sender thread can recognize itself when to send a P_BARRIER, by tracking the currently processed epoch, and how many writes have been processed for that epoch. If the epoch of the request to be processed does not match the currently processed epoch, any writes have been processed in it, a P_BARRIER for this last processed epoch is send out first. The new epoch then becomes the currently processed epoch. To not get stuck in drbd_al_begin_io() waiting for P_BARRIER_ACK, the sender thread also needs to handle the case when the current epoch was closed already, but no new requests are queued yet, and send out P_BARRIER as soon as possible. This is done by comparing the per resource "current transfer log epoch" (tconn->current_tle_nr) with the per connection "currently processed epoch number" (tconn->send.current_epoch_nr), while waiting for new requests to be processed in wait_for_work(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: move the drbd_work_queue from drbd_socket to drbd_connectionLars Ellenberg
cherry-picked and adapted from drbd 9 devel branch In 8.4, we don't distinguish between "resource work" and "connection work" yet, we have one worker for both, as we still have only one connection. We only ever used the "data.work", no need to keep the "meta.work" around. Move tconn->data.work to tconn->sender_work. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: allow to dequeue batches of work at a timeLars Ellenberg
cherry-picked and adapted from drbd 9 devel branch In 8.4, we still use drbd_queue_work_front(), so in normal operation, we can not dequeue batches, but only single items. Still, followup commits will wake the worker without explicitly queueing a work item, so up() is replaced by a simple wake_up(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: transfer log epoch numbers are now per resourceLars Ellenberg
cherry-picked from drbd 9 devel branch. In preparation of multiple connections, the "barrier number" or "epoch number" needs to be tracked per-resource, not per connection. The sequence number space will not be reset anymore. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: rename drbd_restart_write to drbd_restart_requestLars Ellenberg
Meanwhile, this is used to restart failed READ requests as well. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: use bitmap_parse instead of __bitmap_parsePhilipp Reisner
The buffer 'sc.cpu_mask' is a kernel buffer. If bitmap_parse is used instead of __bitmap_parse the extra parameter that indicates a kernel buffer is not needed. Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: ignore volume number for drbd barrier packet exchangeLars Ellenberg
Transfer log epochs, and therefore P_BARRIER packets, are per resource, not per volume. We must not associate them with "some random volume". Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: fix potential deadlock during "restart" of conflicting writesLars Ellenberg
w_restart_write(), run from worker context, calls __drbd_make_request() and further drbd_al_begin_io(, delegate=true), which then potentially deadlocks. The previous patch moved a BUG_ON to expose such call paths, which would now be triggered. Also, if we call __drbd_make_request() from resource worker context, like w_restart_write() did, and that should block for whatever reason (!drbd_state_is_stable(), resource suspended, ...), we potentially deadlock the whole resource, as the worker is needed for state changes and other things. Create a dedicated retry workqueue for this instead. Also make sure that inc_ap_bio()/dec_ap_bio() are properly paired, even if do_retry() needs to retry itself, in case __drbd_make_request() returns != 0. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: don't pretend that barrier_nr == 0 was specialLars Ellenberg
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: remove now unused seq_num member from struct drbd_requestLars Ellenberg
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: fix potential data corruption and protocol errorLars Ellenberg
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: explicitly clear unused dp_flags in drbd_send_blockLars Ellenberg
We send left-over garbage from the previous packet in P_DATA_REPLY and P_RS_DATA_REPLY packets. That's bad behaviour. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: Move list of epochs from mdev to tconnPhilipp Reisner
This is necessary since the transfer_log on the sending is also per tconn. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: Move write_ordering from mdev to tconnPhilipp Reisner
This is necessary in order to prepare the move of the (receiver side) epoch list from the device (mdev) to the connection (tconn) objects. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: Move the CREATE_BARRIER flag from connection to devicePhilipp Reisner
That is necessary since the whole transfer log is per connection(tconn) and not per device(mdev). This bug caused list corruption on the worker list. When a barrier is queued for sending in the context of one device, another device did not see the CREATE_BARRIER bit, and queued the same object again -> list corruption. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: Fixes from the drbd-8.3 branchPhilipp Reisner
* drbd-8.3: drbd: O_SYNC gives EIO on ramdisks for some kernels (eg. RHEL6). drbd: send intermediate state change results to the peer Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: Fixes from the drbd-8.3 branchPhilipp Reisner
* drbd-8.3: drbd: fix spurious meta data IO "error" drbd: Fixed a race condition between detach and start of resync drbd: fix harmless race to not trigger an ASSERT drbd: Derive sync-UUIDs only from the bitmap-uuid if it is non-zero drbd: Fixed current UUID generation (regression introduced recently, after 8.3.11) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: detach must not try to abort non-local requestsLars Ellenberg
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: Get rid of MR_{READ,WRITE}_SHIFTAndreas Gruenbacher
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: Fix the data-integrity-alg settingAndreas Gruenbacher
The last data-integrity-alg fix made data integrity checking work when the algorithm was changed for an established connection, but the common case of configuring the algorithm before connecting was still broken. Fix that. Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: Turn tl_apply() into tl_abort_disk_io()Andreas Gruenbacher
There is no need to overly generalize this function; it only makes the code harder to understand. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: on attach, enforce clean meta dataLars Ellenberg
Detection of unclean shutdown has moved into user space. The kernel code will, whenever it updates the meta data, mark it as "unclean", and will refuse to attach to such unclean meta data. "drbdadm up" now schedules "drbdmeta apply-al", which will apply the activity log to the bitmap, and/or reinitialize it, if necessary, as well as set a "clean" indicator flag. This moves a bit code out of kernel space. As a side effect, it also prevents some 8.3 module from accidentally ignoring the 8.4 style activity log, if someone should downgrade, whether on purpose, or accidentally because he changed kernel versions without providing an 8.4 for the new kernel, and the new kernel comes with in-tree 8.3. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: detach from frozen backing devicePhilipp Reisner
* drbd-8.3: documentation: Documented detach's --force and disk's --disk-timeout drbd: Implemented the disk-timeout option drbd: Force flag for the detach operation drbd: Allow new IOs while the local disk in in FAILED state drbd: Bitmap IO functions can not return prematurely if the disk breaks drbd: Added a kref to bm_aio_ctx drbd: Hold a reference to ldev while doing meta-data IO drbd: Keep a reference to the bio until the completion handler finished drbd: Implemented wait_until_done_or_disk_failure() drbd: Replaced md_io_mutex by an atomic: md_io_in_use drbd: moved md_io into mdev drbd: Immediately allow completion of IOs, that wait for IO completions on a failed disk drbd: Keep a reference to barrier acked requests Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: Improve compatibility with drbd's older than 8.3.7Philipp Reisner
Regression introduced with 8.3.11 commit: drbd: Take a more conservative approach when deciding max_bio_size Never ever tell an older drbd, that we support more than 32KiB in a single data request (packet). Never believe an older drbd, that is supports more than 32KiB in a single data request (packet) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: Rename --dry-run to --tentativeAndreas Gruenbacher
drbdadm already has a --dry-run option, so this option cannot directly be passed through to drbdsetup. Rename the drbdsetup option to resolve this conflict. For backward compatibility, make --dry-run an alias of --tentative. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: Allow to pass resource options to the new-resource commandAndreas Gruenbacher
This is equivalent to how the attach and connect commands work. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: Convert the generic netlink interface to accept connection endpointsAndreas Gruenbacher
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: Use DRBD_MINOR_COUNT_DEF in one more placeAndreas Gruenbacher
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: Send PROTOCOL_UPDATE packets when appropriatePhilipp Reisner
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: Receiving part for the PROTOCOL_UPDATE packetPhilipp Reisner
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: protect updates to integrits_tfm by tconn->data->mutexPhilipp Reisner
Since we need to hold that mutex anyways to make sure the peer gets that change in the right position in the data stream, it makes a lot of sense to use the same mutex to ensure existence of the tfm. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: Refer to resync-rate consistently throughout the codeAndreas Gruenbacher
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: Rename the want_lose field/flag to discard_my_dataAndreas Gruenbacher
This is what it is called in config files and on the command line as well. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: Removing drbd_cfg_rwsemPhilipp Reisner
* Updates to all configuration items is done under genl_lock(). Including removal of mdevs or tconns. * All read non sleeping read sides are protected by rcu * All sleeping read sides keep reference counts to keep the objects alive Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: Use RCU for the drbd_tconns listPhilipp Reisner
Preparing removal of drbd_cfg_rwsem Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: Refcounting for mdev objectsPhilipp Reisner
Preparing removal of drbd_cfg_rwsem Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>