aboutsummaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)Author
2012-11-09Merge branch 'drbd-8.4_ed6' into for-3.8-drivers-drbd-8.4_ed6Philipp Reisner
2012-11-09drbd: Broadcast sync progress no more often than once per secondPhilipp Reisner
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: don't try to clear bits once the disk has failedPhilipp Reisner
If the disk has failed already, there is no point trying to change the bitmap. drbd_set_out_of_sync() already had this safeguard, time to add it to drbd_set_in_sync() as well. This also prevents some warning messages, like FIXME asender in bm_change_bits_to, bitmap locked for 'detach' by worker if our disk fails during resync, while there are some resync acks queued up. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: fix regression: potential NULL pointer dereferencePhilipp Reisner
recent commit drbd: always write bitmap on detach introduced a bitmap writeout during detach, which obviously needs some meta data device to write to. Unfortunately, that same error path may be taken if we fail to attach, e.g. due to UUID mismatch, after we changed state to D_ATTACHING, but before the lower level device pointer is even assigned. We need to test for presence of mdev->ldev. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: Fix clearing of MDF_AL_DISABLEDPhilipp Reisner
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: log request sector offset and size for IO errorsLars Ellenberg
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: always write bitmap on detachLars Ellenberg
If we detach due to local read-error (which sets a bit in the bitmap), stay Primary, and then re-attach (which re-reads the bitmap from disk), we potentially lost the "out-of-sync" (or, "bad block") information in the bitmap. Always (try to) write out the changed bitmap pages before going diskless. That way, we don't lose the bit for the bad block, the next resync will fetch it from the peer, and rewrite it locally, which may result in block reallocation in some lower layer (or the hardware), and thereby "heal" the bad blocks. If the bitmap writeout errors out as well, we will (again: try to) mark the "we need a full sync" bit in our super block, if it was a READ error; writes are covered by the activity log already. If that superblock does not make it to disk either, we are sorry. Maybe we just lost an entire disk or controller (or iSCSI connection), and there actually are no bad blocks at all, so we don't need to re-fetch from the peer, there is no "auto-healing" necessary. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: wait for meta data IO completion even with failed disk, unless ↵Lars Ellenberg
force-detached The intention of force-detach is to be able to deal with a completely unresponsive lower level IO stack, which does not even deliver error completions anymore, but no completion at all. In all other cases, we must still wait for the meta data IO completion. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: a few more GFP_KERNEL -> GFP_NOIOLars Ellenberg
This has not yet been observed, but conceivably, when using GFP_KERNEL allocations from drbd_md_sync(), drbd_flush_after_epoch() or receive_SyncParam(), we could trigger additional IO to our own device, or an other device in a criss-cross setup, and end up in a local deadlock, or potentially a distributed deadlock in a criss-cross setup involving the peer blocked in a similar way waiting for us to make progress. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: fix potential deadlock during bitmap (re-)allocationLars Ellenberg
The former comment arguing that GFP_KERNEL was good enough was wrong: it did not take resize into account at all, and assumed the only path leading here was the normal attach on a still secondary device, so no deadlock would be possible. Both resize on a Primary, or attach on a diskless Primary, could potentially deadlock. drbd_bm_resize() is called while IO to the respective device is suspended, so we must use GFP_NOIO to avoid potential deadlock. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: use list_move_tail instead of list_del/list_add_tailLars Ellenberg
Using list_move_tail() instead of list_del() + list_add_tail(). spatch with a semantic match is used to found this problem. (http://coccinelle.lip6.fr/) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: panic on delayed completion of aborted requestsPhilipp Reisner
"aborting" requests, or force-detaching the disk, is intended for completely blocked/hung local backing devices which do no longer complete requests at all, not even do error completions. In this situation, usually a hard-reset and failover is the only way out. By "aborting", basically faking a local error-completion, we allow for a more graceful swichover by cleanly migrating services. Still the affected node has to be rebooted "soon". By completing these requests, we allow the upper layers to re-use the associated data pages. If later the local backing device "recovers", and now DMAs some data from disk into the original request pages, in the best case it will just put random data into unused pages; but typically it will corrupt meanwhile completely unrelated data, causing all sorts of damage. Which means delayed successful completion, especially for READ requests, is a reason to panic(). We assume that a delayed *error* completion is OK, though we still will complain noisily about it. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: Fix comparison of is_valid_transition()'s return codePhilipp Reisner
is_valid_transition() might return SS_NOTHING_TO_DO. The condition function _req_st_cond() returned SS_NOTHING_TO_DO, which caused the wait_event to abort too early. Therefore drbd_req_state() did not consume the next CL_ST_CHG_SUCCESS or SS_CW_FAILED_BY_PEER causing serve disruption of the state machine logic... Detaching from a single volue was one way to trigger this bug. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: Remove duplicate codePhilipp Reisner
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: differentiate early and later "postponing" of requestsLars Ellenberg
We use the RQ_POSTPONED flag to mark a request for several reasons. It may be a conflicting request in a dual-primary setup, where conflict detection and resolution on the peer decided that this request needs to be re-submitted, it needs to re-enter drbd_make_request() to fix the data divergence caused by these conflicting, partially overlapping, quasi-simultaneous requests. In this case we need to mark the corresponding area as out-of-sync, before we call drbd_al_complete_io(). We also use the RQ_POSTPONED flag to just "push back" a request, before even processing it, if IO is suspended for some reason. In this case, as this request was neither submitted nor sent yet, we must not touch the bitmap. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: Fix postponed requestsPhilipp Reisner
A postponed request might has RQ_IN_ACT_LOG already set, but is POSTPONED before it gets something in the RQ_LOCAL_MASK set. Up to now this caused a left-over active extent. Fix that by only testing for the RQ_IN_ACT_LOG bit in drbd_req_destroy() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: Call drbd_md_sync() explicitly after a state change on the connectionPhilipp Reisner
Without this, the meta-data gets updates after 5 seconds by the md_sync_timer. Better to do it immeditaly after a state change. If the asender detects a network failure, it may take a bit until the worker processes the according after-conn-state-change work item. The worker might be blocked in sending something, i.e. it takes until it gets into its timeout. That is 6 seconds by default which is longer than the 5 seconds of the md_sync_timer. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: Fix postponed requestsPhilipp Reisner
* Postponed requests should not set or clear out-of-sync marks * When a request gets postponed we need to drop its reference mdev->local_cnt (put_ldev()). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: Imporve the error reporting of failed conn state changesPhilipp Reisner
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: Fix the way the STATE_SENT bit is clearedPhilipp Reisner
With merging the commit 'drbd: Delay/reject other state changes while establishing a connection' the condition check for clearing the flag was wrong. Move the bit clearing to the __drbd_set_state() function in order to have it already cleared for the other parts of the function. I.e. clearing the susp_fen in the after_state_ch() function. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: Do not check aspects that are not subject to change in ↵Philipp Reisner
_conn_requests_state() When _conn_requests_state() is used to change other parts of the state than the connection, do not check for a valid connection transition. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: Improve readability of IO resuming after freeze due to no data accessPhilipp Reisner
The previous way of doing the state change was also okay since the state change on the susp flag gets propagated from the mdev to the tconn. Fortunately all this goes away in drbd-9.0 Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: Fix IO resuming after connection was established while executing the ↵Philipp Reisner
fence handler Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: fix potential list_add corruptionLars Ellenberg
If the md_sync_timer triggers a second time, while the work queued during the first time is still pending, this could result in list_add() of an already added item, and corrupt the work item list. This likely only triggered because of the erroneous batch-dequeueing of work items fixed with drbd: dequeue single work items in wait_for_work() Still, skip queueing if md_sync_work is already queued. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: dequeue single work items in wait_for_work()Lars Ellenberg
As long as we still use drbd_queue_work_front(), we must only dequeue the single first item during normal operation. The comment in drbd_worker() even says so, but bc8a5a1 drbd: remove struct drbd_tl_epoch objects (barrier works) introduced the batch dequeueing again via list_splice_init() in wait_for_work(). Change back to list_move() of the first item, if any. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: mutex_unlock "... must no be used in interrupt context"Lars Ellenberg
Documentation of mutex_unlock says we must not use it in interrupt context. So do not call it while holding the spin_lock_irq, but give up the spinlock temporarily. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: Fix a race condition that can lead to a BUG()Philipp Reisner
If the preconditions for a state change change after the wait_event() we might hit the BUG() statement in conn_set_state(). With holding the spin_lock while evaluating the condition AND until the actual state change we ensure the the preconditions can not change anymore. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: temporarily suspend io in drbd_adm_disk_optsLars Ellenberg
drbd_adm_disk_opts() does wait_event(mdev->al_wait, lc_try_lock(mdev->act_log)); drbd_al_shrink(mdev); If the device is very busy, this can take a very long time to succeed. Fix this by temporarily suspending IO, then quickly change the settings, and resume. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: don't send out P_BARRIER with stale informationLars Ellenberg
We must only send P_BARRIER for epochs we actually sent P_DATA in. If we (re-)establish a connection, we reinitialized the send.current_epoch_nr, but forgot to reset send.current_epoch_writes. This could result in a spurious P_BARRIER with stale epoch information, and a disconnect/reconnect cycle once the then "unexpected" P_BARRIER_ACK is received: BAD! BarrierAck #28823 received, expected #28829! Introduce re_init_if_first_write() and maybe_send_barrier() helpers, and call them appropriately for read/write/set-out-of-sync requests. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: properly call drbd_rs_cancel_all() in drbd_disconnected()Lars Ellenberg
drbd_disconnected() is supposed to clear the resync lru cache, by calling drbd_rs_cancel_all(). We must do so before we call drbd_flush_workqueue(), as at least the callback w_restart_disk_io() may wait for resync progres, and would otherwise deadlock. drbd_finish_peer_reqs() may again populate that cache, which will then potentially be stale after the next resync handshake and bitmap exchange, we have to do it again after that. A stale resync lru cache causes no harm but ugly messages like this: BAD! sector=196608s enr=6 rs_left=-256 rs_failed=0 count=256 cstate=SyncTarget Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: Remove dead codePhilipp Reisner
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: Avoid NetworkFailure state during disconnectPhilipp Reisner
Disconnecting is a cluster wide state change. In case the peer node agrees to the state transition, it sends back the fact on the meta-data connection and closes both sockets. In case the node node that initiated the state transfer sees the closing action on the data-socket, before the P_STATE_CHG_REPLY packet, it was going into one of the network failure states. At least with the fencing option set to something else thatn "dont-care", the unclean shutdown of the connection causes a short IO freeze or a fence operation. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: Protect accesses to the uuid set with a spinlockPhilipp Reisner
There is at least the worker context, the receiver context, the context of receiving netlink packts. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: Write all pages of the bitmap after an online resizePhilipp Reisner
We need to write the whole bitmap after we moved the meta data due to an online resize operation. With the support for one peta byte devices bitmap IO was optimized to only write out touched pages. This optimization must be turned off when writing the bitmap after an online resize. This issue was introduced with drbd-8.3.10. The impact of this bug is that after an online resize, the next resync could become larger than expected. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: Fix completion of requests while the device is suspendedPhilipp Reisner
In various places (E.g. CONNECTION_LOST_WHILE_PENDING) the RQ_COMPLETION_SUSP mask is passed in the clear set to mod_rq_state(). The issue was that it tried to clear the RQ_COMPLETION_SUSP bit out of the state mask first, and eventuelly set it afterwards, in the drbd_req_put_completion_ref() function. Fixed that by moving the reference getting out of drbd_req_put_completion_ref() into the mod_rq_state(), before the place where the extra reference might be put. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: Don't unregister socket state_change callback from within the callbackAndreas Gruenbacher
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: disambiguation, s/ERR_DISCARD/ERR_DISCARD_IMPOSSIBLE/Lars Ellenberg
If for some reason (typically "split-brained" cluster manager) drbd replica data has diverged, we can chose a victim, and reconnect using "--discard-my-data", causing the victim to become sync-target, fetching all changed blocks from the peer. If we are Primary, we are potentially in use, and we refuse to "roll back" changes to the data below the page cache and other users. Rename the error symbol for this to ERR_DISCARD_IMPOSSIBLE. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: disambiguation, s/DISCARD_CONCURRENT/RESOLVE_CONFLICTS/Lars Ellenberg
We don't discard anything here, really. We resolve conflicting, concurrent writes to overlapping data blocks. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: disambiguation, s/P_DISCARD_WRITE/P_SUPERSEDED/Lars Ellenberg
To avoid confusion with REQ_DISCARD aka TRIM, rename our "discard concurrent write acks" from P_DISCARD_WRITE to P_SUPERSEDED. At the same time, rename the drbd request event DISCARD_WRITE to CONFLICT_RESOLVED. It already triggers both successful completion or restart of the request, depending on our RQ_POSTPONED flag. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: cleanup, drop unused structLars Ellenberg
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: NEG_ACK does not imply a barrier-ackLars Ellenberg
Don't drop a request from the transfer log just because it was NEG_ACKED. We need it around to be able to verify P_BARRIER_ACKs against the transver log. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: only start a new epoch, if the current epoch contains writesLars Ellenberg
Almost all code paths calling start_new_tl_epoch() guarded it with if (... current_tle_writes > 0 ... ). Just move that inside start_new_tl_epoch(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: Finish requests that completed while IO was frozenPhilipp Reisner
Requests of an acked epoch are stored on the barrier_acked_requests list. In case the private bio of such a request completes while IO on the drbd device is suspended [req_mod(completed_ok)] then the request stays there. When thawing IO because the fence_peer handler returned, then we use tl_clear() to apply the connection_lost_while_pending event to all requests on the transfer-log and the barrier_acked_requests list. Up to now the connection_lost_while_pending event was not applied on requests on the barrier_acked_requests list. Fixed that. I.e. now the connection_lost_while_pending and resend events are applied to requests on the barrier_acked_requests list. For that it is necessary that the resend event finishes (local only) READS correctly. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: Fix a potential issue with the DISCARD_CONCURRENT flagLars Ellenberg
The DISCARD_CONCURRENT flag should be set on one node and cleared on the other node. As the code was before it was theoretical possible that a node accepts the meta socket, but has to close it later on, and keeps the DISCARD_CONCURRENT flag. Correct this by moving the clear_bit(DISCARD_CONCURRENT) where the packet gets sent. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: fix drbd wire compatibility for empty flushesLars Ellenberg
DRBD has a concept of request epochs or reorder-domains, which are separated on the wire by P_BARRIER packets. Older DRBD is not able to handle zero-sized requests at all, so we need to map empty flushes to these drbd barriers. These are the equivalent of empty flushes, and by default trigger flushes on the receiving side anyways (unless not supported or explicitly disabled), so there is no need to handle this differently in newer drbd either. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: More random to the connect logicPhilipp Reisner
Since the listening socket is open all the time, it was possible to get into stable "initial packet S crossed" loops. * when both sides realize in the drbd_socket_okay() call at the end of the loop that the other side closed the main socket you had the chance to get into a stable loop with repeated "packet S crossed" messages. * when both sides do not realize with the drbd_socket_okay() call at the end of the loop that the other side closed the main socket you had the chance to get into a stable loop with alternating "packet S crossed" "packet M crossed" messages. In order to break out these stable loops randomize the behaviour if such a crossing of P_INITIAL_DATA or P_INITIAL_META packets is detected. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: Try to connec to peer only once per cyclePhilipp Reisner
Since now our listening socket is open all the time we will get connection tries of the peer always in. No need to try it three times. This is valid when connecting to older peers as well, it simply increases the probability that the new version DRBD will accept a connection instead that it will establish one. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: Remove redundant and wrong test for NULL simplification in conn_connect()Philipp Reisner
Since the drbd_socket_okay() function itself tests if the the socket is NULL, the explicit test "if (sock.socket && &msock.socket)" was redundent. Apart from that the address opperator ('&') before msock.socket rendered the test pointless. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: pass some more information to userspace.Philipp Marek
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: announce FLUSH/FUA capability to upper layersLars Ellenberg
In 8.4, we may have bios spanning two activity log extents. Fixup drbd_al_begin_io() and drbd_al_complete_io() to deal with zero sized bios. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>