aboutsummaryrefslogtreecommitdiff
path: root/drivers/block/drbd/drbd_nl.c
AgeCommit message (Collapse)Author
2012-12-06drbd: Fix drbdsetup wait-connect, wait-sync etc... commandsPhilipp Reisner
This was introduces when moving the code over from the 8.3 codebase with commit 328e0f125bf41f4f33f684db22015f92cb44fe56 Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-12-06drbd: respect no-md-barriers setting also when changed online via disk-optionsLars Ellenberg
We need to propagate the configuration into the flag bits, or it won't be effective. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09Merge branch 'drbd-8.4_ed6' into for-3.8-drivers-drbd-8.4_ed6Philipp Reisner
2012-11-09drbd: Broadcast sync progress no more often than once per secondPhilipp Reisner
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: Fix clearing of MDF_AL_DISABLEDPhilipp Reisner
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: always write bitmap on detachLars Ellenberg
If we detach due to local read-error (which sets a bit in the bitmap), stay Primary, and then re-attach (which re-reads the bitmap from disk), we potentially lost the "out-of-sync" (or, "bad block") information in the bitmap. Always (try to) write out the changed bitmap pages before going diskless. That way, we don't lose the bit for the bad block, the next resync will fetch it from the peer, and rewrite it locally, which may result in block reallocation in some lower layer (or the hardware), and thereby "heal" the bad blocks. If the bitmap writeout errors out as well, we will (again: try to) mark the "we need a full sync" bit in our super block, if it was a READ error; writes are covered by the activity log already. If that superblock does not make it to disk either, we are sorry. Maybe we just lost an entire disk or controller (or iSCSI connection), and there actually are no bad blocks at all, so we don't need to re-fetch from the peer, there is no "auto-healing" necessary. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: Call drbd_md_sync() explicitly after a state change on the connectionPhilipp Reisner
Without this, the meta-data gets updates after 5 seconds by the md_sync_timer. Better to do it immeditaly after a state change. If the asender detects a network failure, it may take a bit until the worker processes the according after-conn-state-change work item. The worker might be blocked in sending something, i.e. it takes until it gets into its timeout. That is 6 seconds by default which is longer than the 5 seconds of the md_sync_timer. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: temporarily suspend io in drbd_adm_disk_optsLars Ellenberg
drbd_adm_disk_opts() does wait_event(mdev->al_wait, lc_try_lock(mdev->act_log)); drbd_al_shrink(mdev); If the device is very busy, this can take a very long time to succeed. Fix this by temporarily suspending IO, then quickly change the settings, and resume. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: Protect accesses to the uuid set with a spinlockPhilipp Reisner
There is at least the worker context, the receiver context, the context of receiving netlink packts. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: Write all pages of the bitmap after an online resizePhilipp Reisner
We need to write the whole bitmap after we moved the meta data due to an online resize operation. With the support for one peta byte devices bitmap IO was optimized to only write out touched pages. This optimization must be turned off when writing the bitmap after an online resize. This issue was introduced with drbd-8.3.10. The impact of this bug is that after an online resize, the next resync could become larger than expected. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: disambiguation, s/ERR_DISCARD/ERR_DISCARD_IMPOSSIBLE/Lars Ellenberg
If for some reason (typically "split-brained" cluster manager) drbd replica data has diverged, we can chose a victim, and reconnect using "--discard-my-data", causing the victim to become sync-target, fetching all changed blocks from the peer. If we are Primary, we are potentially in use, and we refuse to "roll back" changes to the data below the page cache and other users. Rename the error symbol for this to ERR_DISCARD_IMPOSSIBLE. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: disambiguation, s/DISCARD_CONCURRENT/RESOLVE_CONFLICTS/Lars Ellenberg
We don't discard anything here, really. We resolve conflicting, concurrent writes to overlapping data blocks. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: pass some more information to userspace.Philipp Marek
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: introduce stop-sector to online verifyLars Ellenberg
We now can schedule only a specific range of sectors for online verify, or interrupt a running verify without interrupting the connection. Had to bump the protocol version differently, we are now 101. Added verify_can_do_stop_sector() { protocol >= 97 && protocol != 100; } Also, the return value convention for worker callbacks has changed, we returned "true/false" for "keep the connection up" in 8.3, we return 0 for success and <= for failure in 8.4. Affected: receive_state() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: flush drbd work queue before invalidate/invalidate remoteLars Ellenberg
If you do back to back wait-sync/invalidate on a Primary in a tight loop, during application IO load, you could trigger a race: kernel: block drbd6: FIXME going to queue 'set_n_write from StartingSync' but 'write from resync_finished' still pending? Fix this by changing the order of the drbd_queue_work() and the wake_up() in dec_ap_pending(), and adding the additional drbd_flush_workqueue() before requesting the full sync. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: do not reset rs_pending_cnt too earlyLars Ellenberg
Fix asserts like block drbd0: in got_BlockAck:4634: rs_pending_cnt = -35 < 0 ! We reset the resync lru cache and related information (rs_pending_cnt), once we successfully finished a resync or online verify, or if the replication connection is lost. We also need to reset it if a resync or online verify is aborted because a lower level disk failed. In that case the replication link is still established, and we may still have packets queued in the network buffers which want to touch rs_pending_cnt. We do not have any synchronization mechanism to know for sure when all such pending resync related packets have been drained. To avoid this counter to go negative (and violate the ASSERT that it will always be >= 0), just do not reset it when we lose a disk. It is good enough to make sure it is re-initialized before the next resync can start: reset it when we re-attach a disk. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: report congestion if we are waiting for some userland callbackLars Ellenberg
If the drbd worker thread is synchronously waiting for some userland callback, we don't want some casual pageout to block on us. Have drbd_congested() report congestion in that case. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: differentiate between normal and forced detachLars Ellenberg
Aborting local requests (not waiting for completion from the lower level disk) is dangerous: if the master bio has been completed to upper layers, data pages may be re-used for other things already. If local IO is still pending and later completes, this may cause crashes or corrupt unrelated data. Only abort local IO if explicitly requested. Intended use case is a lower level device that turned into a tarpit, not completing io requests, not even doing error completion. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: take error path in drbd_adm_down if interrupted by signalLars Ellenberg
drbd_adm_down() does adm_detach(), which can fail with various error codes, or be interrupted by a signal. The interrupted by signal case was not properly handled, leading to block drbd0: ASSERT( mdev->state.disk == D_DISKLESS && mdev->state.conn == C_STANDALONE ) in drbd/drbd_worker.c and further to destroying objects while still in use, and resulting crashes. Detect the interruption, and take the error path out. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: remove struct drbd_tl_epoch objects (barrier works)Lars Ellenberg
cherry-picked and adapted from drbd 9 devel branch DRBD requests (struct drbd_request) are already on the per resource transfer log list, and carry their epoch number. We do not need to additionally link them on other ring lists in other structs. The drbd sender thread can recognize itself when to send a P_BARRIER, by tracking the currently processed epoch, and how many writes have been processed for that epoch. If the epoch of the request to be processed does not match the currently processed epoch, any writes have been processed in it, a P_BARRIER for this last processed epoch is send out first. The new epoch then becomes the currently processed epoch. To not get stuck in drbd_al_begin_io() waiting for P_BARRIER_ACK, the sender thread also needs to handle the case when the current epoch was closed already, but no new requests are queued yet, and send out P_BARRIER as soon as possible. This is done by comparing the per resource "current transfer log epoch" (tconn->current_tle_nr) with the per connection "currently processed epoch number" (tconn->send.current_epoch_nr), while waiting for new requests to be processed in wait_for_work(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: New disk option al-updatesPhilipp Reisner
By disabling al-updates one might increase performace. The price for that is that in case a crashed primary (that had al-updates disabled) is reintegraded, it will receive a full-resync instead of a bitmap based resync. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: Stop using NLA_PUT*().Andreas Gruenbacher
These macros no longer exist in kernel version v3.5-rc1. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: fix race between drbdadm invalidate/verify and finishing resyncLars Ellenberg
When a resync or online verify is finished or aborted, drbd does a bulk write-out of changed bitmap pages. If *in that very moment* a new verify or resync is triggered, this can race: ASSERT( !test_bit(BITMAP_IO, &mdev->flags) ) in drbd_main.c FIXME going to queue 'set_n_write from StartingSync' but 'write from resync_finished' still pending? and similar. This can be observed with e.g. tight invalidate loops in test scripts, and probably has no real-life implication. Still, that race can be solved by first quiescen the device, before starting a new resync or verify. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: Delay/reject other state changes while establishing a connectionPhilipp Reisner
Changes to the role and disk state should be delayed or rejected while we establish a connection. This is necessary, since the peer will base its resync decision on the UUIDs and the state we sent in the drbd_connect() function. The most prominent example for this race is becoming primary after sending state and UUIDs and before the state changes to C_WF_CONNECTION. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: Fixed processing of disk-barrier, disk-flushes and disk-drainPhilipp Reisner
Since drbd_bump_write_ordering() is called in the attaching process while the disk state is D_ATTACHING, it was not considering these three flags during attach. A call to this function was missing form drbd_adm_disk_opts(). Fixed both issues. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: Reinstate disabling AL updates with invalidate-remotePhilipp Reisner
Commit d0ef827e (drbd: switch configuration interface from connector to genetlink) introduced a regression by removing the ability to set all bits in the out of sync bitmap and to suspend updates to the activity log of a disconnected device via the invalidate-remote management call. Credits for reporting the issue are going to Arne Redlich. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: Move write_ordering from mdev to tconnPhilipp Reisner
This is necessary in order to prepare the move of the (receiver side) epoch list from the device (mdev) to the connection (tconn) objects. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: Fixes from the drbd-8.3 branchPhilipp Reisner
* drbd-8.3: drbd: O_SYNC gives EIO on ramdisks for some kernels (eg. RHEL6). drbd: send intermediate state change results to the peer Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: Fixes from the drbd-8.3 branchPhilipp Reisner
* drbd-8.3: drbd: fix spurious meta data IO "error" drbd: Fixed a race condition between detach and start of resync drbd: fix harmless race to not trigger an ASSERT drbd: Derive sync-UUIDs only from the bitmap-uuid if it is non-zero drbd: Fixed current UUID generation (regression introduced recently, after 8.3.11) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: fix "stalled" empty resyncPhilipp Reisner
With sync-after dependencies, given "lucky" timing of pause/unpause events, and the end of an empty (0 bits set) resync was sometimes not detected on the SyncTarget, leading to a "stalled" SyncSource state. Fixed this by expecting not only "Inconsistent -> UpToDate" but also "Consistent -> UpToDate" transitions for the peer disk state to end a resync. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: fix connect failure with all default net-optionsLars Ellenberg
If no net-options are configured (all on their default), no DRBD_NLA_NET_CONF will be passed to the kernel. The kernel must not require its presence, there is no required option in there. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: Update some outdated comments to match the codeAndreas Gruenbacher
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: Remove unused codePhilipp Reisner
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: Fix the data-integrity-alg settingAndreas Gruenbacher
The last data-integrity-alg fix made data integrity checking work when the algorithm was changed for an established connection, but the common case of configuring the algorithm before connecting was still broken. Fix that. Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: Allow to create devices with a minor number > minor_countAndreas Gruenbacher
The minor_count module/kernel parameter serves to scale the size of drbd's internal memory pool, but it is no longer a limit for the number of minors or the minor number. (Minor numbers can be arbitrarily high within the allowed limit of 2^20.) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: report net config even for resources without a single volumeLars Ellenberg
Currently it is legal (though unusual) to create and connect a resource, before adding in all necessary volumes. We should include the network configuration details, even if we don't have a single volume (yet). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: Correctly handle resources without volumesPhilipp Reisner
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: Fixed removal of volumes/devices from connected resourcesPhilipp Reisner
When removing a volume/device we need to switch the connection status of the peer back into WFReportParams. Before this fix it was left in Connected state. That means that the peer device continued to inform us about state changes, etc... But we deleted that minor -> protocol error. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: on attach, enforce clean meta dataLars Ellenberg
Detection of unclean shutdown has moved into user space. The kernel code will, whenever it updates the meta data, mark it as "unclean", and will refuse to attach to such unclean meta data. "drbdadm up" now schedules "drbdmeta apply-al", which will apply the activity log to the bitmap, and/or reinitialize it, if necessary, as well as set a "clean" indicator flag. This moves a bit code out of kernel space. As a side effect, it also prevents some 8.3 module from accidentally ignoring the 8.4 style activity log, if someone should downgrade, whether on purpose, or accidentally because he changed kernel versions without providing an 8.4 for the new kernel, and the new kernel comes with in-tree 8.3. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: detach from frozen backing devicePhilipp Reisner
* drbd-8.3: documentation: Documented detach's --force and disk's --disk-timeout drbd: Implemented the disk-timeout option drbd: Force flag for the detach operation drbd: Allow new IOs while the local disk in in FAILED state drbd: Bitmap IO functions can not return prematurely if the disk breaks drbd: Added a kref to bm_aio_ctx drbd: Hold a reference to ldev while doing meta-data IO drbd: Keep a reference to the bio until the completion handler finished drbd: Implemented wait_until_done_or_disk_failure() drbd: Replaced md_io_mutex by an atomic: md_io_in_use drbd: moved md_io into mdev drbd: Immediately allow completion of IOs, that wait for IO completions on a failed disk drbd: Keep a reference to barrier acked requests Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: Fixed state transitions in case reading meta data failesPhilipp Reisner
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: Improve compatibility with drbd's older than 8.3.7Philipp Reisner
Regression introduced with 8.3.11 commit: drbd: Take a more conservative approach when deciding max_bio_size Never ever tell an older drbd, that we support more than 32KiB in a single data request (packet). Never believe an older drbd, that is supports more than 32KiB in a single data request (packet) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: Remove dead codeAndreas Gruenbacher
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: Allow to pass resource options to the new-resource commandAndreas Gruenbacher
This is equivalent to how the attach and connect commands work. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: Convert the generic netlink interface to accept connection endpointsAndreas Gruenbacher
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: Rename DRBD_ADM_NEED_{CONN -> RESOURCE}Andreas Gruenbacher
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: Split off netlink mandatory attribute handling into separate fileAndreas Gruenbacher
Duplicate this file in the kernel module and in user space; both sides need it. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: Also need to check for DRBD_GENLA_F_MANDATORY flags before ↵Andreas Gruenbacher
nla_find_nested() This is done by introducing drbd_nla_find_nested() which handles the flag before calling nla_find_nested(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: Use the terminology suggested by the command names in the source code ↵Andreas Gruenbacher
and messages Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-08drbd: spelling fix: too smallLars Ellenberg
It is not "to small", but "too small". Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>