path: root/net
AgeCommit message (Collapse)Author
2014-02-06mac80211: release the channel in error path in start_apEmmanuel Grumbach
When the driver cannot start the AP or when the assignement of the beacon goes wrong, we need to unassign the vif. Cc: stable@vger.kernel.org Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-06cfg80211: send scan results from work queueJohannes Berg
Due to the previous commit, when a scan finishes, it is in theory possible to hit the following sequence: 1. interface starts being removed 2. scan is cancelled by driver and cfg80211 is notified 3. scan done work is scheduled 4. interface is removed completely, rdev->scan_req is freed, event sent to userspace but scan done work remains pending 5. new scan is requested on another virtual interface 6. scan done work runs, freeing the still-running scan To fix this situation, hang on to the scan done message and block new scans while that is the case, and only send the message from the work function, regardless of whether the scan_req is already freed from interface removal. This makes step 5 above impossible and changes step 6 to be 5. scan done work runs, sending the scan done message As this can't work for wext, so we send the message immediately, but this shouldn't be an issue since we still return -EBUSY. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-06cfg80211: fix scan done raceJohannes Berg
When an interface/wdev is removed, any ongoing scan should be cancelled by the driver. This will make it call cfg80211, which only queues a work struct. If interface/wdev removal is quick enough, this can leave the scan request pending and processed only after the interface is gone, causing a use-after-free. Fix this by making sure the scan request is not pending after the interface is destroyed. We can't flush or cancel the work item due to locking concerns, but when it'll run it shouldn't find anything to do. This leaves a potential issue, if a new scan gets requested before the work runs, it prematurely stops the running scan, potentially causing another crash. I'll fix that in the next patch. This was particularly observed with P2P_DEVICE wdevs, likely because freeing them is quicker than freeing netdevs. Reported-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Fixes: 4a58e7c38443 ("cfg80211: don't "leak" uncompleted scans") Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-06mac80211: avoid deadlock revealed by lockdepEmmanuel Grumbach
sdata->u.ap.request_smps_work can’t be flushed synchronously under wdev_lock(wdev) since ieee80211_request_smps_ap_work itself locks the same lock. While at it, reset the driver_smps_mode when the ap is stopped to its default: OFF. This solves: ====================================================== [ INFO: possible circular locking dependency detected ] 3.12.0-ipeer+ #2 Tainted: G O ------------------------------------------------------- rmmod/2867 is trying to acquire lock: ((&sdata->u.ap.request_smps_work)){+.+...}, at: [<c105b8d0>] flush_work+0x0/0x90 but task is already holding lock: (&wdev->mtx){+.+.+.}, at: [<f9b32626>] cfg80211_stop_ap+0x26/0x230 [cfg80211] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&wdev->mtx){+.+.+.}: [<c10aefa9>] lock_acquire+0x79/0xe0 [<c1607a1a>] mutex_lock_nested+0x4a/0x360 [<fb06288b>] ieee80211_request_smps_ap_work+0x2b/0x50 [mac80211] [<c105cdd8>] process_one_work+0x198/0x450 [<c105d469>] worker_thread+0xf9/0x320 [<c10669ff>] kthread+0x9f/0xb0 [<c1613397>] ret_from_kernel_thread+0x1b/0x28 -> #0 ((&sdata->u.ap.request_smps_work)){+.+...}: [<c10ae9df>] __lock_acquire+0x183f/0x1910 [<c10aefa9>] lock_acquire+0x79/0xe0 [<c105b917>] flush_work+0x47/0x90 [<c105d867>] __cancel_work_timer+0x67/0xe0 [<c105d90f>] cancel_work_sync+0xf/0x20 [<fb0765cc>] ieee80211_stop_ap+0x8c/0x340 [mac80211] [<f9b3268c>] cfg80211_stop_ap+0x8c/0x230 [cfg80211] [<f9b0d8f9>] cfg80211_leave+0x79/0x100 [cfg80211] [<f9b0da72>] cfg80211_netdev_notifier_call+0xf2/0x4f0 [cfg80211] [<c160f2c9>] notifier_call_chain+0x59/0x130 [<c106c6de>] __raw_notifier_call_chain+0x1e/0x30 [<c106c70f>] raw_notifier_call_chain+0x1f/0x30 [<c14f8213>] call_netdevice_notifiers_info+0x33/0x70 [<c14f8263>] call_netdevice_notifiers+0x13/0x20 [<c14f82a4>] __dev_close_many+0x34/0xb0 [<c14f83fe>] dev_close_many+0x6e/0xc0 [<c14f9c77>] rollback_registered_many+0xa7/0x1f0 [<c14f9dd4>] unregister_netdevice_many+0x14/0x60 [<fb06f4d9>] ieee80211_remove_interfaces+0xe9/0x170 [mac80211] [<fb055116>] ieee80211_unregister_hw+0x56/0x110 [mac80211] [<fa3e9396>] iwl_op_mode_mvm_stop+0x26/0xe0 [iwlmvm] [<f9b9d8ca>] _iwl_op_mode_stop+0x3a/0x70 [iwlwifi] [<f9b9d96f>] iwl_opmode_deregister+0x6f/0x90 [iwlwifi] [<fa405179>] __exit_compat+0xd/0x19 [iwlmvm] [<c10b8bf9>] SyS_delete_module+0x179/0x2b0 [<c1613421>] sysenter_do_call+0x12/0x32 Fixes: 687da132234f ("mac80211: implement SMPS for AP") Cc: <stable@vger.kernel.org> [3.13] Reported-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-06cfg80211: re-enable 5/10 MHz supportJohannes Berg
Unfortunately I forgot this during the merge window, but the patch seems small enough to go in as a fix. The userspace API bug that was the reason for disabling it has long been fixed. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-06nl80211: Reset split_start when netlink skb is exhaustedPontus Fuchs
When the netlink skb is exhausted split_start is left set. In the subsequent retry, with a larger buffer, the dump is continued from the failing point instead of from the beginning. This was causing my rt28xx based USB dongle to now show up when running "iw list" with an old iw version without split dump support. Cc: stable@vger.kernel.org Fixes: 3713b4e364ef ("nl80211: allow splitting wiphy information in dumps") Signed-off-by: Pontus Fuchs <pontus.fuchs@gmail.com> [avoid the entire workaround when state->split is set] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-06mac80211: move roc cookie assignment earlierEliad Peller
ieee80211_start_roc_work() might add a new roc to existing roc, and tell cfg80211 it has already started. However, this might happen before the roc cookie was set, resulting in REMAIN_ON_CHANNEL (started) event with null cookie. Consequently, it can make wpa_supplicant go out of sync. Fix it by setting the roc cookie earlier. Cc: stable@vger.kernel.org Signed-off-by: Eliad Peller <eliad@wizery.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-02-06netfilter: nf_tables: add reject module for NFPROTO_INETPatrick McHardy
Add a reject module for NFPROTO_INET. It does nothing but dispatch to the AF-specific modules based on the hook family. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-02-06netfilter: nft_reject: split up reject module into IPv4 and IPv6 specifc partsPatrick McHardy
Currently the nft_reject module depends on symbols from ipv6. This is wrong since no generic module should force IPv6 support to be loaded. Split up the module into AF-specific and a generic part. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-02-06netfilter: nf_tables: add AF specific expression supportPatrick McHardy
For the reject module, we need to add AF-specific implementations to get rid of incorrect module dependencies. Try to load an AF-specific module first and fall back to generic modules. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-02-06netfilter: nft_ct: fix missing NFT_CT_L3PROTOCOL key in validity checksPatrick McHardy
The key was missing in the list of valid keys, add it. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-02-06netfilter: nf_tables: fix potential oops when dumping setsPatrick McHardy
Commit c9c8e48597 (netfilter: nf_tables: dump sets in all existing families) changed nft_ctx_init_from_setattr() to only look up the address family if it is not NFPROTO_UNSPEC. However if it is NFPROTO_UNSPEC and a table attribute is given, nftables_afinfo_lookup() will dereference the NULL afi pointer. Fix by checking for non-NULL afi and also move a check added by that commit to the proper position. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-02-05netfilter: nf_tables: fix overrun in nf_tables_set_alloc_name()Patrick McHardy
The map that is used to allocate anonymous sets is indeed BITS_PER_BYTE * PAGE_SIZE long. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-02-05netfilter: nf_conntrack: don't release a conntrack with non-zero refcntPablo Neira Ayuso
With this patch, the conntrack refcount is initially set to zero and it is bumped once it is added to any of the list, so we fulfill Eric's golden rule which is that all released objects always have a refcount that equals zero. Andrey Vagin reports that nf_conntrack_free can't be called for a conntrack with non-zero ref-counter, because it can race with nf_conntrack_find_get(). A conntrack slab is created with SLAB_DESTROY_BY_RCU. Non-zero ref-counter says that this conntrack is used. So when we release a conntrack with non-zero counter, we break this assumption. CPU1 CPU2 ____nf_conntrack_find() nf_ct_put() destroy_conntrack() ... init_conntrack __nf_conntrack_alloc (set use = 1) atomic_inc_not_zero(&ct->use) (use = 2) if (!l4proto->new(ct, skb, dataoff, timeouts)) nf_conntrack_free(ct); (use = 2 !!!) ... __nf_conntrack_alloc (set use = 1) if (!nf_ct_key_equal(h, tuple, zone)) nf_ct_put(ct); (use = 0) destroy_conntrack() /* continue to work with CT */ After applying the path "[PATCH] netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get" another bug was triggered in destroy_conntrack(): <4>[67096.759334] ------------[ cut here ]------------ <2>[67096.759353] kernel BUG at net/netfilter/nf_conntrack_core.c:211! ... <4>[67096.759837] Pid: 498649, comm: atdd veid: 666 Tainted: G C --------------- 2.6.32-042stab084.18 #1 042stab084_18 /DQ45CB <4>[67096.759932] RIP: 0010:[<ffffffffa03d99ac>] [<ffffffffa03d99ac>] destroy_conntrack+0x15c/0x190 [nf_conntrack] <4>[67096.760255] Call Trace: <4>[67096.760255] [<ffffffff814844a7>] nf_conntrack_destroy+0x17/0x30 <4>[67096.760255] [<ffffffffa03d9bb5>] nf_conntrack_find_get+0x85/0x130 [nf_conntrack] <4>[67096.760255] [<ffffffffa03d9fb2>] nf_conntrack_in+0x352/0xb60 [nf_conntrack] <4>[67096.760255] [<ffffffffa048c771>] ipv4_conntrack_local+0x51/0x60 [nf_conntrack_ipv4] <4>[67096.760255] [<ffffffff81484419>] nf_iterate+0x69/0xb0 <4>[67096.760255] [<ffffffff814b5b00>] ? dst_output+0x0/0x20 <4>[67096.760255] [<ffffffff814845d4>] nf_hook_slow+0x74/0x110 <4>[67096.760255] [<ffffffff814b5b00>] ? dst_output+0x0/0x20 <4>[67096.760255] [<ffffffff814b66d5>] raw_sendmsg+0x775/0x910 <4>[67096.760255] [<ffffffff8104c5a8>] ? flush_tlb_others_ipi+0x128/0x130 <4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20 <4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20 <4>[67096.760255] [<ffffffff814c136a>] inet_sendmsg+0x4a/0xb0 <4>[67096.760255] [<ffffffff81444e93>] ? sock_sendmsg+0x13/0x140 <4>[67096.760255] [<ffffffff81444f97>] sock_sendmsg+0x117/0x140 <4>[67096.760255] [<ffffffff8102e299>] ? native_smp_send_reschedule+0x49/0x60 <4>[67096.760255] [<ffffffff81519beb>] ? _spin_unlock_bh+0x1b/0x20 <4>[67096.760255] [<ffffffff8109d930>] ? autoremove_wake_function+0x0/0x40 <4>[67096.760255] [<ffffffff814960f0>] ? do_ip_setsockopt+0x90/0xd80 <4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20 <4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20 <4>[67096.760255] [<ffffffff814457c9>] sys_sendto+0x139/0x190 <4>[67096.760255] [<ffffffff810efa77>] ? audit_syscall_entry+0x1d7/0x200 <4>[67096.760255] [<ffffffff810ef7c5>] ? __audit_syscall_exit+0x265/0x290 <4>[67096.760255] [<ffffffff81474daf>] compat_sys_socketcall+0x13f/0x210 <4>[67096.760255] [<ffffffff8104dea3>] ia32_sysret+0x0/0x5 I have reused the original title for the RFC patch that Andrey posted and most of the original patch description. Cc: Eric Dumazet <edumazet@google.com> Cc: Andrew Vagin <avagin@parallels.com> Cc: Florian Westphal <fw@strlen.de> Reported-by: Andrew Vagin <avagin@parallels.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Andrew Vagin <avagin@parallels.com>
2014-02-05netfilter: nf_nat_h323: fix crash in nf_ct_unlink_expect_report()Alexey Dobriyan
Similar bug fixed in SIP module in 3f509c6 ("netfilter: nf_nat_sip: fix incorrect handling of EBUSY for RTCP expectation"). BUG: unable to handle kernel paging request at 00100104 IP: [<f8214f07>] nf_ct_unlink_expect_report+0x57/0xf0 [nf_conntrack] ... Call Trace: [<c0244bd8>] ? del_timer+0x48/0x70 [<f8215687>] nf_ct_remove_expectations+0x47/0x60 [nf_conntrack] [<f8211c99>] nf_ct_delete_from_lists+0x59/0x90 [nf_conntrack] [<f8212e5e>] death_by_timeout+0x14e/0x1c0 [nf_conntrack] [<f8212d10>] ? nf_conntrack_set_hashsize+0x190/0x190 [nf_conntrack] [<c024442d>] call_timer_fn+0x1d/0x80 [<c024461e>] run_timer_softirq+0x18e/0x1a0 [<f8212d10>] ? nf_conntrack_set_hashsize+0x190/0x190 [nf_conntrack] [<c023e6f3>] __do_softirq+0xa3/0x170 [<c023e650>] ? __local_bh_enable+0x70/0x70 <IRQ> [<c023e587>] ? irq_exit+0x67/0xa0 [<c0202af6>] ? do_IRQ+0x46/0xb0 [<c027ad05>] ? clockevents_notify+0x35/0x110 [<c066ac6c>] ? common_interrupt+0x2c/0x40 [<c056e3c1>] ? cpuidle_enter_state+0x41/0xf0 [<c056e6fb>] ? cpuidle_idle_call+0x8b/0x100 [<c02085f8>] ? arch_cpu_idle+0x8/0x30 [<c027314b>] ? cpu_idle_loop+0x4b/0x140 [<c0273258>] ? cpu_startup_entry+0x18/0x20 [<c066056d>] ? rest_init+0x5d/0x70 [<c0813ac8>] ? start_kernel+0x2ec/0x2f2 [<c081364f>] ? repair_env_string+0x5b/0x5b [<c0813269>] ? i386_start_kernel+0x33/0x35 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-02-05netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_getAndrey Vagin
Lets look at destroy_conntrack: hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode); ... nf_conntrack_free(ct) kmem_cache_free(net->ct.nf_conntrack_cachep, ct); net->ct.nf_conntrack_cachep is created with SLAB_DESTROY_BY_RCU. The hash is protected by rcu, so readers look up conntracks without locks. A conntrack is removed from the hash, but in this moment a few readers still can use the conntrack. Then this conntrack is released and another thread creates conntrack with the same address and the equal tuple. After this a reader starts to validate the conntrack: * It's not dying, because a new conntrack was created * nf_ct_tuple_equal() returns true. But this conntrack is not initialized yet, so it can not be used by two threads concurrently. In this case BUG_ON may be triggered from nf_nat_setup_info(). Florian Westphal suggested to check the confirm bit too. I think it's right. task 1 task 2 task 3 nf_conntrack_find_get ____nf_conntrack_find destroy_conntrack hlist_nulls_del_rcu nf_conntrack_free kmem_cache_free __nf_conntrack_alloc kmem_cache_alloc memset(&ct->tuplehash[IP_CT_DIR_MAX], if (nf_ct_is_dying(ct)) if (!nf_ct_tuple_equal() I'm not sure, that I have ever seen this race condition in a real life. Currently we are investigating a bug, which is reproduced on a few nodes. In our case one conntrack is initialized from a few tasks concurrently, we don't have any other explanation for this. <2>[46267.083061] kernel BUG at net/ipv4/netfilter/nf_nat_core.c:322! ... <4>[46267.083951] RIP: 0010:[<ffffffffa01e00a4>] [<ffffffffa01e00a4>] nf_nat_setup_info+0x564/0x590 [nf_nat] ... <4>[46267.085549] Call Trace: <4>[46267.085622] [<ffffffffa023421b>] alloc_null_binding+0x5b/0xa0 [iptable_nat] <4>[46267.085697] [<ffffffffa02342bc>] nf_nat_rule_find+0x5c/0x80 [iptable_nat] <4>[46267.085770] [<ffffffffa0234521>] nf_nat_fn+0x111/0x260 [iptable_nat] <4>[46267.085843] [<ffffffffa0234798>] nf_nat_out+0x48/0xd0 [iptable_nat] <4>[46267.085919] [<ffffffff814841b9>] nf_iterate+0x69/0xb0 <4>[46267.085991] [<ffffffff81494e70>] ? ip_finish_output+0x0/0x2f0 <4>[46267.086063] [<ffffffff81484374>] nf_hook_slow+0x74/0x110 <4>[46267.086133] [<ffffffff81494e70>] ? ip_finish_output+0x0/0x2f0 <4>[46267.086207] [<ffffffff814b5890>] ? dst_output+0x0/0x20 <4>[46267.086277] [<ffffffff81495204>] ip_output+0xa4/0xc0 <4>[46267.086346] [<ffffffff814b65a4>] raw_sendmsg+0x8b4/0x910 <4>[46267.086419] [<ffffffff814c10fa>] inet_sendmsg+0x4a/0xb0 <4>[46267.086491] [<ffffffff814459aa>] ? sock_update_classid+0x3a/0x50 <4>[46267.086562] [<ffffffff81444d67>] sock_sendmsg+0x117/0x140 <4>[46267.086638] [<ffffffff8151997b>] ? _spin_unlock_bh+0x1b/0x20 <4>[46267.086712] [<ffffffff8109d370>] ? autoremove_wake_function+0x0/0x40 <4>[46267.086785] [<ffffffff81495e80>] ? do_ip_setsockopt+0x90/0xd80 <4>[46267.086858] [<ffffffff8100be0e>] ? call_function_interrupt+0xe/0x20 <4>[46267.086936] [<ffffffff8118cb10>] ? ub_slab_ptr+0x20/0x90 <4>[46267.087006] [<ffffffff8118cb10>] ? ub_slab_ptr+0x20/0x90 <4>[46267.087081] [<ffffffff8118f2e8>] ? kmem_cache_alloc+0xd8/0x1e0 <4>[46267.087151] [<ffffffff81445599>] sys_sendto+0x139/0x190 <4>[46267.087229] [<ffffffff81448c0d>] ? sock_setsockopt+0x16d/0x6f0 <4>[46267.087303] [<ffffffff810efa47>] ? audit_syscall_entry+0x1d7/0x200 <4>[46267.087378] [<ffffffff810ef795>] ? __audit_syscall_exit+0x265/0x290 <4>[46267.087454] [<ffffffff81474885>] ? compat_sys_setsockopt+0x75/0x210 <4>[46267.087531] [<ffffffff81474b5f>] compat_sys_socketcall+0x13f/0x210 <4>[46267.087607] [<ffffffff8104dea3>] ia32_sysret+0x0/0x5 <4>[46267.087676] Code: 91 20 e2 01 75 29 48 89 de 4c 89 f7 e8 56 fa ff ff 85 c0 0f 84 68 fc ff ff 0f b6 4d c6 41 8b 45 00 e9 4d fb ff ff e8 7c 19 e9 e0 <0f> 0b eb fe f6 05 17 91 20 e2 80 74 ce 80 3d 5f 2e 00 00 00 74 <1>[46267.088023] RIP [<ffffffffa01e00a4>] nf_nat_setup_info+0x564/0x590 Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Florian Westphal <fw@strlen.de> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrey Vagin <avagin@openvz.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-02-05netfilter: nf_tables: fix oops when deleting a chain with referencesPatrick McHardy
The following commands trigger an oops: # nft -i nft> add table filter nft> add chain filter input { type filter hook input priority 0; } nft> add chain filter test nft> add rule filter input jump test nft> delete chain filter test We need to check the chain use counter before allowing destruction since we might have references from sets or jump rules. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=69341 Reported-by: Matthew Ife <deleriux1@gmail.com> Tested-by: Matthew Ife <deleriux1@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-02-05netfilter: nft_ct: fix unconditional dump of 'dir' attrArturo Borrero
We want to make sure that the information that we get from the kernel can be reinjected without troubles. The kernel shouldn't return an attribute that is not required, or even prohibited. Dumping unconditionally NFTA_CT_DIRECTION could lead an application in userspace to interpret that the attribute was originally set, while it was not. Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-02-04openvswitch: Suppress error messages on megaflow updatesAndy Zhou
With subfacets, we'd expect megaflow updates message to carry the original micro flow. If not, EINVAL is returned and kernel logs an error message. Now that the user space subfacet layer is removed, it is expected that flow updates can arrive with a micro flow other than the original. Change the return code to EEXIST and remove the kernel error log message. Reported-by: Ben Pfaff <blp@nicira.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2014-02-04openvswitch: Fix ovs_flow_free() ovs-lock assert.Pravin B Shelar
ovs_flow_free() is not called under ovs-lock during packet execute path (ovs_packet_cmd_execute()). Since packet execute does not touch flow->mask, there is no need to take that lock either. So move assert in case where flow->mask is checked. Found by code inspection. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2014-02-04openvswitch: Fix ovs_dp_cmd_msg_size()Daniele Di Proietto
commit 43d4be9cb55f3bac5253e9289996fd9d735531db (openvswitch: Allow user space to announce ability to accept unaligned Netlink messages) introduced OVS_DP_ATTR_USER_FEATURES netlink attribute in datapath responses, but the attribute size was not taken into account in ovs_dp_cmd_msg_size(). Signed-off-by: Daniele Di Proietto <daniele.di.proietto@gmail.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2014-02-04openvswitch: Fix kernel panic on ovs_flow_freeAndy Zhou
Both mega flow mask's reference counter and per flow table mask list should only be accessed when holding ovs_mutex() lock. However this is not true with ovs_flow_table_flush(). The patch fixes this bug. Reported-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2014-02-04openvswitch: Pad OVS_PACKET_ATTR_PACKET if linear copy was performedThomas Graf
While the zerocopy method is correctly omitted if user space does not support unaligned Netlink messages. The attribute is still not padded correctly as skb_zerocopy() will not ensure padding and the attribute size is no longer pre calculated though nla_reserve() which ensured padding previously. This patch applies appropriate padding if a linear data copy was performed in skb_zerocopy(). Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Zoltan Kiss <zoltan.kiss@citrix.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2014-02-04rtnetlink: fix oops in rtnl_link_get_slave_info_data_sizeFernando Luis Vazquez Cao
We should check whether rtnetlink link operations are defined before calling get_slave_size(). Without this, the following oops can occur when adding a tap device to OVS. [ 87.839553] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 [ 87.839595] IP: [<ffffffff813d47c0>] if_nlmsg_size+0xf0/0x220 [...] [ 87.840651] Call Trace: [ 87.840664] [<ffffffff813d694b>] ? rtmsg_ifinfo+0x2b/0x100 [ 87.840688] [<ffffffff813c8340>] ? __netdev_adjacent_dev_insert+0x150/0x1a0 [ 87.840718] [<ffffffff813d6a50>] ? rtnetlink_event+0x30/0x40 [ 87.840742] [<ffffffff814b4144>] ? notifier_call_chain+0x44/0x70 [ 87.840768] [<ffffffff813c8946>] ? __netdev_upper_dev_link+0x3c6/0x3f0 [ 87.840798] [<ffffffffa0678d6c>] ? netdev_create+0xcc/0x160 [openvswitch] [ 87.840828] [<ffffffffa06781ea>] ? ovs_vport_add+0x4a/0xd0 [openvswitch] [ 87.840857] [<ffffffffa0670139>] ? new_vport+0x9/0x50 [openvswitch] [ 87.840884] [<ffffffffa067279e>] ? ovs_vport_cmd_new+0x11e/0x210 [openvswitch] [ 87.840915] [<ffffffff813f3efa>] ? genl_family_rcv_msg+0x19a/0x360 [ 87.840941] [<ffffffff813f40c0>] ? genl_family_rcv_msg+0x360/0x360 [ 87.840967] [<ffffffff813f4139>] ? genl_rcv_msg+0x79/0xc0 [ 87.840991] [<ffffffff813b6cf9>] ? __kmalloc_reserve.isra.25+0x29/0x80 [ 87.841018] [<ffffffff813f2389>] ? netlink_rcv_skb+0xa9/0xc0 [ 87.841042] [<ffffffff813f27cf>] ? genl_rcv+0x1f/0x30 [ 87.841064] [<ffffffff813f1988>] ? netlink_unicast+0xe8/0x1e0 [ 87.841088] [<ffffffff813f1d9a>] ? netlink_sendmsg+0x31a/0x750 [ 87.841113] [<ffffffff813aee96>] ? sock_sendmsg+0x86/0xc0 [ 87.841136] [<ffffffff813c960d>] ? __netdev_update_features+0x4d/0x200 [ 87.841163] [<ffffffff813ca94e>] ? ethtool_get_value+0x2e/0x50 [ 87.841188] [<ffffffff813af269>] ? ___sys_sendmsg+0x359/0x370 [ 87.841212] [<ffffffff813da686>] ? dev_ioctl+0x1a6/0x5c0 [ 87.841236] [<ffffffff8109c210>] ? autoremove_wake_function+0x30/0x30 [ 87.841264] [<ffffffff813ac59d>] ? sock_do_ioctl+0x3d/0x50 [ 87.841288] [<ffffffff813aca68>] ? sock_ioctl+0x1e8/0x2c0 [ 87.841312] [<ffffffff811934bf>] ? do_vfs_ioctl+0x2cf/0x4b0 [ 87.841335] [<ffffffff813afeb9>] ? __sys_sendmsg+0x39/0x70 [ 87.841362] [<ffffffff814b86f9>] ? system_call_fastpath+0x16/0x1b [ 87.841386] Code: c0 74 10 48 89 ef ff d0 83 c0 07 83 e0 fc 48 98 49 01 c7 48 89 ef e8 d0 d6 fe ff 48 85 c0 0f 84 df 00 00 00 48 8b 90 08 07 00 00 <48> 8b 8a a8 00 00 00 31 d2 48 85 c9 74 0c 48 89 ee 48 89 c7 ff [ 87.841529] RIP [<ffffffff813d47c0>] if_nlmsg_size+0xf0/0x220 [ 87.841555] RSP <ffff880221aa5950> [ 87.841569] CR2: 00000000000000a8 [ 87.851442] ---[ end trace e42ab217691b4fc2 ]--- Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-04net/ipv4: Use proper RCU APIs for writer-side in udp_offload.cShlomo Pongratz
RCU writer side should use rcu_dereference_protected() and not rcu_dereference(), fix that. This also removes the "suspicious RCU usage" warning seen when running with CONFIG_PROVE_RCU. Also, don't use rcu_assign_pointer/rcu_dereference for pointers which are invisible beyond the udp offload code. Fixes: b582ef0 ('net: Add GRO support for UDP encapsulating protocols') Reported-by: Eric Dumazet <edumazet@google.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-04ipvs: fix AF assignment in ip_vs_conn_new()Michal Kubecek
If a fwmark is passed to ip_vs_conn_new(), it is passed in vaddr, not daddr. Therefore we should set AF to AF_UNSPEC in vaddr assignment (like we do in ip_vs_ct_in_get()), otherwise we may copy only first 4 bytes of an IPv6 address into cp->daddr. Signed-off-by: Bogdano Arendartchuk <barendartchuk@suse.com> Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2014-02-03ip_tunnel: fix panic in ip_tunnel_xmit()Eric Dumazet
Setting rt variable to NULL at the beginning of ip_tunnel_xmit() missed possible use of this variable as a scratch value. Also fixes a possible dst leak in tunnel_dst_check() : If we had to call tunnel_dst_reset(), we forgot to release the reference on dst. Merges tunnel_dst_get()/tunnel_dst_check() into a single tunnel_rtable_get() function for clarity. Many thanks to Tommi for his report and tests. Fixes: 7d442fab0a67 ("ipv4: Cache dst in tunnels") Reported-by: Tommi Rantala <tt.rantala@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Tommi Rantala <tt.rantala@gmail.com> Cc: Tom Herbert <therbert@google.com> Cc: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-03libceph: fix error handling in ceph_osdc_init()Ilya Dryomov
msgpool_op_reply message pool isn't destroyed if workqueue construction fails. Fix it. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2014-01-31Merge tag 'nfs-for-3.14-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: "Highlights: - Fix several races in nfs_revalidate_mapping - NFSv4.1 slot leakage in the pNFS files driver - Stable fix for a slot leak in nfs40_sequence_done - Don't reject NFSv4 servers that support ACLs with only ALLOW aces" * tag 'nfs-for-3.14-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: nfs: initialize the ACL support bits to zero. NFSv4.1: Cleanup NFSv4.1: Clean up nfs41_sequence_done NFSv4: Fix a slot leak in nfs40_sequence_done NFSv4.1 free slot before resending I/O to MDS nfs: add memory barriers around NFS_INO_INVALID_DATA and NFS_INO_INVALIDATING NFS: Fix races in nfs_revalidate_mapping sunrpc: turn warn_gssd() log message into a dprintk() NFS: fix the handling of NFS_INO_INVALID_DATA flag in nfs_revalidate_mapping nfs: handle servers that support only ALLOW ACE type.
2014-01-30x86, x32: Correct invalid use of user timespec in the kernelPaX Team
The x32 case for the recvmsg() timout handling is broken: asmlinkage long compat_sys_recvmmsg(int fd, struct compat_mmsghdr __user *mmsg, unsigned int vlen, unsigned int flags, struct compat_timespec __user *timeout) { int datagrams; struct timespec ktspec; if (flags & MSG_CMSG_COMPAT) return -EINVAL; if (COMPAT_USE_64BIT_TIME) return __sys_recvmmsg(fd, (struct mmsghdr __user *)mmsg, vlen, flags | MSG_CMSG_COMPAT, (struct timespec *) timeout); ... The timeout pointer parameter is provided by userland (hence the __user annotation) but for x32 syscalls it's simply cast to a kernel pointer and is passed to __sys_recvmmsg which will eventually directly dereference it for both reading and writing. Other callers to __sys_recvmmsg properly copy from userland to the kernel first. The bug was introduced by commit ee4fa23c4bfc ("compat: Use COMPAT_USE_64BIT_TIME in net/compat.c") and should affect all kernels since 3.4 (and perhaps vendor kernels if they backported x32 support along with this code). Note that CONFIG_X86_X32_ABI gets enabled at build time and only if CONFIG_X86_X32 is enabled and ld can build x32 executables. Other uses of COMPAT_USE_64BIT_TIME seem fine. This addresses CVE-2014-0038. Signed-off-by: PaX Team <pageexec@freemail.hu> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org> # v3.4+ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-30Merge tag 'linux-can-fixes-for-3.14-20140129' of ↵David S. Miller
git://gitorious.org/linux-can/linux-can linux-can-fixes-for-3.14-20140129 Marc Kleine-Budde says: ==================== Arnd Bergmann provides a fix for the flexcan driver, enabling compilation on all combinations of big and little endian on ARM and PowerPc. A patch by Ira W. Snyder fixes uninitialized variable warnings in the janz-ican3 driver. Rostislav Lisovy contributes a patch to propagate the SO_PRIORITY of raw sockets to skbs. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-30can: add destructor for self generated skbsOliver Hartkopp
Self generated skbuffs in net/can/bcm.c are setting a skb->sk reference but no explicit destructor which is enforced since Linux 3.11 with commit 376c7311bdb6 (net: add a temporary sanity check in skb_orphan()). This patch adds some helper functions to make sure that a destructor is properly defined when a sock reference is assigned to a CAN related skb. To create an unshared skb owned by the original sock a common helper function has been introduced to replace open coded functions to create CAN echo skbs. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Tested-by: Andre Naujoks <nautsch2@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-30net/ipv4: Use non-atomic allocation of udp offloads structure instanceOr Gerlitz
Since udp_add_offload() can be called from non-sleepable context e.g under this call tree from the vxlan driver use case: vxlan_socket_create() <-- holds the spinlock -> vxlan_notify_add_rx_port() -> udp_add_offload() <-- schedules we should allocate the udp_offloads structure in atomic manner. Fixes: b582ef0 ('net: Add GRO support for UDP encapsulating protocols') Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-30Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull core block IO changes from Jens Axboe: "The major piece in here is the immutable bio_ve series from Kent, the rest is fairly minor. It was supposed to go in last round, but various issues pushed it to this release instead. The pull request contains: - Various smaller blk-mq fixes from different folks. Nothing major here, just minor fixes and cleanups. - Fix for a memory leak in the error path in the block ioctl code from Christian Engelmayer. - Header export fix from CaiZhiyong. - Finally the immutable biovec changes from Kent Overstreet. This enables some nice future work on making arbitrarily sized bios possible, and splitting more efficient. Related fixes to immutable bio_vecs: - dm-cache immutable fixup from Mike Snitzer. - btrfs immutable fixup from Muthu Kumar. - bio-integrity fix from Nic Bellinger, which is also going to stable" * 'for-3.14/core' of git://git.kernel.dk/linux-block: (44 commits) xtensa: fixup simdisk driver to work with immutable bio_vecs block/blk-mq-cpu.c: use hotcpu_notifier() blk-mq: for_each_* macro correctness block: Fix memory leak in rw_copy_check_uvector() handling bio-integrity: Fix bio_integrity_verify segment start bug block: remove unrelated header files and export symbol blk-mq: uses page->list incorrectly blk-mq: use __smp_call_function_single directly btrfs: fix missing increment of bi_remaining Revert "block: Warn and free bio if bi_end_io is not set" block: Warn and free bio if bi_end_io is not set blk-mq: fix initializing request's start time block: blk-mq: don't export blk_mq_free_queue() block: blk-mq: make blk_sync_queue support mq block: blk-mq: support draining mq queue dm cache: increment bi_remaining when bi_end_io is restored block: fixup for generic bio chaining block: Really silence spurious compiler warnings block: Silence spurious compiler warnings block: Kill bio_pair_split() ...
2014-01-30Merge branch 'for-3.14' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd updates from Bruce Fields: - Handle some loose ends from the vfs read delegation support. (For example nfsd can stop breaking leases on its own in a fewer places where it can now depend on the vfs to.) - Make life a little easier for NFSv4-only configurations (thanks to Kinglong Mee). - Fix some gss-proxy problems (thanks Jeff Layton). - miscellaneous bug fixes and cleanup * 'for-3.14' of git://linux-nfs.org/~bfields/linux: (38 commits) nfsd: consider CLAIM_FH when handing out delegation nfsd4: fix delegation-unlink/rename race nfsd4: delay setting current_fh in open nfsd4: minor nfs4_setlease cleanup gss_krb5: use lcm from kernel lib nfsd4: decrease nfsd4_encode_fattr stack usage nfsd: fix encode_entryplus_baggage stack usage nfsd4: simplify xdr encoding of nfsv4 names nfsd4: encode_rdattr_error cleanup nfsd4: nfsd4_encode_fattr cleanup minor svcauth_gss.c cleanup nfsd4: better VERIFY comment nfsd4: break only delegations when appropriate NFSD: Fix a memory leak in nfsd4_create_session sunrpc: get rid of use_gssp_lock sunrpc: fix potential race between setting use_gss_proxy and the upcall rpc_clnt sunrpc: don't wait for write before allowing reads from use-gss-proxy file nfsd: get rid of unused function definition Define op_iattr for nfsd4_open instead using macro NFSD: fix compile warning without CONFIG_NFSD_V3 ...
2014-01-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "Several fixups, of note: 1) Fix unlock of not held spinlock in RXRPC code, from Alexey Khoroshilov. 2) Call pci_disable_device() from the correct shutdown path in bnx2x driver, from Yuval Mintz. 3) Fix qeth build on s390 for some configurations, from Eugene Crosser. 4) Cure locking bugs in bond_loadbalance_arp_mon(), from Ding Tianhong. 5) Must do netif_napi_add() before registering netdevice in sky2 driver, from Stanislaw Gruszka. 6) Fix lost bug fix during merge due to code movement in ieee802154, noticed and fixed by the eagle eyed Stephen Rothwell. 7) Get rid of resource leak in xen-netfront driver, from Annie Li. 8) Bounds checks in qlcnic driver are off by one, from Manish Chopra. 9) TPROXY can leak sockets when TCP early demux is enabled, fix from Holger Eitzenberger" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (32 commits) qeth: fix build of s390 allmodconfig bonding: fix locking in bond_loadbalance_arp_mon() tun: add device name(iff) field to proc fdinfo entry DT: net: davinci_emac: "ti, davinci-no-bd-ram" property is actually optional DT: net: davinci_emac: "ti, davinci-rmii-en" property is actually optional bnx2x: Fix generic option settings net: Fix warning on make htmldocs caused by skbuff.c llc: remove noisy WARN from llc_mac_hdr_init qlcnic: Fix loopback test failure qlcnic: Fix tx timeout. qlcnic: Fix initialization of vlan list. qlcnic: Correct off-by-one errors in bounds checks net: Document promote_secondaries net: gre: use icmp_hdr() to get inner ip header i40e: Add missing braces to i40e_dcb_need_reconfig() xen-netfront: fix resource leak in netfront net: 6lowpan: fixup for code movement hyperv: Add support for physically discontinuous receive buffer sky2: initialize napi before registering device net: Fix memory leak if TPROXY used with TCP early demux ...
2014-01-29can: Propagate SO_PRIORITY of raw sockets to skbsRostislav Lisovy
This allows controlling certain queueing disciplines by setting the socket's SO_PRIORITY option. For example, with the default pfifo_fast queueing discipline, which provides three priorities, socket priority TC_PRIO_CONTROL means higher than default and TC_PRIO_BULK means lower than default. Signed-off-by: Rostislav Lisovy <lisovy@gmail.com> Signed-off-by: Michal Sojka <sojkam1@fel.cvut.cz> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2014-01-28net: Fix warning on make htmldocs caused by skbuff.cMasanari Iida
This patch fixed following Warning while executing "make htmldocs". Warning(/net/core/skbuff.c:2164): No description found for parameter 'from' Warning(/net/core/skbuff.c:2164): Excess function parameter 'source' description in 'skb_zerocopy' Replace "@source" with "@from" fixed the warning. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-28Merge tag 'rxrpc-20140126' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== RxRPC fixes Here are some small AF_RXRPC fixes. (1) Fix a place where a spinlock is taken conditionally but is released unconditionally. (2) Fix a double-free that happens when cleaning up on a checksum error. (3) Fix handling of CHECKSUM_PARTIAL whilst delivering messages to userspace. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-28llc: remove noisy WARN from llc_mac_hdr_initDave Jones
Sending malformed llc packets triggers this spew, which seems excessive. WARNING: CPU: 1 PID: 6917 at net/llc/llc_output.c:46 llc_mac_hdr_init+0x85/0x90 [llc]() device type not supported: 0 CPU: 1 PID: 6917 Comm: trinity-c1 Not tainted 3.13.0+ #95 0000000000000009 00000000007e257d ffff88009232fbe8 ffffffffac737325 ffff88009232fc30 ffff88009232fc20 ffffffffac06d28d ffff88020e07f180 ffff88009232fec0 00000000000000c8 0000000000000000 ffff88009232fe70 Call Trace: [<ffffffffac737325>] dump_stack+0x4e/0x7a [<ffffffffac06d28d>] warn_slowpath_common+0x7d/0xa0 [<ffffffffac06d30c>] warn_slowpath_fmt+0x5c/0x80 [<ffffffffc01736d5>] llc_mac_hdr_init+0x85/0x90 [llc] [<ffffffffc0173759>] llc_build_and_send_ui_pkt+0x79/0x90 [llc] [<ffffffffc057cdba>] llc_ui_sendmsg+0x23a/0x400 [llc2] [<ffffffffac605d8c>] sock_sendmsg+0x9c/0xe0 [<ffffffffac185a37>] ? might_fault+0x47/0x50 [<ffffffffac606321>] SYSC_sendto+0x121/0x1c0 [<ffffffffac011847>] ? syscall_trace_enter+0x207/0x270 [<ffffffffac6071ce>] SyS_sendto+0xe/0x10 [<ffffffffac74aaa4>] tracesys+0xdd/0xe2 Until 2009, this was a printk, when it was changed in bf9ae5386bc: "llc: use dev_hard_header". Let userland figure out what -EINVAL means by itself. Signed-off-by: Dave Jones <davej@fedoraproject.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-28Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull ceph updates from Sage Weil: "This is a big batch. From Ilya we have: - rbd support for more than ~250 mapped devices (now uses same scheme that SCSI does for device major/minor numbering) - crush updates for new mapping behaviors (will be needed for coming erasure coding support, among other things) - preliminary support for tiered storage pools There is also a big series fixing a pile cephfs bugs with clustered MDSs from Yan Zheng, ACL support for cephfs from Guangliang Zhao, ceph fscache improvements from Li Wang, improved behavior when we get ENOSPC from Josh Durgin, some readv/writev improvements from Majianpeng, and the usual mix of small cleanups" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (76 commits) ceph: cast PAGE_SIZE to size_t in ceph_sync_write() ceph: fix dout() compile warnings in ceph_filemap_fault() libceph: support CEPH_FEATURE_OSD_CACHEPOOL feature libceph: follow redirect replies from osds libceph: rename ceph_osd_request::r_{oloc,oid} to r_base_{oloc,oid} libceph: follow {read,write}_tier fields on osd request submission libceph: add ceph_pg_pool_by_id() libceph: CEPH_OSD_FLAG_* enum update libceph: replace ceph_calc_ceph_pg() with ceph_oloc_oid_to_pg() libceph: introduce and start using oid abstraction libceph: rename MAX_OBJ_NAME_SIZE to CEPH_MAX_OID_NAME_LEN libceph: move ceph_file_layout helpers to ceph_fs.h libceph: start using oloc abstraction libceph: dout() is missing a newline libceph: add ceph_kv{malloc,free}() and switch to them libceph: support CEPH_FEATURE_EXPORT_PEER ceph: add imported caps when handling cap export message ceph: add open export target session helper ceph: remove exported caps when handling cap import message ceph: handle session flush message ...
2014-01-28Merge tag 'nfs-for-3.14-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client updates from Trond Myklebust: "Highlights include: - stable fix for an infinite loop in RPC state machine - stable fix for a use after free situation in the NFSv4 trunking discovery - stable fix for error handling in the NFSv4 trunking discovery - stable fix for the page write update code - stable fix for the NFSv4.1 mount time security negotiation - stable fix for the NFSv4 open code. - O_DIRECT locking fixes - fix an Oops in the pnfs file commit code - RPC layer needs finer grained handling of connection errors - more RPC GSS upcall fixes" * tag 'nfs-for-3.14-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (30 commits) pnfs: Proper delay for NFS4ERR_RECALLCONFLICT in layout_get_done pnfs: fix BUG in filelayout_recover_commit_reqs nfs4: fix discover_server_trunking use after free NFSv4.1: Handle errors correctly in nfs41_walk_client_list nfs: always make sure page is up-to-date before extending a write to cover the entire page nfs: page cache invalidation for dio nfs: take i_mutex during direct I/O reads nfs: merge nfs_direct_write into nfs_file_direct_write nfs: merge nfs_direct_read into nfs_file_direct_read nfs: increment i_dio_count for reads, too nfs: defer inode_dio_done call until size update is done nfs: fix size updates for aio writes nfs4.1: properly handle ENOTSUP in SECINFO_NO_NAME NFSv4.1: Fix a race in nfs4_write_inode NFSv4.1: Don't trust attributes if a pNFS LAYOUTCOMMIT is outstanding point to the right include file in a comment (left over from a9004abc3) NFS: dprintk() should not print negative fileids and inode numbers nfs: fix dead code of ipv6_addr_scope sunrpc: Fix infinite loop in RPC state machine SUNRPC: Add tracepoint for socket errors ...
2014-01-27net: gre: use icmp_hdr() to get inner ip headerDuan Jiong
When dealing with icmp messages, the skb->data points the ip header that triggered the sending of the icmp message. In gre_cisco_err(), the parse_gre_header() is called, and the iptunnel_pull_header() is called to pull the skb at the end of the parse_gre_header(), so the skb->data doesn't point the inner ip header. Unfortunately, the ipgre_err still needs those ip addresses in inner ip header to look up tunnel by ip_tunnel_lookup(). So just use icmp_hdr() to get inner ip header instead of skb->data. Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-27net: 6lowpan: fixup for code movementStephen Rothwell
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-27net: Fix memory leak if TPROXY used with TCP early demuxHolger Eitzenberger
I see a memory leak when using a transparent HTTP proxy using TPROXY together with TCP early demux and Kernel v3.8.13.15 (Ubuntu stable): unreferenced object 0xffff88008cba4a40 (size 1696): comm "softirq", pid 0, jiffies 4294944115 (age 8907.520s) hex dump (first 32 bytes): 0a e0 20 6a 40 04 1b 37 92 be 32 e2 e8 b4 00 00 .. j@..7..2..... 02 00 07 01 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff810b710a>] kmem_cache_alloc+0xad/0xb9 [<ffffffff81270185>] sk_prot_alloc+0x29/0xc5 [<ffffffff812702cf>] sk_clone_lock+0x14/0x283 [<ffffffff812aaf3a>] inet_csk_clone_lock+0xf/0x7b [<ffffffff8129a893>] netlink_broadcast+0x14/0x16 [<ffffffff812c1573>] tcp_create_openreq_child+0x1b/0x4c3 [<ffffffff812c033e>] tcp_v4_syn_recv_sock+0x38/0x25d [<ffffffff812c13e4>] tcp_check_req+0x25c/0x3d0 [<ffffffff812bf87a>] tcp_v4_do_rcv+0x287/0x40e [<ffffffff812a08a7>] ip_route_input_noref+0x843/0xa55 [<ffffffff812bfeca>] tcp_v4_rcv+0x4c9/0x725 [<ffffffff812a26f4>] ip_local_deliver_finish+0xe9/0x154 [<ffffffff8127a927>] __netif_receive_skb+0x4b2/0x514 [<ffffffff8127aa77>] process_backlog+0xee/0x1c5 [<ffffffff8127c949>] net_rx_action+0xa7/0x200 [<ffffffff81209d86>] add_interrupt_randomness+0x39/0x157 But there are many more, resulting in the machine going OOM after some days. From looking at the TPROXY code, and with help from Florian, I see that the memory leak is introduced in tcp_v4_early_demux(): void tcp_v4_early_demux(struct sk_buff *skb) { /* ... */ iph = ip_hdr(skb); th = tcp_hdr(skb); if (th->doff < sizeof(struct tcphdr) / 4) return; sk = __inet_lookup_established(dev_net(skb->dev), &tcp_hashinfo, iph->saddr, th->source, iph->daddr, ntohs(th->dest), skb->skb_iif); if (sk) { skb->sk = sk; where the socket is assigned unconditionally to skb->sk, also bumping the refcnt on it. This is problematic, because in our case the skb has already a socket assigned in the TPROXY target. This then results in the leak I see. The very same issue seems to be with IPv6, but haven't tested. Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-27libceph: follow redirect replies from osdsIlya Dryomov
Follow redirect replies from osds, for details see ceph.git commit fbbe3ad1220799b7bb00ea30fce581c5eadaf034. v1 (current) version of redirect reply consists of oloc and oid, which expands to pool, key, nspace, hash and oid. However, server-side code that would populate anything other than pool doesn't exist yet, and hence this commit adds support for pool redirects only. To make sure that future server-side updates don't break us, we decode all fields and, if any of key, nspace, hash or oid have a non-default value, error out with "corrupt osd_op_reply ..." message. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2014-01-27libceph: rename ceph_osd_request::r_{oloc,oid} to r_base_{oloc,oid}Ilya Dryomov
Rename ceph_osd_request::r_{oloc,oid} to r_base_{oloc,oid} before introducing r_target_{oloc,oid} needed for redirects. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2014-01-27libceph: follow {read,write}_tier fields on osd request submissionIlya Dryomov
Overwrite ceph_osd_request::r_oloc.pool with read_tier for read ops and write_tier for write and read+write ops (aka basic tiering support). {read,write}_tier are part of pg_pool_t since v9. This commit bumps our pg_pool_t decode compat version from v7 to v9, all new fields except for {read,write}_tier are ignored. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2014-01-27libceph: add ceph_pg_pool_by_id()Ilya Dryomov
"Lookup pool info by ID" function is hidden in osdmap.c. Expose it to the rest of libceph. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2014-01-27libceph: replace ceph_calc_ceph_pg() with ceph_oloc_oid_to_pg()Ilya Dryomov
Switch ceph_calc_ceph_pg() to new oloc and oid abstractions and rename it to ceph_oloc_oid_to_pg() to make its purpose more clear. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>