path: root/net
AgeCommit message (Collapse)Author
2018-09-17net/ipv6: do not copy dst flags on rt initPeter Oskolkov
DST_NOCOUNT in dst_entry::flags tracks whether the entry counts toward route cache size (net->ipv6.sysctl.ip6_rt_max_size). If the flag is NOT set, dst_ops::pcpuc_entries counter is incremented in dist_init() and decremented in dst_destroy(). This flag is tied to allocation/deallocation of dst_entry and should not be copied from another dst/route. Otherwise it can happen that dst_ops::pcpuc_entries counter grows until no new routes can be allocated because the counter reached ip6_rt_max_size due to DST_NOCOUNT not set and thus no counter decrements on gc-ed routes. Fixes: 3b6761d18bc1 ("net/ipv6: Move dst flags to booleans in fib entries") Cc: David Ahern <dsahern@gmail.com> Acked-by: Wei Wang <weiwan@google.com> Signed-off-by: Peter Oskolkov <posk@google.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-17Revert "kcm: remove any offset before parsing messages"David S. Miller
This reverts commit 072222b488bc55cce92ff246bdc10115fd57d3ab. I just read that this causes regressions. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-17kcm: remove any offset before parsing messagesDominique Martinet
The current code assumes kcm users know they need to look for the strparser offset within their bpf program, which is not documented anywhere and examples laying around do not do. The actual recv function does handle the offset well, so we can create a temporary clone of the skb and pull that one up as required for parsing. The pull itself has a cost if we are pulling beyond the head data, measured to 2-3% latency in a noisy VM with a local client stressing that path. The clone's impact seemed too small to measure. This bug can be exhibited easily by implementing a "trivial" kcm parser taking the first bytes as size, and on the client sending at least two such packets in a single write(). Note that bpf sockmap has the same problem, both for parse and for recv, so it would pulling twice or a real pull within the strparser logic if anyone cares about that. Signed-off-by: Dominique Martinet <asmadeus@codewreck.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-17tls: fix currently broken MSG_PEEK behaviorDaniel Borkmann
In kTLS MSG_PEEK behavior is currently failing, strace example: [pid 2430] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3 [pid 2430] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 4 [pid 2430] bind(4, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("")}, 16) = 0 [pid 2430] listen(4, 10) = 0 [pid 2430] getsockname(4, {sa_family=AF_INET, sin_port=htons(38855), sin_addr=inet_addr("")}, [16]) = 0 [pid 2430] connect(3, {sa_family=AF_INET, sin_port=htons(38855), sin_addr=inet_addr("")}, 16) = 0 [pid 2430] setsockopt(3, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0 [pid 2430] setsockopt(3, 0x11a /* SOL_?? */, 1, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0 [pid 2430] accept(4, {sa_family=AF_INET, sin_port=htons(49636), sin_addr=inet_addr("")}, [16]) = 5 [pid 2430] setsockopt(5, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0 [pid 2430] setsockopt(5, 0x11a /* SOL_?? */, 2, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0 [pid 2430] close(4) = 0 [pid 2430] sendto(3, "test_read_peek", 14, 0, NULL, 0) = 14 [pid 2430] sendto(3, "_mult_recs\0", 11, 0, NULL, 0) = 11 [pid 2430] recvfrom(5, "test_read_peektest_read_peektest"..., 64, MSG_PEEK, NULL, NULL) = 64 As can be seen from strace, there are two TLS records sent, i) 'test_read_peek' and ii) '_mult_recs\0' where we end up peeking 'test_read_peektest_read_peektest'. This is clearly wrong, and what happens is that given peek cannot call into tls_sw_advance_skb() to unpause strparser and proceed with the next skb, we end up looping over the current one, copying the 'test_read_peek' over and over into the user provided buffer. Here, we can only peek into the currently held skb (current, full TLS record) as otherwise we would end up having to hold all the original skb(s) (depending on the peek depth) in a separate queue when unpausing strparser to process next records, minimally intrusive is to return only up to the current record's size (which likely was what c46234ebb4d1 ("tls: RX path for ktls") originally intended as well). Thus, after patch we properly peek the first record: [pid 2046] wait4(2075, <unfinished ...> [pid 2075] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3 [pid 2075] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 4 [pid 2075] bind(4, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("")}, 16) = 0 [pid 2075] listen(4, 10) = 0 [pid 2075] getsockname(4, {sa_family=AF_INET, sin_port=htons(55115), sin_addr=inet_addr("")}, [16]) = 0 [pid 2075] connect(3, {sa_family=AF_INET, sin_port=htons(55115), sin_addr=inet_addr("")}, 16) = 0 [pid 2075] setsockopt(3, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0 [pid 2075] setsockopt(3, 0x11a /* SOL_?? */, 1, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0 [pid 2075] accept(4, {sa_family=AF_INET, sin_port=htons(45732), sin_addr=inet_addr("")}, [16]) = 5 [pid 2075] setsockopt(5, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0 [pid 2075] setsockopt(5, 0x11a /* SOL_?? */, 2, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0 [pid 2075] close(4) = 0 [pid 2075] sendto(3, "test_read_peek", 14, 0, NULL, 0) = 14 [pid 2075] sendto(3, "_mult_recs\0", 11, 0, NULL, 0) = 11 [pid 2075] recvfrom(5, "test_read_peek", 64, MSG_PEEK, NULL, NULL) = 14 Fixes: c46234ebb4d1 ("tls: RX path for ktls") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-17ipv6: fix possible use-after-free in ip6_xmit()Eric Dumazet
In the unlikely case ip6_xmit() has to call skb_realloc_headroom(), we need to call skb_set_owner_w() before consuming original skb, otherwise we risk a use-after-free. Bring IPv6 in line with what we do in IPv4 to fix this. Fixes: 1da177e4c3f41 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2018-09-16 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix end boundary calculation in BTF for the type section, from Martin. 2) Fix and revert subtraction of pointers that was accidentally allowed for unprivileged programs, from Alexei. 3) Fix bpf_msg_pull_data() helper by using __GFP_COMP in order to avoid a warning in linearizing sg pages into a single one for large allocs, from Tushar. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-16udp6: add missing checks on edumux packet processingPaolo Abeni
Currently the UDPv6 early demux rx code path lacks some mandatory checks, already implemented into the normal RX code path - namely the checksum conversion and no_check6_rx check. Similar to the previous commit, we move the common processing to an UDPv6 specific helper and call it from both edemux code path and normal code path. In respect to the UDPv4, we need to add an explicit check for non zero csum according to no_check6_rx value. Reported-by: Jianlin Shi <jishi@redhat.com> Suggested-by: Xin Long <lucien.xin@gmail.com> Fixes: c9f2c1ae123a ("udp6: fix socket leak on early demux") Fixes: 2abb7cdc0dc8 ("udp: Add support for doing checksum unnecessary conversion") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-16udp4: fix IP_CMSG_CHECKSUM for connected socketsPaolo Abeni
commit 2abb7cdc0dc8 ("udp: Add support for doing checksum unnecessary conversion") left out the early demux path for connected sockets. As a result IP_CMSG_CHECKSUM gives wrong values for such socket when GRO is not enabled/available. This change addresses the issue by moving the csum conversion to a common helper and using such helper in both the default and the early demux rx path. Fixes: 2abb7cdc0dc8 ("udp: Add support for doing checksum unnecessary conversion") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-14net/sched: act_sample: fix NULL dereference in the data pathDavide Caratti
Matteo reported the following splat, testing the datapath of TC 'sample': BUG: KASAN: null-ptr-deref in tcf_sample_act+0xc4/0x310 Read of size 8 at addr 0000000000000000 by task nc/433 CPU: 0 PID: 433 Comm: nc Not tainted 4.19.0-rc3-kvm #17 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28 04/01/2014 Call Trace: kasan_report.cold.6+0x6c/0x2fa tcf_sample_act+0xc4/0x310 ? dev_hard_start_xmit+0x117/0x180 tcf_action_exec+0xa3/0x160 tcf_classify+0xdd/0x1d0 htb_enqueue+0x18e/0x6b0 ? deref_stack_reg+0x7a/0xb0 ? htb_delete+0x4b0/0x4b0 ? unwind_next_frame+0x819/0x8f0 ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 __dev_queue_xmit+0x722/0xca0 ? unwind_get_return_address_ptr+0x50/0x50 ? netdev_pick_tx+0xe0/0xe0 ? save_stack+0x8c/0xb0 ? kasan_kmalloc+0xbe/0xd0 ? __kmalloc_track_caller+0xe4/0x1c0 ? __kmalloc_reserve.isra.45+0x24/0x70 ? __alloc_skb+0xdd/0x2e0 ? sk_stream_alloc_skb+0x91/0x3b0 ? tcp_sendmsg_locked+0x71b/0x15a0 ? tcp_sendmsg+0x22/0x40 ? __sys_sendto+0x1b0/0x250 ? __x64_sys_sendto+0x6f/0x80 ? do_syscall_64+0x5d/0x150 ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 ? __sys_sendto+0x1b0/0x250 ? __x64_sys_sendto+0x6f/0x80 ? do_syscall_64+0x5d/0x150 ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 ip_finish_output2+0x495/0x590 ? ip_copy_metadata+0x2e0/0x2e0 ? skb_gso_validate_network_len+0x6f/0x110 ? ip_finish_output+0x174/0x280 __tcp_transmit_skb+0xb17/0x12b0 ? __tcp_select_window+0x380/0x380 tcp_write_xmit+0x913/0x1de0 ? __sk_mem_schedule+0x50/0x80 tcp_sendmsg_locked+0x49d/0x15a0 ? tcp_rcv_established+0x8da/0xa30 ? tcp_set_state+0x220/0x220 ? clear_user+0x1f/0x50 ? iov_iter_zero+0x1ae/0x590 ? __fget_light+0xa0/0xe0 tcp_sendmsg+0x22/0x40 __sys_sendto+0x1b0/0x250 ? __ia32_sys_getpeername+0x40/0x40 ? _copy_to_user+0x58/0x70 ? poll_select_copy_remaining+0x176/0x200 ? __pollwait+0x1c0/0x1c0 ? ktime_get_ts64+0x11f/0x140 ? kern_select+0x108/0x150 ? core_sys_select+0x360/0x360 ? vfs_read+0x127/0x150 ? kernel_write+0x90/0x90 __x64_sys_sendto+0x6f/0x80 do_syscall_64+0x5d/0x150 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fefef2b129d Code: ff ff ff ff eb b6 0f 1f 80 00 00 00 00 48 8d 05 51 37 0c 00 41 89 ca 8b 00 85 c0 75 20 45 31 c9 45 31 c0 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 6b f3 c3 66 0f 1f 84 00 00 00 00 00 41 56 41 RSP: 002b:00007fff2f5350c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 000056118d60c120 RCX: 00007fefef2b129d RDX: 0000000000002000 RSI: 000056118d629320 RDI: 0000000000000003 RBP: 000056118d530370 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000002000 R13: 000056118d5c2a10 R14: 000056118d5c2a10 R15: 000056118d5303b8 tcf_sample_act() tried to update its per-cpu stats, but tcf_sample_init() forgot to allocate them, because tcf_idr_create() was called with a wrong value of 'cpustats'. Setting it to true proved to fix the reported crash. Reported-by: Matteo Croce <mcroce@redhat.com> Fixes: 65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use IDR") Fixes: 5c5670fae430 ("net/sched: Introduce sample tc action") Tested-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13socket: fix struct ifreq size in compat ioctlJohannes Berg
As reported by Reobert O'Callahan, since Viro's commit to kill dev_ifsioc() we attempt to copy too much data in compat mode, which may lead to EFAULT when the 32-bit version of struct ifreq sits at/near the end of a page boundary, and the next page isn't mapped. Fix this by passing the approprate compat/non-compat size to copy and using that, as before the dev_ifsioc() removal. This works because only the embedded "struct ifmap" has different size, and this is only used in SIOCGIFMAP/SIOCSIFMAP which has a different handler. All other parts of the union are naturally compatible. This fixes https://bugzilla.kernel.org/show_bug.cgi?id=199469. Fixes: bf4405737f9f ("kill dev_ifsioc()") Reported-by: Robert O'Callahan <robert@ocallahan.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13gso_segment: Reset skb->mac_len after modifying network headerToke Høiland-Jørgensen
When splitting a GSO segment that consists of encapsulated packets, the skb->mac_len of the segments can end up being set wrong, causing packet drops in particular when using act_mirred and ifb interfaces in combination with a qdisc that splits GSO packets. This happens because at the time skb_segment() is called, network_header will point to the inner header, throwing off the calculation in skb_reset_mac_len(). The network_header is subsequently adjust by the outer IP gso_segment handlers, but they don't set the mac_len. Fix this by adding skb_reset_mac_len() calls to both the IPv4 and IPv6 gso_segment handlers, after they modify the network_header. Many thanks to Eric Dumazet for his help in identifying the cause of the bug. Acked-by: Dave Taht <dave.taht@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Johan Hedberg says: ==================== pull request: bluetooth 2018-09-13 A few Bluetooth fixes for the 4.19-rc series: - Fixed rw_semaphore leak in hci_ldisc - Fixed local Out-of-Band pairing data handling Let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13tls: clear key material from kernel memory when do_tls_setsockopt_conf failsSabrina Dubroca
Fixes: 3c4d7559159b ("tls: kernel TLS support") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13tls: zero the crypto information from tls_context before freeingSabrina Dubroca
This contains key material in crypto_send_aes_gcm_128 and crypto_recv_aes_gcm_128. Introduce union tls_crypto_context, and replace the two identical unions directly embedded in struct tls_context with it. We can then use this union to clean up the memory in the new tls_ctx_free() function. Fixes: 3c4d7559159b ("tls: kernel TLS support") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13tls: don't copy the key out of tls12_crypto_info_aes_gcm_128Sabrina Dubroca
There's no need to copy the key to an on-stack buffer before calling crypto_aead_setkey(). Fixes: 3c4d7559159b ("tls: kernel TLS support") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13neighbour: confirm neigh entries when ARP packet is receivedVasily Khoruzhick
Update 'confirmed' timestamp when ARP packet is received. It shouldn't affect locktime logic and anyway entry can be confirmed by any higher-layer protocol. Thus it makes sense to confirm it when ARP packet is received. Fixes: 77d7123342dc ("neighbour: update neigh timestamps iff update is effective") Signed-off-by: Vasily Khoruzhick <vasilykh@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13net: rtnl_configure_link: fix dev flags changes arg to __dev_notify_flagsRoopa Prabhu
This fix addresses https://bugzilla.kernel.org/show_bug.cgi?id=201071 Commit 5025f7f7d506 wrongly relied on __dev_change_flags to notify users of dev flag changes in the case when dev->rtnl_link_state = RTNL_LINK_INITIALIZED. Fix it by indicating flag changes explicitly to __dev_notify_flags. Fixes: 5025f7f7d506 ("rtnetlink: add rtnl_link_state check in rtnl_configure_link") Reported-By: Liam mcbirnie <liam.mcbirnie@boeing.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13net_sched: notify filter deletion when deleting a chainCong Wang
When we delete a chain of filters, we need to notify user-space we are deleting each filters in this chain too. Fixes: 32a4f5ecd738 ("net: sched: introduce chain object to uapi") Cc: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13ipv6: use rt6_info members when dst is set in rt6_fill_nodeXin Long
In inet6_rtm_getroute, since Commit 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes"), it has used rt->from to dump route info instead of rt. However for some route like cache, some of its information like flags or gateway is not the same as that of the 'from' one. It caused 'ip route get' to dump the wrong route information. In Jianlin's testing, the output information even lost the expiration time for a pmtu route cache due to the wrong fib6_flags. So change to use rt6_info members for dst addr, src addr, flags and gateway when it tries to dump a route entry without fibmatch set. v1->v2: - not use rt6i_prefsrc. - also fix the gw dump issue. Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes") Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-12bpf: use __GFP_COMP while allocating pageTushar Dave
Helper bpg_msg_pull_data() can allocate multiple pages while linearizing multiple scatterlist elements into one shared page. However, if the shared page has size > PAGE_SIZE, using copy_page_to_iter() causes below warning. e.g. [ 6367.019832] WARNING: CPU: 2 PID: 7410 at lib/iov_iter.c:825 page_copy_sane.part.8+0x0/0x8 To avoid above warning, use __GFP_COMP while allocating multiple contiguous pages. Fixes: 015632bb30da ("bpf: sk_msg program helper bpf_sk_msg_pull_data") Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-09-12tipc: check return value of __tipc_dump_start()Cong Wang
When __tipc_dump_start() fails with running out of memory, we have no reason to continue, especially we should avoid calling tipc_dump_done(). Fixes: 8f5c5fcf3533 ("tipc: call start and done ops directly in __tipc_nl_compat_dumpit()") Reported-and-tested-by: syzbot+3f8324abccfbf8c74a9f@syzkaller.appspotmail.com Cc: Jon Maloy <jon.maloy@ericsson.com> Cc: Ying Xue <ying.xue@windriver.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-12rds: fix two RCU related problemsCong Wang
When a rds sock is bound, it is inserted into the bind_hash_table which is protected by RCU. But when releasing rds sock, after it is removed from this hash table, it is freed immediately without respecting RCU grace period. This could cause some use-after-free as reported by syzbot. Mark the rds sock with SOCK_RCU_FREE before inserting it into the bind_hash_table, so that it would be always freed after a RCU grace period. The other problem is in rds_find_bound(), the rds sock could be freed in between rhashtable_lookup_fast() and rds_sock_addref(), so we need to extend RCU read lock protection in rds_find_bound() to close this race condition. Reported-and-tested-by: syzbot+8967084bcac563795dc6@syzkaller.appspotmail.com Reported-by: syzbot+93a5839deb355537440f@syzkaller.appspotmail.com Cc: Sowmini Varadhan <sowmini.varadhan@oracle.com> Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com> Cc: rds-devel@oss.oracle.com Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oarcle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-11erspan: fix error handling for erspan tunnelHaishuang Yan
When processing icmp unreachable message for erspan tunnel, tunnel id should be erspan_net_id instead of ipgre_net_id. Fixes: 84e54fe0a5ea ("gre: introduce native tunnel support for ERSPAN") Cc: William Tu <u9012063@gmail.com> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-11erspan: return PACKET_REJECT when the appropriate tunnel is not foundHaishuang Yan
If erspan tunnel hasn't been established, we'd better send icmp port unreachable message after receive erspan packets. Fixes: 84e54fe0a5ea ("gre: introduce native tunnel support for ERSPAN") Cc: William Tu <u9012063@gmail.com> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-11tcp: rate limit synflood warnings furtherWillem de Bruijn
Convert pr_info to net_info_ratelimited to limit the total number of synflood warnings. Commit 946cedccbd73 ("tcp: Change possible SYN flooding messages") rate limits synflood warnings to one per listener. Workloads that open many listener sockets can still see a high rate of log messages. Syzkaller is one frequent example. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for you net tree: 1) Remove duplicated include at the end of UDP conntrack, from Yue Haibing. 2) Restore conntrack dependency on xt_cluster, from Martin Willi. 3) Fix splat with GSO skbs from the checksum target, from Florian Westphal. 4) Rework ct timeout support, the template strategy to attach custom timeouts is not correct since it will not work in conjunction with conntrack zones and we have a possible free after use when removing the rule due to missing refcounting. To fix these problems, do not use conntrack template at all and set custom timeout on the already valid conntrack object. This fix comes with a preparation patch to simplify timeout adjustment by initializating the first position of the timeout array for all of the existing trackers. Patchset from Florian Westphal. 5) Fix missing dependency on from IPv4 chain NAT type, from Florian. 6) Release chain reference counter from the flush path, from Taehee Yoo. 7) After flushing an iptables ruleset, conntrack hooks are unregistered and entries are left stale to be cleaned up by the timeout garbage collector. No TCP tracking is done on established flows by this time. If ruleset is reloaded, then hooks are registered again and TCP tracking is restored, which considers packets to be invalid. Clear window tracking to exercise TCP flow pickup from the middle given that history is lost for us. Again from Florian. 8) Fix crash from netlink interface with CONFIG_NF_CONNTRACK_TIMEOUT=y and CONFIG_NF_CT_NETLINK_TIMEOUT=n. 9) Broken CT target due to returning incorrect type from ctnl_timeout_find_get(). 10) Solve conntrack clash on NF_REPEAT verdicts too, from Michal Vaner. 11) Missing conversion of hashlimit sysctl interface to new API, from Cong Wang. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-11Bluetooth: Use correct tfm to generate OOB dataMatias Karhumaa
In case local OOB data was generated and other device initiated pairing claiming that it has got OOB data, following crash occurred: [ 222.847853] general protection fault: 0000 [#1] SMP PTI [ 222.848025] CPU: 1 PID: 42 Comm: kworker/u5:0 Tainted: G C 4.18.0-custom #4 [ 222.848158] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 222.848307] Workqueue: hci0 hci_rx_work [bluetooth] [ 222.848416] RIP: 0010:compute_ecdh_secret+0x5a/0x270 [bluetooth] [ 222.848540] Code: 0c af f5 48 8b 3d 46 de f0 f6 ba 40 00 00 00 be c0 00 60 00 e8 b7 7b c5 f5 48 85 c0 0f 84 ea 01 00 00 48 89 c3 e8 16 0c af f5 <49> 8b 47 38 be c0 00 60 00 8b 78 f8 48 83 c7 48 e8 51 84 c5 f5 48 [ 222.848914] RSP: 0018:ffffb1664087fbc0 EFLAGS: 00010293 [ 222.849021] RAX: ffff8a5750d7dc00 RBX: ffff8a5671096780 RCX: ffffffffc08bc32a [ 222.849111] RDX: 0000000000000000 RSI: 00000000006000c0 RDI: ffff8a5752003800 [ 222.849192] RBP: ffffb1664087fc60 R08: ffff8a57525280a0 R09: ffff8a5752003800 [ 222.849269] R10: ffffb1664087fc70 R11: 0000000000000093 R12: ffff8a5674396e00 [ 222.849350] R13: ffff8a574c2e79aa R14: ffff8a574c2e796a R15: 020e0e100d010101 [ 222.849429] FS: 0000000000000000(0000) GS:ffff8a5752500000(0000) knlGS:0000000000000000 [ 222.849518] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 222.849586] CR2: 000055856016a038 CR3: 0000000110d2c005 CR4: 00000000000606e0 [ 222.849671] Call Trace: [ 222.849745] ? sc_send_public_key+0x110/0x2a0 [bluetooth] [ 222.849825] ? sc_send_public_key+0x115/0x2a0 [bluetooth] [ 222.849925] smp_recv_cb+0x959/0x2490 [bluetooth] [ 222.850023] ? _cond_resched+0x19/0x40 [ 222.850105] ? mutex_lock+0x12/0x40 [ 222.850202] l2cap_recv_frame+0x109d/0x3420 [bluetooth] [ 222.850315] ? l2cap_recv_frame+0x109d/0x3420 [bluetooth] [ 222.850426] ? __switch_to_asm+0x34/0x70 [ 222.850515] ? __switch_to_asm+0x40/0x70 [ 222.850625] ? __switch_to_asm+0x34/0x70 [ 222.850724] ? __switch_to_asm+0x40/0x70 [ 222.850786] ? __switch_to_asm+0x34/0x70 [ 222.850846] ? __switch_to_asm+0x40/0x70 [ 222.852581] ? __switch_to_asm+0x34/0x70 [ 222.854976] ? __switch_to_asm+0x40/0x70 [ 222.857475] ? __switch_to_asm+0x40/0x70 [ 222.859775] ? __switch_to_asm+0x34/0x70 [ 222.861218] ? __switch_to_asm+0x40/0x70 [ 222.862327] ? __switch_to_asm+0x34/0x70 [ 222.863758] l2cap_recv_acldata+0x266/0x3c0 [bluetooth] [ 222.865122] hci_rx_work+0x1c9/0x430 [bluetooth] [ 222.867144] process_one_work+0x210/0x4c0 [ 222.868248] worker_thread+0x41/0x4d0 [ 222.869420] kthread+0x141/0x160 [ 222.870694] ? process_one_work+0x4c0/0x4c0 [ 222.871668] ? kthread_create_worker_on_cpu+0x90/0x90 [ 222.872896] ret_from_fork+0x35/0x40 [ 222.874132] Modules linked in: algif_hash algif_skcipher af_alg rfcomm bnep btusb btrtl btbcm btintel snd_intel8x0 cmac intel_rapl_perf vboxvideo(C) snd_ac97_codec bluetooth ac97_bus joydev ttm snd_pcm ecdh_generic drm_kms_helper snd_timer snd input_leds drm serio_raw fb_sys_fops soundcore syscopyarea sysfillrect sysimgblt mac_hid sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper ahci psmouse libahci i2c_piix4 video e1000 pata_acpi [ 222.883153] fbcon_switch: detected unhandled fb_set_par error, error code -16 [ 222.886774] fbcon_switch: detected unhandled fb_set_par error, error code -16 [ 222.890503] ---[ end trace 6504aa7a777b5316 ]--- [ 222.890541] RIP: 0010:compute_ecdh_secret+0x5a/0x270 [bluetooth] [ 222.890551] Code: 0c af f5 48 8b 3d 46 de f0 f6 ba 40 00 00 00 be c0 00 60 00 e8 b7 7b c5 f5 48 85 c0 0f 84 ea 01 00 00 48 89 c3 e8 16 0c af f5 <49> 8b 47 38 be c0 00 60 00 8b 78 f8 48 83 c7 48 e8 51 84 c5 f5 48 [ 222.890555] RSP: 0018:ffffb1664087fbc0 EFLAGS: 00010293 [ 222.890561] RAX: ffff8a5750d7dc00 RBX: ffff8a5671096780 RCX: ffffffffc08bc32a [ 222.890565] RDX: 0000000000000000 RSI: 00000000006000c0 RDI: ffff8a5752003800 [ 222.890571] RBP: ffffb1664087fc60 R08: ffff8a57525280a0 R09: ffff8a5752003800 [ 222.890576] R10: ffffb1664087fc70 R11: 0000000000000093 R12: ffff8a5674396e00 [ 222.890581] R13: ffff8a574c2e79aa R14: ffff8a574c2e796a R15: 020e0e100d010101 [ 222.890586] FS: 0000000000000000(0000) GS:ffff8a5752500000(0000) knlGS:0000000000000000 [ 222.890591] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 222.890594] CR2: 000055856016a038 CR3: 0000000110d2c005 CR4: 00000000000606e0 This commit fixes a bug where invalid pointer to crypto tfm was used for SMP SC ECDH calculation when OOB was in use. Solution is to use same crypto tfm than when generating OOB material on generate_oob() function. This bug was introduced in commit c0153b0b901a ("Bluetooth: let the crypto subsystem generate the ecc privkey"). Bug was found by fuzzing kernel SMP implementation using Synopsys Defensics. Signed-off-by: Matias Karhumaa <matias.karhumaa@gmail.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2018-09-11Bluetooth: SMP: Fix trying to use non-existent local OOB dataJohan Hedberg
A remote device may claim that it has received our OOB data, even though we never geneated it. Add a new flag to track whether we actually have OOB data, and ignore the remote peer's flag if haven't generated OOB data. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2018-09-11netfilter: xt_hashlimit: use s->file instead of s->privateCong Wang
After switching to the new procfs API, it is supposed to retrieve the private pointer from PDE_DATA(file_inode(s->file)), s->private is no longer referred. Fixes: 1cd671827290 ("netfilter/x_tables: switch to proc_create_seq_private") Reported-by: Sami Farin <hvtaifwkbgefbaei@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Christoph Hellwig <hch@lst.de> Tested-by: Sami Farin <hvtaifwkbgefbaei@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-11netfilter: nfnetlink_queue: Solve the NFQUEUE/conntrack clash for NF_REPEATMichal 'vorner' Vaner
NF_REPEAT places the packet at the beginning of the iptables chain instead of accepting or rejecting it right away. The packet however will reach the end of the chain and continue to the end of iptables eventually, so it needs the same handling as NF_ACCEPT and NF_DROP. Fixes: 368982cd7d1b ("netfilter: nfnetlink_queue: resolve clash for unconfirmed conntracks") Signed-off-by: Michal 'vorner' Vaner <michal.vaner@avast.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-11netfilter: cttimeout: ctnl_timeout_find_get() returns incorrect pointer to typePablo Neira Ayuso
Compiler did not catch incorrect typing in the rcu hook assignment. % nfct add timeout test-tcp inet tcp established 100 close 10 close_wait 10 % iptables -I OUTPUT -t raw -p tcp -j CT --timeout test-tcp dmesg - xt_CT: Timeout policy `test-tcp' can only be used by L3 protocol number 25000 The CT target bails out with incorrect layer 3 protocol number. Fixes: 6c1fd7dc489d ("netfilter: cttimeout: decouple timeout policy from nfnetlink_cttimeout object") Reported-by: Harsha Sharma <harshasharmaiitr@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-11netfilter: conntrack: timeout interface depend on CONFIG_NF_CONNTRACK_TIMEOUTPablo Neira Ayuso
Now that cttimeout support for nft_ct is in place, these should depend on CONFIG_NF_CONNTRACK_TIMEOUT otherwise we can crash when dumping the policy if this option is not enabled. [ 71.600121] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 [...] [ 71.600141] CPU: 3 PID: 7612 Comm: nft Not tainted 4.18.0+ #246 [...] [ 71.600188] Call Trace: [ 71.600201] ? nft_ct_timeout_obj_dump+0xc6/0xf0 [nft_ct] Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-11netfilter: conntrack: reset tcp maxwin on re-registerFlorian Westphal
Doug Smythies says: Sometimes it is desirable to temporarily disable, or clear, the iptables rule set on a computer being controlled via a secure shell session (SSH). While unwise on an internet facing computer, I also do it often on non-internet accessible computers while testing. Recently, this has become problematic, with the SSH session being dropped upon re-load of the rule set. The problem is that when all rules are deleted, conntrack hooks get unregistered. In case the rules are re-added later, its possible that tcp window has moved far enough so that all packets are considered invalid (out of window) until entry expires (which can take forever, default established timeout is 5 days). Fix this by clearing maxwin of existing tcp connections on register. v2: don't touch entries on hook removal. v3: remove obsolete expiry check. Reported-by: Doug Smythies <dsmythies@telus.net> Fixes: 4d3a57f23dec59 ("netfilter: conntrack: do not enable connection tracking unless needed") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-09ip: frags: fix crash in ip_do_fragment()Taehee Yoo
A kernel crash occurrs when defragmented packet is fragmented in ip_do_fragment(). In defragment routine, skb_orphan() is called and skb->ip_defrag_offset is set. but skb->sk and skb->ip_defrag_offset are same union member. so that frag->sk is not NULL. Hence crash occurrs in skb->sk check routine in ip_do_fragment() when defragmented packet is fragmented. test commands: %iptables -t nat -I POSTROUTING -j MASQUERADE %hping3 -s 1000 -p 2000 -d 60000 splat looks like: [ 261.069429] kernel BUG at net/ipv4/ip_output.c:636! [ 261.075753] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 261.083854] CPU: 1 PID: 1349 Comm: hping3 Not tainted 4.19.0-rc2+ #3 [ 261.100977] RIP: 0010:ip_do_fragment+0x1613/0x2600 [ 261.106945] Code: e8 e2 38 e3 fe 4c 8b 44 24 18 48 8b 74 24 08 e9 92 f6 ff ff 80 3c 02 00 0f 85 da 07 00 00 48 8b b5 d0 00 00 00 e9 25 f6 ff ff <0f> 0b 0f 0b 44 8b 54 24 58 4c 8b 4c 24 18 4c 8b 5c 24 60 4c 8b 6c [ 261.127015] RSP: 0018:ffff8801031cf2c0 EFLAGS: 00010202 [ 261.134156] RAX: 1ffff1002297537b RBX: ffffed0020639e6e RCX: 0000000000000004 [ 261.142156] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880114ba9bd8 [ 261.150157] RBP: ffff880114ba8a40 R08: ffffed0022975395 R09: ffffed0022975395 [ 261.158157] R10: 0000000000000001 R11: ffffed0022975394 R12: ffff880114ba9ca4 [ 261.166159] R13: 0000000000000010 R14: ffff880114ba9bc0 R15: dffffc0000000000 [ 261.174169] FS: 00007fbae2199700(0000) GS:ffff88011b400000(0000) knlGS:0000000000000000 [ 261.183012] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 261.189013] CR2: 00005579244fe000 CR3: 0000000119bf4000 CR4: 00000000001006e0 [ 261.198158] Call Trace: [ 261.199018] ? dst_output+0x180/0x180 [ 261.205011] ? save_trace+0x300/0x300 [ 261.209018] ? ip_copy_metadata+0xb00/0xb00 [ 261.213034] ? sched_clock_local+0xd4/0x140 [ 261.218158] ? kill_l4proto+0x120/0x120 [nf_conntrack] [ 261.223014] ? rt_cpu_seq_stop+0x10/0x10 [ 261.227014] ? find_held_lock+0x39/0x1c0 [ 261.233008] ip_finish_output+0x51d/0xb50 [ 261.237006] ? ip_fragment.constprop.56+0x220/0x220 [ 261.243011] ? nf_ct_l4proto_register_one+0x5b0/0x5b0 [nf_conntrack] [ 261.250152] ? rcu_is_watching+0x77/0x120 [ 261.255010] ? nf_nat_ipv4_out+0x1e/0x2b0 [nf_nat_ipv4] [ 261.261033] ? nf_hook_slow+0xb1/0x160 [ 261.265007] ip_output+0x1c7/0x710 [ 261.269005] ? ip_mc_output+0x13f0/0x13f0 [ 261.273002] ? __local_bh_enable_ip+0xe9/0x1b0 [ 261.278152] ? ip_fragment.constprop.56+0x220/0x220 [ 261.282996] ? nf_hook_slow+0xb1/0x160 [ 261.287007] raw_sendmsg+0x21f9/0x4420 [ 261.291008] ? dst_output+0x180/0x180 [ 261.297003] ? sched_clock_cpu+0x126/0x170 [ 261.301003] ? find_held_lock+0x39/0x1c0 [ 261.306155] ? stop_critical_timings+0x420/0x420 [ 261.311004] ? check_flags.part.36+0x450/0x450 [ 261.315005] ? _raw_spin_unlock_irq+0x29/0x40 [ 261.320995] ? _raw_spin_unlock_irq+0x29/0x40 [ 261.326142] ? cyc2ns_read_end+0x10/0x10 [ 261.330139] ? raw_bind+0x280/0x280 [ 261.334138] ? sched_clock_cpu+0x126/0x170 [ 261.338995] ? check_flags.part.36+0x450/0x450 [ 261.342991] ? __lock_acquire+0x4500/0x4500 [ 261.348994] ? inet_sendmsg+0x11c/0x500 [ 261.352989] ? dst_output+0x180/0x180 [ 261.357012] inet_sendmsg+0x11c/0x500 [ ... ] v2: - clear skb->sk at reassembly routine.(Eric Dumarzet) Fixes: fa0f527358bd ("ip: use rb trees for IP frag queue.") Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-09net/tls: Set count of SG entries if sk_alloc_sg returns -ENOSPCVakul Garg
tls_sw_sendmsg() allocates plaintext and encrypted SG entries using function sk_alloc_sg(). In case the number of SG entries hit MAX_SKB_FRAGS, sk_alloc_sg() returns -ENOSPC and sets the variable for current SG index to '0'. This leads to calling of function tls_push_record() with 'sg_encrypted_num_elem = 0' and later causes kernel crash. To fix this, set the number of SG elements to the number of elements in plaintext/encrypted SG arrays in case sk_alloc_sg() returns -ENOSPC. Fixes: 3c4d7559159b ("tls: kernel TLS support") Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-07tcp: really ignore MSG_ZEROCOPY if no SO_ZEROCOPYVincent Whitchurch
According to the documentation in msg_zerocopy.rst, the SO_ZEROCOPY flag was introduced because send(2) ignores unknown message flags and any legacy application which was accidentally passing the equivalent of MSG_ZEROCOPY earlier should not see any new behaviour. Before commit f214f915e7db ("tcp: enable MSG_ZEROCOPY"), a send(2) call which passed the equivalent of MSG_ZEROCOPY without setting SO_ZEROCOPY would succeed. However, after that commit, it fails with -ENOBUFS. So it appears that the SO_ZEROCOPY flag fails to fulfill its intended purpose. Fix it. Fixes: f214f915e7db ("tcp: enable MSG_ZEROCOPY") Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-07net_sched: properly cancel netlink dump on failureCong Wang
When nla_put*() fails after nla_nest_start(), we need to call nla_nest_cancel() to cancel the message, otherwise we end up calling nla_nest_end() like a success. Fixes: 0ed5269f9e41 ("net/sched: add tunnel option support to act_tunnel_key") Cc: Davide Caratti <dcaratti@redhat.com> Cc: Simon Horman <simon.horman@netronome.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-06tipc: call start and done ops directly in __tipc_nl_compat_dumpit()Cong Wang
__tipc_nl_compat_dumpit() uses a netlink_callback on stack, so the only way to align it with other ->dumpit() call path is calling tipc_dump_start() and tipc_dump_done() directly inside it. Otherwise ->dumpit() would always get NULL from cb->args[]. But tipc_dump_start() uses sock_net(cb->skb->sk) to retrieve net pointer, the cb->skb here doesn't set skb->sk, the net pointer is saved in msg->net instead, so introduce a helper function __tipc_dump_start() to pass in msg->net. Ying pointed out cb->args[0...3] are already used by other callbacks on this call path, so we can't use cb->args[0] any more, use cb->args[4] instead. Fixes: 9a07efa9aea2 ("tipc: switch to rhashtable iterator") Reported-and-tested-by: syzbot+e93a2c41f91b8e2c7d9b@syzkaller.appspotmail.com Cc: Jon Maloy <jon.maloy@ericsson.com> Cc: Ying Xue <ying.xue@windriver.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05net/iucv: declare iucv_path_table_empty() as staticJulian Wiedmann
Fixes a compile warning. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05net/af_iucv: fix skb handling on HiperTransport xmit errorJulian Wiedmann
When sending an skb, afiucv_hs_send() bails out on various error conditions. But currently the caller has no way of telling whether the skb was freed or not - resulting in potentially either a) leaked skbs from iucv_send_ctrl(), or b) double-free's from iucv_sock_sendmsg(). As dev_queue_xmit() will always consume the skb (even on error), be consistent and also free the skb from all other error paths. This way callers no longer need to care about managing the skb. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05net/af_iucv: drop inbound packets with invalid flagsJulian Wiedmann
Inbound packets may have any combination of flag bits set in their iucv header. If we don't know how to handle a specific combination, drop the skb instead of leaking it. To clarify what error is returned in this case, replace the hard-coded 0 with the corresponding macro. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05net/sched: fix memory leak in act_tunnel_key_init()Davide Caratti
If users try to install act_tunnel_key 'set' rules with duplicate values of 'index', the tunnel metadata are allocated, but never released. Then, kmemleak complains as follows: # tc a a a tunnel_key set src_ip dst_ip id 42 index 111 # echo clear > /sys/kernel/debug/kmemleak # tc a a a tunnel_key set src_ip dst_ip id 42 index 111 Error: TC IDR already exists. We have an error talking to the kernel # echo scan > /sys/kernel/debug/kmemleak # cat /sys/kernel/debug/kmemleak unreferenced object 0xffff8800574e6c80 (size 256): comm "tc", pid 5617, jiffies 4298118009 (age 57.990s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 1c e8 b0 ff ff ff ff ................ 81 24 c2 ad ff ff ff ff 00 00 00 00 00 00 00 00 .$.............. backtrace: [<00000000b7afbf4e>] tunnel_key_init+0x8a5/0x1800 [act_tunnel_key] [<000000007d98fccd>] tcf_action_init_1+0x698/0xac0 [<0000000099b8f7cc>] tcf_action_init+0x15c/0x590 [<00000000dc60eebe>] tc_ctl_action+0x336/0x5c2 [<000000002f5a2f7d>] rtnetlink_rcv_msg+0x357/0x8e0 [<000000000bfe7575>] netlink_rcv_skb+0x124/0x350 [<00000000edab656f>] netlink_unicast+0x40f/0x5d0 [<00000000b322cdcb>] netlink_sendmsg+0x6e8/0xba0 [<0000000063d9d490>] sock_sendmsg+0xb3/0xf0 [<00000000f0d3315a>] ___sys_sendmsg+0x654/0x960 [<00000000c06cbd42>] __sys_sendmsg+0xd3/0x170 [<00000000ce72e4b0>] do_syscall_64+0xa5/0x470 [<000000005caa2d97>] entry_SYSCALL_64_after_hwframe+0x49/0xbe [<00000000fac1b476>] 0xffffffffffffffff This problem theoretically happens also in case users attempt to setup a geneve rule having wrong configuration data, or when the kernel fails to allocate 'params_new'. Ensure that tunnel_key_init() releases the tunnel metadata also in the above conditions. Addresses-Coverity-ID: 1373974 ("Resource leak") Fixes: d0f6dd8a914f4 ("net/sched: Introduce act_tunnel_key") Fixes: 0ed5269f9e41f ("net/sched: add tunnel option support to act_tunnel_key") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05tipc: orphan sock in tipc_release()Cong Wang
Before we unlock the sock in tipc_release(), we have to detach sk->sk_socket from sk, otherwise a parallel tipc_sk_fill_sock_diag() could stil read it after we free this socket. Fixes: c30b70deb5f4 ("tipc: implement socket diagnostics for AF_TIPC") Reported-and-tested-by: syzbot+48804b87c16588ad491d@syzkaller.appspotmail.com Cc: Jon Maloy <jon.maloy@ericsson.com> Cc: Ying Xue <ying.xue@windriver.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Must perform TXQ teardown before unregistering interfaces in mac80211, from Toke Høiland-Jørgensen. 2) Don't allow creating mac80211_hwsim with less than one channel, from Johannes Berg. 3) Division by zero in cfg80211, fix from Johannes Berg. 4) Fix endian issue in tipc, from Haiqing Bai. 5) BPF sockmap use-after-free fixes from Daniel Borkmann. 6) Spectre-v1 in mac80211_hwsim, from Jinbum Park. 7) Missing rhashtable_walk_exit() in tipc, from Cong Wang. 8) Revert kvzalloc() conversion of AF_PACKET, it breaks mmap() when kvzalloc() tries to use kmalloc() pages. From Eric Dumazet. 9) Fix deadlock in hv_netvsc, from Dexuan Cui. 10) Do not restart timewait timer on RST, from Florian Westphal. 11) Fix double lwstate refcount grab in ipv6, from Alexey Kodanev. 12) Unsolicit report count handling is off-by-one, fix from Hangbin Liu. 13) Sleep-in-atomic in cadence driver, from Jia-Ju Bai. 14) Respect ttl-inherit in ip6 tunnel driver, from Hangbin Liu. 15) Use-after-free in act_ife, fix from Cong Wang. 16) Missing hold to meta module in act_ife, from Vlad Buslov. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (91 commits) net: phy: sfp: Handle unimplemented hwmon limits and alarms net: sched: action_ife: take reference to meta module act_ife: fix a potential use-after-free net/mlx5: Fix SQ offset in QPs with small RQ tipc: correct spelling errors for tipc_topsrv_queue_evt() comments tipc: correct spelling errors for struct tipc_bc_base's comment bnxt_en: Do not adjust max_cp_rings by the ones used by RDMA. bnxt_en: Clean up unused functions. bnxt_en: Fix firmware signaled resource change logic in open. sctp: not traverse asoc trans list if non-ipv6 trans exists for ipv6_flowlabel sctp: fix invalid reference to the index variable of the iterator net/ibm/emac: wrong emac_calc_base call was used by typo net: sched: null actions array pointer before releasing action vhost: fix VHOST_GET_BACKEND_FEATURES ioctl request definition r8169: add support for NCube 8168 network card ip6_tunnel: respect ttl inherit for ip6tnl mac80211: shorten the IBSS debug messages mac80211: don't Tx a deauth frame if the AP forbade Tx mac80211: Fix station bandwidth setting after channel switch mac80211: fix a race between restart and CSA flows ...
2018-09-04net: sched: action_ife: take reference to meta moduleVlad Buslov
Recent refactoring of add_metainfo() caused use_all_metadata() to add metainfo to ife action metalist without taking reference to module. This causes warning in module_put called from ife action cleanup function. Implement add_metainfo_and_get_ops() function that returns with reference to module taken if metainfo was added successfully, and call it from use_all_metadata(), instead of calling __add_metainfo() directly. Example warning: [ 646.344393] WARNING: CPU: 1 PID: 2278 at kernel/module.c:1139 module_put+0x1cb/0x230 [ 646.352437] Modules linked in: act_meta_skbtcindex act_meta_mark act_meta_skbprio act_ife ife veth nfsv3 nfs fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c tun ebtable_filter ebtables ip6table_filter ip6_tables bridge stp llc mlx5_ib ib_uverbs ib_core intel_rapl sb_edac x86_pkg_temp_thermal mlx5_core coretemp kvm_intel kvm nfsd igb irqbypass crct10dif_pclmul devlink crc32_pclmul mei_me joydev ses crc32c_intel enclosure auth_rpcgss i2c_algo_bit ioatdma ptp mei pps_core ghash_clmulni_intel iTCO_wdt iTCO_vendor_support pcspkr dca ipmi_ssif lpc_ich target_core_mod i2c_i801 ipmi_si ipmi_devintf pcc_cpufreq wmi ipmi_msghandler nfs_acl lockd acpi_pad acpi_power_meter grace sunrpc mpt3sas raid_class scsi_transport_sas [ 646.425631] CPU: 1 PID: 2278 Comm: tc Not tainted 4.19.0-rc1+ #799 [ 646.432187] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017 [ 646.440595] RIP: 0010:module_put+0x1cb/0x230 [ 646.445238] Code: f3 66 94 02 e8 26 ff fa ff 85 c0 74 11 0f b6 1d 51 30 94 02 80 fb 01 77 60 83 e3 01 74 13 65 ff 0d 3a 83 db 73 e9 2b ff ff ff <0f> 0b e9 00 ff ff ff e8 59 01 fb ff 85 c0 75 e4 48 c7 c2 20 62 6b [ 646.464997] RSP: 0018:ffff880354d37068 EFLAGS: 00010286 [ 646.470599] RAX: 0000000000000000 RBX: ffffffffc0a52518 RCX: ffffffff8c2668db [ 646.478118] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffffffffc0a52518 [ 646.485641] RBP: ffffffffc0a52180 R08: fffffbfff814a4a4 R09: fffffbfff814a4a3 [ 646.493164] R10: ffffffffc0a5251b R11: fffffbfff814a4a4 R12: 1ffff1006a9a6e0d [ 646.500687] R13: 00000000ffffffff R14: ffff880362bab890 R15: dead000000000100 [ 646.508213] FS: 00007f4164c99800(0000) GS:ffff88036fe40000(0000) knlGS:0000000000000000 [ 646.516961] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 646.523080] CR2: 00007f41638b8420 CR3: 0000000351df0004 CR4: 00000000001606e0 [ 646.530595] Call Trace: [ 646.533408] ? find_symbol_in_section+0x260/0x260 [ 646.538509] tcf_ife_cleanup+0x11b/0x200 [act_ife] [ 646.543695] tcf_action_cleanup+0x29/0xa0 [ 646.548078] __tcf_action_put+0x5a/0xb0 [ 646.552289] ? nla_put+0x65/0xe0 [ 646.555889] __tcf_idr_release+0x48/0x60 [ 646.560187] tcf_generic_walker+0x448/0x6b0 [ 646.564764] ? tcf_action_dump_1+0x450/0x450 [ 646.569411] ? __lock_is_held+0x84/0x110 [ 646.573720] ? tcf_ife_walker+0x10c/0x20f [act_ife] [ 646.578982] tca_action_gd+0x972/0xc40 [ 646.583129] ? tca_get_fill.constprop.17+0x250/0x250 [ 646.588471] ? mark_lock+0xcf/0x980 [ 646.592324] ? check_chain_key+0x140/0x1f0 [ 646.596832] ? debug_show_all_locks+0x240/0x240 [ 646.601839] ? memset+0x1f/0x40 [ 646.605350] ? nla_parse+0xca/0x1a0 [ 646.609217] tc_ctl_action+0x215/0x230 [ 646.613339] ? tcf_action_add+0x220/0x220 [ 646.617748] rtnetlink_rcv_msg+0x56a/0x6d0 [ 646.622227] ? rtnl_fdb_del+0x3f0/0x3f0 [ 646.626466] netlink_rcv_skb+0x18d/0x200 [ 646.630752] ? rtnl_fdb_del+0x3f0/0x3f0 [ 646.634959] ? netlink_ack+0x500/0x500 [ 646.639106] netlink_unicast+0x2d0/0x370 [ 646.643409] ? netlink_attachskb+0x340/0x340 [ 646.648050] ? _copy_from_iter_full+0xe9/0x3e0 [ 646.652870] ? import_iovec+0x11e/0x1c0 [ 646.657083] netlink_sendmsg+0x3b9/0x6a0 [ 646.661388] ? netlink_unicast+0x370/0x370 [ 646.665877] ? netlink_unicast+0x370/0x370 [ 646.670351] sock_sendmsg+0x6b/0x80 [ 646.674212] ___sys_sendmsg+0x4a1/0x520 [ 646.678443] ? copy_msghdr_from_user+0x210/0x210 [ 646.683463] ? lock_downgrade+0x320/0x320 [ 646.687849] ? debug_show_all_locks+0x240/0x240 [ 646.692760] ? do_raw_spin_unlock+0xa2/0x130 [ 646.697418] ? _raw_spin_unlock+0x24/0x30 [ 646.701798] ? __handle_mm_fault+0x1819/0x1c10 [ 646.706619] ? __pmd_alloc+0x320/0x320 [ 646.710738] ? debug_show_all_locks+0x240/0x240 [ 646.715649] ? restore_nameidata+0x7b/0xa0 [ 646.720117] ? check_chain_key+0x140/0x1f0 [ 646.724590] ? check_chain_key+0x140/0x1f0 [ 646.729070] ? __fget_light+0xbc/0xd0 [ 646.733121] ? __sys_sendmsg+0xd7/0x150 [ 646.737329] __sys_sendmsg+0xd7/0x150 [ 646.741359] ? __ia32_sys_shutdown+0x30/0x30 [ 646.746003] ? up_read+0x53/0x90 [ 646.749601] ? __do_page_fault+0x484/0x780 [ 646.754105] ? do_syscall_64+0x1e/0x2c0 [ 646.758320] do_syscall_64+0x72/0x2c0 [ 646.762353] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 646.767776] RIP: 0033:0x7f4163872150 [ 646.771713] Code: 8b 15 3c 7d 2b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb cd 66 0f 1f 44 00 00 83 3d b9 d5 2b 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 be cd 00 00 48 89 04 24 [ 646.791474] RSP: 002b:00007ffdef7d6b58 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 646.799721] RAX: ffffffffffffffda RBX: 0000000000000024 RCX: 00007f4163872150 [ 646.807240] RDX: 0000000000000000 RSI: 00007ffdef7d6bd0 RDI: 0000000000000003 [ 646.814760] RBP: 000000005b8b9482 R08: 0000000000000001 R09: 0000000000000000 [ 646.822286] R10: 00000000000005e7 R11: 0000000000000246 R12: 00007ffdef7dad20 [ 646.829807] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000679bc0 [ 646.837360] irq event stamp: 6083 [ 646.841043] hardirqs last enabled at (6081): [<ffffffff8c220a7d>] __call_rcu+0x17d/0x500 [ 646.849882] hardirqs last disabled at (6083): [<ffffffff8c004f06>] trace_hardirqs_off_thunk+0x1a/0x1c [ 646.859775] softirqs last enabled at (5968): [<ffffffff8d4004a1>] __do_softirq+0x4a1/0x6ee [ 646.868784] softirqs last disabled at (6082): [<ffffffffc0a78759>] tcf_ife_cleanup+0x39/0x200 [act_ife] [ 646.878845] ---[ end trace b1b8c12ffe51e657 ]--- Fixes: 5ffe57da29b3 ("act_ife: fix a potential deadlock") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-04act_ife: fix a potential use-after-freeCong Wang
Immediately after module_put(), user could delete this module, so e->ops could be already freed before we call e->ops->release(). Fix this by moving module_put() after ops->release(). Fixes: ef6980b6becb ("introduce IFE action") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-03Merge tag 'mac80211-for-davem-2018-09-03' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Here are quite a large number of fixes, notably: * various A-MSDU building fixes (currently only affects mt76) * syzkaller & spectre fixes in hwsim * TXQ vs. teardown fix that was causing crashes * embed WMM info in reg rule, bad code here had been causing crashes * one compilation issue with fix from Arnd (rfkill-gpio includes) * fixes for a race and bad data during/after channel switch * nl80211: a validation fix, attribute type & unit fixes along with other small fixes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-03tipc: correct spelling errors for tipc_topsrv_queue_evt() commentsZhenbo Gao
tipc_conn_queue_evt -> tipc_topsrv_queue_evt tipc_send_work -> tipc_conn_send_work tipc_send_to_sock -> tipc_conn_send_to_sock Signed-off-by: Zhenbo Gao <zhenbo.gao@windriver.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-03tipc: correct spelling errors for struct tipc_bc_base's commentZhenbo Gao
Trivial fix for two spelling mistakes. Signed-off-by: Zhenbo Gao <zhenbo.gao@windriver.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-03sctp: not traverse asoc trans list if non-ipv6 trans exists for ipv6_flowlabelXin Long
When users set params.spp_address and get a trans, ipv6_flowlabel flag should be applied into this trans. But even if this one is not an ipv6 trans, it should not go to apply it into all other transes of the asoc but simply ignore it. Fixes: 0b0dce7a36fb ("sctp: add spp_ipv6_flowlabel and spp_dscp for sctp_paddrparams") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>