AgeCommit message (Collapse)Author
2019-10-25Enable IMX296 camera sensorimx296-db410cManivannan Sadhasivam
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2019-10-25media: i2c: Add IMX296 CMOS image sensor driverManivannan Sadhasivam
Add driver for Sony IMX296 CMOS image sensor driver. The driver only supports I2C interface for programming and MIPI CSI-2 for sensor output. Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2019-10-25dt-bindings: media: i2c: Add IMX296 CMOS sensor bindingManivannan Sadhasivam
Add devicetree binding for IMX296 CMOS image sensor. Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2019-01-29Merge tag 'v4.14.96' into release/qcomlt-4.14Aníbal Limón
This is the 4.14.96 stable release Conflicts: drivers/mmc/host/sdhci-msm.c
2019-01-26Linux 4.14.96Greg Kroah-Hartman
2019-01-26ipmi:ssif: Fix handling of multi-part return messagesCorey Minyard
commit 7d6380cd40f7993f75c4bde5b36f6019237e8719 upstream. The block number was not being compared right, it was off by one when checking the response. Some statistics wouldn't be incremented properly in some cases. Check to see if that middle-part messages always have 31 bytes of data. Signed-off-by: Corey Minyard <cminyard@mvista.com> Cc: stable@vger.kernel.org # 4.4 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-26PCI: dwc: Move interrupt acking into the proper callbackMarc Zyngier
commit 3f7bb2ec20ce07c02b2002349d256c91a463fcc5 upstream. The write to the status register is really an ACK for the HW, and should be treated as such by the driver. Let's move it to the irq_ack() callback, which will prevent people from moving it around in order to paper over other bugs. Fixes: 8c934095fa2f ("PCI: dwc: Clear MSI interrupt status after it is handled, not before") Fixes: 7c5925afbc58 ("PCI: dwc: Move MSI IRQs allocation to IRQ domains hierarchical API") Link: https://lore.kernel.org/linux-pci/20181113225734.8026-1-marc.zyngier@arm.com/ Reported-by: Trent Piepho <tpiepho@impinj.com> Tested-by: Niklas Cassel <niklas.cassel@linaro.org> Tested-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com> Tested-by: Stanimir Varbanov <svarbanov@mm-sol.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> [lorenzo.pieralisi@arm.com: updated commit log] Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-26drm/i915/gvt: Fix mmap range checkZhenyu Wang
commit 51b00d8509dc69c98740da2ad07308b630d3eb7d upstream. This is to fix missed mmap range check on vGPU bar2 region and only allow to map vGPU allocated GMADDR range, which means user space should support sparse mmap to get proper offset for mmap vGPU aperture. And this takes care of actual pgoff in mmap request as original code always does from beginning of vGPU aperture. Fixes: 659643f7d814 ("drm/i915/gvt/kvmgt: add vfio/mdev support to KVMGT") Cc: "Monroy, Rodrigo Axel" <rodrigo.axel.monroy@intel.com> Cc: "Orrala Contreras, Alfredo" <alfredo.orrala.contreras@intel.com> Cc: stable@vger.kernel.org # v4.10+ Reviewed-by: Hang Yuan <hang.yuan@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-26cifs: allow disabling insecure dialects in the configSteve French
commit 7420451f6a109f7f8f1bf283f34d08eba3259fb3 upstream. allow disabling cifs (SMB1 ie vers=1.0) and vers=2.0 in the config for the build of cifs.ko if want to always prevent mounting with these less secure dialects. Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Reviewed-by: Jeremy Allison <jra@samba.org> Cc: Alakesh Haloi <alakeshh@amazon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-26ipmi:pci: Blacklist a Realtek "IPMI" deviceCorey Minyard
commit bc48fa1b9d3b04106055b27078da824cd209865a upstream. Realtek has some sort of "Virtual" IPMI device on the PCI bus as a KCS controller, but whatever it is, it's not one. Ignore it if seen. [ Commit 13d0b35c (ipmi_si: Move PCI setup to another file) from Linux 4.15-rc1 has not been back ported, so the PCI code is still in `drivers/char/ipmi/ipmi_si_intf.c`, requiring to apply the commit manually. This fixes a 100 s boot delay on the HP EliteDesk 705 G4 MT with Linux 4.14.94. ] Reported-by: Chris Chiu <chiu@endlessm.com> Signed-off-by: Corey Minyard <cminyard@mvista.com> Tested-by: Daniel Drake <drake@endlessm.com> Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-26nfs: fix a deadlock in nfs client initializationScott Mayhew
commit c156618e15101a9cc8c815108fec0300a0ec6637 upstream. The following deadlock can occur between a process waiting for a client to initialize in while walking the client list during nfsv4 server trunking detection and another process waiting for the nfs_clid_init_mutex so it can initialize that client: Process 1 Process 2 --------- --------- spin_lock(&nn->nfs_client_lock); list_add_tail(&CLIENTA->cl_share_link, &nn->nfs_client_list); spin_unlock(&nn->nfs_client_lock); spin_lock(&nn->nfs_client_lock); list_add_tail(&CLIENTB->cl_share_link, &nn->nfs_client_list); spin_unlock(&nn->nfs_client_lock); mutex_lock(&nfs_clid_init_mutex); nfs41_walk_client_list(clp, result, cred); nfs_wait_client_init_complete(CLIENTA); (waiting for nfs_clid_init_mutex) Make sure nfs_match_client() only evaluates clients that have completed initialization in order to prevent that deadlock. This patch also fixes v4.0 trunking behavior by not marking the client NFS_CS_READY until the clientid has been confirmed. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Qian Lu <luqia@amazon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-26mm, proc: be more verbose about unstable VMA flags in /proc/<pid>/smapsMichal Hocko
[ Upstream commit 7550c6079846a24f30d15ac75a941c8515dbedfb ] Patch series "THP eligibility reporting via proc". This series of three patches aims at making THP eligibility reporting much more robust and long term sustainable. The trigger for the change is a regression report [2] and the long follow up discussion. In short the specific application didn't have good API to query whether a particular mapping can be backed by THP so it has used VMA flags to workaround that. These flags represent a deep internal state of VMAs and as such they should be used by userspace with a great deal of caution. A similar has happened for [3] when users complained that VM_MIXEDMAP is no longer set on DAX mappings. Again a lack of a proper API led to an abuse. The first patch in the series tries to emphasise that that the semantic of flags might change and any application consuming those should be really careful. The remaining two patches provide a more suitable interface to address [2] and provide a consistent API to query the THP status both for each VMA and process wide as well. [1] http://lkml.kernel.org/r/20181120103515.25280-1-mhocko@kernel.org [2] http://lkml.kernel.org/r/http://lkml.kernel.org/r/alpine.DEB.2.21.1809241054050.224429@chino.kir.corp.google.com [3] http://lkml.kernel.org/r/20181002100531.GC4135@quack2.suse.cz This patch (of 3): Even though vma flags exported via /proc/<pid>/smaps are explicitly documented to be not guaranteed for future compatibility the warning doesn't go far enough because it doesn't mention semantic changes to those flags. And they are important as well because these flags are a deep implementation internal to the MM code and the semantic might change at any time. Let's consider two recent examples: http://lkml.kernel.org/r/20181002100531.GC4135@quack2.suse.cz : commit e1fb4a086495 "dax: remove VM_MIXEDMAP for fsdax and device dax" has : removed VM_MIXEDMAP flag from DAX VMAs. Now our testing shows that in the : mean time certain customer of ours started poking into /proc/<pid>/smaps : and looks at VMA flags there and if VM_MIXEDMAP is missing among the VMA : flags, the application just fails to start complaining that DAX support is : missing in the kernel. http://lkml.kernel.org/r/alpine.DEB.2.21.1809241054050.224429@chino.kir.corp.google.com : Commit 1860033237d4 ("mm: make PR_SET_THP_DISABLE immediately active") : introduced a regression in that userspace cannot always determine the set : of vmas where thp is ineligible. : Userspace relies on the "nh" flag being emitted as part of /proc/pid/smaps : to determine if a vma is eligible to be backed by hugepages. : Previous to this commit, prctl(PR_SET_THP_DISABLE, 1) would cause thp to : be disabled and emit "nh" as a flag for the corresponding vmas as part of : /proc/pid/smaps. After the commit, thp is disabled by means of an mm : flag and "nh" is not emitted. : This causes smaps parsing libraries to assume a vma is eligible for thp : and ends up puzzling the user on why its memory is not backed by thp. In both cases userspace was relying on a semantic of a specific VMA flag. The primary reason why that happened is a lack of a proper interface. While this has been worked on and it will be fixed properly, it seems that our wording could see some refinement and be more vocal about semantic aspect of these flags as well. Link: http://lkml.kernel.org/r/20181211143641.3503-2-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Jan Kara <jack@suse.cz> Acked-by: Dan Williams <dan.j.williams@intel.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Paul Oppenheimer <bepvte@gmail.com> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26mm/swap: use nr_node_ids for avail_lists in swap_info_structAaron Lu
[ Upstream commit 66f71da9dd38af17dc17209cdde7987d4679a699 ] Since a2468cc9bfdf ("swap: choose swap device according to numa node"), avail_lists field of swap_info_struct is changed to an array with MAX_NUMNODES elements. This made swap_info_struct size increased to 40KiB and needs an order-4 page to hold it. This is not optimal in that: 1 Most systems have way less than MAX_NUMNODES(1024) nodes so it is a waste of memory; 2 It could cause swapon failure if the swap device is swapped on after system has been running for a while, due to no order-4 page is available as pointed out by Vasily Averin. Solve the above two issues by using nr_node_ids(which is the actual possible node number the running system has) for avail_lists instead of MAX_NUMNODES. nr_node_ids is unknown at compile time so can't be directly used when declaring this array. What I did here is to declare avail_lists as zero element array and allocate space for it when allocating space for swap_info_struct. The reason why keep using array but not pointer is plist_for_each_entry needs the field to be part of the struct, so pointer will not work. This patch is on top of Vasily Averin's fix commit. I think the use of kvzalloc for swap_info_struct is still needed in case nr_node_ids is really big on some systems. Link: http://lkml.kernel.org/r/20181115083847.GA11129@intel.com Signed-off-by: Aaron Lu <aaron.lu@intel.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Vasily Averin <vvs@virtuozzo.com> Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26mm/page-writeback.c: don't break integrity writeback on ->writepage() errorBrian Foster
[ Upstream commit 3fa750dcf29e8606e3969d13d8e188cc1c0f511d ] write_cache_pages() is used in both background and integrity writeback scenarios by various filesystems. Background writeback is mostly concerned with cleaning a certain number of dirty pages based on various mm heuristics. It may not write the full set of dirty pages or wait for I/O to complete. Integrity writeback is responsible for persisting a set of dirty pages before the writeback job completes. For example, an fsync() call must perform integrity writeback to ensure data is on disk before the call returns. write_cache_pages() unconditionally breaks out of its processing loop in the event of a ->writepage() error. This is fine for background writeback, which had no strict requirements and will eventually come around again. This can cause problems for integrity writeback on filesystems that might need to clean up state associated with failed page writeouts. For example, XFS performs internal delayed allocation accounting before returning a ->writepage() error, where applicable. If the current writeback happens to be associated with an unmount and write_cache_pages() completes the writeback prematurely due to error, the filesystem is unmounted in an inconsistent state if dirty+delalloc pages still exist. To handle this problem, update write_cache_pages() to always process the full set of pages for integrity writeback regardless of ->writepage() errors. Save the first encountered error and return it to the caller once complete. This facilitates XFS (or any other fs that expects integrity writeback to process the entire set of dirty pages) to clean up its internal state completely in the event of persistent mapping errors. Background writeback continues to exit on the first error encountered. [akpm@linux-foundation.org: fix typo in comment] Link: http://lkml.kernel.org/r/20181116134304.32440-1-bfoster@redhat.com Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26ocfs2: fix panic due to unrecovered local allocJunxiao Bi
[ Upstream commit 532e1e54c8140188e192348c790317921cb2dc1c ] mount.ocfs2 ignore the inconsistent error that journal is clean but local alloc is unrecovered. After mount, local alloc not empty, then reserver cluster didn't alloc a new local alloc window, reserveration map is empty(ocfs2_reservation_map.m_bitmap_len = 0), that triggered the following panic. This issue was reported at https://oss.oracle.com/pipermail/ocfs2-devel/2015-May/010854.html and was advised to fixed during mount. But this is a very unusual inconsistent state, usually journal dirty flag should be cleared at the last stage of umount until every other things go right. We may need do further debug to check that. Any way to avoid possible futher corruption, mount should be abort and fsck should be run. (mount.ocfs2,1765,1):ocfs2_load_local_alloc:353 ERROR: Local alloc hasn't been recovered! found = 6518, set = 6518, taken = 8192, off = 15912372 ocfs2: Mounting device (202,64) on (node 0, slot 3) with ordered data mode. o2dlm: Joining domain 89CEAC63CC4F4D03AC185B44E0EE0F3F ( 0 1 2 3 4 5 6 8 ) 8 nodes ocfs2: Mounting device (202,80) on (node 0, slot 3) with ordered data mode. o2hb: Region 89CEAC63CC4F4D03AC185B44E0EE0F3F (xvdf) is now a quorum device o2net: Accepted connection from node yvwsoa17p (num 7) at o2dlm: Node 7 joins domain 64FE421C8C984E6D96ED12C55FEE2435 ( 0 1 2 3 4 5 6 7 8 ) 9 nodes o2dlm: Node 7 joins domain 89CEAC63CC4F4D03AC185B44E0EE0F3F ( 0 1 2 3 4 5 6 7 8 ) 9 nodes ------------[ cut here ]------------ kernel BUG at fs/ocfs2/reservations.c:507! invalid opcode: 0000 [#1] SMP Modules linked in: ocfs2 rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs fscache lockd grace ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sunrpc ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 ovmapi ppdev parport_pc parport xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea acpi_cpufreq pcspkr i2c_piix4 i2c_core sg ext4 jbd2 mbcache2 sr_mod cdrom xen_blkfront pata_acpi ata_generic ata_piix floppy dm_mirror dm_region_hash dm_log dm_mod CPU: 0 PID: 4349 Comm: startWebLogic.s Not tainted 4.1.12-124.19.2.el6uek.x86_64 #2 Hardware name: Xen HVM domU, BIOS 4.4.4OVM 09/06/2018 task: ffff8803fb04e200 ti: ffff8800ea4d8000 task.ti: ffff8800ea4d8000 RIP: 0010:[<ffffffffa05e96a8>] [<ffffffffa05e96a8>] __ocfs2_resv_find_window+0x498/0x760 [ocfs2] Call Trace: ocfs2_resmap_resv_bits+0x10d/0x400 [ocfs2] ocfs2_claim_local_alloc_bits+0xd0/0x640 [ocfs2] __ocfs2_claim_clusters+0x178/0x360 [ocfs2] ocfs2_claim_clusters+0x1f/0x30 [ocfs2] ocfs2_convert_inline_data_to_extents+0x634/0xa60 [ocfs2] ocfs2_write_begin_nolock+0x1c6/0x1da0 [ocfs2] ocfs2_write_begin+0x13e/0x230 [ocfs2] generic_perform_write+0xbf/0x1c0 __generic_file_write_iter+0x19c/0x1d0 ocfs2_file_write_iter+0x589/0x1360 [ocfs2] __vfs_write+0xb8/0x110 vfs_write+0xa9/0x1b0 SyS_write+0x46/0xb0 system_call_fastpath+0x18/0xd7 Code: ff ff 8b 75 b8 39 75 b0 8b 45 c8 89 45 98 0f 84 e5 fe ff ff 45 8b 74 24 18 41 8b 54 24 1c e9 56 fc ff ff 85 c0 0f 85 48 ff ff ff <0f> 0b 48 8b 05 cf c3 de ff 48 ba 00 00 00 00 00 00 00 10 48 85 RIP __ocfs2_resv_find_window+0x498/0x760 [ocfs2] RSP <ffff8800ea4db668> ---[ end trace 566f07529f2edf3c ]--- Kernel panic - not syncing: Fatal exception Kernel Offset: disabled Link: http://lkml.kernel.org/r/20181121020023.3034-2-junxiao.bi@oracle.com Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com> Acked-by: Joseph Qi <jiangqi903@gmail.com> Cc: Jun Piao <piaojun@huawei.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26scsi: megaraid: fix out-of-bound array accessesQian Cai
[ Upstream commit c7a082e4242fd8cd21a441071e622f87c16bdacc ] UBSAN reported those with MegaRAID SAS-3 3108, [ 77.467308] UBSAN: Undefined behaviour in drivers/scsi/megaraid/megaraid_sas_fp.c:117:32 [ 77.475402] index 255 is out of range for type 'MR_LD_SPAN_MAP [1]' [ 77.481677] CPU: 16 PID: 333 Comm: kworker/16:1 Not tainted 4.20.0-rc5+ #1 [ 77.488556] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.50 06/01/2018 [ 77.495791] Workqueue: events work_for_cpu_fn [ 77.500154] Call trace: [ 77.502610] dump_backtrace+0x0/0x2c8 [ 77.506279] show_stack+0x24/0x30 [ 77.509604] dump_stack+0x118/0x19c [ 77.513098] ubsan_epilogue+0x14/0x60 [ 77.516765] __ubsan_handle_out_of_bounds+0xfc/0x13c [ 77.521767] mr_update_load_balance_params+0x150/0x158 [megaraid_sas] [ 77.528230] MR_ValidateMapInfo+0x2cc/0x10d0 [megaraid_sas] [ 77.533825] megasas_get_map_info+0x244/0x2f0 [megaraid_sas] [ 77.539505] megasas_init_adapter_fusion+0x9b0/0xf48 [megaraid_sas] [ 77.545794] megasas_init_fw+0x1ab4/0x3518 [megaraid_sas] [ 77.551212] megasas_probe_one+0x2c4/0xbe0 [megaraid_sas] [ 77.556614] local_pci_probe+0x7c/0xf0 [ 77.560365] work_for_cpu_fn+0x34/0x50 [ 77.564118] process_one_work+0x61c/0xf08 [ 77.568129] worker_thread+0x534/0xa70 [ 77.571882] kthread+0x1c8/0x1d0 [ 77.575114] ret_from_fork+0x10/0x1c [ 89.240332] UBSAN: Undefined behaviour in drivers/scsi/megaraid/megaraid_sas_fp.c:117:32 [ 89.248426] index 255 is out of range for type 'MR_LD_SPAN_MAP [1]' [ 89.254700] CPU: 16 PID: 95 Comm: kworker/u130:0 Not tainted 4.20.0-rc5+ #1 [ 89.261665] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.50 06/01/2018 [ 89.268903] Workqueue: events_unbound async_run_entry_fn [ 89.274222] Call trace: [ 89.276680] dump_backtrace+0x0/0x2c8 [ 89.280348] show_stack+0x24/0x30 [ 89.283671] dump_stack+0x118/0x19c [ 89.287167] ubsan_epilogue+0x14/0x60 [ 89.290835] __ubsan_handle_out_of_bounds+0xfc/0x13c [ 89.295828] MR_LdRaidGet+0x50/0x58 [megaraid_sas] [ 89.300638] megasas_build_io_fusion+0xbb8/0xd90 [megaraid_sas] [ 89.306576] megasas_build_and_issue_cmd_fusion+0x138/0x460 [megaraid_sas] [ 89.313468] megasas_queue_command+0x398/0x3d0 [megaraid_sas] [ 89.319222] scsi_dispatch_cmd+0x1dc/0x8a8 [ 89.323321] scsi_request_fn+0x8e8/0xdd0 [ 89.327249] __blk_run_queue+0xc4/0x158 [ 89.331090] blk_execute_rq_nowait+0xf4/0x158 [ 89.335449] blk_execute_rq+0xdc/0x158 [ 89.339202] __scsi_execute+0x130/0x258 [ 89.343041] scsi_probe_and_add_lun+0x2fc/0x1488 [ 89.347661] __scsi_scan_target+0x1cc/0x8c8 [ 89.351848] scsi_scan_channel.part.3+0x8c/0xc0 [ 89.356382] scsi_scan_host_selected+0x130/0x1f0 [ 89.361002] do_scsi_scan_host+0xd8/0xf0 [ 89.364927] do_scan_async+0x9c/0x320 [ 89.368594] async_run_entry_fn+0x138/0x420 [ 89.372780] process_one_work+0x61c/0xf08 [ 89.376793] worker_thread+0x13c/0xa70 [ 89.380546] kthread+0x1c8/0x1d0 [ 89.383778] ret_from_fork+0x10/0x1c This is because when populating Driver Map using firmware raid map, all non-existing VDs set their ldTgtIdToLd to 0xff, so it can be skipped later. From drivers/scsi/megaraid/megaraid_sas_base.c , memset(instance->ld_ids, 0xff, MEGASAS_MAX_LD_IDS); From drivers/scsi/megaraid/megaraid_sas_fp.c , /* For non existing VDs, iterate to next VD*/ if (ld >= (MAX_LOGICAL_DRIVES_EXT - 1)) continue; However, there are a few places that failed to skip those non-existing VDs due to off-by-one errors. Then, those 0xff leaked into MR_LdRaidGet(0xff, map) and triggered the out-of-bound accesses. Fixes: 51087a8617fe ("megaraid_sas : Extended VD support") Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Sumit Saxena <sumit.saxena@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26scsi: smartpqi: call pqi_free_interrupts() in pqi_shutdown()Yanjiang Jin
[ Upstream commit e57b2945aa654e48f85a41e8917793c64ecb9de8 ] We must free all irqs during shutdown, else kexec's 2nd kernel would hang in pqi_wait_for_completion_io() as below: Call trace: pqi_wait_for_completion_io pqi_submit_raid_request_synchronous.constprop.78+0x23c/0x310 [smartpqi] pqi_configure_events+0xec/0x1f8 [smartpqi] pqi_ctrl_init+0x814/0xca0 [smartpqi] pqi_pci_probe+0x400/0x46c [smartpqi] local_pci_probe+0x48/0xb0 pci_device_probe+0x14c/0x1b0 really_probe+0x218/0x3fc driver_probe_device+0x70/0x140 __driver_attach+0x11c/0x134 bus_for_each_dev+0x70/0xc8 driver_attach+0x30/0x38 bus_add_driver+0x1f0/0x294 driver_register+0x74/0x12c __pci_register_driver+0x64/0x70 pqi_init+0xd0/0x10000 [smartpqi] do_one_initcall+0x60/0x1d8 do_init_module+0x64/0x1f8 load_module+0x10ec/0x1350 __se_sys_finit_module+0xd4/0x100 __arm64_sys_finit_module+0x28/0x34 el0_svc_handler+0x104/0x160 el0_svc+0x8/0xc This happens only in the following combinations: 1. smartpqi is built as module, not built-in; 2. We have a disk connected to smartpqi card; 3. Both kexec's 1st and 2nd kernels use this disk as Rootfs' mount point. Signed-off-by: Yanjiang Jin <yanjiang.jin@hxt-semitech.com> Acked-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26scsi: smartpqi: correct lun reset issuesKevin Barnett
[ Upstream commit 2ba55c9851d74eb015a554ef69ddf2ef061d5780 ] Problem: The Linux kernel takes a logical volume offline after a LUN reset. This is generally accompanied by this message in the dmesg output: Device offlined - not ready after error recovery Root Cause: The root cause is a "quirk" in the timeout handling in the Linux SCSI layer. The Linux kernel places a 30-second timeout on most media access commands (reads and writes) that it send to device drivers. When a media access command times out, the Linux kernel goes into error recovery mode for the LUN that was the target of the command that timed out. Every command that timed out is kept on a list inside of the Linux kernel to be retried later. The kernel attempts to recover the command(s) that timed out by issuing a LUN reset followed by a TEST UNIT READY. If the LUN reset and TEST UNIT READY commands are successful, the kernel retries the command(s) that timed out. Each SCSI command issued by the kernel has a result field associated with it. This field indicates the final result of the command (success or error). When a command times out, the kernel places a value in this result field indicating that the command timed out. The "quirk" is that after the LUN reset and TEST UNIT READY commands are completed, the kernel checks each command on the timed-out command list before retrying it. If the result field is still "timed out", the kernel treats that command as not having been successfully recovered for a retry. If the number of commands that are in this state are greater than two, the kernel takes the LUN offline. Fix: When our RAIDStack receives a LUN reset, it simply waits until all outstanding commands complete. Generally, all of these outstanding commands complete successfully. Therefore, the fix in the smartpqi driver is to always set the command result field to indicate success when a request completes successfully. This normally isn’t necessary because the result field is always initialized to success when the command is submitted to the driver. So when the command completes successfully, the result field is left untouched. But in this case, the kernel changes the result field behind the driver’s back and then expects the field to be changed by the driver as the commands that timed-out complete. Reviewed-by: Dave Carroll <david.carroll@microsemi.com> Reviewed-by: Scott Teel <scott.teel@microsemi.com> Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26IB/usnic: Fix potential deadlockParvi Kaustubhi
[ Upstream commit 8036e90f92aae2784b855a0007ae2d8154d28b3c ] Acquiring the rtnl lock while holding usdev_lock could result in a deadlock. For example: usnic_ib_query_port() | mutex_lock(&us_ibdev->usdev_lock) | ib_get_eth_speed() | rtnl_lock() rtnl_lock() | usnic_ib_netdevice_event() | mutex_lock(&us_ibdev->usdev_lock) This commit moves the usdev_lock acquisition after the rtnl lock has been released. This is safe to do because usdev_lock is not protecting anything being accessed in ib_get_eth_speed(). Hence, the correct order of holding locks (rtnl -> usdev_lock) is not violated. Signed-off-by: Parvi Kaustubhi <pkaustub@cisco.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26sysfs: Disable lockdep for driver bind/unbind filesDaniel Vetter
[ Upstream commit 4f4b374332ec0ae9c738ff8ec9bed5cd97ff9adc ] This is the much more correct fix for my earlier attempt at: https://lkml.org/lkml/2018/12/10/118 Short recap: - There's not actually a locking issue, it's just lockdep being a bit too eager to complain about a possible deadlock. - Contrary to what I claimed the real problem is recursion on kn->count. Greg pointed me at sysfs_break_active_protection(), used by the scsi subsystem to allow a sysfs file to unbind itself. That would be a real deadlock, which isn't what's happening here. Also, breaking the active protection means we'd need to manually handle all the lifetime fun. - With Rafael we discussed the task_work approach, which kinda works, but has two downsides: It's a functional change for a lockdep annotation issue, and it won't work for the bind file (which needs to get the errno from the driver load function back to userspace). - Greg also asked why this never showed up: To hit this you need to unregister a 2nd driver from the unload code of your first driver. I guess only gpus do that. The bug has always been there, but only with a recent patch series did we add more locks so that lockdep built a chain from unbinding the snd-hda driver to the acpi_video_unregister call. Full lockdep splat: [12301.898799] ============================================ [12301.898805] WARNING: possible recursive locking detected [12301.898811] 4.20.0-rc7+ #84 Not tainted [12301.898815] -------------------------------------------- [12301.898821] bash/5297 is trying to acquire lock: [12301.898826] 00000000f61c6093 (kn->count#39){++++}, at: kernfs_remove_by_name_ns+0x3b/0x80 [12301.898841] but task is already holding lock: [12301.898847] 000000005f634021 (kn->count#39){++++}, at: kernfs_fop_write+0xdc/0x190 [12301.898856] other info that might help us debug this: [12301.898862] Possible unsafe locking scenario: [12301.898867] CPU0 [12301.898870] ---- [12301.898874] lock(kn->count#39); [12301.898879] lock(kn->count#39); [12301.898883] *** DEADLOCK *** [12301.898891] May be due to missing lock nesting notation [12301.898899] 5 locks held by bash/5297: [12301.898903] #0: 00000000cd800e54 (sb_writers#4){.+.+}, at: vfs_write+0x17f/0x1b0 [12301.898915] #1: 000000000465e7c2 (&of->mutex){+.+.}, at: kernfs_fop_write+0xd3/0x190 [12301.898925] #2: 000000005f634021 (kn->count#39){++++}, at: kernfs_fop_write+0xdc/0x190 [12301.898936] #3: 00000000414ef7ac (&dev->mutex){....}, at: device_release_driver_internal+0x34/0x240 [12301.898950] #4: 000000003218fbdf (register_count_mutex){+.+.}, at: acpi_video_unregister+0xe/0x40 [12301.898960] stack backtrace: [12301.898968] CPU: 1 PID: 5297 Comm: bash Not tainted 4.20.0-rc7+ #84 [12301.898974] Hardware name: Hewlett-Packard HP EliteBook 8460p/161C, BIOS 68SCF Ver. F.01 03/11/2011 [12301.898982] Call Trace: [12301.898989] dump_stack+0x67/0x9b [12301.898997] __lock_acquire+0x6ad/0x1410 [12301.899003] ? kernfs_remove_by_name_ns+0x3b/0x80 [12301.899010] ? find_held_lock+0x2d/0x90 [12301.899017] ? mutex_spin_on_owner+0xe4/0x150 [12301.899023] ? find_held_lock+0x2d/0x90 [12301.899030] ? lock_acquire+0x90/0x180 [12301.899036] lock_acquire+0x90/0x180 [12301.899042] ? kernfs_remove_by_name_ns+0x3b/0x80 [12301.899049] __kernfs_remove+0x296/0x310 [12301.899055] ? kernfs_remove_by_name_ns+0x3b/0x80 [12301.899060] ? kernfs_name_hash+0xd/0x80 [12301.899066] ? kernfs_find_ns+0x6c/0x100 [12301.899073] kernfs_remove_by_name_ns+0x3b/0x80 [12301.899080] bus_remove_driver+0x92/0xa0 [12301.899085] acpi_video_unregister+0x24/0x40 [12301.899127] i915_driver_unload+0x42/0x130 [i915] [12301.899160] i915_pci_remove+0x19/0x30 [i915] [12301.899169] pci_device_remove+0x36/0xb0 [12301.899176] device_release_driver_internal+0x185/0x240 [12301.899183] unbind_store+0xaf/0x180 [12301.899189] kernfs_fop_write+0x104/0x190 [12301.899195] __vfs_write+0x31/0x180 [12301.899203] ? rcu_read_lock_sched_held+0x6f/0x80 [12301.899209] ? rcu_sync_lockdep_assert+0x29/0x50 [12301.899216] ? __sb_start_write+0x13c/0x1a0 [12301.899221] ? vfs_write+0x17f/0x1b0 [12301.899227] vfs_write+0xb9/0x1b0 [12301.899233] ksys_write+0x50/0xc0 [12301.899239] do_syscall_64+0x4b/0x180 [12301.899247] entry_SYSCALL_64_after_hwframe+0x49/0xbe [12301.899253] RIP: 0033:0x7f452ac7f7a4 [12301.899259] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 80 00 00 00 00 8b 05 aa f0 2c 00 48 63 ff 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 f3 c3 66 90 55 53 48 89 d5 48 89 f3 48 83 [12301.899273] RSP: 002b:00007ffceafa6918 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [12301.899282] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007f452ac7f7a4 [12301.899288] RDX: 000000000000000d RSI: 00005612a1abf7c0 RDI: 0000000000000001 [12301.899295] RBP: 00005612a1abf7c0 R08: 000000000000000a R09: 00005612a1c46730 [12301.899301] R10: 000000000000000a R11: 0000000000000246 R12: 000000000000000d [12301.899308] R13: 0000000000000001 R14: 00007f452af4a740 R15: 000000000000000d Looking around I've noticed that usb and i2c already handle similar recursion problems, where a sysfs file can unbind the same type of sysfs somewhere else in the hierarchy. Relevant commits are: commit 356c05d58af05d582e634b54b40050c73609617b Author: Alan Stern <stern@rowland.harvard.edu> Date: Mon May 14 13:30:03 2012 -0400 sysfs: get rid of some lockdep false positives commit e9b526fe704812364bca07edd15eadeba163ebfb Author: Alexander Sverdlin <alexander.sverdlin@nsn.com> Date: Fri May 17 14:56:35 2013 +0200 i2c: suppress lockdep warning on delete_device Implement the same trick for driver bind/unbind. v2: Put the macro into bus.c (Greg). Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Ramalingam C <ramalingam.c@intel.com> Cc: Arend van Spriel <aspriel@gmail.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Bartosz Golaszewski <brgl@bgdev.pl> Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com> Cc: Vivek Gautam <vivek.gautam@codeaurora.org> Cc: Joe Perches <joe@perches.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26ALSA: bebob: fix model-id of unit for Apogee EnsembleTakashi Sakamoto
[ Upstream commit 644b2e97405b0b74845e1d3c2b4fe4c34858062b ] This commit fixes hard-coded model-id for an unit of Apogee Ensemble with a correct value. This unit uses DM1500 ASIC produced ArchWave AG (formerly known as BridgeCo AG). I note that this model supports three modes in the number of data channels in tx/rx streams; 8 ch pairs, 10 ch pairs, 18 ch pairs. The mode is switched by Vendor-dependent AV/C command, like: $ cd linux-firewire-utils $ ./firewire-request /dev/fw1 fcp 0x00ff000003dbeb0600000000 (8ch pairs) $ ./firewire-request /dev/fw1 fcp 0x00ff000003dbeb0601000000 (10ch pairs) $ ./firewire-request /dev/fw1 fcp 0x00ff000003dbeb0602000000 (18ch pairs) When switching between different mode, the unit disappears from IEEE 1394 bus, then appears on the bus with different combination of stream formats. In a mode of 18 ch pairs, available sampling rate is up to 96.0 kHz, else up to 192.0 kHz. $ ./hinawa-config-rom-printer /dev/fw1 { 'bus-info': { 'adj': False, 'bmc': True, 'chip_ID': 21474898341, 'cmc': True, 'cyc_clk_acc': 100, 'generation': 2, 'imc': True, 'isc': True, 'link_spd': 2, 'max_ROM': 1, 'max_rec': 512, 'name': '1394', 'node_vendor_ID': 987, 'pmc': False}, 'root-directory': [ ['HARDWARE_VERSION', 19], [ 'NODE_CAPABILITIES', { 'addressing': {'64': True, 'fix': True, 'prv': False}, 'misc': {'int': False, 'ms': False, 'spt': True}, 'state': { 'atn': False, 'ded': False, 'drq': True, 'elo': False, 'init': False, 'lst': True, 'off': False}, 'testing': {'bas': False, 'ext': False}}], ['VENDOR', 987], ['DESCRIPTOR', 'Apogee Electronics'], ['MODEL', 126702], ['DESCRIPTOR', 'Ensemble'], ['VERSION', 5297], [ 'UNIT', [ ['SPECIFIER_ID', 41005], ['VERSION', 65537], ['MODEL', 126702], ['DESCRIPTOR', 'Ensemble']]], [ 'DEPENDENT_INFO', [ ['SPECIFIER_ID', 2037], ['VERSION', 1], [(58, 'IMMEDIATE'), 16777159], [(59, 'IMMEDIATE'), 1048576], [(60, 'IMMEDIATE'), 16777159], [(61, 'IMMEDIATE'), 6291456]]]]} Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26clocksource/drivers/integrator-ap: Add missing of_node_put()Yangtao Li
[ Upstream commit 5eb73c831171115d3b4347e1e7124a5a35d8086c ] The function of_find_node_by_path() acquires a reference to the node returned by it and that reference needs to be dropped by its caller. integrator_ap_timer_init_of() doesn't do that. The pri_node and the sec_node are used as an identifier to compare against the current node, so we can directly drop the refcount after getting the node from the path as it is not used as pointer. By dropping the refcount right after getting it, a single variable is needed instead of two. Fix this by use a single variable and drop the refcount right after of_find_node_by_path(). Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26quota: Lock s_umount in exclusive mode for Q_XQUOTA{ON,OFF} quotactls.Javier Barrio
[ Upstream commit 41c4f85cdac280d356df1f483000ecec4a8868be ] Commit 1fa5efe3622db58cb8c7b9a50665e9eb9a6c7e97 (ext4: Use generic helpers for quotaon and quotaoff) made possible to call quotactl(Q_XQUOTAON/OFF) on ext4 filesystems with sysfile quota support. This leads to calling dquot_enable/disable without s_umount held in excl. mode, because quotactl_cmd_onoff checks only for Q_QUOTAON/OFF. The following WARN_ON_ONCE triggers (in this case for dquot_enable, ext4, latest Linus' tree): [ 117.807056] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: quota,prjquota [...] [ 155.036847] WARNING: CPU: 0 PID: 2343 at fs/quota/dquot.c:2469 dquot_enable+0x34/0xb9 [ 155.036851] Modules linked in: quota_v2 quota_tree ipv6 af_packet joydev mousedev psmouse serio_raw pcspkr i2c_piix4 intel_agp intel_gtt e1000 ttm drm_kms_helper drm agpgart fb_sys_fops syscopyarea sysfillrect sysimgblt i2c_core input_leds kvm_intel kvm irqbypass qemu_fw_cfg floppy evdev parport_pc parport button crc32c_generic dm_mod ata_generic pata_acpi ata_piix libata loop ext4 crc16 mbcache jbd2 usb_storage usbcore sd_mod scsi_mod [ 155.036901] CPU: 0 PID: 2343 Comm: qctl Not tainted 4.20.0-rc6-00025-gf5d582777bcb #9 [ 155.036903] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 155.036911] RIP: 0010:dquot_enable+0x34/0xb9 [ 155.036915] Code: 41 56 41 55 41 54 55 53 4c 8b 6f 28 74 02 0f 0b 4d 8d 7d 70 49 89 fc 89 cb 41 89 d6 89 f5 4c 89 ff e8 23 09 ea ff 85 c0 74 0a <0f> 0b 4c 89 ff e8 8b 09 ea ff 85 db 74 6a 41 8b b5 f8 00 00 00 0f [ 155.036918] RSP: 0018:ffffb09b00493e08 EFLAGS: 00010202 [ 155.036922] RAX: 0000000000000001 RBX: 0000000000000008 RCX: 0000000000000008 [ 155.036924] RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff9781b67cd870 [ 155.036926] RBP: 0000000000000002 R08: 0000000000000000 R09: 61c8864680b583eb [ 155.036929] R10: ffffb09b00493e48 R11: ffffffffff7ce7d4 R12: ffff9781b7ee8d78 [ 155.036932] R13: ffff9781b67cd800 R14: 0000000000000004 R15: ffff9781b67cd870 [ 155.036936] FS: 00007fd813250b88(0000) GS:ffff9781ba000000(0000) knlGS:0000000000000000 [ 155.036939] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 155.036942] CR2: 00007fd812ff61d6 CR3: 000000007c882000 CR4: 00000000000006b0 [ 155.036951] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 155.036953] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 155.036955] Call Trace: [ 155.037004] dquot_quota_enable+0x8b/0xd0 [ 155.037011] kernel_quotactl+0x628/0x74e [ 155.037027] ? do_mprotect_pkey+0x2a6/0x2cd [ 155.037034] __x64_sys_quotactl+0x1a/0x1d [ 155.037041] do_syscall_64+0x55/0xe4 [ 155.037078] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 155.037105] RIP: 0033:0x7fd812fe1198 [ 155.037109] Code: 02 77 0d 48 89 c1 48 c1 e9 3f 75 04 48 8b 04 24 48 83 c4 50 5b c3 48 83 ec 08 49 89 ca 48 63 d2 48 63 ff b8 b3 00 00 00 0f 05 <48> 89 c7 e8 c1 eb ff ff 5a c3 48 63 ff b8 bb 00 00 00 0f 05 48 89 [ 155.037112] RSP: 002b:00007ffe8cd7b050 EFLAGS: 00000206 ORIG_RAX: 00000000000000b3 [ 155.037116] RAX: ffffffffffffffda RBX: 00007ffe8cd7b148 RCX: 00007fd812fe1198 [ 155.037119] RDX: 0000000000000000 RSI: 00007ffe8cd7cea9 RDI: 0000000000580102 [ 155.037121] RBP: 00007ffe8cd7b0f0 R08: 000055fc8eba8a9d R09: 0000000000000000 [ 155.037124] R10: 00007ffe8cd7b074 R11: 0000000000000206 R12: 00007ffe8cd7b168 [ 155.037126] R13: 000055fc8eba8897 R14: 0000000000000000 R15: 0000000000000000 [ 155.037131] ---[ end trace 210f864257175c51 ]--- and then the syscall proceeds without s_umount locking. This patch locks the superblock ->s_umount sem. in exclusive mode for all Q_XQUOTAON/OFF quotactls too in addition to Q_QUOTAON/OFF. AFAICT, other than ext4, only xfs and ocfs2 are affected by this change. The VFS will now call in xfs_quota_* functions with s_umount held, which wasn't the case before. This looks good to me but I can not say for sure. Ext4 and ocfs2 where already beeing called with s_umount exclusive via quota_quotaon/off which is basically the same. Signed-off-by: Javier Barrio <javier.barrio.mart@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26dm snapshot: Fix excessive memory usage and workqueue stallsNikos Tsironis
[ Upstream commit 721b1d98fb517ae99ab3b757021cf81db41e67be ] kcopyd has no upper limit to the number of jobs one can allocate and issue. Under certain workloads this can lead to excessive memory usage and workqueue stalls. For example, when creating multiple dm-snapshot targets with a 4K chunk size and then writing to the origin through the page cache. Syncing the page cache causes a large number of BIOs to be issued to the dm-snapshot origin target, which itself issues an even larger (because of the BIO splitting taking place) number of kcopyd jobs. Running the following test, from the device mapper test suite [1], dmtest run --suite snapshot -n many_snapshots_of_same_volume_N , with 8 active snapshots, results in the kcopyd job slab cache growing to 10G. Depending on the available system RAM this can lead to the OOM killer killing user processes: [463.492878] kthreadd invoked oom-killer: gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null), order=1, oom_score_adj=0 [463.492894] kthreadd cpuset=/ mems_allowed=0 [463.492948] CPU: 7 PID: 2 Comm: kthreadd Not tainted 4.19.0-rc7 #3 [463.492950] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [463.492952] Call Trace: [463.492964] dump_stack+0x7d/0xbb [463.492973] dump_header+0x6b/0x2fc [463.492987] ? lockdep_hardirqs_on+0xee/0x190 [463.493012] oom_kill_process+0x302/0x370 [463.493021] out_of_memory+0x113/0x560 [463.493030] __alloc_pages_slowpath+0xf40/0x1020 [463.493055] __alloc_pages_nodemask+0x348/0x3c0 [463.493067] cache_grow_begin+0x81/0x8b0 [463.493072] ? cache_grow_begin+0x874/0x8b0 [463.493078] fallback_alloc+0x1e4/0x280 [463.493092] kmem_cache_alloc_node+0xd6/0x370 [463.493098] ? copy_process.part.31+0x1c5/0x20d0 [463.493105] copy_process.part.31+0x1c5/0x20d0 [463.493115] ? __lock_acquire+0x3cc/0x1550 [463.493121] ? __switch_to_asm+0x34/0x70 [463.493129] ? kthread_create_worker_on_cpu+0x70/0x70 [463.493135] ? finish_task_switch+0x90/0x280 [463.493165] _do_fork+0xe0/0x6d0 [463.493191] ? kthreadd+0x19f/0x220 [463.493233] kernel_thread+0x25/0x30 [463.493235] kthreadd+0x1bf/0x220 [463.493242] ? kthread_create_on_cpu+0x90/0x90 [463.493248] ret_from_fork+0x3a/0x50 [463.493279] Mem-Info: [463.493285] active_anon:20631 inactive_anon:4831 isolated_anon:0 [463.493285] active_file:80216 inactive_file:80107 isolated_file:435 [463.493285] unevictable:0 dirty:51266 writeback:109372 unstable:0 [463.493285] slab_reclaimable:31191 slab_unreclaimable:3483521 [463.493285] mapped:526 shmem:4903 pagetables:1759 bounce:0 [463.493285] free:33623 free_pcp:2392 free_cma:0 ... [463.493489] Unreclaimable slab info: [463.493513] Name Used Total [463.493522] bio-6 1028KB 1028KB [463.493525] bio-5 1028KB 1028KB [463.493528] dm_snap_pending_exception 236783KB 243789KB [463.493531] dm_exception 41KB 42KB [463.493534] bio-4 1216KB 1216KB [463.493537] bio-3 439396KB 439396KB [463.493539] kcopyd_job 6973427KB 6973427KB ... [463.494340] Out of memory: Kill process 1298 (ruby2.3) score 1 or sacrifice child [463.494673] Killed process 1298 (ruby2.3) total-vm:435740kB, anon-rss:20180kB, file-rss:4kB, shmem-rss:0kB [463.506437] oom_reaper: reaped process 1298 (ruby2.3), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB Moreover, issuing a large number of kcopyd jobs results in kcopyd hogging the CPU, while processing them. As a result, processing of work items, queued for execution on the same CPU as the currently running kcopyd thread, is stalled for long periods of time, hurting performance. Running the aforementioned test we get, in dmesg, messages like the following: [67501.194592] BUG: workqueue lockup - pool cpus=4 node=0 flags=0x0 nice=0 stuck for 27s! [67501.195586] Showing busy workqueues and worker pools: [67501.195591] workqueue events: flags=0x0 [67501.195597] pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256 [67501.195611] pending: cache_reap [67501.195641] workqueue mm_percpu_wq: flags=0x8 [67501.195645] pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256 [67501.195656] pending: vmstat_update [67501.195682] workqueue kblockd: flags=0x18 [67501.195687] pwq 5: cpus=2 node=0 flags=0x0 nice=-20 active=1/256 [67501.195698] pending: blk_timeout_work [67501.195753] workqueue kcopyd: flags=0x8 [67501.195757] pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256 [67501.195768] pending: do_work [dm_mod] [67501.195802] workqueue kcopyd: flags=0x8 [67501.195806] pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256 [67501.195817] pending: do_work [dm_mod] [67501.195834] workqueue kcopyd: flags=0x8 [67501.195838] pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256 [67501.195848] pending: do_work [dm_mod] [67501.195881] workqueue kcopyd: flags=0x8 [67501.195885] pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256 [67501.195896] pending: do_work [dm_mod] [67501.195920] workqueue kcopyd: flags=0x8 [67501.195924] pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=2/256 [67501.195935] in-flight: 67:do_work [dm_mod] [67501.195945] pending: do_work [dm_mod] [67501.195961] pool 8: cpus=4 node=0 flags=0x0 nice=0 hung=27s workers=3 idle: 129 23765 The root cause for these issues is the way dm-snapshot uses kcopyd. In particular, the lack of an explicit or implicit limit to the maximum number of in-flight COW jobs. The merging path is not affected because it implicitly limits the in-flight kcopyd jobs to one. Fix these issues by using a semaphore to limit the maximum number of in-flight kcopyd jobs. We grab the semaphore before allocating a new kcopyd job in start_copy() and start_full_bio() and release it after the job finishes in copy_callback(). The initial semaphore value is configurable through a module parameter, to allow fine tuning the maximum number of in-flight COW jobs. Setting this parameter to zero initializes the semaphore to INT_MAX. A default value of 2048 maximum in-flight kcopyd jobs was chosen. This value was decided experimentally as a trade-off between memory consumption, stalling the kernel's workqueues and maintaining a high enough throughput. Re-running the aforementioned test: * Workqueue stalls are eliminated * kcopyd's job slab cache uses a maximum of 130MB * The time taken by the test to write to the snapshot-origin target is reduced from 05m20.48s to 03m26.38s [1] https://github.com/jthornber/device-mapper-test-suite Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com> Signed-off-by: Ilias Tsitsimpis <iliastsi@arrikto.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26tools lib subcmd: Don't add the kernel sources to the include pathArnaldo Carvalho de Melo
[ Upstream commit ece9804985b57e1ccd83b1fb6288520955a29d51 ] At some point we decided not to directly include kernel sources files when building tools/perf/, but when tools/lib/subcmd/ was forked from tools/perf it somehow ended up adding it via these two lines in its Makefile: CFLAGS += -I$(srctree)/include/uapi CFLAGS += -I$(srctree)/include As $(srctree) points to the kernel sources. Removing those lines and keeping just: CFLAGS += -I$(srctree)/tools/include/ Is enough to build tools/perf and tools/objtool. This fixes the build when building from the sources in environments such as the Android NDK crossbuilding from a fedora:26 system: subcmd-util.h:11:15: error: expected ',' or ';' before 'void' static inline void report(const char *prefix, const char *err, va_list params) ^ In file included from /git/perf/include/uapi/linux/stddef.h:2:0, from /git/perf/include/uapi/linux/posix_types.h:5, from /opt/android-ndk-r12b/platforms/android-24/arch-arm/usr/include/sys/types.h:36, from /opt/android-ndk-r12b/platforms/android-24/arch-arm/usr/include/unistd.h:33, from run-command.c:2: subcmd-util.h:18:17: error: '__no_instrument_function__' attribute applies only to functions The /opt/android-ndk-r12b/platforms/android-24/arch-arm/usr/include/sys/types.h file that includes linux/posix_types.h ends up getting the one in the kernel sources causing the breakage. Fix it. Test built tools/objtool/ too. Reported-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Fixes: 4b6ab94eabe4 ("perf subcmd: Create subcmd library") Link: https://lkml.kernel.org/n/tip-5lhaoecrj12t0bqwvpiu14sm@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26dm kcopyd: Fix bug causing workqueue stallsNikos Tsironis
[ Upstream commit d7e6b8dfc7bcb3f4f3a18313581f67486a725b52 ] When using kcopyd to run callbacks through dm_kcopyd_do_callback() or submitting copy jobs with a source size of 0, the jobs are pushed directly to the complete_jobs list, which could be under processing by the kcopyd thread. As a result, the kcopyd thread can continue running completed jobs indefinitely, without releasing the CPU, as long as someone keeps submitting new completed jobs through the aforementioned paths. Processing of work items, queued for execution on the same CPU as the currently running kcopyd thread, is thus stalled for excessive amounts of time, hurting performance. Running the following test, from the device mapper test suite [1], dmtest run --suite snapshot -n parallel_io_to_many_snaps_N , with 8 active snapshots, we get, in dmesg, messages like the following: [68899.948523] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 95s! [68899.949282] Showing busy workqueues and worker pools: [68899.949288] workqueue events: flags=0x0 [68899.949295] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256 [68899.949306] pending: vmstat_shepherd, cache_reap [68899.949331] workqueue mm_percpu_wq: flags=0x8 [68899.949337] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 [68899.949345] pending: vmstat_update [68899.949387] workqueue dm_bufio_cache: flags=0x8 [68899.949392] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256 [68899.949400] pending: work_fn [dm_bufio] [68899.949423] workqueue kcopyd: flags=0x8 [68899.949429] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 [68899.949437] pending: do_work [dm_mod] [68899.949452] workqueue kcopyd: flags=0x8 [68899.949458] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256 [68899.949466] in-flight: 13:do_work [dm_mod] [68899.949474] pending: do_work [dm_mod] [68899.949487] workqueue kcopyd: flags=0x8 [68899.949493] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 [68899.949501] pending: do_work [dm_mod] [68899.949515] workqueue kcopyd: flags=0x8 [68899.949521] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 [68899.949529] pending: do_work [dm_mod] [68899.949541] workqueue kcopyd: flags=0x8 [68899.949547] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 [68899.949555] pending: do_work [dm_mod] [68899.949568] pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=95s workers=4 idle: 27130 27223 1084 Fix this by splitting the complete_jobs list into two parts: A user facing part, named callback_jobs, and one used internally by kcopyd, retaining the name complete_jobs. dm_kcopyd_do_callback() and dispatch_job() now push their jobs to the callback_jobs list, which is spliced to the complete_jobs list once, every time the kcopyd thread wakes up. This prevents kcopyd from hogging the CPU indefinitely and causing workqueue stalls. Re-running the aforementioned test: * Workqueue stalls are eliminated * The maximum writing time among all targets is reduced from 09m37.10s to 06m04.85s and the total run time of the test is reduced from 10m43.591s to 7m19.199s [1] https://github.com/jthornber/device-mapper-test-suite Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com> Signed-off-by: Ilias Tsitsimpis <iliastsi@arrikto.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26dm crypt: use u64 instead of sector_t to store iv_offsetAliOS system security
[ Upstream commit 8d683dcd65c037efc9fb38c696ec9b65b306e573 ] The iv_offset in the mapping table of crypt target is a 64bit number when IV algorithm is plain64, plain64be, essiv or benbi. It will be assigned to iv_offset of struct crypt_config, cc_sector of struct convert_context and iv_sector of struct dm_crypt_request. These structures members are defined as a sector_t. But sector_t is 32bit when CONFIG_LBDAF is not set in 32bit kernel. In this situation sector_t is not big enough to store the 64bit iv_offset. Here is a reproducer. Prepare test image and device (loop is automatically allocated by cryptsetup): # dd if=/dev/zero of=tst.img bs=1M count=1 # echo "tst"|cryptsetup open --type plain -c aes-xts-plain64 \ --skip 500000000000000000 tst.img test On 32bit system (use IV offset value that overflows to 64bit; CONFIG_LBDAF if off) and device checksum is wrong: # dmsetup table test --showkeys 0 2048 crypt aes-xts-plain64 dfa7cfe3c481f2239155739c42e539ae8f2d38f304dcc89d20b26f69daaf0933 3551657984 7:0 0 # sha256sum /dev/mapper/test 533e25c09176632b3794f35303488c4a8f3f965dffffa6ec2df347c168cb6c19 /dev/mapper/test On 64bit system (and on 32bit system with the patch), table and checksum is now correct: # dmsetup table test --showkeys 0 2048 crypt aes-xts-plain64 dfa7cfe3c481f2239155739c42e539ae8f2d38f304dcc89d20b26f69daaf0933 500000000000000000 7:0 0 # sha256sum /dev/mapper/test 5d16160f9d5f8c33d8051e65fdb4f003cc31cd652b5abb08f03aa6fce0df75fc /dev/mapper/test Signed-off-by: AliOS system security <alios_sys_security@linux.alibaba.com> Tested-and-Reviewed-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26netfilter: ipt_CLUSTERIP: check MAC address when duplicate config is setTaehee Yoo
[ Upstream commit 06aa151ad1fc74a49b45336672515774a678d78d ] If same destination IP address config is already existing, that config is just used. MAC address also should be same. However, there is no MAC address checking routine. So that MAC address checking routine is added. test commands: %iptables -A INPUT -p tcp -i lo -d --dport 80 \ -j CLUSTERIP --new --hashmode sourceip \ --clustermac 01:00:5e:00:00:20 --total-nodes 2 --local-node 1 %iptables -A INPUT -p tcp -i lo -d --dport 80 \ -j CLUSTERIP --new --hashmode sourceip \ --clustermac 01:00:5e:00:00:21 --total-nodes 2 --local-node 1 After this patch, above commands are disallowed. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26perf parse-events: Fix unchecked usage of strncpy()Arnaldo Carvalho de Melo
[ Upstream commit bd8d57fb7e25e9fcf67a9eef5fa13aabe2016e07 ] The strncpy() function may leave the destination string buffer unterminated, better use strlcpy() that we have a __weak fallback implementation for systems without it. This fixes this warning on an Alpine Linux Edge system with gcc 8.2: util/parse-events.c: In function 'print_symbol_events': util/parse-events.c:2465:4: error: 'strncpy' specified bound 100 equals destination size [-Werror=stringop-truncation] strncpy(name, syms->symbol, MAX_NAME_LEN); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In function 'print_symbol_events.constprop', inlined from 'print_events' at util/parse-events.c:2508:2: util/parse-events.c:2465:4: error: 'strncpy' specified bound 100 equals destination size [-Werror=stringop-truncation] strncpy(name, syms->symbol, MAX_NAME_LEN); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In function 'print_symbol_events.constprop', inlined from 'print_events' at util/parse-events.c:2511:2: util/parse-events.c:2465:4: error: 'strncpy' specified bound 100 equals destination size [-Werror=stringop-truncation] strncpy(name, syms->symbol, MAX_NAME_LEN); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Fixes: 947b4ad1d198 ("perf list: Fix max event string size") Link: https://lkml.kernel.org/n/tip-b663e33bm6x8hrkie4uxh7u2@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26perf svghelper: Fix unchecked usage of strncpy()Arnaldo Carvalho de Melo
[ Upstream commit 2f5302533f306d5ee87bd375aef9ca35b91762cb ] The strncpy() function may leave the destination string buffer unterminated, better use strlcpy() that we have a __weak fallback implementation for systems without it. In this specific case this would only happen if fgets() was buggy, as its man page states that it should read one less byte than the size of the destination buffer, so that it can put the nul byte at the end of it, so it would never copy 255 non-nul chars, as fgets reads into the orig buffer at most 254 non-nul chars and terminates it. But lets just switch to strlcpy to keep the original intent and silence the gcc 8.2 warning. This fixes this warning on an Alpine Linux Edge system with gcc 8.2: In function 'cpu_model', inlined from 'svg_cpu_box' at util/svghelper.c:378:2: util/svghelper.c:337:5: error: 'strncpy' output may be truncated copying 255 bytes from a string of length 255 [-Werror=stringop-truncation] strncpy(cpu_m, &buf[13], 255); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Fixes: f48d55ce7871 ("perf: Add a SVG helper library file") Link: https://lkml.kernel.org/n/tip-xzkoo0gyr56gej39ltivuh9g@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26perf intel-pt: Fix error with config term "pt=0"Adrian Hunter
[ Upstream commit 1c6f709b9f96366cc47af23c05ecec9b8c0c392d ] Users should never use 'pt=0', but if they do it may give a meaningless error: $ perf record -e intel_pt/pt=0/u uname Error: The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (intel_pt/pt=0/u). Fix that by forcing 'pt=1'. Committer testing: # perf record -e intel_pt/pt=0/u uname Error: The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (intel_pt/pt=0/u). /bin/dmesg | grep -i perf may provide additional information. # perf record -e intel_pt/pt=0/u uname pt=0 doesn't make sense, forcing pt=1 Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.020 MB perf.data ] # Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/b7c5b4e5-9497-10e5-fd43-5f3e4a0fe51d@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26tty/serial: do not free trasnmit buffer page under port lockSergey Senozhatsky
[ Upstream commit d72402145ace0697a6a9e8e75a3de5bf3375f78d ] LKP has hit yet another circular locking dependency between uart console drivers and debugobjects [1]: CPU0 CPU1 rhltable_init() __init_work() debug_object_init uart_shutdown() /* db->lock */ /* uart_port->lock */ debug_print_object() free_page() printk() call_console_drivers() debug_check_no_obj_freed() /* uart_port->lock */ /* db->lock */ debug_print_object() So there are two dependency chains: uart_port->lock -> db->lock And db->lock -> uart_port->lock This particular circular locking dependency can be addressed in several ways: a) One way would be to move debug_print_object() out of db->lock scope and, thus, break the db->lock -> uart_port->lock chain. b) Another one would be to free() transmit buffer page out of db->lock in UART code; which is what this patch does. It makes sense to apply a) and b) independently: there are too many things going on behind free(), none of which depend on uart_port->lock. The patch fixes transmit buffer page free() in uart_shutdown() and, additionally, in uart_port_startup() (as was suggested by Dmitry Safonov). [1] https://lore.kernel.org/lkml/20181211091154.GL23332@shao2-debian/T/#u Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Slaby <jslaby@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Waiman Long <longman@redhat.com> Cc: Dmitry Safonov <dima@arista.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26btrfs: improve error handling of btrfs_add_linkJohannes Thumshirn
[ Upstream commit 1690dd41e0cb1dade80850ed8a3eb0121b96d22f ] In the error handling block, err holds the return value of either btrfs_del_root_ref() or btrfs_del_inode_ref() but it hasn't been checked since it's introduction with commit fe66a05a0679 (Btrfs: improve error handling for btrfs_insert_dir_item callers) in 2012. If the error handling in the error handling fails, there's not much left to do and the abort either happened earlier in the callees or is necessary here. So if one of btrfs_del_root_ref() or btrfs_del_inode_ref() failed, abort the transaction, but still return the original code of the failure stored in 'ret' as this will be reported to the user. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26mmc: atmel-mci: do not assume idle after atmci_request_endJonas Danielsson
[ Upstream commit ae460c115b7aa50c9a36cf78fced07b27962c9d0 ] On our AT91SAM9260 board we use the same sdio bus for wifi and for the sd card slot. This caused the atmel-mci to give the following splat on the serial console: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 538 at drivers/mmc/host/atmel-mci.c:859 atmci_send_command+0x24/0x44 Modules linked in: CPU: 0 PID: 538 Comm: mmcqd/0 Not tainted 4.14.76 #14 Hardware name: Atmel AT91SAM9 [<c000fccc>] (unwind_backtrace) from [<c000d3dc>] (show_stack+0x10/0x14) [<c000d3dc>] (show_stack) from [<c0017644>] (__warn+0xd8/0xf4) [<c0017644>] (__warn) from [<c0017704>] (warn_slowpath_null+0x1c/0x24) [<c0017704>] (warn_slowpath_null) from [<c033bb9c>] (atmci_send_command+0x24/0x44) [<c033bb9c>] (atmci_send_command) from [<c033e984>] (atmci_start_request+0x1f4/0x2dc) [<c033e984>] (atmci_start_request) from [<c033f3b4>] (atmci_request+0xf0/0x164) [<c033f3b4>] (atmci_request) from [<c0327108>] (mmc_start_request+0x280/0x2d0) [<c0327108>] (mmc_start_request) from [<c032800c>] (mmc_start_areq+0x230/0x330) [<c032800c>] (mmc_start_areq) from [<c03366f8>] (mmc_blk_issue_rw_rq+0xc4/0x310) [<c03366f8>] (mmc_blk_issue_rw_rq) from [<c03372c4>] (mmc_blk_issue_rq+0x118/0x5ac) [<c03372c4>] (mmc_blk_issue_rq) from [<c033781c>] (mmc_queue_thread+0xc4/0x118) [<c033781c>] (mmc_queue_thread) from [<c002daf8>] (kthread+0x100/0x118) [<c002daf8>] (kthread) from [<c000a580>] (ret_from_fork+0x14/0x34) ---[ end trace 594371ddfa284bd6 ]--- This is: WARN_ON(host->cmd); This was fixed on our board by letting atmci_request_end determine what state we are in. Instead of unconditionally setting it to STATE_IDLE on STATE_END_REQUEST. Signed-off-by: Jonas Danielsson <jonas@orbital-systems.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26kconfig: fix memory leak when EOF is encountered in quotationMasahiro Yamada
[ Upstream commit fbac5977d81cb2b2b7e37b11c459055d9585273c ] An unterminated string literal followed by new line is passed to the parser (with "multi-line strings not supported" warning shown), then handled properly there. On the other hand, an unterminated string literal at end of file is never passed to the parser, then results in memory leak. [Test Code] ----------(Kconfig begin)---------- source "Kconfig.inc" config A bool "a" -----------(Kconfig end)----------- --------(Kconfig.inc begin)-------- config B bool "b\No new line at end of file ---------(Kconfig.inc end)--------- [Summary from Valgrind] Before the fix: LEAK SUMMARY: definitely lost: 16 bytes in 1 blocks ... After the fix: LEAK SUMMARY: definitely lost: 0 bytes in 0 blocks ... Eliminate the memory leak path by handling this case. Of course, such a Kconfig file is wrong already, so I will add an error message later. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26kconfig: fix file name and line number of warn_ignored_character()Masahiro Yamada
[ Upstream commit 77c1c0fa8b1477c5799bdad65026ea5ff676da44 ] Currently, warn_ignore_character() displays invalid file name and line number. The lexer should use current_file->name and yylineno, while the parser should use zconf_curname() and zconf_lineno(). This difference comes from that the lexer is always going ahead of the parser. The parser needs to look ahead one token to make a shift/reduce decision, so the lexer is requested to scan more text from the input file. This commit fixes the warning message from warn_ignored_character(). [Test Code] ----(Kconfig begin)---- / -----(Kconfig end)----- [Output] Before the fix: <none>:0:warning: ignoring unsupported character '/' After the fix: Kconfig:1:warning: ignoring unsupported character '/' Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26arm64: Fix minor issues with the dcache_by_line_op macroWill Deacon
[ Upstream commit 33309ecda0070506c49182530abe7728850ebe78 ] The dcache_by_line_op macro suffers from a couple of small problems: First, the GAS directives that are currently being used rely on assembler behavior that is not documented, and probably not guaranteed to produce the correct behavior going forward. As a result, we end up with some undefined symbols in cache.o: $ nm arch/arm64/mm/cache.o ... U civac ... U cvac U cvap U cvau This is due to the fact that the comparisons used to select the operation type in the dcache_by_line_op macro are comparing symbols not strings, and even though it seems that GAS is doing the right thing here (undefined symbols by the same name are equal to each other), it seems unwise to rely on this. Second, when patching in a DC CVAP instruction on CPUs that support it, the fallback path consists of a DC CVAU instruction which may be affected by CPU errata that require ARM64_WORKAROUND_CLEAN_CACHE. Solve these issues by unrolling the various maintenance routines and using the conditional directives that are documented as operating on strings. To avoid the complexity of nested alternatives, we move the DC CVAP patching to __clean_dcache_area_pop, falling back to a branch to __clean_dcache_area_poc if DCPOP is not supported by the CPU. Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Suggested-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26clk: imx6q: reset exclusive gates on initLucas Stach
[ Upstream commit f7542d817733f461258fd3a47d77da35b2d9fc81 ] The exclusive gates may be set up in the wrong way by software running before the clock driver comes up. In that case the exclusive setup is locked in its initial state, as the complementary function can't be activated without disabling the initial setup first. To avoid this lock situation, reset the exclusive gates to the off state and allow the kernel to provide the proper setup. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Dong Aisheng <Aisheng.dong@nxp.com> Signed-off-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26selftests: do not macro-expand failed assertion expressionsDmitry V. Levin
[ Upstream commit b708a3cc9600390ccaa2b68a88087dd265154b2b ] I've stumbled over the current macro-expand behaviour of the test harness: $ gcc -Wall -xc - <<'__EOF__' TEST(macro) { int status = 0; ASSERT_TRUE(WIFSIGNALED(status)); } TEST_HARNESS_MAIN __EOF__ $ ./a.out [==========] Running 1 tests from 1 test cases. [ RUN ] global.macro <stdin>:4:global.macro:Expected 0 (0) != (((signed char) (((status) & 0x7f) + 1) >> 1) > 0) (0) global.macro: Test terminated by assertion [ FAIL ] global.macro [==========] 0 / 1 tests passed. [ FAILED ] With this change the output of the same test looks much more comprehensible: [==========] Running 1 tests from 1 test cases. [ RUN ] global.macro <stdin>:4:global.macro:Expected 0 (0) != WIFSIGNALED(status) (0) global.macro: Test terminated by assertion [ FAIL ] global.macro [==========] 0 / 1 tests passed. [ FAILED ] The issue is very similar to the bug fixed in glibc assert(3) three years ago: https://sourceware.org/bugzilla/show_bug.cgi?id=18604 Cc: Shuah Khan <shuah@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Will Drewry <wad@chromium.org> Cc: linux-kselftest@vger.kernel.org Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Shuah Khan <shuah@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26scsi: target: use consistent left-aligned ASCII INQUIRY dataDavid Disseldorp
[ Upstream commit 0de263577de5d5e052be5f4f93334e63cc8a7f0b ] spc5r17.pdf specifies: 4.3.1 ASCII data field requirements ASCII data fields shall contain only ASCII printable characters (i.e., code values 20h to 7Eh) and may be terminated with one or more ASCII null (00h) characters. ASCII data fields described as being left-aligned shall have any unused bytes at the end of the field (i.e., highest offset) and the unused bytes shall be filled with ASCII space characters (20h). LIO currently space-pads the T10 VENDOR IDENTIFICATION and PRODUCT IDENTIFICATION fields in the standard INQUIRY data. However, the PRODUCT REVISION LEVEL field in the standard INQUIRY data as well as the T10 VENDOR IDENTIFICATION field in the INQUIRY Device Identification VPD Page are zero-terminated/zero-padded. Fix this inconsistency by using space-padding for all of the above fields. Signed-off-by: David Disseldorp <ddiss@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bryant G. Ly <bly@catalogicsoftware.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26net: call sk_dst_reset when set SO_DONTROUTEyupeng
[ Upstream commit 0fbe82e628c817e292ff588cd5847fc935e025f2 ] after set SO_DONTROUTE to 1, the IP layer should not route packets if the dest IP address is not in link scope. But if the socket has cached the dst_entry, such packets would be routed until the sk_dst_cache expires. So we should clean the sk_dst_cache when a user set SO_DONTROUTE option. Below are server/client python scripts which could reprodue this issue: server side code: ========================================================================== import socket import struct import time s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.bind(('', 9000)) s.listen(1) sock, addr = s.accept() sock.setsockopt(socket.SOL_SOCKET, socket.SO_DONTROUTE, struct.pack('i', 1)) while True: sock.send(b'foo') time.sleep(1) ========================================================================== client side code: ========================================================================== import socket import time s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.connect(('server_address', 9000)) while True: data = s.recv(1024) print(data) ========================================================================== Signed-off-by: yupeng <yupeng0921@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26media: venus: core: Set dma maximum segment sizeVivek Gautam
[ Upstream commit de2563bce7a157f5296bab94f3843d7d64fb14b4 ] Turning on CONFIG_DMA_API_DEBUG_SG results in the following error: [ 460.308650] ------------[ cut here ]------------ [ 460.313490] qcom-venus aa00000.video-codec: DMA-API: mapping sg segment longer than device claims to support [len=4194304] [max=65536] [ 460.326017] WARNING: CPU: 3 PID: 3555 at src/kernel/dma/debug.c:1301 debug_dma_map_sg+0x174/0x254 [ 460.338888] Modules linked in: venus_dec venus_enc videobuf2_dma_sg videobuf2_memops hci_uart btqca bluetooth venus_core v4l2_mem2mem videobuf2_v4l2 videobuf2_common ath10k_snoc ath10k_core ath lzo lzo_compress zramjoydev [ 460.375811] CPU: 3 PID: 3555 Comm: V4L2DecoderThre Tainted: G W 4.19.1 #82 [ 460.384223] Hardware name: Google Cheza (rev1) (DT) [ 460.389251] pstate: 60400009 (nZCv daif +PAN -UAO) [ 460.394191] pc : debug_dma_map_sg+0x174/0x254 [ 460.398680] lr : debug_dma_map_sg+0x174/0x254 [ 460.403162] sp : ffffff80200c37d0 [ 460.406583] x29: ffffff80200c3830 x28: 0000000000010000 [ 460.412056] x27: 00000000ffffffff x26: ffffffc0f785ea80 [ 460.417532] x25: 0000000000000000 x24: ffffffc0f4ea1290 [ 460.423001] x23: ffffffc09e700300 x22: ffffffc0f4ea1290 [ 460.428470] x21: ffffff8009037000 x20: 0000000000000001 [ 460.433936] x19: ffffff80091b0000 x18: 0000000000000000 [ 460.439411] x17: 0000000000000000 x16: 000000000000f251 [ 460.444885] x15: 0000000000000006 x14: 0720072007200720 [ 460.450354] x13: ffffff800af536e0 x12: 0000000000000000 [ 460.455822] x11: 0000000000000000 x10: 0000000000000000 [ 460.461288] x9 : 537944d9c6c48d00 x8 : 537944d9c6c48d00 [ 460.466758] x7 : 0000000000000000 x6 : ffffffc0f8d98f80 [ 460.472230] x5 : 0000000000000000 x4 : 0000000000000000 [ 460.477703] x3 : 000000000000008a x2 : ffffffc0fdb13948 [ 460.483170] x1 : ffffffc0fdb0b0b0 x0 : 000000000000007a [ 460.488640] Call trace: [ 460.491165] debug_dma_map_sg+0x174/0x254 [ 460.495307] vb2_dma_sg_alloc+0x260/0x2dc [videobuf2_dma_sg] [ 460.501150] __vb2_queue_alloc+0x164/0x374 [videobuf2_common] [ 460.507076] vb2_core_reqbufs+0xfc/0x23c [videobuf2_common] [ 460.512815] vb2_reqbufs+0x44/0x5c [videobuf2_v4l2] [ 460.517853] v4l2_m2m_reqbufs+0x44/0x78 [v4l2_mem2mem] [ 460.523144] v4l2_m2m_ioctl_reqbufs+0x1c/0x28 [v4l2_mem2mem] [ 460.528976] v4l_reqbufs+0x30/0x40 [ 460.532480] __video_do_ioctl+0x36c/0x454 [ 460.536610] video_usercopy+0x25c/0x51c [ 460.540572] video_ioctl2+0x38/0x48 [ 460.544176] v4l2_ioctl+0x60/0x74 [ 460.547602] do_video_ioctl+0x948/0x3520 [ 460.551648] v4l2_compat_ioctl32+0x60/0x98 [ 460.555872] __arm64_compat_sys_ioctl+0x134/0x20c [ 460.560718] el0_svc_common+0x9c/0xe4 [ 460.564498] el0_svc_compat_handler+0x2c/0x38 [ 460.568982] el0_svc_compat+0x8/0x18 [ 460.572672] ---[ end trace ce209b87b2f3af88 ]--- >From above warning one would deduce that the sg segment will overflow the device's capacity. In reality, the hardware can accommodate larger sg segments. So, initialize the max segment size properly to weed out this warning. Based on a similar patch sent by Sean Paul for mdss: https://patchwork.kernel.org/patch/10671457/ Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org> Acked-by: Stanimir Varbanov <stanimir.varbanov@linaro.org> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26media: firewire: Fix app_info parameter type in avc_ca{,_app}_infoNathan Chancellor
[ Upstream commit b2e9a4eda11fd2cb1e6714e9ad3f455c402568ff ] Clang warns: drivers/media/firewire/firedtv-avc.c:999:45: warning: implicit conversion from 'int' to 'char' changes value from 159 to -97 [-Wconstant-conversion] app_info[0] = (EN50221_TAG_APP_INFO >> 16) & 0xff; ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~ drivers/media/firewire/firedtv-avc.c:1000:45: warning: implicit conversion from 'int' to 'char' changes value from 128 to -128 [-Wconstant-conversion] app_info[1] = (EN50221_TAG_APP_INFO >> 8) & 0xff; ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~ drivers/media/firewire/firedtv-avc.c:1040:44: warning: implicit conversion from 'int' to 'char' changes value from 159 to -97 [-Wconstant-conversion] app_info[0] = (EN50221_TAG_CA_INFO >> 16) & 0xff; ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~ drivers/media/firewire/firedtv-avc.c:1041:44: warning: implicit conversion from 'int' to 'char' changes value from 128 to -128 [-Wconstant-conversion] app_info[1] = (EN50221_TAG_CA_INFO >> 8) & 0xff; ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~ 4 warnings generated. Change app_info's type to unsigned char to match the type of the member msg in struct ca_msg, which is the only thing passed into the app_info parameter in this function. Link: https://github.com/ClangBuiltLinux/linux/issues/105 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26powerpc/pseries/cpuidle: Fix preempt warningBreno Leitao
[ Upstream commit 2b038cbc5fcf12a7ee1cc9bfd5da1e46dacdee87 ] When booting a pseries kernel with PREEMPT enabled, it dumps the following warning: BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1 caller is pseries_processor_idle_init+0x5c/0x22c CPU: 13 PID: 1 Comm: swapper/0 Not tainted 4.20.0-rc3-00090-g12201a0128bc-dirty #828 Call Trace: [c000000429437ab0] [c0000000009c8878] dump_stack+0xec/0x164 (unreliable) [c000000429437b00] [c0000000005f2f24] check_preemption_disabled+0x154/0x160 [c000000429437b90] [c000000000cab8e8] pseries_processor_idle_init+0x5c/0x22c [c000000429437c10] [c000000000010ed4] do_one_initcall+0x64/0x300 [c000000429437ce0] [c000000000c54500] kernel_init_freeable+0x3f0/0x500 [c000000429437db0] [c0000000000112dc] kernel_init+0x2c/0x160 [c000000429437e20] [c00000000000c1d0] ret_from_kernel_thread+0x5c/0x6c This happens because the code calls get_lppaca() which calls get_paca() and it checks if preemption is disabled through check_preemption_disabled(). Preemption should be disabled because the per CPU variable may make no sense if there is a preemption (and a CPU switch) after it reads the per CPU data and when it is used. In this device driver specifically, it is not a problem, because this code just needs to have access to one lppaca struct, and it does not matter if it is the current per CPU lppaca struct or not (i.e. when there is a preemption and a CPU migration). That said, the most appropriate fix seems to be related to avoiding the debug_smp_processor_id() call at get_paca(), instead of calling preempt_disable() before get_paca(). Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26powerpc/xmon: Fix invocation inside lock regionBreno Leitao
[ Upstream commit 8d4a862276a9c30a269d368d324fb56529e6d5fd ] Currently xmon needs to get devtree_lock (through rtas_token()) during its invocation (at crash time). If there is a crash while devtree_lock is being held, then xmon tries to get the lock but spins forever and never get into the interactive debugger, as in the following case: int *ptr = NULL; raw_spin_lock_irqsave(&devtree_lock, flags); *ptr = 0xdeadbeef; This patch avoids calling rtas_token(), thus trying to get the same lock, at crash time. This new mechanism proposes getting the token at initialization time (xmon_init()) and just consuming it at crash time. This would allow xmon to be possible invoked independent of devtree_lock being held or not. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26pstore/ram: Do not treat empty buffers as validJoel Fernandes (Google)
[ Upstream commit 30696378f68a9e3dad6bfe55938b112e72af00c2 ] The ramoops backend currently calls persistent_ram_save_old() even if a buffer is empty. While this appears to work, it is does not seem like the right thing to do and could lead to future bugs so lets avoid that. It also prevents misleading prints in the logs which claim the buffer is valid. I got something like: found existing buffer, size 0, start 0 When I was expecting: no valid data in buffer (sig = ...) This bails out early (and reports with pr_debug()), since it's an acceptable state. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Co-developed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26clk: imx: make mux parent strings constA.s. Dong
[ Upstream commit 9e5ef7a57ca75a1b9411c46caeeb6881124284a3 ] As the commit 2893c379461a ("clk: make strings in parent name arrays const"), let's make the parent strings const, otherwise we may meet the following warning when compiling: drivers/clk/imx/clk-imx7ulp.c: In function 'imx7ulp_clocks_init': drivers/clk/imx/clk-imx7ulp.c:73:35: warning: passing argument 5 of 'imx_clk_mux_flags' discards 'const' qualifier from pointer target type clks[IMX7ULP_CLK_APLL_PRE_SEL] = imx_clk_mux_flags("apll_pre_sel", base + 0x508, 0, 1, pll_pre_sels, ARRAY_SIZE(pll_pre_sels), CLK_SET_PARENT_GATE); ^ In file included from drivers/clk/imx/clk-imx7ulp.c:23:0: drivers/clk/imx/clk.h:200:27: note: expected 'const char **' but argument is of type 'const char * const*' ... Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Michael Turquette <mturquette@baylibre.com> Cc: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Dong Aisheng <aisheng.dong@nxp.com> Signed-off-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26jffs2: Fix use of uninitialized delayed_work, lockdep breakageDaniel Santos
[ Upstream commit a788c5272769ddbcdbab297cf386413eeac04463 ] jffs2_sync_fs makes the assumption that if CONFIG_JFFS2_FS_WRITEBUFFER is defined then a write buffer is available and has been initialized. However, this does is not the case when the mtd device has no out-of-band buffer: int jffs2_nand_flash_setup(struct jffs2_sb_info *c) { if (!c->mtd->oobsize) return 0; ... The resulting call to cancel_delayed_work_sync passing a uninitialized (but zeroed) delayed_work struct forces lockdep to become disabled. [ 90.050639] overlayfs: upper fs does not support tmpfile. [ 90.652264] INFO: trying to register non-static key. [ 90.662171] the code is fine but needs lockdep annotation. [ 90.673090] turning off the locking correctness validator. [ 90.684021] CPU: 0 PID: 1762 Comm: mount_root Not tainted 4.14.63 #0 [ 90.696672] Stack : 00000000 00000000 80d8f6a2 00000038 805f0000 80444600 8fe364f4 805dfbe7 [ 90.713349] 80563a30 000006e2 8068370c 00000001 00000000 00000001 8e2fdc48 ffffffff [ 90.730020] 00000000 00000000 80d90000 00000000 00000106 00000000 6465746e 312e3420 [ 90.746690] 6b636f6c 03bf0000 f8000000 20676e69 00000000 80000000 00000000 8e2c2a90 [ 90.763362] 80d90000 00000001 00000000 8e2c2a90 00000003 80260dc0 08052098 80680000 [ 90.780033] ... [ 90.784902] Call Trace: [ 90.789793] [<8000f0d8>] show_stack+0xb8/0x148 [ 90.798659] [<8005a000>] register_lock_class+0x270/0x55c [ 90.809247] [<8005cb64>] __lock_acquire+0x13c/0xf7c [ 90.818964] [<8005e314>] lock_acquire+0x194/0x1dc [ 90.828345] [<8003f27c>] flush_work+0x200/0x24c [ 90.837374] [<80041dfc>] __cancel_work_timer+0x158/0x210 [ 90.847958] [<801a8770>] jffs2_sync_fs+0x20/0x54 [ 90.857173] [<80125cf4>] iterate_supers+0xf4/0x120 [ 90.866729] [<80158fc4>] sys_sync+0x44/0x9c [ 90.875067] [<80014424>] syscall_common+0x34/0x58 Signed-off-by: Daniel Santos <daniel.santos@pobox.com> Reviewed-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26rxe: IB_WR_REG_MR does not capture MR's iova fieldChuck Lever
[ Upstream commit b024dd0eba6e6d568f69d63c5e3153aba94c23e3 ] FRWR memory registration is done with a series of calls and WRs. 1. ULP invokes ib_dma_map_sg() 2. ULP invokes ib_map_mr_sg() 3. ULP posts an IB_WR_REG_MR on the Send queue Step 2 generates an iova. It is permissible for ULPs to change this iova (with certain restrictions) between steps 2 and 3. rxe_map_mr_sg captures the MR's iova but later when rxe processes the REG_MR WR, it ignores the MR's iova field. If a ULP alters the MR's iova after step 2 but before step 3, rxe never captures that change. When the remote sends an RDMA Read targeting that MR, rxe looks up the R_key, but the altered iova does not match the iova stored in the MR, causing the RDMA Read request to fail. Reported-by: Anna Schumaker <schumaker.anna@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26selinux: always allow mounting submountsOndrej Mosnacek
[ Upstream commit 2cbdcb882f97a45f7475c67ac6257bbc16277dfe ] If a superblock has the MS_SUBMOUNT flag set, we should always allow mounting it. These mounts are done automatically by the kernel either as part of mounting some parent mount (e.g. debugfs always mounts tracefs under "tracing" for compatibility) or they are mounted automatically as needed on subdirectory accesses (e.g. NFS crossmnt mounts). Since such automounts are either an implicit consequence of the parent mount (which is already checked) or they can happen during regular accesses (where it doesn't make sense to check against the current task's context), the mount permission check should be skipped for them. Without this patch, attempts to access contents of an automounted directory can cause unexpected SELinux denials. In the current kernel tree, the MS_SUBMOUNT flag is set only via vfs_submount(), which is called only from the following places: - AFS, when automounting special "symlinks" referencing other cells - CIFS, when automounting "referrals" - NFS, when automounting subtrees - debugfs, when automounting tracefs In all cases the submounts are meant to be transparent to the user and it makes sense that if mounting the master is allowed, then so should be the automounts. Note that CAP_SYS_ADMIN capability checking is already skipped for (SB_KERNMOUNT|SB_SUBMOUNT) in: - sget_userns() in fs/super.c: if (!(flags & (SB_KERNMOUNT|SB_SUBMOUNT)) && !(type->fs_flags & FS_USERNS_MOUNT) && !capable(CAP_SYS_ADMIN)) return ERR_PTR(-EPERM); - sget() in fs/super.c: /* Ensure the requestor has permissions over the target filesystem */ if (!(flags & (SB_KERNMOUNT|SB_SUBMOUNT)) && !ns_capable(user_ns, CAP_SYS_ADMIN)) return ERR_PTR(-EPERM); Verified internally on patched RHEL 7.6 with a reproducer using NFS+httpd and selinux-tesuite. Fixes: 93faccbbfa95 ("fs: Better permission checking for submounts") Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Sasha Levin <sashal@kernel.org>