path: root/arch/alpha
AgeCommit message (Collapse)Author
2020-04-02mm: introduce fault_signal_pending()Peter Xu
For most architectures, we've got a quick path to detect fatal signal after a handle_mm_fault(). Introduce a helper for that quick path. It cleans the current codes a bit so we don't need to duplicate the same check across archs. More importantly, this will be an unified place that we handle the signal immediately right after an interrupted page fault, so it'll be much easier for us if we want to change the behavior of handling signals later on for all the archs. Note that currently only part of the archs are using this new helper, because some archs have their own way to handle signals. In the follow up patches, we'll try to apply this helper to all the rest of archs. Another note is that the "regs" parameter in the new helper is not used yet. It'll be used very soon. Now we kept it in this patch only to avoid touching all the archs again in the follow up patches. [peterx@redhat.com: fix sparse warnings] Link: http://lkml.kernel.org/r/20200311145921.GD479302@xz-x1 Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Brian Geffon <bgeffon@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220155353.8676-4-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02asm-generic: make more kernel-space headers mandatoryMasahiro Yamada
Change a header to mandatory-y if both of the following are met: [1] At least one architecture (except um) specifies it as generic-y in arch/*/include/asm/Kbuild [2] Every architecture (except um) either has its own implementation (arch/*/include/asm/*.h) or specifies it as generic-y in arch/*/include/asm/Kbuild This commit was generated by the following shell script. ----------------------------------->8----------------------------------- arches=$(cd arch; ls -1 | sed -e '/Kconfig/d' -e '/um/d') tmpfile=$(mktemp) grep "^mandatory-y +=" include/asm-generic/Kbuild > $tmpfile find arch -path 'arch/*/include/asm/Kbuild' | xargs sed -n 's/^generic-y += \(.*\)/\1/p' | sort -u | while read header do mandatory=yes for arch in $arches do if ! grep -q "generic-y += $header" arch/$arch/include/asm/Kbuild && ! [ -f arch/$arch/include/asm/$header ]; then mandatory=no break fi done if [ "$mandatory" = yes ]; then echo "mandatory-y += $header" >> $tmpfile for arch in $arches do sed -i "/generic-y += $header/d" arch/$arch/include/asm/Kbuild done fi done sed -i '/^mandatory-y +=/d' include/asm-generic/Kbuild LANG=C sort $tmpfile >> include/asm-generic/Kbuild ----------------------------------->8----------------------------------- One obvious benefit is the diff stat: 25 files changed, 52 insertions(+), 557 deletions(-) It is tedious to list generic-y for each arch that needs it. So, mandatory-y works like a fallback default (by just wrapping asm-generic one) when arch does not have a specific header implementation. See the following commits: def3f7cefe4e81c296090e1722a76551142c227c a1b39bae16a62ce4aae02d958224f19316d98b24 It is tedious to convert headers one by one, so I processed by a shell script. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Simek <michal.simek@xilinx.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Arnd Bergmann <arnd@arndb.de> Link: http://lkml.kernel.org/r/20200210175452.5030-1-masahiroy@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-30Merge tag 'irq-core-2020-03-30' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Updates for the interrupt subsystem: Treewide: - Cleanup of setup_irq() which is not longer required because the memory allocator is available early. Most cleanup changes come through the various maintainer trees, so the final removal of setup_irq() is postponed towards the end of the merge window. Core: - Protection against unsafe invocation of interrupt handlers and unsafe interrupt injection including a fixup of the offending PCI/AER error injection mechanism. Invoking interrupt handlers from arbitrary contexts, i.e. outside of an actual interrupt, can cause inconsistent state on the fragile x86 interrupt affinity changing hardware trainwreck. Drivers: - Second wave of support for the new ARM GICv4.1 - Multi-instance support for Xilinx and PLIC interrupt controllers - CPU-Hotplug support for PLIC - The obligatory new driver for X1000 TCU - Enhancements, cleanups and fixes all over the place" * tag 'irq-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits) unicore32: Replace setup_irq() by request_irq() sh: Replace setup_irq() by request_irq() hexagon: Replace setup_irq() by request_irq() c6x: Replace setup_irq() by request_irq() alpha: Replace setup_irq() by request_irq() irqchip/gic-v4.1: Eagerly vmap vPEs irqchip/gic-v4.1: Add VSGI property setup irqchip/gic-v4.1: Add VSGI allocation/teardown irqchip/gic-v4.1: Move doorbell management to the GICv4 abstraction layer irqchip/gic-v4.1: Plumb set_vcpu_affinity SGI callbacks irqchip/gic-v4.1: Plumb get/set_irqchip_state SGI callbacks irqchip/gic-v4.1: Plumb mask/unmask SGI callbacks irqchip/gic-v4.1: Add initial SGI configuration irqchip/gic-v4.1: Plumb skeletal VSGI irqchip irqchip/stm32: Retrigger both in eoi and unmask callbacks irqchip/gic-v3: Move irq_domain_update_bus_token to after checking for NULL domain irqchip/xilinx: Do not call irq_set_default_host() irqchip/xilinx: Enable generic irq multi handler irqchip/xilinx: Fill error code when irq domain registration fails irqchip/xilinx: Add support for multiple instances ...
2020-03-29alpha: Replace setup_irq() by request_irq()afzal mohammed
request_irq() is preferred over setup_irq(). Invocations of setup_irq() occur after memory allocators are ready. setup_irq() was required in older kernels as the memory allocator was not available during early boot. Hence replace setup_irq() by request_irq(). Signed-off-by: afzal mohammed <afzal.mohd.ma@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Matt Turner <mattst88@gmail.com> Link: https://lkml.kernel.org/r/51f8ae7da9f47a23596388141933efa2bdef317b.1585320721.git.afzal.mohd.ma@gmail.com
2020-03-27futex: arch_futex_atomic_op_inuser() calling conventions changeAl Viro
Move access_ok() in and pagefault_enable()/pagefault_disable() out. Mechanical conversion only - some instances don't really need a separate access_ok() at all (e.g. the ones only using get_user()/put_user(), or architectures where access_ok() is always true); we'll deal with that in followups. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-09Merge tag 'kbuild-v5.6-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull more Kbuild updates from Masahiro Yamada: - fix randconfig to generate a sane .config - rename hostprogs-y / always to hostprogs / always-y, which are more natual syntax. - optimize scripts/kallsyms - fix yes2modconfig and mod2yesconfig - make multiple directory targets ('make foo/ bar/') work * tag 'kbuild-v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kbuild: make multiple directory targets work kconfig: Invalidate all symbols after changing to y or m. kallsyms: fix type of kallsyms_token_table[] scripts/kallsyms: change table to store (strcut sym_entry *) scripts/kallsyms: rename local variables in read_symbol() kbuild: rename hostprogs-y/always to hostprogs/always-y kbuild: fix the document to use extra-y for vmlinux.lds kconfig: fix broken dependency in randconfig-generated .config
2020-02-04proc: convert everything to "struct proc_ops"Alexey Dobriyan
The most notable change is DEFINE_SHOW_ATTRIBUTE macro split in seq_file.h. Conversion rule is: llseek => proc_lseek unlocked_ioctl => proc_ioctl xxx => proc_xxx delete ".owner = THIS_MODULE" line [akpm@linux-foundation.org: fix drivers/isdn/capi/kcapi_proc.c] [sfr@canb.auug.org.au: fix kernel/sched/psi.c] Link: http://lkml.kernel.org/r/20200122180545.36222f50@canb.auug.org.au Link: http://lkml.kernel.org/r/20191225172546.GB13378@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-04kbuild: rename hostprogs-y/always to hostprogs/always-yMasahiro Yamada
In old days, the "host-progs" syntax was used for specifying host programs. It was renamed to the current "hostprogs-y" in 2004. It is typically useful in scripts/Makefile because it allows Kbuild to selectively compile host programs based on the kernel configuration. This commit renames like follows: always -> always-y hostprogs-y -> hostprogs So, scripts/Makefile will look like this: always-$(CONFIG_BUILD_BIN2C) += ... always-$(CONFIG_KALLSYMS) += ... ... hostprogs := $(always-y) $(always-m) I think this makes more sense because a host program is always a host program, irrespective of the kernel configuration. We want to specify which ones to compile by CONFIG options, so always-y will be handier. The "always", "hostprogs-y", "hostprogs-m" will be kept for backward compatibility for a while. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-01-29Merge tag 'threads-v5.6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull thread management updates from Christian Brauner: "Sargun Dhillon over the last cycle has worked on the pidfd_getfd() syscall. This syscall allows for the retrieval of file descriptors of a process based on its pidfd. A task needs to have ptrace_may_access() permissions with PTRACE_MODE_ATTACH_REALCREDS (suggested by Oleg and Andy) on the target. One of the main use-cases is in combination with seccomp's user notification feature. As a reminder, seccomp's user notification feature was made available in v5.0. It allows a task to retrieve a file descriptor for its seccomp filter. The file descriptor is usually handed of to a more privileged supervising process. The supervisor can then listen for syscall events caught by the seccomp filter of the supervisee and perform actions in lieu of the supervisee, usually emulating syscalls. pidfd_getfd() is needed to expand its uses. There are currently two major users that wait on pidfd_getfd() and one future user: - Netflix, Sargun said, is working on a service mesh where users should be able to connect to a dns-based VIP. When a user connects to e.g. that runs e.g. service "foo" they will be redirected to an envoy process. This service mesh uses seccomp user notifications and pidfd to intercept all connect calls and instead of connecting them to connects them to e.g. - LXD uses the seccomp notifier heavily to intercept and emulate mknod() and mount() syscalls for unprivileged containers/processes. With pidfd_getfd() more uses-cases e.g. bridging socket connections will be possible. - The patchset has also seen some interest from the browser corner. Right now, Firefox is using a SECCOMP_RET_TRAP sandbox managed by a broker process. In the future glibc will start blocking all signals during dlopen() rendering this type of sandbox impossible. Hence, in the future Firefox will switch to a seccomp-user-nofication based sandbox which also makes use of file descriptor retrieval. The thread for this can be found at https://sourceware.org/ml/libc-alpha/2019-12/msg00079.html With pidfd_getfd() it is e.g. possible to bridge socket connections for the supervisee (binding to a privileged port) and taking actions on file descriptors on behalf of the supervisee in general. Sargun's first version was using an ioctl on pidfds but various people pushed for it to be a proper syscall which he duely implemented as well over various review cycles. Selftests are of course included. I've also added instructions how to deal with merge conflicts below. There's also a small fix coming from the kernel mentee project to correctly annotate struct sighand_struct with __rcu to fix various sparse warnings. We've received a few more such fixes and even though they are mostly trivial I've decided to postpone them until after -rc1 since they came in rather late and I don't want to risk introducing build warnings. Finally, there's a new prctl() command PR_{G,S}ET_IO_FLUSHER which is needed to avoid allocation recursions triggerable by storage drivers that have userspace parts that run in the IO path (e.g. dm-multipath, iscsi, etc). These allocation recursions deadlock the device. The new prctl() allows such privileged userspace components to avoid allocation recursions by setting the PF_MEMALLOC_NOIO and PF_LESS_THROTTLE flags. The patch carries the necessary acks from the relevant maintainers and is routed here as part of prctl() thread-management." * tag 'threads-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaim sched.h: Annotate sighand_struct with __rcu test: Add test for pidfd getfd arch: wire up pidfd_getfd syscall pid: Implement pidfd_getfd syscall vfs, fdtable: Add fget_task helper
2020-01-29Merge branch 'work.openat2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull openat2 support from Al Viro: "This is the openat2() series from Aleksa Sarai. I'm afraid that the rest of namei stuff will have to wait - it got zero review the last time I'd posted #work.namei, and there had been a leak in the posted series I'd caught only last weekend. I was going to repost it on Monday, but the window opened and the odds of getting any review during that... Oh, well. Anyway, openat2 part should be ready; that _did_ get sane amount of review and public testing, so here it comes" From Aleksa's description of the series: "For a very long time, extending openat(2) with new features has been incredibly frustrating. This stems from the fact that openat(2) is possibly the most famous counter-example to the mantra "don't silently accept garbage from userspace" -- it doesn't check whether unknown flags are present[1]. This means that (generally) the addition of new flags to openat(2) has been fraught with backwards-compatibility issues (O_TMPFILE has to be defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old kernels gave errors, since it's insecure to silently ignore the flag[2]). All new security-related flags therefore have a tough road to being added to openat(2). Furthermore, the need for some sort of control over VFS's path resolution (to avoid malicious paths resulting in inadvertent breakouts) has been a very long-standing desire of many userspace applications. This patchset is a revival of Al Viro's old AT_NO_JUMPS[3] patchset (which was a variant of David Drysdale's O_BENEATH patchset[4] which was a spin-off of the Capsicum project[5]) with a few additions and changes made based on the previous discussion within [6] as well as others I felt were useful. In line with the conclusions of the original discussion of AT_NO_JUMPS, the flag has been split up into separate flags. However, instead of being an openat(2) flag it is provided through a new syscall openat2(2) which provides several other improvements to the openat(2) interface (see the patch description for more details). The following new LOOKUP_* flags are added: LOOKUP_NO_XDEV: Blocks all mountpoint crossings (upwards, downwards, or through absolute links). Absolute pathnames alone in openat(2) do not trigger this. Magic-link traversal which implies a vfsmount jump is also blocked (though magic-link jumps on the same vfsmount are permitted). LOOKUP_NO_MAGICLINKS: Blocks resolution through /proc/$pid/fd-style links. This is done by blocking the usage of nd_jump_link() during resolution in a filesystem. The term "magic-links" is used to match with the only reference to these links in Documentation/, but I'm happy to change the name. It should be noted that this is different to the scope of ~LOOKUP_FOLLOW in that it applies to all path components. However, you can do openat2(NO_FOLLOW|NO_MAGICLINKS) on a magic-link and it will *not* fail (assuming that no parent component was a magic-link), and you will have an fd for the magic-link. In order to correctly detect magic-links, the introduction of a new LOOKUP_MAGICLINK_JUMPED state flag was required. LOOKUP_BENEATH: Disallows escapes to outside the starting dirfd's tree, using techniques such as ".." or absolute links. Absolute paths in openat(2) are also disallowed. Conceptually this flag is to ensure you "stay below" a certain point in the filesystem tree -- but this requires some additional to protect against various races that would allow escape using "..". Currently LOOKUP_BENEATH implies LOOKUP_NO_MAGICLINKS, because it can trivially beam you around the filesystem (breaking the protection). In future, there might be similar safety checks done as in LOOKUP_IN_ROOT, but that requires more discussion. In addition, two new flags are added that expand on the above ideas: LOOKUP_NO_SYMLINKS: Does what it says on the tin. No symlink resolution is allowed at all, including magic-links. Just as with LOOKUP_NO_MAGICLINKS this can still be used with NOFOLLOW to open an fd for the symlink as long as no parent path had a symlink component. LOOKUP_IN_ROOT: This is an extension of LOOKUP_BENEATH that, rather than blocking attempts to move past the root, forces all such movements to be scoped to the starting point. This provides chroot(2)-like protection but without the cost of a chroot(2) for each filesystem operation, as well as being safe against race attacks that chroot(2) is not. If a race is detected (as with LOOKUP_BENEATH) then an error is generated, and similar to LOOKUP_BENEATH it is not permitted to cross magic-links with LOOKUP_IN_ROOT. The primary need for this is from container runtimes, which currently need to do symlink scoping in userspace[7] when opening paths in a potentially malicious container. There is a long list of CVEs that could have bene mitigated by having RESOLVE_THIS_ROOT (such as CVE-2017-1002101, CVE-2017-1002102, CVE-2018-15664, and CVE-2019-5736, just to name a few). In order to make all of the above more usable, I'm working on libpathrs[8] which is a C-friendly library for safe path resolution. It features a userspace-emulated backend if the kernel doesn't support openat2(2). Hopefully we can get userspace to switch to using it, and thus get openat2(2) support for free once it's ready. Future work would include implementing things like RESOLVE_NO_AUTOMOUNT and possibly a RESOLVE_NO_REMOTE (to allow programs to be sure they don't hit DoSes though stale NFS handles)" * 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: Documentation: path-lookup: include new LOOKUP flags selftests: add openat2(2) selftests open: introduce openat2(2) syscall namei: LOOKUP_{IN_ROOT,BENEATH}: permit limited ".." resolution namei: LOOKUP_IN_ROOT: chroot-like scoped resolution namei: LOOKUP_BENEATH: O_BENEATH-like scoped resolution namei: LOOKUP_NO_XDEV: block mountpoint crossing namei: LOOKUP_NO_MAGICLINKS: block magic-link resolution namei: LOOKUP_NO_SYMLINKS: block symlink resolution namei: allow set_root() to produce errors namei: allow nd_jump_link() to produce errors nsfs: clean-up ns_get_path() signature to return int namei: only return -ECHILD from follow_dotdot_rcu()
2020-01-29Merge tag 'tty-5.6-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial driver updates from Greg KH: "Here are the big set of tty and serial driver updates for 5.6-rc1 Included in here are: - dummy_con cleanups (touches lots of arch code) - sysrq logic cleanups (touches lots of serial drivers) - samsung driver fixes (wasn't really being built) - conmakeshash move to tty subdir out of scripts - lots of small tty/serial driver updates All of these have been in linux-next for a while with no reported issues" * tag 'tty-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (140 commits) tty: n_hdlc: Use flexible-array member and struct_size() helper tty: baudrate: SPARC supports few more baud rates tty: baudrate: Synchronise baud_table[] and baud_bits[] tty: serial: meson_uart: Add support for kernel debugger serial: imx: fix a race condition in receive path serial: 8250_bcm2835aux: Document struct bcm2835aux_data serial: 8250_bcm2835aux: Use generic remapping code serial: 8250_bcm2835aux: Allocate uart_8250_port on stack serial: 8250_bcm2835aux: Suppress register_port error on -EPROBE_DEFER serial: 8250_bcm2835aux: Suppress clk_get error on -EPROBE_DEFER serial: 8250_bcm2835aux: Fix line mismatch on driver unbind serial_core: Remove unused member in uart_port vt: Correct comment documenting do_take_over_console() vt: Delete comment referencing non-existent unbind_con_driver() arch/xtensa/setup: Drop dummy_con initialization arch/x86/setup: Drop dummy_con initialization arch/unicore32/setup: Drop dummy_con initialization arch/sparc/setup: Drop dummy_con initialization arch/sh/setup: Drop dummy_con initialization arch/s390/setup: Drop dummy_con initialization ...
2020-01-28Merge branch 'efi-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI updates from Ingo Molnar: "The main changes in this cycle were: - Cleanup of the GOP [graphics output] handling code in the EFI stub - Complete refactoring of the mixed mode handling in the x86 EFI stub - Overhaul of the x86 EFI boot/runtime code - Increase robustness for mixed mode code - Add the ability to disable DMA at the root port level in the EFI stub - Get rid of RWX mappings in the EFI memory map and page tables, where possible - Move the support code for the old EFI memory mapping style into its only user, the SGI UV1+ support code. - plus misc fixes, updates, smaller cleanups. ... and due to interactions with the RWX changes, another round of PAT cleanups make a guest appearance via the EFI tree - with no side effects intended" * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits) efi/x86: Disable instrumentation in the EFI runtime handling code efi/libstub/x86: Fix EFI server boot failure efi/x86: Disallow efi=old_map in mixed mode x86/boot/compressed: Relax sed symbol type regex for LLVM ld.lld efi/x86: avoid KASAN false positives when accessing the 1: 1 mapping efi: Fix handling of multiple efi_fake_mem= entries efi: Fix efi_memmap_alloc() leaks efi: Add tracking for dynamically allocated memmaps efi: Add a flags parameter to efi_memory_map efi: Fix comment for efi_mem_type() wrt absent physical addresses efi/arm: Defer probe of PCIe backed efifb on DT systems efi/x86: Limit EFI old memory map to SGI UV machines efi/x86: Avoid RWX mappings for all of DRAM efi/x86: Don't map the entire kernel text RW for mixed mode x86/mm: Fix NX bit clearing issue in kernel_map_pages_in_pgd efi/libstub/x86: Fix unused-variable warning efi/libstub/x86: Use mandatory 16-byte stack alignment in mixed mode efi/libstub/x86: Use const attribute for efi_is_64bit() efi: Allow disabling PCI busmastering on bridges during boot efi/x86: Allow translating 64-bit arguments for mixed mode calls ...
2020-01-18open: introduce openat2(2) syscallAleksa Sarai
/* Background. */ For a very long time, extending openat(2) with new features has been incredibly frustrating. This stems from the fact that openat(2) is possibly the most famous counter-example to the mantra "don't silently accept garbage from userspace" -- it doesn't check whether unknown flags are present[1]. This means that (generally) the addition of new flags to openat(2) has been fraught with backwards-compatibility issues (O_TMPFILE has to be defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old kernels gave errors, since it's insecure to silently ignore the flag[2]). All new security-related flags therefore have a tough road to being added to openat(2). Userspace also has a hard time figuring out whether a particular flag is supported on a particular kernel. While it is now possible with contemporary kernels (thanks to [3]), older kernels will expose unknown flag bits through fcntl(F_GETFL). Giving a clear -EINVAL during openat(2) time matches modern syscall designs and is far more fool-proof. In addition, the newly-added path resolution restriction LOOKUP flags (which we would like to expose to user-space) don't feel related to the pre-existing O_* flag set -- they affect all components of path lookup. We'd therefore like to add a new flag argument. Adding a new syscall allows us to finally fix the flag-ignoring problem, and we can make it extensible enough so that we will hopefully never need an openat3(2). /* Syscall Prototype. */ /* * open_how is an extensible structure (similar in interface to * clone3(2) or sched_setattr(2)). The size parameter must be set to * sizeof(struct open_how), to allow for future extensions. All future * extensions will be appended to open_how, with their zero value * acting as a no-op default. */ struct open_how { /* ... */ }; int openat2(int dfd, const char *pathname, struct open_how *how, size_t size); /* Description. */ The initial version of 'struct open_how' contains the following fields: flags Used to specify openat(2)-style flags. However, any unknown flag bits or otherwise incorrect flag combinations (like O_PATH|O_RDWR) will result in -EINVAL. In addition, this field is 64-bits wide to allow for more O_ flags than currently permitted with openat(2). mode The file mode for O_CREAT or O_TMPFILE. Must be set to zero if flags does not contain O_CREAT or O_TMPFILE. resolve Restrict path resolution (in contrast to O_* flags they affect all path components). The current set of flags are as follows (at the moment, all of the RESOLVE_ flags are implemented as just passing the corresponding LOOKUP_ flag). RESOLVE_NO_XDEV => LOOKUP_NO_XDEV RESOLVE_NO_SYMLINKS => LOOKUP_NO_SYMLINKS RESOLVE_NO_MAGICLINKS => LOOKUP_NO_MAGICLINKS RESOLVE_BENEATH => LOOKUP_BENEATH RESOLVE_IN_ROOT => LOOKUP_IN_ROOT open_how does not contain an embedded size field, because it is of little benefit (userspace can figure out the kernel open_how size at runtime fairly easily without it). It also only contains u64s (even though ->mode arguably should be a u16) to avoid having padding fields which are never used in the future. Note that as a result of the new how->flags handling, O_PATH|O_TMPFILE is no longer permitted for openat(2). As far as I can tell, this has always been a bug and appears to not be used by userspace (and I've not seen any problems on my machines by disallowing it). If it turns out this breaks something, we can special-case it and only permit it for openat(2) but not openat2(2). After input from Florian Weimer, the new open_how and flag definitions are inside a separate header from uapi/linux/fcntl.h, to avoid problems that glibc has with importing that header. /* Testing. */ In a follow-up patch there are over 200 selftests which ensure that this syscall has the correct semantics and will correctly handle several attack scenarios. In addition, I've written a userspace library[4] which provides convenient wrappers around openat2(RESOLVE_IN_ROOT) (this is necessary because no other syscalls support RESOLVE_IN_ROOT, and thus lots of care must be taken when using RESOLVE_IN_ROOT'd file descriptors with other syscalls). During the development of this patch, I've run numerous verification tests using libpathrs (showing that the API is reasonably usable by userspace). /* Future Work. */ Additional RESOLVE_ flags have been suggested during the review period. These can be easily implemented separately (such as blocking auto-mount during resolution). Furthermore, there are some other proposed changes to the openat(2) interface (the most obvious example is magic-link hardening[5]) which would be a good opportunity to add a way for userspace to restrict how O_PATH file descriptors can be re-opened. Another possible avenue of future work would be some kind of CHECK_FIELDS[6] flag which causes the kernel to indicate to userspace which openat2(2) flags and fields are supported by the current kernel (to avoid userspace having to go through several guesses to figure it out). [1]: https://lwn.net/Articles/588444/ [2]: https://lore.kernel.org/lkml/CA+55aFyyxJL1LyXZeBsf2ypriraj5ut1XkNDsunRBqgVjZU_6Q@mail.gmail.com [3]: commit 629e014bb834 ("fs: completely ignore unknown open flags") [4]: https://sourceware.org/bugzilla/show_bug.cgi?id=17523 [5]: https://lore.kernel.org/lkml/20190930183316.10190-2-cyphar@cyphar.com/ [6]: https://youtu.be/ggD-eb3yPVs Suggested-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-01-14arch/alpha/setup: Drop dummy_con initializationArvind Sankar
con_init in tty/vt.c will now set conswitchp to dummy_con if it's unset. Drop it from arch setup code. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Link: https://lore.kernel.org/r/20191218214506.49252-4-nivedita@alum.mit.edu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-13arch: wire up pidfd_getfd syscallSargun Dhillon
This wires up the pidfd_getfd syscall for all architectures. Signed-off-by: Sargun Dhillon <sargun@sargun.me> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20200107175927.4558-4-sargun@sargun.me Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-01-06remove ioremap_nocache and devm_ioremap_nocacheChristoph Hellwig
ioremap has provided non-cached semantics by default since the Linux 2.6 days, so remove the additional ioremap_nocache interface. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Arnd Bergmann <arnd@arndb.de>
2019-12-10mm/vmalloc: Add empty <asm/vmalloc.h> headers and use them from ↵Ingo Molnar
<linux/vmalloc.h> In the x86 MM code we'd like to untangle various types of historic header dependency spaghetti, but for this we'd need to pass to the generic vmalloc code various vmalloc related defines that customarily come via the <asm/page.h> low level arch header. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-04alpha: use pgtable-nopud instead of 4level-fixupMike Rapoport
Patch series "mm: remove __ARCH_HAS_4LEVEL_HACK", v13. These patches convert several architectures to use page table folding and remove __ARCH_HAS_4LEVEL_HACK along with include/asm-generic/4level-fixup.h. For the nommu configurations the folding is already implemented by the generic code so the only change was to use the appropriate header file. As for the rest, the changes are mostly about mechanical replacement of pgd accessors with pud/pmd ones and the addition of higher levels to page table traversals. With Vineet's patches from "elide extraneous generated code for folded p4d/pud/pmd" series [1] there is a small shrink of the kernel size of about -0.01% for the defconfig builds. This patch (of 13): It is not likely alpha will have 5-level page tables. Replace usage of include/asm-generic/4level-fixup.h and implied __ARCH_HAS_4LEVEL_HACK with include/asm-generic/pgtable-nopud.h and adjust page table manipulation macros and functions accordingly. Link: http://lkml.kernel.org/r/1572938135-31886-2-git-send-email-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Helge Deller <deller@gmx.de> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Mark Salter <msalter@redhat.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Peter Rosin <peda@axentia.se> Cc: Richard Weinberger <richard@nod.at> Cc: Rolf Eike Beer <eike-kernel@sf-tec.de> Cc: Russell King <linux@armlinux.org.uk> Cc: Sam Creasey <sammy@sammy.net> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com> Cc: Anatoly Pugachev <matorola@gmail.com> Cc: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-03Merge tag 'pci-v5.5-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "Enumeration: - Warn if a host bridge has no NUMA info (Yunsheng Lin) - Add PCI_STD_NUM_BARS for the number of standard BARs (Denis Efremov) Resource management: - Fix boot-time Embedded Controller GPE storm caused by incorrect resource assignment after ACPI Bus Check Notification (Mika Westerberg) - Protect pci_reassign_bridge_resources() against concurrent addition/removal (Benjamin Herrenschmidt) - Fix bridge dma_ranges resource list cleanup (Rob Herring) - Add "pci=hpmmiosize" and "pci=hpmmioprefsize" parameters to control the MMIO and prefetchable MMIO window sizes of hotplug bridges independently (Nicholas Johnson) - Fix MMIO/MMIO_PREF window assignment that assigned more space than desired (Nicholas Johnson) - Only enforce bus numbers from bridge EA if the bridge has EA devices downstream (Subbaraya Sundeep) - Consolidate DT "dma-ranges" parsing and convert all host drivers to use shared parsing (Rob Herring) Error reporting: - Restore AER capability after resume (Mayurkumar Patel) - Add PoisonTLPBlocked AER counter (Rajat Jain) - Use for_each_set_bit() to simplify AER code (Andy Shevchenko) - Fix AER kernel-doc (Andy Shevchenko) - Add "pcie_ports=dpc-native" parameter to allow native use of DPC even if platform didn't grant control over AER (Olof Johansson) Hotplug: - Avoid returning prematurely from sysfs requests to enable or disable a PCIe hotplug slot (Lukas Wunner) - Don't disable interrupts twice when suspending hotplug ports (Mika Westerberg) - Fix deadlocks when PCIe ports are hot-removed while suspended (Mika Westerberg) Power management: - Remove unnecessary ASPM locking (Bjorn Helgaas) - Add support for disabling L1 PM Substates (Heiner Kallweit) - Allow re-enabling Clock PM after it has been disabled (Heiner Kallweit) - Add sysfs attributes for controlling ASPM link states (Heiner Kallweit) - Remove CONFIG_PCIEASPM_DEBUG, including "link_state" and "clk_ctl" sysfs files (Heiner Kallweit) - Avoid AMD FCH XHCI USB PME# from D0 defect that prevents wakeup on USB 2.0 or 1.1 connect events (Kai-Heng Feng) - Move power state check out of pci_msi_supported() (Bjorn Helgaas) - Fix incorrect MSI-X masking on resume and revert related nvme quirk for Kingston NVME SSD running FW E8FK11.T (Jian-Hong Pan) - Always return devices to D0 when thawing to fix hibernation with drivers like mlx4 that used legacy power management (previously we only did it for drivers with new power management ops) (Dexuan Cui) - Clear PCIe PME Status even for legacy power management (Bjorn Helgaas) - Fix PCI PM documentation errors (Bjorn Helgaas) - Use dev_printk() for more power management messages (Bjorn Helgaas) - Apply D2 delay as milliseconds, not microseconds (Bjorn Helgaas) - Convert xen-platform from legacy to generic power management (Bjorn Helgaas) - Removed unused .resume_early() and .suspend_late() legacy power management hooks (Bjorn Helgaas) - Rearrange power management code for clarity (Rafael J. Wysocki) - Decode power states more clearly ("4" or "D4" really refers to "D3cold") (Bjorn Helgaas) - Notice when reading PM Control register returns an error (~0) instead of interpreting it as being in D3hot (Bjorn Helgaas) - Add missing link delays required by the PCIe spec (Mika Westerberg) Virtualization: - Move pci_prg_resp_pasid_required() to CONFIG_PCI_PRI (Bjorn Helgaas) - Allow VFs to use PRI (the PF PRI is shared by the VFs, but the code previously didn't recognize that) (Kuppuswamy Sathyanarayanan) - Allow VFs to use PASID (the PF PASID capability is shared by the VFs, but the code previously didn't recognize that) (Kuppuswamy Sathyanarayanan) - Disconnect PF and VF ATS enablement, since ATS in PFs and associated VFs can be enabled independently (Kuppuswamy Sathyanarayanan) - Cache PRI and PASID capability offsets (Kuppuswamy Sathyanarayanan) - Cache the PRI PRG Response PASID Required bit (Bjorn Helgaas) - Consolidate ATS declarations in linux/pci-ats.h (Krzysztof Wilczynski) - Remove unused PRI and PASID stubs (Bjorn Helgaas) - Removed unnecessary EXPORT_SYMBOL_GPL() from ATS, PRI, and PASID interfaces that are only used by built-in IOMMU drivers (Bjorn Helgaas) - Hide PRI and PASID state restoration functions used only inside the PCI core (Bjorn Helgaas) - Add a DMA alias quirk for the Intel VCA NTB (Slawomir Pawlowski) - Serialize sysfs sriov_numvfs reads vs writes (Pierre Crégut) - Update Cavium ACS quirk for ThunderX2 and ThunderX3 (George Cherian) - Fix the UPDCR register address in the Intel ACS quirk (Steffen Liebergeld) - Unify ACS quirk implementations (Bjorn Helgaas) Amlogic Meson host bridge driver: - Fix meson PERST# GPIO polarity problem (Remi Pommarel) - Add DT bindings for Amlogic Meson G12A (Neil Armstrong) - Fix meson clock names to match DT bindings (Neil Armstrong) - Add meson support for Amlogic G12A SoC with separate shared PHY (Neil Armstrong) - Add meson extended PCIe PHY functions for Amlogic G12A USB3+PCIe combo PHY (Neil Armstrong) - Add arm64 DT for Amlogic G12A PCIe controller node (Neil Armstrong) - Add commented-out description of VIM3 USB3/PCIe mux in arm64 DT (Neil Armstrong) Broadcom iProc host bridge driver: - Invalidate iProc PAXB address mapping before programming it (Abhishek Shah) - Fix iproc-msi and mvebu __iomem annotations (Ben Dooks) Cadence host bridge driver: - Refactor Cadence PCIe host controller to use as a library for both host and endpoint (Tom Joseph) Freescale Layerscape host bridge driver: - Add layerscape LS1028a support (Xiaowei Bao) Intel VMD host bridge driver: - Add VMD bus 224-255 restriction decode (Jon Derrick) - Add VMD 8086:9A0B device ID (Jon Derrick) - Remove Keith from VMD maintainer list (Keith Busch) Marvell ARMADA 3700 / Aardvark host bridge driver: - Use LTSSM state to build link training flag since Aardvark doesn't implement the Link Training bit (Remi Pommarel) - Delay before training Aardvark link in case PERST# was asserted before the driver probe (Remi Pommarel) - Fix Aardvark issues with Root Control reads and writes (Remi Pommarel) - Don't rely on jiffies in Aardvark config access path since interrupts may be disabled (Remi Pommarel) - Fix Aardvark big-endian support (Grzegorz Jaszczyk) Marvell ARMADA 370 / XP host bridge driver: - Make mvebu_pci_bridge_emul_ops static (Ben Dooks) Microsoft Hyper-V host bridge driver: - Add hibernation support for Hyper-V virtual PCI devices (Dexuan Cui) - Track Hyper-V pci_protocol_version per-hbus, not globally (Dexuan Cui) - Avoid kmemleak false positive on hv hbus buffer (Dexuan Cui) Mobiveil host bridge driver: - Change mobiveil csr_read()/write() function names that conflict with riscv arch functions (Kefeng Wang) NVIDIA Tegra host bridge driver: - Fix Tegra CLKREQ dependency programming (Vidya Sagar) Renesas R-Car host bridge driver: - Remove unnecessary header include from rcar (Andrew Murray) - Tighten register index checking for rcar inbound range programming (Marek Vasut) - Fix rcar inbound range alignment calculation to improve packing of multiple entries (Marek Vasut) - Update rcar MACCTLR setting to match documentation (Yoshihiro Shimoda) - Clear bit 0 of MACCTLR before PCIETCTLR.CFINIT per manual (Yoshihiro Shimoda) - Add Marek Vasut and Yoshihiro Shimoda as R-Car maintainers (Simon Horman) Rockchip host bridge driver: - Make rockchip 0V9 and 1V8 power regulators non-optional (Robin Murphy) Socionext UniPhier host bridge driver: - Set uniphier to host (RC) mode always (Kunihiko Hayashi) Endpoint drivers: - Fix endpoint driver sign extension problem when shifting page number to phys_addr_t (Alan Mikhak) Misc: - Add NumaChip SPDX header (Krzysztof Wilczynski) - Replace EXTRA_CFLAGS with ccflags-y (Krzysztof Wilczynski) - Remove unused includes (Krzysztof Wilczynski) - Removed unused sysfs attribute groups (Ben Dooks) - Remove PTM and ASPM dependencies on PCIEPORTBUS (Bjorn Helgaas) - Add PCIe Link Control 2 register field definitions to replace magic numbers in AMDGPU and Radeon CIK/SI (Bjorn Helgaas) - Fix incorrect Link Control 2 Transmit Margin usage in AMDGPU and Radeon CIK/SI PCIe Gen3 link training (Bjorn Helgaas) - Use pcie_capability_read_word() instead of pci_read_config_word() in AMDGPU and Radeon CIK/SI (Frederick Lawler) - Remove unused pci_irq_get_node() Greg Kroah-Hartman) - Make asm/msi.h mandatory and simplify PCI_MSI_IRQ_DOMAIN Kconfig (Palmer Dabbelt, Michal Simek) - Read all 64 bits of Switchtec part_event_bitmap (Logan Gunthorpe) - Fix erroneous intel-iommu dependency on CONFIG_AMD_IOMMU (Bjorn Helgaas) - Fix bridge emulation big-endian support (Grzegorz Jaszczyk) - Fix dwc find_next_bit() usage (Niklas Cassel) - Fix pcitest.c fd leak (Hewenliang) - Fix typos and comments (Bjorn Helgaas) - Fix Kconfig whitespace errors (Krzysztof Kozlowski)" * tag 'pci-v5.5-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (160 commits) PCI: Remove PCI_MSI_IRQ_DOMAIN architecture whitelist asm-generic: Make msi.h a mandatory include/asm header Revert "nvme: Add quirk for Kingston NVME SSD running FW E8FK11.T" PCI/MSI: Fix incorrect MSI-X masking on resume PCI/MSI: Move power state check out of pci_msi_supported() PCI/MSI: Remove unused pci_irq_get_node() PCI: hv: Avoid a kmemleak false positive caused by the hbus buffer PCI: hv: Change pci_protocol_version to per-hbus PCI: hv: Add hibernation support PCI: hv: Reorganize the code in preparation of hibernation MAINTAINERS: Remove Keith from VMD maintainer PCI/ASPM: Remove PCIEASPM_DEBUG Kconfig option and related code PCI/ASPM: Add sysfs attributes for controlling ASPM link states PCI: Fix indentation drm/radeon: Prefer pcie_capability_read_word() drm/radeon: Replace numbers with PCI_EXP_LNKCTL2 definitions drm/radeon: Correct Transmit Margin masks drm/amdgpu: Prefer pcie_capability_read_word() PCI: uniphier: Set mode register to host mode drm/amdgpu: Replace numbers with PCI_EXP_LNKCTL2 definitions ...
2019-12-01Merge tag 'y2038-cleanups-5.5' of ↵Linus Torvalds
git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground Pull y2038 cleanups from Arnd Bergmann: "y2038 syscall implementation cleanups This is a series of cleanups for the y2038 work, mostly intended for namespace cleaning: the kernel defines the traditional time_t, timeval and timespec types that often lead to y2038-unsafe code. Even though the unsafe usage is mostly gone from the kernel, having the types and associated functions around means that we can still grow new users, and that we may be missing conversions to safe types that actually matter. There are still a number of driver specific patches needed to get the last users of these types removed, those have been submitted to the respective maintainers" Link: https://lore.kernel.org/lkml/20191108210236.1296047-1-arnd@arndb.de/ * tag 'y2038-cleanups-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (26 commits) y2038: alarm: fix half-second cut-off y2038: ipc: fix x32 ABI breakage y2038: fix typo in powerpc vdso "LOPART" y2038: allow disabling time32 system calls y2038: itimer: change implementation to timespec64 y2038: move itimer reset into itimer.c y2038: use compat_{get,set}_itimer on alpha y2038: itimer: compat handling to itimer.c y2038: time: avoid timespec usage in settimeofday() y2038: timerfd: Use timespec64 internally y2038: elfcore: Use __kernel_old_timeval for process times y2038: make ns_to_compat_timeval use __kernel_old_timeval y2038: socket: use __kernel_old_timespec instead of timespec y2038: socket: remove timespec reference in timestamping y2038: syscalls: change remaining timeval to __kernel_old_timeval y2038: rusage: use __kernel_old_timeval y2038: uapi: change __kernel_time_t to __kernel_old_time_t y2038: stat: avoid 'time_t' in 'struct stat' y2038: ipc: remove __kernel_time_t reference from headers y2038: vdso: powerpc: avoid timespec references ...
2019-11-28Merge tag 'ioremap-5.5' of git://git.infradead.org/users/hch/ioremapLinus Torvalds
Pull generic ioremap support from Christoph Hellwig: "This adds the remaining bits for an entirely generic ioremap and iounmap to lib/ioremap.c. To facilitate that, it cleans up the giant mess of weird ioremap variants we had with no users outside the arch code. For now just the three newest ports use the code, but there is more than a handful others that can be converted without too much work. Summary: - clean up various obsolete ioremap and iounmap variants - add a new generic ioremap implementation and switch csky, nds32 and riscv over to it" * tag 'ioremap-5.5' of git://git.infradead.org/users/hch/ioremap: (21 commits) nds32: use generic ioremap csky: use generic ioremap csky: remove ioremap_cache riscv: use the generic ioremap code lib: provide a simple generic ioremap implementation sh: remove __iounmap nios2: remove __iounmap hexagon: remove __iounmap m68k: rename __iounmap and mark it static arch: rely on asm-generic/io.h for default ioremap_* definitions asm-generic: don't provide ioremap for CONFIG_MMU asm-generic: ioremap_uc should behave the same with and without MMU xtensa: clean up ioremap x86: Clean up ioremap() parisc: remove __ioremap nios2: remove __ioremap alpha: remove the unused __ioremap wrapper hexagon: clean up ioremap ia64: rename ioremap_nocache to ioremap_uc unicore32: remove ioremap_cached ...
2019-11-26Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm updates from Ingo Molnar: "The main changes in this cycle were: - Cross-arch changes to move the linker sections for NOTES and EXCEPTION_TABLE into the RO_DATA area, where they belong on most architectures. (Kees Cook) - Switch the x86 linker fill byte from x90 (NOP) to 0xcc (INT3), to trap jumps into the middle of those padding areas instead of sliding execution. (Kees Cook) - A thorough cleanup of symbol definitions within x86 assembler code. The rather randomly named macros got streamlined around a (hopefully) straightforward naming scheme: SYM_START(name, linkage, align...) SYM_END(name, sym_type) SYM_FUNC_START(name) SYM_FUNC_END(name) SYM_CODE_START(name) SYM_CODE_END(name) SYM_DATA_START(name) SYM_DATA_END(name) etc - with about three times of these basic primitives with some label, local symbol or attribute variant, expressed via postfixes. No change in functionality intended. (Jiri Slaby) - Misc other changes, cleanups and smaller fixes" * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (67 commits) x86/entry/64: Remove pointless jump in paranoid_exit x86/entry/32: Remove unused resume_userspace label x86/build/vdso: Remove meaningless CFLAGS_REMOVE_*.o m68k: Convert missed RODATA to RO_DATA x86/vmlinux: Use INT3 instead of NOP for linker fill bytes x86/mm: Report actual image regions in /proc/iomem x86/mm: Report which part of kernel image is freed x86/mm: Remove redundant address-of operators on addresses xtensa: Move EXCEPTION_TABLE to RO_DATA segment powerpc: Move EXCEPTION_TABLE to RO_DATA segment parisc: Move EXCEPTION_TABLE to RO_DATA segment microblaze: Move EXCEPTION_TABLE to RO_DATA segment ia64: Move EXCEPTION_TABLE to RO_DATA segment h8300: Move EXCEPTION_TABLE to RO_DATA segment c6x: Move EXCEPTION_TABLE to RO_DATA segment arm64: Move EXCEPTION_TABLE to RO_DATA segment alpha: Move EXCEPTION_TABLE to RO_DATA segment x86/vmlinux: Move EXCEPTION_TABLE to RO_DATA segment x86/vmlinux: Actually use _etext for the end of the text segment vmlinux.lds.h: Allow EXCEPTION_TABLE to live in RO_DATA ...
2019-11-25Merge tag 'printk-for-5.5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk Pull printk updates from Petr Mladek: - Allow to print symbolic error names via new %pe modifier. - Use pr_warn() instead of the remaining pr_warning() calls. Fix formatting of the related lines. - Add VSPRINTF entry to MAINTAINERS. * tag 'printk-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: (32 commits) checkpatch: don't warn about new vsprintf pointer extension '%pe' MAINTAINERS: Add VSPRINTF tools lib api: Renaming pr_warning to pr_warn ASoC: samsung: Use pr_warn instead of pr_warning lib: cpu_rmap: Use pr_warn instead of pr_warning trace: Use pr_warn instead of pr_warning dma-debug: Use pr_warn instead of pr_warning vgacon: Use pr_warn instead of pr_warning fs: afs: Use pr_warn instead of pr_warning sh/intc: Use pr_warn instead of pr_warning scsi: Use pr_warn instead of pr_warning platform/x86: intel_oaktrail: Use pr_warn instead of pr_warning platform/x86: asus-laptop: Use pr_warn instead of pr_warning platform/x86: eeepc-laptop: Use pr_warn instead of pr_warning oprofile: Use pr_warn instead of pr_warning of: Use pr_warn instead of pr_warning macintosh: Use pr_warn instead of pr_warning idsn: Use pr_warn instead of pr_warning ide: Use pr_warn instead of pr_warning crypto: n2: Use pr_warn instead of pr_warning ...
2019-11-15y2038: use compat_{get,set}_itimer on alphaArnd Bergmann
The itimer handling for the old alpha osf_setitimer/osf_getitimer system calls is identical to the compat version of getitimer/setitimer, so just use those directly. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-11-15y2038: rusage: use __kernel_old_timevalArnd Bergmann
There are two 'struct timeval' fields in 'struct rusage'. Unfortunately the definition of timeval is now ambiguous when used in user space with a libc that has a 64-bit time_t, and this also changes the 'rusage' definition in user space in a way that is incompatible with the system call interface. While there is no good solution to avoid all ambiguity here, change the definition in the kernel headers to be compatible with the kernel ABI, using __kernel_old_timeval as an unambiguous base type. In previous discussions, there was also a plan to add a replacement for rusage based on 64-bit timestamps and nanosecond resolution, i.e. 'struct __kernel_timespec'. I have patches for that as well, if anyone thinks we should do that. Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-11-11alpha: remove the unused __ioremap wrapperChristoph Hellwig
No need for the additional namespace pollution. Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-11-04alpha: Move EXCEPTION_TABLE to RO_DATA segmentKees Cook
Since the EXCEPTION_TABLE is read-only, collapse it into RO_DATA. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: linux-alpha@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-c6x-dev@linux-c6x.org Cc: linux-ia64@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s390@vger.kernel.org Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Richard Henderson <rth@twiddle.net> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Will Deacon <will@kernel.org> Cc: x86@kernel.org Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: https://lkml.kernel.org/r/20191029211351.13243-18-keescook@chromium.org
2019-11-04vmlinux.lds.h: Replace RW_DATA_SECTION with RW_DATAKees Cook
Rename RW_DATA_SECTION to RW_DATA. (Calling this a "section" is a lie, since it's multiple sections and section flags cannot be applied to the macro.) Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # s390 Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: linux-alpha@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-c6x-dev@linux-c6x.org Cc: linux-ia64@vger.kernel.org Cc: linux-s390@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Will Deacon <will@kernel.org> Cc: x86-ml <x86@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: https://lkml.kernel.org/r/20191029211351.13243-14-keescook@chromium.org
2019-11-04vmlinux.lds.h: Replace RODATA with RO_DATAKees Cook
There's no reason to keep the RODATA macro: replace the callers with the expected RO_DATA macro. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: linux-alpha@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-c6x-dev@linux-c6x.org Cc: linux-ia64@vger.kernel.org Cc: linux-s390@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Will Deacon <will@kernel.org> Cc: x86-ml <x86@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: https://lkml.kernel.org/r/20191029211351.13243-12-keescook@chromium.org
2019-11-04vmlinux.lds.h: Move NOTES into RO_DATAKees Cook
The .notes section should be non-executable read-only data. As such, move it to the RO_DATA macro instead of being per-architecture defined. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # s390 Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: linux-alpha@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-c6x-dev@linux-c6x.org Cc: linux-ia64@vger.kernel.org Cc: linux-s390@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Will Deacon <will@kernel.org> Cc: x86-ml <x86@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: https://lkml.kernel.org/r/20191029211351.13243-11-keescook@chromium.org
2019-11-04vmlinux.lds.h: Move Program Header restoration into NOTES macroKees Cook
In preparation for moving NOTES into RO_DATA, make the Program Header assignment restoration be part of the NOTES macro itself. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # s390 Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: linux-alpha@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-c6x-dev@linux-c6x.org Cc: linux-ia64@vger.kernel.org Cc: linux-s390@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Will Deacon <will@kernel.org> Cc: x86-ml <x86@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: https://lkml.kernel.org/r/20191029211351.13243-10-keescook@chromium.org
2019-11-04vmlinux.lds.h: Provide EMIT_PT_NOTE to indicate export of .notesKees Cook
In preparation for moving NOTES into RO_DATA, provide a mechanism for architectures that want to emit a PT_NOTE Program Header to do so. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # s390 Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: linux-alpha@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-c6x-dev@linux-c6x.org Cc: linux-ia64@vger.kernel.org Cc: linux-s390@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Will Deacon <will@kernel.org> Cc: x86-ml <x86@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: https://lkml.kernel.org/r/20191029211351.13243-9-keescook@chromium.org
2019-11-04alpha: Rename PT_LOAD identifier "kernel" to "text"Kees Cook
In preparation for moving NOTES into RO_DATA, rename the linker script internal identifier for the PT_LOAD Program Header from "kernel" to "text" to match other architectures. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: linux-alpha@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-c6x-dev@linux-c6x.org Cc: linux-ia64@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s390@vger.kernel.org Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Richard Henderson <rth@twiddle.net> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Will Deacon <will@kernel.org> Cc: x86-ml <x86@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: https://lkml.kernel.org/r/20191029211351.13243-5-keescook@chromium.org
2019-10-18alpha: Use pr_warn instead of pr_warningKefeng Wang
As said in commit f2c2cbcc35d4 ("powerpc: Use pr_warn instead of pr_warning"), removing pr_warning so all logging messages use a consistent <prefix>_warn style. Let's do it. Link: http://lkml.kernel.org/r/20191018031850.48498-1-wangkefeng.wang@huawei.com To: linux-kernel@vger.kernel.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Petr Mladek <pmladek@suse.com>
2019-10-14PCI: Add PCI_STD_NUM_BARS for the number of standard BARsDenis Efremov
Code that iterates over all standard PCI BARs typically uses PCI_STD_RESOURCE_END. However, that requires the unusual test "i <= PCI_STD_RESOURCE_END" rather than something the typical "i < PCI_STD_NUM_BARS". Add a definition for PCI_STD_NUM_BARS and change loops to use the more idiomatic C style to help avoid fencepost errors. Link: https://lore.kernel.org/r/20190927234026.23342-1-efremov@linux.com Link: https://lore.kernel.org/r/20190927234308.23935-1-efremov@linux.com Link: https://lore.kernel.org/r/20190916204158.6889-3-efremov@linux.com Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Sebastian Ott <sebott@linux.ibm.com> # arch/s390/ Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> # video/fbdev/ Acked-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com> # pci/controller/dwc/ Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com> # scsi/pm8001/ Acked-by: Martin K. Petersen <martin.petersen@oracle.com> # scsi/pm8001/ Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # memstick/
2019-09-25mm: introduce MADV_PAGEOUTMinchan Kim
When a process expects no accesses to a certain memory range for a long time, it could hint kernel that the pages can be reclaimed instantly but data should be preserved for future use. This could reduce workingset eviction so it ends up increasing performance. This patch introduces the new MADV_PAGEOUT hint to madvise(2) syscall. MADV_PAGEOUT can be used by a process to mark a memory range as not expected to be used for a long time so that kernel reclaims *any LRU* pages instantly. The hint can help kernel in deciding which pages to evict proactively. A note: It doesn't apply SWAP_CLUSTER_MAX LRU page isolation limit intentionally because it's automatically bounded by PMD size. If PMD size(e.g., 256) makes some trouble, we could fix it later by limit it to SWAP_CLUSTER_MAX[1]. - man-page material MADV_PAGEOUT (since Linux x.x) Do not expect access in the near future so pages in the specified regions could be reclaimed instantly regardless of memory pressure. Thus, access in the range after successful operation could cause major page fault but never lose the up-to-date contents unlike MADV_DONTNEED. Pages belonging to a shared mapping are only processed if a write access is allowed for the calling process. MADV_PAGEOUT cannot be applied to locked pages, Huge TLB pages, or VM_PFNMAP pages. [1] https://lore.kernel.org/lkml/20190710194719.GS29695@dhcp22.suse.cz/ [minchan@kernel.org: clear PG_active on MADV_PAGEOUT] Link: http://lkml.kernel.org/r/20190802200643.GA181880@google.com [akpm@linux-foundation.org: resolve conflicts with hmm.git] Link: http://lkml.kernel.org/r/20190726023435.214162-5-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: kbuild test robot <lkp@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Chris Zankel <chris@zankel.net> Cc: Daniel Colascione <dancol@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tim Murray <timmurray@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25mm: introduce MADV_COLDMinchan Kim
Patch series "Introduce MADV_COLD and MADV_PAGEOUT", v7. - Background The Android terminology used for forking a new process and starting an app from scratch is a cold start, while resuming an existing app is a hot start. While we continually try to improve the performance of cold starts, hot starts will always be significantly less power hungry as well as faster so we are trying to make hot start more likely than cold start. To increase hot start, Android userspace manages the order that apps should be killed in a process called ActivityManagerService. ActivityManagerService tracks every Android app or service that the user could be interacting with at any time and translates that into a ranked list for lmkd(low memory killer daemon). They are likely to be killed by lmkd if the system has to reclaim memory. In that sense they are similar to entries in any other cache. Those apps are kept alive for opportunistic performance improvements but those performance improvements will vary based on the memory requirements of individual workloads. - Problem Naturally, cached apps were dominant consumers of memory on the system. However, they were not significant consumers of swap even though they are good candidate for swap. Under investigation, swapping out only begins once the low zone watermark is hit and kswapd wakes up, but the overall allocation rate in the system might trip lmkd thresholds and cause a cached process to be killed(we measured performance swapping out vs. zapping the memory by killing a process. Unsurprisingly, zapping is 10x times faster even though we use zram which is much faster than real storage) so kill from lmkd will often satisfy the high zone watermark, resulting in very few pages actually being moved to swap. - Approach The approach we chose was to use a new interface to allow userspace to proactively reclaim entire processes by leveraging platform information. This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages that are known to be cold from userspace and to avoid races with lmkd by reclaiming apps as soon as they entered the cached state. Additionally, it could provide many chances for platform to use much information to optimize memory efficiency. To achieve the goal, the patchset introduce two new options for madvise. One is MADV_COLD which will deactivate activated pages and the other is MADV_PAGEOUT which will reclaim private pages instantly. These new options complement MADV_DONTNEED and MADV_FREE by adding non-destructive ways to gain some free memory space. MADV_PAGEOUT is similar to MADV_DONTNEED in a way that it hints the kernel that memory region is not currently needed and should be reclaimed immediately; MADV_COLD is similar to MADV_FREE in a way that it hints the kernel that memory region is not currently needed and should be reclaimed when memory pressure rises. This patch (of 5): When a process expects no accesses to a certain memory range, it could give a hint to kernel that the pages can be reclaimed when memory pressure happens but data should be preserved for future use. This could reduce workingset eviction so it ends up increasing performance. This patch introduces the new MADV_COLD hint to madvise(2) syscall. MADV_COLD can be used by a process to mark a memory range as not expected to be used in the near future. The hint can help kernel in deciding which pages to evict early during memory pressure. It works for every LRU pages like MADV_[DONTNEED|FREE]. IOW, It moves active file page -> inactive file LRU active anon page -> inacdtive anon LRU Unlike MADV_FREE, it doesn't move active anonymous pages to inactive file LRU's head because MADV_COLD is a little bit different symantic. MADV_FREE means it's okay to discard when the memory pressure because the content of the page is *garbage* so freeing such pages is almost zero overhead since we don't need to swap out and access afterward causes just minor fault. Thus, it would make sense to put those freeable pages in inactive file LRU to compete other used-once pages. It makes sense for implmentaion point of view, too because it's not swapbacked memory any longer until it would be re-dirtied. Even, it could give a bonus to make them be reclaimed on swapless system. However, MADV_COLD doesn't mean garbage so reclaiming them requires swap-out/in in the end so it's bigger cost. Since we have designed VM LRU aging based on cost-model, anonymous cold pages would be better to position inactive anon's LRU list, not file LRU. Furthermore, it would help to avoid unnecessary scanning if system doesn't have a swap device. Let's start simpler way without adding complexity at this moment. However, keep in mind, too that it's a caveat that workloads with a lot of pages cache are likely to ignore MADV_COLD on anonymous memory because we rarely age anonymous LRU lists. * man-page material MADV_COLD (since Linux x.x) Pages in the specified regions will be treated as less-recently-accessed compared to pages in the system with similar access frequencies. In contrast to MADV_FREE, the contents of the region are preserved regardless of subsequent writes to pages. MADV_COLD cannot be applied to locked pages, Huge TLB pages, or VM_PFNMAP pages. [akpm@linux-foundation.org: resolve conflicts with hmm.git] Link: http://lkml.kernel.org/r/20190726023435.214162-2-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: kbuild test robot <lkp@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Chris Zankel <chris@zankel.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Daniel Colascione <dancol@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tim Murray <timmurray@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-24mm: consolidate pgtable_cache_init() and pgd_cache_init()Mike Rapoport
Both pgtable_cache_init() and pgd_cache_init() are used to initialize kmem cache for page table allocations on several architectures that do not use PAGE_SIZE tables for one or more levels of the page table hierarchy. Most architectures do not implement these functions and use __weak default NOP implementation of pgd_cache_init(). Since there is no such default for pgtable_cache_init(), its empty stub is duplicated among most architectures. Rename the definitions of pgd_cache_init() to pgtable_cache_init() and drop empty stubs of pgtable_cache_init(). Link: http://lkml.kernel.org/r/1566457046-22637-1-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Will Deacon <will@kernel.org> [arm64] Acked-by: Thomas Gleixner <tglx@linutronix.de> [x86] Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-24mm: remove quicklist page table cachesNicholas Piggin
Patch series "mm: remove quicklist page table caches". A while ago Nicholas proposed to remove quicklist page table caches [1]. I've rebased his patch on the curren upstream and switched ia64 and sh to use generic versions of PTE allocation. [1] https://lore.kernel.org/linux-mm/20190711030339.20892-1-npiggin@gmail.com This patch (of 3): Remove page table allocator "quicklists". These have been around for a long time, but have not got much traction in the last decade and are only used on ia64 and sh architectures. The numbers in the initial commit look interesting but probably don't apply anymore. If anybody wants to resurrect this it's in the git history, but it's unhelpful to have this code and divergent allocator behaviour for minor archs. Also it might be better to instead make more general improvements to page allocator if this is still so slow. Link: http://lkml.kernel.org/r/1565250728-21721-2-git-send-email-rppt@linux.ibm.com Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-20Merge tag 'kbuild-v5.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - add modpost warn exported symbols marked as 'static' because 'static' and EXPORT_SYMBOL is an odd combination - break the build early if gold linker is used - optimize the Bison rule to produce .c and .h files by a single pattern rule - handle PREEMPT_RT in the module vermagic and UTS_VERSION - warn CONFIG options leaked to the user-space except existing ones - make single targets work properly - rebuild modules when module linker scripts are updated - split the module final link stage into scripts/Makefile.modfinal - fix the missed error code in merge_config.sh - improve the error message displayed on the attempt of the O= build in unclean source tree - remove 'clean-dirs' syntax - disable -Wimplicit-fallthrough warning for Clang - add CONFIG_CC_OPTIMIZE_FOR_SIZE_O3 for ARC - remove ARCH_{CPP,A,C}FLAGS variables - add $(BASH) to run bash scripts - change *CFLAGS_<basetarget>.o to take the relative path to $(obj) instead of the basename - stop suppressing Clang's -Wunused-function warnings when W=1 - fix linux/export.h to avoid genksyms calculating CRC of trimmed exported symbols - misc cleanups * tag 'kbuild-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (63 commits) genksyms: convert to SPDX License Identifier for lex.l and parse.y modpost: use __section in the output to *.mod.c modpost: use MODULE_INFO() for __module_depends export.h, genksyms: do not make genksyms calculate CRC of trimmed symbols export.h: remove defined(__KERNEL__), which is no longer needed kbuild: allow Clang to find unused static inline functions for W=1 build kbuild: rename KBUILD_ENABLE_EXTRA_GCC_CHECKS to KBUILD_EXTRA_WARN kbuild: refactor scripts/Makefile.extrawarn merge_config.sh: ignore unwanted grep errors kbuild: change *FLAGS_<basetarget>.o to take the path relative to $(obj) modpost: add NOFAIL to strndup modpost: add guid_t type definition kbuild: add $(BASH) to run scripts with bash-extension kbuild: remove ARCH_{CPP,A,C}FLAGS kbuild,arc: add CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE_O3 for ARC kbuild: Do not enable -Wimplicit-fallthrough for clang for now kbuild: clean up subdir-ymn calculation in Makefile.clean kbuild: remove unneeded '+' marker from cmd_clean kbuild: remove clean-dirs syntax kbuild: check clean srctree even earlier ...
2019-09-04dma-mapping: explicitly wire up ->mmap and ->get_sgtableChristoph Hellwig
While the default ->mmap and ->get_sgtable implementations work for the majority of our dma_map_ops impementations they are inherently safe for others that don't use the page allocator or CMA and/or use their own way of remapping not covered by the common code. So remove the defaults if these methods are not wired up, but instead wire up the default implementations for all safe instances. Fixes: e1c7e324539a ("dma-mapping: always provide the dma_map_ops based implementation") Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-08-22kbuild: add CONFIG_ASM_MODVERSIONSMasahiro Yamada
Add CONFIG_ASM_MODVERSIONS. This allows to remove one if-conditional nesting in scripts/Makefile.build. scripts/Makefile.build is run every time Kbuild descends into a sub-directory. So, I want to avoid $(wildcard ...) evaluation where possible although computing $(wildcard ...) is so cheap that it may not make measurable performance difference. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
2019-07-17Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge more updates from Andrew Morton: "VM: - z3fold fixes and enhancements by Henry Burns and Vitaly Wool - more accurate reclaimed slab caches calculations by Yafang Shao - fix MAP_UNINITIALIZED UAPI symbol to not depend on config, by Christoph Hellwig - !CONFIG_MMU fixes by Christoph Hellwig - new novmcoredd parameter to omit device dumps from vmcore, by Kairui Song - new test_meminit module for testing heap and pagealloc initialization, by Alexander Potapenko - ioremap improvements for huge mappings, by Anshuman Khandual - generalize kprobe page fault handling, by Anshuman Khandual - device-dax hotplug fixes and improvements, by Pavel Tatashin - enable synchronous DAX fault on powerpc, by Aneesh Kumar K.V - add pte_devmap() support for arm64, by Robin Murphy - unify locked_vm accounting with a helper, by Daniel Jordan - several misc fixes core/lib: - new typeof_member() macro including some users, by Alexey Dobriyan - make BIT() and GENMASK() available in asm, by Masahiro Yamada - changed LIST_POISON2 on x86_64 to 0xdead000000000122 for better code generation, by Alexey Dobriyan - rbtree code size optimizations, by Michel Lespinasse - convert struct pid count to refcount_t, by Joel Fernandes get_maintainer.pl: - add --no-moderated switch to skip moderated ML's, by Joe Perches misc: - ptrace PTRACE_GET_SYSCALL_INFO interface - coda updates - gdb scripts, various" [ Using merge message suggestion from Vlastimil Babka, with some editing - Linus ] * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (100 commits) fs/select.c: use struct_size() in kmalloc() mm: add account_locked_vm utility function arm64: mm: implement pte_devmap support mm: introduce ARCH_HAS_PTE_DEVMAP mm: clean up is_device_*_page() definitions mm/mmap: move common defines to mman-common.h mm: move MAP_SYNC to asm-generic/mman-common.h device-dax: "Hotremove" persistent memory that is used like normal RAM mm/hotplug: make remove_memory() interface usable device-dax: fix memory and resource leak if hotplug fails include/linux/lz4.h: fix spelling and copy-paste errors in documentation ipc/mqueue.c: only perform resource calculation if user valid include/asm-generic/bug.h: fix "cut here" for WARN_ON for __WARN_TAINT architectures scripts/gdb: add helpers to find and list devices scripts/gdb: add lx-genpd-summary command drivers/pps/pps.c: clear offset flags in PPS_SETPARAMS ioctl kernel/pid.c: convert struct pid count to refcount_t drivers/rapidio/devices/rio_mport_cdev.c: NUL terminate some strings select: shift restore_saved_sigmask_unless() into poll_select_copy_remaining() select: change do_poll() to return -ERESTARTNOHAND rather than -EINTR ...
2019-07-16arch/*: remove unused isa_page_to_bus()Stephen Kitt
isa_page_to_bus() is deprecated and is no longer used anywhere. Remove it entirely. Link: http://lkml.kernel.org/r/20190613161155.16946-1-steve@sk2.org Signed-off-by: Stephen Kitt <steve@sk2.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-15arch: mark syscall number 435 reserved for clone3Christian Brauner
A while ago Arnd made it possible to give new system calls the same syscall number on all architectures (except alpha). To not break this nice new feature let's mark 435 for clone3 as reserved on all architectures that do not yet implement it. Even if an architecture does not plan to implement it this ensures that new system calls coming after clone3 will have the same number on all architectures. Signed-off-by: Christian Brauner <christian@brauner.io> Cc: linux-arch@vger.kernel.org Cc: linux-alpha@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-ia64@vger.kernel.org Cc: linux-m68k@lists.linux-m68k.org Cc: linux-mips@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s390@vger.kernel.org Cc: linux-sh@vger.kernel.org Cc: sparclinux@vger.kernel.org Link: https://lore.kernel.org/r/20190714192205.27190-2-christian@brauner.io Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Christian Brauner <christian@brauner.io>
2019-07-12Merge tag 'kconfig-v5.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kconfig updates from Masahiro Yamada: - always require argument for --defconfig and remove the hard-coded arch/$(ARCH)/defconfig path - make arch/$(SRCARCH)/configs/defconfig the new default of defconfig - some code cleanups * tag 'kconfig-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kconfig: remove meaningless if-conditional in conf_read() kconfig: Fix spelling of sym_is_changable unicore32: rename unicore32_defconfig to defconfig kconfig: make arch/*/configs/defconfig the default of KBUILD_DEFCONFIG kconfig: add static qualifier to expand_string() kconfig: require the argument of --defconfig kconfig: remove always false ifeq ($(KBUILD_DEFCONFIG,) conditional
2019-07-12alpha: switch to generic version of pte allocationMike Rapoport
alpha allocates PTE pages with __get_free_page() and uses GFP_KERNEL | __GFP_ZERO for the allocations. Switch it to the generic version that does exactly the same thing for the kernel page tables and adds __GFP_ACCOUNT for the user PTEs. The alpha pte_free() and pte_free_kernel() versions are identical to the generic ones and can be simply dropped. Link: http://lkml.kernel.org/r/1557296232-15361-3-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Guo Ren <ren_guo@c-sky.com> Cc: Helge Deller <deller@gmx.de> Cc: Ley Foon Tan <lftan@altera.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Paul Burton <paul.burton@mips.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Richard Weinberger <richard@nod.at> Cc: Russell King <linux@armlinux.org.uk> Cc: Sam Creasey <sammy@sammy.net> Cc: Vincent Chen <deanbo422@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: "Some highlights from this development cycle: 1) Big refactoring of ipv6 route and neigh handling to support nexthop objects configurable as units from userspace. From David Ahern. 2) Convert explored_states in BPF verifier into a hash table, significantly decreased state held for programs with bpf2bpf calls, from Alexei Starovoitov. 3) Implement bpf_send_signal() helper, from Yonghong Song. 4) Various classifier enhancements to mvpp2 driver, from Maxime Chevallier. 5) Add aRFS support to hns3 driver, from Jian Shen. 6) Fix use after free in inet frags by allocating fqdirs dynamically and reworking how rhashtable dismantle occurs, from Eric Dumazet. 7) Add act_ctinfo packet classifier action, from Kevin Darbyshire-Bryant. 8) Add TFO key backup infrastructure, from Jason Baron. 9) Remove several old and unused ISDN drivers, from Arnd Bergmann. 10) Add devlink notifications for flash update status to mlxsw driver, from Jiri Pirko. 11) Lots of kTLS offload infrastructure fixes, from Jakub Kicinski. 12) Add support for mv88e6250 DSA chips, from Rasmus Villemoes. 13) Various enhancements to ipv6 flow label handling, from Eric Dumazet and Willem de Bruijn. 14) Support TLS offload in nfp driver, from Jakub Kicinski, Dirk van der Merwe, and others. 15) Various improvements to axienet driver including converting it to phylink, from Robert Hancock. 16) Add PTP support to sja1105 DSA driver, from Vladimir Oltean. 17) Add mqprio qdisc offload support to dpaa2-eth, from Ioana Radulescu. 18) Add devlink health reporting to mlx5, from Moshe Shemesh. 19) Convert stmmac over to phylink, from Jose Abreu. 20) Add PTP PHC (Physical Hardware Clock) support to mlxsw, from Shalom Toledo. 21) Add nftables SYNPROXY support, from Fernando Fernandez Mancera. 22) Convert tcp_fastopen over to use SipHash, from Ard Biesheuvel. 23) Track spill/fill of constants in BPF verifier, from Alexei Starovoitov. 24) Support bounded loops in BPF, from Alexei Starovoitov. 25) Various page_pool API fixes and improvements, from Jesper Dangaard Brouer. 26) Just like ipv4, support ref-countless ipv6 route handling. From Wei Wang. 27) Support VLAN offloading in aquantia driver, from Igor Russkikh. 28) Add AF_XDP zero-copy support to mlx5, from Maxim Mikityanskiy. 29) Add flower GRE encap/decap support to nfp driver, from Pieter Jansen van Vuuren. 30) Protect against stack overflow when using act_mirred, from John Hurley. 31) Allow devmap map lookups from eBPF, from Toke Høiland-Jørgensen. 32) Use page_pool API in netsec driver, Ilias Apalodimas. 33) Add Google gve network driver, from Catherine Sullivan. 34) More indirect call avoidance, from Paolo Abeni. 35) Add kTLS TX HW offload support to mlx5, from Tariq Toukan. 36) Add XDP_REDIRECT support to bnxt_en, from Andy Gospodarek. 37) Add MPLS manipulation actions to TC, from John Hurley. 38) Add sending a packet to connection tracking from TC actions, and then allow flower classifier matching on conntrack state. From Paul Blakey. 39) Netfilter hw offload support, from Pablo Neira Ayuso" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2080 commits) net/mlx5e: Return in default case statement in tx_post_resync_params mlx5: Return -EINVAL when WARN_ON_ONCE triggers in mlx5e_tls_resync(). net: dsa: add support for BRIDGE_MROUTER attribute pkt_sched: Include const.h net: netsec: remove static declaration for netsec_set_tx_de() net: netsec: remove superfluous if statement netfilter: nf_tables: add hardware offload support net: flow_offload: rename tc_cls_flower_offload to flow_cls_offload net: flow_offload: add flow_block_cb_is_busy() and use it net: sched: remove tcf block API drivers: net: use flow block API net: sched: use flow block API net: flow_offload: add flow_block_cb_{priv, incref, decref}() net: flow_offload: add list handling functions net: flow_offload: add flow_block_cb_alloc() and flow_block_cb_free() net: flow_offload: rename TCF_BLOCK_BINDER_TYPE_* to FLOW_BLOCK_BINDER_TYPE_* net: flow_offload: rename TC_BLOCK_{UN}BIND to FLOW_BLOCK_{UN}BIND net: flow_offload: add flow_block_cb_setup_simple() net: hisilicon: Add an tx_desc to adapt HI13X1_GMAC net: hisilicon: Add an rx_desc to adapt HI13X1_GMAC ...
2019-07-10Merge tag 'pidfd-updates-v5.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull pidfd updates from Christian Brauner: "This adds two main features. - First, it adds polling support for pidfds. This allows process managers to know when a (non-parent) process dies in a race-free way. The notification mechanism used follows the same logic that is currently used when the parent of a task is notified of a child's death. With this patchset it is possible to put pidfds in an {e}poll loop and get reliable notifications for process (i.e. thread-group) exit. - The second feature compliments the first one by making it possible to retrieve pollable pidfds for processes that were not created using CLONE_PIDFD. A lot of processes get created with traditional PID-based calls such as fork() or clone() (without CLONE_PIDFD). For these processes a caller can currently not create a pollable pidfd. This is a problem for Android's low memory killer (LMK) and service managers such as systemd. Both patchsets are accompanied by selftests. It's perhaps worth noting that the work done so far and the work done in this branch for pidfd_open() and polling support do already see some adoption: - Android is in the process of backporting this work to all their LTS kernels [1] - Service managers make use of pidfd_send_signal but will need to wait until we enable waiting on pidfds for full adoption. - And projects I maintain make use of both pidfd_send_signal and CLONE_PIDFD [2] and will use polling support and pidfd_open() too" [1] https://android-review.googlesource.com/q/topic:%22pidfd+polling+support+4.9+backport%22 https://android-review.googlesource.com/q/topic:%22pidfd+polling+support+4.14+backport%22 https://android-review.googlesource.com/q/topic:%22pidfd+polling+support+4.19+backport%22 [2] https://github.com/lxc/lxc/blob/aab6e3eb73c343231cdde775db938994fc6f2803/src/lxc/start.c#L1753 * tag 'pidfd-updates-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: tests: add pidfd_open() tests arch: wire-up pidfd_open() pid: add pidfd_open() pidfd: add polling selftests pidfd: add polling support
2019-07-08Merge branch 'siginfo-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull force_sig() argument change from Eric Biederman: "A source of error over the years has been that force_sig has taken a task parameter when it is only safe to use force_sig with the current task. The force_sig function is built for delivering synchronous signals such as SIGSEGV where the userspace application caused a synchronous fault (such as a page fault) and the kernel responded with a signal. Because the name force_sig does not make this clear, and because the force_sig takes a task parameter the function force_sig has been abused for sending other kinds of signals over the years. Slowly those have been fixed when the oopses have been tracked down. This set of changes fixes the remaining abusers of force_sig and carefully rips out the task parameter from force_sig and friends making this kind of error almost impossible in the future" * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (27 commits) signal/x86: Move tsk inside of CONFIG_MEMORY_FAILURE in do_sigbus signal: Remove the signal number and task parameters from force_sig_info signal: Factor force_sig_info_to_task out of force_sig_info signal: Generate the siginfo in force_sig signal: Move the computation of force into send_signal and correct it. signal: Properly set TRACE_SIGNAL_LOSE_INFO in __send_signal signal: Remove the task parameter from force_sig_fault signal: Use force_sig_fault_to_task for the two calls that don't deliver to current signal: Explicitly call force_sig_fault on current signal/unicore32: Remove tsk parameter from __do_user_fault signal/arm: Remove tsk parameter from __do_user_fault signal/arm: Remove tsk parameter from ptrace_break signal/nds32: Remove tsk parameter from send_sigtrap signal/riscv: Remove tsk parameter from do_trap signal/sh: Remove tsk parameter from force_sig_info_fault signal/um: Remove task parameter from send_sigtrap signal/x86: Remove task parameter from send_sigtrap signal: Remove task parameter from force_sig_mceerr signal: Remove task parameter from force_sig signal: Remove task parameter from force_sigsegv ...