aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2014-06-12mm, hugetlb: return a reserved page to a reserved pool if failedfor-broonie/lsk-hugetlbJoonsoo Kim
If we fail with a reserved page, just calling put_page() is not sufficient, because put_page() invoke free_huge_page() at last step and it doesn't know whether a page comes from a reserved pool or not. So it doesn't do anything related to reserved count. This makes reserve count lower than how we need, because reserve count already decrease in dequeue_huge_page_vma(). This patch fix this situation. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Hillf Danton <dhillf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-12mm, hugetlb: grab a page_table_lock after page_cache_releaseJoonsoo Kim
We don't need to grab a page_table_lock when we try to release a page. So, defer to grab a page_table_lock. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: Davidlohr Bueso <davidlohr@hp.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Hillf Danton <dhillf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-12mm, hugetlb: remove useless check about mapping typeJoonsoo Kim
is_vma_resv_set(vma, HPAGE_RESV_OWNER) implys that this mapping is for private. So we don't need to check whether this mapping is for shared or not. This patch is just for clean-up. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Davidlohr Bueso <davidlohr@hp.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Hillf Danton <dhillf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-12mm, hugetlb: fix subpool accounting handlingJoonsoo Kim
If we alloc hugepage with avoid_reserve, we don't dequeue reserved one. So, we should check subpool counter when avoid_reserve. This patch implement it. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Hillf Danton <dhillf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-12mm, hugetlb: change variable name reservations to resvJoonsoo Kim
'reservations' is so long name as a variable and we use 'resv_map' to represent 'struct resv_map' in other place. To reduce confusion and unreadability, change it. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reviewed-by: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Davidlohr Bueso <davidlohr@hp.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Hillf Danton <dhillf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-12mm, hugetlb: protect reserved pages when soft offlining a hugepageJoonsoo Kim
Don't use the reserve pool when soft offlining a hugepage. Check we have free pages outside the reserve pool before we dequeue the huge page. Otherwise, we can steal other's reserve page. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reviewed-by: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Davidlohr Bueso <davidlohr@hp.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Hillf Danton <dhillf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-12mm, hugetlb: decrement reserve count if VM_NORESERVE alloc page cacheJoonsoo Kim
If a vma with VM_NORESERVE allocate a new page for page cache, we should check whether this area is reserved or not. If this address is already reserved by other process(in case of chg == 0), we should decrement reserve count, because this allocated page will go into page cache and currently, there is no way to know that this page comes from reserved pool or not when releasing inode. This may introduce over-counting problem to reserved count. With following example code, you can easily reproduce this situation. Assume 2MB, nr_hugepages = 100 size = 20 * MB; flag = MAP_SHARED; p = mmap(NULL, size, PROT_READ|PROT_WRITE, flag, fd, 0); if (p == MAP_FAILED) { fprintf(stderr, "mmap() failed: %s\n", strerror(errno)); return -1; } flag = MAP_SHARED | MAP_NORESERVE; q = mmap(NULL, size, PROT_READ|PROT_WRITE, flag, fd, 0); if (q == MAP_FAILED) { fprintf(stderr, "mmap() failed: %s\n", strerror(errno)); } q[0] = 'c'; After finish the program, run 'cat /proc/meminfo'. You can see below result. HugePages_Free: 100 HugePages_Rsvd: 1 To fix this, we should check our mapping type and tracked region. If our mapping is VM_NORESERVE, VM_MAYSHARE and chg is 0, this imply that current allocated page will go into page cache which is already reserved region when mapping is created. In this case, we should decrease reserve count. As implementing above, this patch solve the problem. [akpm@linux-foundation.org: fix spelling in comment] Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Hillf Danton <dhillf@gmail.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-12mm, hugetlb: remove decrement_hugepage_resv_vma()Joonsoo Kim
Now, Checking condition of decrement_hugepage_resv_vma() and vma_has_reserves() is same, so we can clean-up this function with vma_has_reserves(). Additionally, decrement_hugepage_resv_vma() has only one call site, so we can remove function and embed it into dequeue_huge_page_vma() directly. This patch implement it. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Hillf Danton <dhillf@gmail.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-12mm, hugetlb: add VM_NORESERVE check in vma_has_reserves()Joonsoo Kim
If we map the region with MAP_NORESERVE and MAP_SHARED, we can skip to check reserve counting and eventually we cannot be ensured to allocate a huge page in fault time. With following example code, you can easily find this situation. Assume 2MB, nr_hugepages = 100 fd = hugetlbfs_unlinked_fd(); if (fd < 0) return 1; size = 200 * MB; flag = MAP_SHARED; p = mmap(NULL, size, PROT_READ|PROT_WRITE, flag, fd, 0); if (p == MAP_FAILED) { fprintf(stderr, "mmap() failed: %s\n", strerror(errno)); return -1; } size = 2 * MB; flag = MAP_ANONYMOUS | MAP_SHARED | MAP_HUGETLB | MAP_NORESERVE; p = mmap(NULL, size, PROT_READ|PROT_WRITE, flag, -1, 0); if (p == MAP_FAILED) { fprintf(stderr, "mmap() failed: %s\n", strerror(errno)); } p[0] = '0'; sleep(10); During executing sleep(10), run 'cat /proc/meminfo' on another process. HugePages_Free: 99 HugePages_Rsvd: 100 Number of free should be higher or equal than number of reserve, but this aren't. This represent that non reserved shared mapping steal a reserved page. Non reserved shared mapping should not eat into reserve space. If we consider VM_NORESERVE in vma_has_reserve() and return 0 which mean that we don't have reserved pages, then we check that we have enough free pages in dequeue_huge_page_vma(). This prevent to steal a reserved page. With this change, above test generate a SIGBUG which is correct, because all free pages are reserved and non reserved shared mapping can't get a free page. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Hillf Danton <dhillf@gmail.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-12mm, hugetlb: do not use a page in page cache for cow optimizationJoonsoo Kim
Currently, we use a page with mapped count 1 in page cache for cow optimization. If we find this condition, we don't allocate a new page and copy contents. Instead, we map this page directly. This may introduce a problem that writting to private mapping overwrite hugetlb file directly. You can find this situation with following code. size = 20 * MB; flag = MAP_SHARED; p = mmap(NULL, size, PROT_READ|PROT_WRITE, flag, fd, 0); if (p == MAP_FAILED) { fprintf(stderr, "mmap() failed: %s\n", strerror(errno)); return -1; } p[0] = 's'; fprintf(stdout, "BEFORE STEAL PRIVATE WRITE: %c\n", p[0]); munmap(p, size); flag = MAP_PRIVATE; p = mmap(NULL, size, PROT_READ|PROT_WRITE, flag, fd, 0); if (p == MAP_FAILED) { fprintf(stderr, "mmap() failed: %s\n", strerror(errno)); } p[0] = 'c'; munmap(p, size); flag = MAP_SHARED; p = mmap(NULL, size, PROT_READ|PROT_WRITE, flag, fd, 0); if (p == MAP_FAILED) { fprintf(stderr, "mmap() failed: %s\n", strerror(errno)); return -1; } fprintf(stdout, "AFTER STEAL PRIVATE WRITE: %c\n", p[0]); munmap(p, size); We can see that "AFTER STEAL PRIVATE WRITE: c", not "AFTER STEAL PRIVATE WRITE: s". If we turn off this optimization to a page in page cache, the problem is disappeared. So, I change the trigger condition of optimization. If this page is not AnonPage, we don't do optimization. This makes this optimization turning off for a page cache. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Hillf Danton <dhillf@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-12mm, hugetlb: remove redundant list_empty check in gather_surplus_pages()Joonsoo Kim
If list is empty, list_for_each_entry_safe() doesn't do anything. So, this check is redundant. Remove it. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Hillf Danton <dhillf@gmail.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-12mm, hugetlb: fix and clean-up node iteration code to alloc or freeJoonsoo Kim
Current node iteration code have a minor problem which do one more node rotation if we can't succeed to allocate. For example, if we start to allocate at node 0, we stop to iterate at node 0. Then we start to allocate at node 1 for next allocation. I introduce new macros "for_each_node_mask_to_[alloc|free]" and fix and clean-up node iteration code to alloc or free. This makes code more understandable. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Hillf Danton <dhillf@gmail.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-12mm, hugetlb: clean-up alloc_huge_page()Joonsoo Kim
Unify successful allocation paths to make the code more readable. There are no functional changes. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-09Merge branch 'v3.10/topic/misc' into linux-linaro-lskAlex Shi
2014-06-09arm64: avoid multiple evaluation of ptr in get_user/put_user()AKASHI Takahiro
get_user() is defined as a function macro in arm64, and trace_get_user() calls it as followed: get_user(ch, ptr++); Since the second parameter occurs twice in the definition, 'ptr++' is unexpectedly evaluated twice and trace_get_user() will generate a bogus string from user-provided one. As a result, some ftrace sysfs operations, like "echo FUNCNAME > set_ftrace_filter," hit this case and eventually fail. This patch fixes the issue both in get_user() and put_user(). Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> [catalin.marinas@arm.com: added __user type annotation and s/optr/__p/] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> (cherry picked from commit 1f65c13efef69b6dc908e588f91a133641d8475c) Signed-off-by: Alex Shi <alex.shi@linaro.org> Conflicts: arch/arm64/include/asm/uaccess.h
2014-06-09Merge tag v3.10.42 into linux-linaro-lskAlex Shi
This is the 3.10.42 stable release
2014-06-07Linux 3.10.42Greg Kroah-Hartman
2014-06-07futex: Make lookup_pi_state more robustThomas Gleixner
commit 54a217887a7b658e2650c3feff22756ab80c7339 upstream. The current implementation of lookup_pi_state has ambigous handling of the TID value 0 in the user space futex. We can get into the kernel even if the TID value is 0, because either there is a stale waiters bit or the owner died bit is set or we are called from the requeue_pi path or from user space just for fun. The current code avoids an explicit sanity check for pid = 0 in case that kernel internal state (waiters) are found for the user space address. This can lead to state leakage and worse under some circumstances. Handle the cases explicit: Waiter | pi_state | pi->owner | uTID | uODIED | ? [1] NULL | --- | --- | 0 | 0/1 | Valid [2] NULL | --- | --- | >0 | 0/1 | Valid [3] Found | NULL | -- | Any | 0/1 | Invalid [4] Found | Found | NULL | 0 | 1 | Valid [5] Found | Found | NULL | >0 | 1 | Invalid [6] Found | Found | task | 0 | 1 | Valid [7] Found | Found | NULL | Any | 0 | Invalid [8] Found | Found | task | ==taskTID | 0/1 | Valid [9] Found | Found | task | 0 | 0 | Invalid [10] Found | Found | task | !=taskTID | 0/1 | Invalid [1] Indicates that the kernel can acquire the futex atomically. We came came here due to a stale FUTEX_WAITERS/FUTEX_OWNER_DIED bit. [2] Valid, if TID does not belong to a kernel thread. If no matching thread is found then it indicates that the owner TID has died. [3] Invalid. The waiter is queued on a non PI futex [4] Valid state after exit_robust_list(), which sets the user space value to FUTEX_WAITERS | FUTEX_OWNER_DIED. [5] The user space value got manipulated between exit_robust_list() and exit_pi_state_list() [6] Valid state after exit_pi_state_list() which sets the new owner in the pi_state but cannot access the user space value. [7] pi_state->owner can only be NULL when the OWNER_DIED bit is set. [8] Owner and user space value match [9] There is no transient state which sets the user space TID to 0 except exit_robust_list(), but this is indicated by the FUTEX_OWNER_DIED bit. See [4] [10] There is no transient state which leaves owner and user space TID out of sync. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Kees Cook <keescook@chromium.org> Cc: Will Drewry <wad@chromium.org> Cc: Darren Hart <dvhart@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07futex: Always cleanup owner tid in unlock_piThomas Gleixner
commit 13fbca4c6ecd96ec1a1cfa2e4f2ce191fe928a5e upstream. If the owner died bit is set at futex_unlock_pi, we currently do not cleanup the user space futex. So the owner TID of the current owner (the unlocker) persists. That's observable inconsistant state, especially when the ownership of the pi state got transferred. Clean it up unconditionally. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Kees Cook <keescook@chromium.org> Cc: Will Drewry <wad@chromium.org> Cc: Darren Hart <dvhart@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07futex: Validate atomic acquisition in futex_lock_pi_atomic()Thomas Gleixner
commit b3eaa9fc5cd0a4d74b18f6b8dc617aeaf1873270 upstream. We need to protect the atomic acquisition in the kernel against rogue user space which sets the user space futex to 0, so the kernel side acquisition succeeds while there is existing state in the kernel associated to the real owner. Verify whether the futex has waiters associated with kernel state. If it has, return -EINVAL. The state is corrupted already, so no point in cleaning it up. Subsequent calls will fail as well. Not our problem. [ tglx: Use futex_top_waiter() and explain why we do not need to try restoring the already corrupted user space state. ] Signed-off-by: Darren Hart <dvhart@linux.intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: Will Drewry <wad@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07futex-prevent-requeue-pi-on-same-futex.patch futex: Forbid uaddr == uaddr2 ↵Thomas Gleixner
in futex_requeue(..., requeue_pi=1) commit e9c243a5a6de0be8e584c604d353412584b592f8 upstream. If uaddr == uaddr2, then we have broken the rule of only requeueing from a non-pi futex to a pi futex with this call. If we attempt this, then dangling pointers may be left for rt_waiter resulting in an exploitable condition. This change brings futex_requeue() in line with futex_wait_requeue_pi() which performs the same check as per commit 6f7b0a2a5c0f ("futex: Forbid uaddr == uaddr2 in futex_wait_requeue_pi()") [ tglx: Compare the resulting keys as well, as uaddrs might be different depending on the mapping ] Fixes CVE-2014-3153. Reported-by: Pinkie Pie Signed-off-by: Will Drewry <wad@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Darren Hart <dvhart@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07ath9k: protect tid->sched checkStanislaw Gruszka
[ Upstream commit 21f8aaee0c62708654988ce092838aa7df4d25d8 ] We check tid->sched without a lock taken on ath_tx_aggr_sleep(). That is race condition which can result of doing list_del(&tid->list) twice (second time with poisoned list node) and cause crash like shown below: [424271.637220] BUG: unable to handle kernel paging request at 00100104 [424271.637328] IP: [<f90fc072>] ath_tx_aggr_sleep+0x62/0xe0 [ath9k] ... [424271.639953] Call Trace: [424271.639998] [<f90f6900>] ? ath9k_get_survey+0x110/0x110 [ath9k] [424271.640083] [<f90f6942>] ath9k_sta_notify+0x42/0x50 [ath9k] [424271.640177] [<f809cfef>] sta_ps_start+0x8f/0x1c0 [mac80211] [424271.640258] [<c10f730e>] ? free_compound_page+0x2e/0x40 [424271.640346] [<f809e915>] ieee80211_rx_handlers+0x9d5/0x2340 [mac80211] [424271.640437] [<c112f048>] ? kmem_cache_free+0x1d8/0x1f0 [424271.640510] [<c1345a84>] ? kfree_skbmem+0x34/0x90 [424271.640578] [<c10fc23c>] ? put_page+0x2c/0x40 [424271.640640] [<c1345a84>] ? kfree_skbmem+0x34/0x90 [424271.640706] [<c1345a84>] ? kfree_skbmem+0x34/0x90 [424271.640787] [<f809dde3>] ? ieee80211_rx_handlers_result+0x73/0x1d0 [mac80211] [424271.640897] [<f80a07a0>] ieee80211_prepare_and_rx_handle+0x520/0xad0 [mac80211] [424271.641009] [<f809e22d>] ? ieee80211_rx_handlers+0x2ed/0x2340 [mac80211] [424271.641104] [<c13846ce>] ? ip_output+0x7e/0xd0 [424271.641182] [<f80a1057>] ieee80211_rx+0x307/0x7c0 [mac80211] [424271.641266] [<f90fa6ee>] ath_rx_tasklet+0x88e/0xf70 [ath9k] [424271.641358] [<f80a0f2c>] ? ieee80211_rx+0x1dc/0x7c0 [mac80211] [424271.641445] [<f90f82db>] ath9k_tasklet+0xcb/0x130 [ath9k] Bug report: https://bugzilla.kernel.org/show_bug.cgi?id=70551 Reported-and-tested-by: Max Sydorenko <maxim.stargazer@gmail.com> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> [ xl: backported to 3.10: adjusted context ] Signed-off-by: Xiangyu Lu <luxiangyu@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07media: V4L2: fix VIDIOC_CREATE_BUFS in 64- / 32-bit compatibility modeGuennadi Liakhovetski
commit 97d9d23dda6f37d90aefeec4ed619d52df525382 upstream. If a struct contains 64-bit fields, it is aligned on 64-bit boundaries within containing structs in 64-bit compilations. This is the case with struct v4l2_window, which contains pointers and is embedded into struct v4l2_format, and that one is embedded into struct v4l2_create_buffers. Unlike some other structs, used as a part of the kernel ABI as ioctl() arguments, that are packed, these structs aren't packed. This isn't a problem per se, but the ioctl-compat code for VIDIOC_CREATE_BUFS contains a bug, that triggers in such 64-bit builds. That code wrongly assumes, that in struct v4l2_create_buffers, struct v4l2_format immediately follows the __u32 memory field, which in fact isn't the case. This bug wasn't visible until now, because until recently hardly any applications used this ioctl() and mostly embedded 32-bit only drivers implemented it. This is changing now with addition of this ioctl() to some USB drivers, e.g. UVC. This patch fixes the bug by copying parts of struct v4l2_create_buffers separately. Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07media: V4L2: ov7670: fix a wrong index, potentially Oopsing the kernel from ↵Guennadi Liakhovetski
user-space commit cfece5857ca51d1dcdb157017aba226f594e9dcf upstream. Commit 75e2bdad8901a0b599e01a96229be922eef1e488 "ov7670: allow configuration of image size, clock speed, and I/O method" uses a wrong index to iterate an array. Apart from being wrong, it also uses an unchecked value from user-space, which can cause access to unmapped memory in the kernel, triggered by a normal desktop user with rights to use V4L2 devices. Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Acked-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07media: fc2580: fix tuning failure on 32-bit archAntti Palosaari
commit 8845cc6415ec28ef8d57b3fb81c75ef9bce69c5f upstream. There was some frequency calculation overflows which caused tuning failure on 32-bit architecture. Use 64-bit numbers where needed in order to avoid calculation overflows. Thanks for the Finnish person, who asked remain anonymous, reporting, testing and suggesting the fix. Signed-off-by: Antti Palosaari <crope@iki.fi> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07iommu/amd: Fix interrupt remapping for aliased devicesAlex Williamson
commit e028a9e6b8a637af09ac4114083280df4a7045f1 upstream. An apparent cut and paste error prevents the correct flags from being set on the alias device resulting in MSI on conventional PCI devices failing to work. This also produces error events from the IOMMU like: AMD-Vi: Event logged [INVALID_DEVICE_REQUEST device=00:14.4 address=0x000000fdf8000000 flags=0x0a00] Where 14.4 is a PCIe-to-PCI bridge with a device behind it trying to use MSI interrupts. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Joerg Roedel <joro@8bytes.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07libceph: fix corruption when using page_count 0 page in rbdChunwei Chen
commit 178eda29ca721842f2146378e73d43e0044c4166 upstream. It has been reported that using ZFSonLinux on rbd will result in memory corruption. The bug report can be found here: https://github.com/zfsonlinux/spl/issues/241 http://tracker.ceph.com/issues/7790 The reason is that ZFS will send pages with page_count 0 into rbd, which in turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with page_count 0, as it will do get_page and put_page, and erroneously free the page. This type of issue has been noted before, and handled in iscsi, drbd, etc. So, rbd should also handle this. This fix address this issue by fall back to slower sendmsg when page_count 0 detected. Cc: Sage Weil <sage@inktank.com> Cc: Yehuda Sadeh <yehuda@inktank.com> Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07powerpc: Fix 64 bit builds with binutils 2.24Guenter Roeck
commit 7998eb3dc700aaf499f93f50b3d77da834ef9e1d upstream. With binutils 2.24, various 64 bit builds fail with relocation errors such as arch/powerpc/kernel/built-in.o: In function `exc_debug_crit_book3e': (.text+0x165ee): relocation truncated to fit: R_PPC64_ADDR16_HI against symbol `interrupt_base_book3e' defined in .text section in arch/powerpc/kernel/built-in.o arch/powerpc/kernel/built-in.o: In function `exc_debug_crit_book3e': (.text+0x16602): relocation truncated to fit: R_PPC64_ADDR16_HI against symbol `interrupt_end_book3e' defined in .text section in arch/powerpc/kernel/built-in.o The assembler maintainer says: I changed the ABI, something that had to be done but unfortunately happens to break the booke kernel code. When building up a 64-bit value with lis, ori, shl, oris, ori or similar sequences, you now should use @high and @higha in place of @h and @ha. @h and @ha (and their associated relocs R_PPC64_ADDR16_HI and R_PPC64_ADDR16_HA) now report overflow if the value is out of 32-bit signed range. ie. @h and @ha assume you're building a 32-bit value. This is needed to report out-of-range -mcmodel=medium toc pointer offsets in @toc@h and @toc@ha expressions, and for consistency I did the same for all other @h and @ha relocs. Replacing @h with @high in one strategic location fixes the relocation errors. This has to be done conditionally since the assembler either supports @h or @high but not both. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07crypto: s390 - fix aes,des ctr mode concurrency finding.Harald Freudenberger
commit 3901c1124ec5099254a9396085f7798153a7293f upstream. An additional testcase found an issue with the last series of patches applied: the fallback solution may not save the iv value after operation. This very small fix just makes sure the iv is copied back to the walk/desc struct. Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07crypto: caam - add allocation failure handling in SPRINTFCAT macroHoria Geanta
commit 27c5fb7a84242b66bf1e0b2fe6bf40d19bcc5c04 upstream. GFP_ATOMIC memory allocation could fail. In this case, avoid NULL pointer dereference and notify user. Cc: Kim Phillips <kim.phillips@freescale.com> Signed-off-by: Horia Geanta <horia.geanta@freescale.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07i2c: s3c2410: resume race fixOlof Johansson
commit ce78cc071f5f541480e381cc0241d37590041a9d upstream. Don't unmark the device as suspended until after it's been re-setup. The main race would be w.r.t. an i2c driver that gets resumed at the same time (asyncronously), that is allowed to do a transfer since suspended is set to 0 before reinit, but really should have seen the -EIO return instead. Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Doug Anderson <dianders@chromium.org> Acked-by: Kukjin Kim <kgene.kim@samsung.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07i2c: designware: Mask all interrupts during i2c controller enableDu, Wenkai
commit 47bb27e78867997040a228328f2a631c3c7f2c82 upstream. There have been "i2c_designware 80860F41:00: controller timed out" errors on a number of Baytrail platforms. The issue is caused by incorrect value in Interrupt Mask Register (DW_IC_INTR_MASK) when i2c core is being enabled. This causes call to __i2c_dw_enable() to immediately start the transfer which leads to timeout. There are 3 failure modes observed: 1. Failure in S0 to S3 resume path The default value after reset for DW_IC_INTR_MASK is 0x8ff. When we start the first transaction after resuming from system sleep, TX_EMPTY interrupt is already unmasked because of the hardware default. 2. Failure in normal operational path This failure happens rarely and is hard to reproduce. Debug trace showed that DW_IC_INTR_MASK had value of 0x254 when failure occurred, which meant TX_EMPTY was unmasked. 3. Failure in S3 to S0 suspend path This failure also happens rarely and is hard to reproduce. Adding debug trace that read DW_IC_INTR_MASK made this failure not reproducible. But from ISR call trace we could conclude TX_EMPTY was unmasked when problem occurred. The patch masks all interrupts before the controller is enabled to resolve the faulty DW_IC_INTR_MASK conditions. Signed-off-by: Wenkai Du <wenkai.du@intel.com> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> [wsa: improved the comment and removed typo in commit msg] Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07i2c: rcar: bail out on zero length transfersWolfram Sang
commit d7653964c590ba846aa11a8f6edf409773cbc492 upstream. This hardware does not support zero length transfers. Instead, the driver does one (random) byte transfers currently with undefined results for the slaves. We now bail out. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07ACPI / blacklist: Add dmi_enable_osi_linux quirk for Asus EEE PC 1015PXHans de Goede
commit f6e6e1b9fee88c90586787b71dc49bb3ce62bb89 upstream. Without this this EEE PC exports a non working WMI interface, with this it exports a working "good old" eeepc_laptop interface, fixing brightness control not working as well as rfkill being stuck in a permanent wireless blocked state. This is not an ideal way to fix this, but various attempts to fix this otherwise have failed, see: References: https://bugzilla.redhat.com/show_bug.cgi?id=1067181 Reported-and-tested-by: lou.cardone@gmail.com Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07libata: clean up ZPODD when a port is detachedLevente Kurusa
commit a6f9bf4d2f965b862b95213303d154e02957eed8 upstream. When a ZPODD device is unbound via sysfs, the ACPI notify handler is not removed. This causes panics as observed in Bug #74601. The panic only happens when the wake happens from outside the kernel (i.e. inserting a media or pressing a button). Add a loop to ata_port_detach which loops through the port's devices and checks if zpodd is enabled, if so call zpodd_exit. Reviewed-by: Aaron Lu <aaron.lu@intel.com> Signed-off-by: Levente Kurusa <levex@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07dm crypt: fix cpu hotplug crash by removing per-cpu structureMikulas Patocka
commit 610f2de3559c383caf8fbbf91e9968102dff7ca0 upstream. The DM crypt target used per-cpu structures to hold pointers to a ablkcipher_request structure. The code assumed that the work item keeps executing on a single CPU, so it didn't use synchronization when accessing this structure. If a CPU is disabled by writing 0 to /sys/devices/system/cpu/cpu*/online, the work item could be moved to another CPU. This causes dm-crypt crashes, like the following, because the code starts using an incorrect ablkcipher_request: smpboot: CPU 7 is now offline BUG: unable to handle kernel NULL pointer dereference at 0000000000000130 IP: [<ffffffffa1862b3d>] crypt_convert+0x12d/0x3c0 [dm_crypt] ... Call Trace: [<ffffffffa1864415>] ? kcryptd_crypt+0x305/0x470 [dm_crypt] [<ffffffff81062060>] ? finish_task_switch+0x40/0xc0 [<ffffffff81052a28>] ? process_one_work+0x168/0x470 [<ffffffff8105366b>] ? worker_thread+0x10b/0x390 [<ffffffff81053560>] ? manage_workers.isra.26+0x290/0x290 [<ffffffff81058d9f>] ? kthread+0xaf/0xc0 [<ffffffff81058cf0>] ? kthread_create_on_node+0x120/0x120 [<ffffffff813464ac>] ? ret_from_fork+0x7c/0xb0 [<ffffffff81058cf0>] ? kthread_create_on_node+0x120/0x120 Fix this bug by removing the per-cpu definition. The structure ablkcipher_request is accessed via a pointer from convert_context. Consequently, if the work item is rescheduled to a different CPU, the thread still uses the same ablkcipher_request. This change may undermine performance improvements intended by commit c0297721 ("dm crypt: scale to multiple cpus") on select hardware. In practice no performance difference was observed on recent hardware. But regardless, correctness is more important than performance. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07powerpc/tm: Fix crash when forking inside a transactionMichael Neuling
commit 621b5060e823301d0cba4cb52a7ee3491922d291 upstream. When we fork/clone we currently don't copy any of the TM state to the new thread. This results in a TM bad thing (program check) when the new process is switched in as the kernel does a tmrechkpt with TEXASR FS not set. Also, since R1 is from userspace, we trigger the bad kernel stack pointer detection. So we end up with something like this: Bad kernel stack pointer 0 at c0000000000404fc cpu 0x2: Vector: 700 (Program Check) at [c00000003ffefd40] pc: c0000000000404fc: restore_gprs+0xc0/0x148 lr: 0000000000000000 sp: 0 msr: 9000000100201030 current = 0xc000001dd1417c30 paca = 0xc00000000fe00800 softe: 0 irq_happened: 0x01 pid = 0, comm = swapper/2 WARNING: exception is not recoverable, can't continue The below fixes this by flushing the TM state before we copy the task_struct to the clone. To do this we go through the tmreclaim patch, which removes the checkpointed registers from the CPU and transitions the CPU out of TM suspend mode. Hence we need to call tmrechkpt after to restore the checkpointed state and the TM mode for the current task. To make this fail from userspace is simply: tbegin li r0, 2 sc <boom> Kudos to Adhemerval Zanella Neto for finding this. Signed-off-by: Michael Neuling <mikey@neuling.org> cc: Adhemerval Zanella Neto <azanella@br.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [Backported to 3.10: context adjust] Signed-off-by: Xue Liu <liuxueliu.liu@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07target: Don't allow setting WC emulation if device doesn't supportAndy Grover
commit 07b8dae38b09bcfede7e726f172e39b5ce8390d9 upstream. Just like for pSCSI, if the transport sets get_write_cache, then it is not valid to enable write cache emulation for it. Return an error. see https://bugzilla.redhat.com/show_bug.cgi?id=1082675 Reviewed-by: Chris Leech <cleech@redhat.com> Signed-off-by: Andy Grover <agrover@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07Target/iser: Fix iscsit_accept_np and rdma_cm racy flowSagi Grimberg
commit 531b7bf4bd795d9a09eac92504322a472c010bc8 upstream. RDMA CM and iSCSI target flows are asynchronous and completely uncorrelated. Relying on the fact that iscsi_accept_np will be called after CM connection request event and will wait for it is a mistake. When attempting to login to a few targets this flow is racy and unpredictable, but for parallel login to dozens of targets will race and hang every time. The correct synchronizing mechanism in this case is pending on a semaphore rather than a wait_for_event. We keep the pending interruptible for iscsi_np cleanup stage. (Squash patch to remove dead code into parent - nab) Reported-by: Slava Shwartsman <valyushash@gmail.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07Target/iser: Fix wrong connection requests list additionSagi Grimberg
commit 9fe63c88b1d59f1ce054d6948ccd3096496ecedb upstream. Should be adding list_add_tail($new, $head) and not the other way around. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07PCI: shpchp: Check bridge's secondary (not primary) bus speedMarcel Apfelbaum
commit 93fa9d32670f5592c8e56abc9928fc194e1e72fc upstream. When a new device is added below a hotplug bridge, the bridge's secondary bus speed and the device's bus speed must match. The shpchp driver previously checked the bridge's *primary* bus speed, not the secondary bus speed. This caused hot-add errors like: shpchp 0000:00:03.0: Speed of bus ff and adapter 0 mismatch Check the secondary bus speed instead. [bhelgaas: changelog] Link: https://bugzilla.kernel.org/show_bug.cgi?id=75251 Fixes: 3749c51ac6c1 ("PCI: Make current and maximum bus speeds part of the PCI core") Signed-off-by: Marcel Apfelbaum <marcel.a@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07genirq: Provide irq_force_affinity fallback for non-SMPArnd Bergmann
commit 4c88d7f9b0d5fb0588c3386be62115cc2eaa8f9f upstream. Patch 01f8fa4f01d "genirq: Allow forcing cpu affinity of interrupts" added an irq_force_affinity() function, and 30ccf03b4a6 "clocksource: Exynos_mct: Use irq_force_affinity() in cpu bringup" subsequently uses it. However, the driver can be used with CONFIG_SMP disabled, but the function declaration is only available for CONFIG_SMP, leading to this build error: drivers/clocksource/exynos_mct.c:431:3: error: implicit declaration of function 'irq_force_affinity' [-Werror=implicit-function-declaration] irq_force_affinity(mct_irqs[MCT_L0_IRQ + cpu], cpumask_of(cpu)); This patch introduces a dummy helper function for the non-SMP case that always returns success, to get rid of the build error. Since the patches causing the problem are marked for stable backports, this one should be as well. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Krzysztof Kozlowski <k.kozlowski@samsung.com> Acked-by: Kukjin Kim <kgene.kim@samsung.com> Link: http://lkml.kernel.org/r/5619084.0zmrrIUZLV@wuerfel Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07x86-64, modify_ldt: Make support for 16-bit segments a runtime optionLinus Torvalds
commit fa81511bb0bbb2b1aace3695ce869da9762624ff upstream. Checkin: b3b42ac2cbae x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels disabled 16-bit segments on 64-bit kernels due to an information leak. However, it does seem that people are genuinely using Wine to run old 16-bit Windows programs on Linux. A proper fix for this ("espfix64") is coming in the upcoming merge window, but as a temporary fix, create a sysctl to allow the administrator to re-enable support for 16-bit segments. It adds a "/proc/sys/abi/ldt16" sysctl that defaults to zero (off). If you hit this issue and care about your old Windows program more than you care about a kernel stack address information leak, you can do echo 1 > /proc/sys/abi/ldt16 as root (add it to your startup scripts), and you should be ok. The sysctl table is only added if you have COMPAT support enabled on x86-64, but I assume anybody who runs old windows binaries very much does that ;) Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/CA%2B55aFw9BPoD10U1LfHbOMpHWZkvJTkMcfCs9s3urPr1YyWBxw@mail.gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07metag: Reduce maximum stack size to 256MBJames Hogan
commit d71f290b4e98a39f49f2595a13be3b4d5ce8e1f1 upstream. Specify the maximum stack size for arches where the stack grows upward (parisc and metag) in asm/processor.h rather than hard coding in fs/exec.c so that metag can specify a smaller value of 256MB rather than 1GB. This fixes a BUG on metag if the RLIMIT_STACK hard limit is increased beyond a safe value by root. E.g. when starting a process after running "ulimit -H -s unlimited" it will then attempt to use a stack size of the maximum 1GB which is far too big for metag's limited user virtual address space (stack_top is usually 0x3ffff000): BUG: failure at fs/exec.c:589/shift_arg_pages()! Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Helge Deller <deller@gmx.de> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: linux-parisc@vger.kernel.org Cc: linux-metag@vger.kernel.org Cc: John David Anglin <dave.anglin@bell.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07metag: fix memory barriersMikulas Patocka
commit 2425ce84026c385b73ae72039f90d042d49e0394 upstream. Volatile access doesn't really imply the compiler barrier. Volatile access is only ordered with respect to other volatile accesses, it isn't ordered with respect to general memory accesses. Gcc may reorder memory accesses around volatile access, as we can see in this simple example (if we compile it with optimization, both increments of *b will be collapsed to just one): void fn(volatile int *a, long *b) { (*b)++; *a = 10; (*b)++; } Consequently, we need the compiler barrier after a write to the volatile variable, to make sure that the compiler doesn't reorder the volatile write with something else. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: James Hogan <james.hogan@imgtec.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07ASoC: wm8962: Update register CLASS_D_CONTROL_1 to be non-volatileCharles Keepax
commit 44330ab516c15dda8a1e660eeaf0003f84e43e3f upstream. The register CLASS_D_CONTROL_1 is marked as volatile because it contains a bit, DAC_MUTE, which is also mirrored in the ADC_DAC_CONTROL_1 register. This causes problems for the "Speaker Switch" control, which will report an error if the CODEC is suspended because it relies on a volatile register. To resolve this issue mark CLASS_D_CONTROL_1 as non-volatile and manually keep the register cache in sync by updating both bits when changing the mute status. Reported-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com> Tested-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Mark Brown <broonie@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07xen-blkfront: restore the non-persistent data pathRoger Pau Monne
commit bfe11d6de1c416cea4f3f0f35f864162063ce3fa upstream. When persistent grants were added they were always used, even if the backend doesn't have this feature (there's no harm in always using the same set of pages). This restores the old data path when the backend doesn't have persistent grants, removing the burden of doing a memcpy when it is not actually needed. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reported-by: Felipe Franciosi <felipe.franciosi@citrix.com> Cc: Felipe Franciosi <felipe.franciosi@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [v2: Fix up whitespace issues] Tested-by: Felipe Franciosi <felipe@paradoxo.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07xen-blkfront: revoke foreign access for grants not mapped by the backendRoger Pau Monne
commit fbe363c476afe8ec992d3baf682670a4bd1b6ce6 upstream. There's no need to keep the foreign access in a grant if it is not persistently mapped by the backend. This allows us to free grants that are not mapped by the backend, thus preventing blkfront from hoarding all grants. The main effect of this is that blkfront will only persistently map the same grants as the backend, and it will always try to use grants that are already mapped by the backend. Also the number of persistent grants in blkfront is the same as in blkback (and is controlled by the value in blkback). Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Matt Wilson <msw@amazon.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07percpu: make pcpu_alloc_chunk() use pcpu_mem_free() instead of kfree()Jianyu Zhan
commit 5a838c3b60e3a36ade764cf7751b8f17d7c9c2da upstream. pcpu_chunk_struct_size = sizeof(struct pcpu_chunk) + BITS_TO_LONGS(pcpu_unit_pages) * sizeof(unsigned long) It hardly could be ever bigger than PAGE_SIZE even for large-scale machine, but for consistency with its couterpart pcpu_mem_zalloc(), use pcpu_mem_free() instead. Commit b4916cb17c26 ("percpu: make pcpu_free_chunk() use pcpu_mem_free() instead of kfree()") addressed this problem, but missed this one. tj: commit message updated Signed-off-by: Jianyu Zhan <nasa4836@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: 099a19d91ca4 ("percpu: allow limited allocation before slab is online) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07bus: mvebu-mbus: allow several windows with the same target/attributeThomas Petazzoni
commit b566e782be32145664d96ada3e389f17d32742e5 upstream. Having multiple windows with the same target and attribute is actually legal, and can be useful for PCIe windows, when PCIe BARs have a size that isn't a power of two, and we therefore need to create several MBus windows to cover the PCIe BAR for a given PCIe interface. Fixes: fddddb52a6c4 ('bus: introduce an Marvell EBU MBus driver') Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Link: https://lkml.kernel.org/r/1397823593-1932-7-git-send-email-thomas.petazzoni@free-electrons.com Tested-by: Neil Greatorex <neil@fatboyfat.co.uk> Signed-off-by: Jason Cooper <jason@lakedaemon.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>