aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2012-12-20vfs: turn is_dir argument to kern_path_create into a lookup_flags argJeff Layton
Where we can pass in LOOKUP_DIRECTORY or LOOKUP_REVAL. Any other flags passed in here are currently ignored. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-20vfs: add a retry_estale helper function to handle retries on ESTALEJeff Layton
This function is expected to be called from path-based syscalls to help them decide whether to try the lookup and call again in the event that they got an -ESTALE return back on an earier try. Currently, we only retry the call once on an ESTALE error, but in the event that we decide that that's not enough in the future, we should be able to change the logic in this helper without too much effort. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-20Merge branch 'fscache' of ↵Al Viro
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs into for-linus
2012-12-20vfs: Remove useless function prototypesAlessio Igor Bogani
Commit 8e22cc88d68ca1a46d7d582938f979eb640ed30f removes the (un)lock_super function definitions but forgets to remove their prototypes. Signed-off-by: Alessio Igor Bogani <abogani@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-20mm: drop vmtruncateMarco Stornelli
Removed vmtruncate Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-20vfs: drop vmtruncateMarco Stornelli
Removed vmtruncate Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-20FS-Cache: Mark cancellation of in-progress operationDavid Howells
Mark as cancelled an operation that is in progress rather than pending at the time it is cancelled, and call fscache_complete_op() to cancel an operation so that blocked ops can be started. Signed-off-by: David Howells <dhowells@redhat.com>
2012-12-20FS-Cache: Convert the object event ID #defines into an enumDavid Howells
Convert the fscache_object event IDs from #defines into an enum. Also add an extra label to the enum to carry the event count and redefine the event mask in terms of that. Signed-off-by: David Howells <dhowells@redhat.com>
2012-12-20VFS: Make more complete truncate operation available to CacheFilesDavid Howells
Make a more complete truncate operation available to CacheFiles (including security checks and suchlike) so that it can use this to clear invalidated cache files. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-20FS-Cache: Provide proper invalidationDavid Howells
Provide a proper invalidation method rather than relying on the netfs retiring the cookie it has and getting a new one. The problem with this is that isn't easy for the netfs to make sure that it has completed/cancelled all its outstanding storage and retrieval operations on the cookie it is retiring. Instead, have the cache provide an invalidation method that will cancel or wait for all currently outstanding operations before invalidating the cache, and will cause new operations to queue up behind that. Whilst invalidation is in progress, some requests will be rejected until the cache can stack a barrier on the operation queue to cause new operations to be deferred behind it. Signed-off-by: David Howells <dhowells@redhat.com>
2012-12-20FS-Cache: Fix operation state management and accountingDavid Howells
Fix the state management of internal fscache operations and the accounting of what operations are in what states. This is done by: (1) Give struct fscache_operation a enum variable that directly represents the state it's currently in, rather than spreading this knowledge over a bunch of flags, who's processing the operation at the moment and whether it is queued or not. This makes it easier to write assertions to check the state at various points and to prevent invalid state transitions. (2) Add an 'operation complete' state and supply a function to indicate the completion of an operation (fscache_op_complete()) and make things call it. The final call to fscache_put_operation() can then check that an op in the appropriate state (complete or cancelled). (3) Adjust the use of object->n_ops, ->n_in_progress, ->n_exclusive to better govern the state of an object: (a) The ->n_ops is now the number of extant operations on the object and is now decremented by fscache_put_operation() only. (b) The ->n_in_progress is simply the number of objects that have been taken off of the object's pending queue for the purposes of being run. This is decremented by fscache_op_complete() only. (c) The ->n_exclusive is the number of exclusive ops that have been submitted and queued or are in progress. It is decremented by fscache_op_complete() and by fscache_cancel_op(). fscache_put_operation() and fscache_operation_gc() now no longer try to clean up ->n_exclusive and ->n_in_progress. That was leading to double decrements against fscache_cancel_op(). fscache_cancel_op() now no longer decrements ->n_ops. That was leading to double decrements against fscache_put_operation(). fscache_submit_exclusive_op() now decides whether it has to queue an op based on ->n_in_progress being > 0 rather than ->n_ops > 0 as the latter will persist in being true even after all preceding operations have been cancelled or completed. Furthermore, if an object is active and there are runnable ops against it, there must be at least one op running. (4) Add a remaining-pages counter (n_pages) to struct fscache_retrieval and provide a function to record completion of the pages as they complete. When n_pages reaches 0, the operation is deemed to be complete and fscache_op_complete() is called. Add calls to fscache_retrieval_complete() anywhere we've finished with a page we've been given to read or allocate for. This includes places where we just return pages to the netfs for reading from the server and where accessing the cache fails and we discard the proposed netfs page. The bugs in the unfixed state management manifest themselves as oopses like the following where the operation completion gets out of sync with return of the cookie by the netfs. This is possible because the cache unlocks and returns all the netfs pages before recording its completion - which means that there's nothing to stop the netfs discarding them and returning the cookie. FS-Cache: Cookie 'NFS.fh' still has outstanding reads ------------[ cut here ]------------ kernel BUG at fs/fscache/cookie.c:519! invalid opcode: 0000 [#1] SMP CPU 1 Modules linked in: cachefiles nfs fscache auth_rpcgss nfs_acl lockd sunrpc Pid: 400, comm: kswapd0 Not tainted 3.1.0-rc7-fsdevel+ #1090 /DG965RY RIP: 0010:[<ffffffffa007050a>] [<ffffffffa007050a>] __fscache_relinquish_cookie+0x170/0x343 [fscache] RSP: 0018:ffff8800368cfb00 EFLAGS: 00010282 RAX: 000000000000003c RBX: ffff880023cc8790 RCX: 0000000000000000 RDX: 0000000000002f2e RSI: 0000000000000001 RDI: ffffffff813ab86c RBP: ffff8800368cfb50 R08: 0000000000000002 R09: 0000000000000000 R10: ffff88003a1b7890 R11: ffff88001df6e488 R12: ffff880023d8ed98 R13: ffff880023cc8798 R14: 0000000000000004 R15: ffff88003b8bf370 FS: 0000000000000000(0000) GS:ffff88003bd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000008ba008 CR3: 0000000023d93000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kswapd0 (pid: 400, threadinfo ffff8800368ce000, task ffff88003b8bf040) Stack: ffff88003b8bf040 ffff88001df6e528 ffff88001df6e528 ffffffffa00b46b0 ffff88003b8bf040 ffff88001df6e488 ffff88001df6e620 ffffffffa00b46b0 ffff88001ebd04c8 0000000000000004 ffff8800368cfb70 ffffffffa00b2c91 Call Trace: [<ffffffffa00b2c91>] nfs_fscache_release_inode_cookie+0x3b/0x47 [nfs] [<ffffffffa008f25f>] nfs_clear_inode+0x3c/0x41 [nfs] [<ffffffffa0090df1>] nfs4_evict_inode+0x2f/0x33 [nfs] [<ffffffff810d8d47>] evict+0xa1/0x15c [<ffffffff810d8e2e>] dispose_list+0x2c/0x38 [<ffffffff810d9ebd>] prune_icache_sb+0x28c/0x29b [<ffffffff810c56b7>] prune_super+0xd5/0x140 [<ffffffff8109b615>] shrink_slab+0x102/0x1ab [<ffffffff8109d690>] balance_pgdat+0x2f2/0x595 [<ffffffff8103e009>] ? process_timeout+0xb/0xb [<ffffffff8109dba3>] kswapd+0x270/0x289 [<ffffffff8104c5ea>] ? __init_waitqueue_head+0x46/0x46 [<ffffffff8109d933>] ? balance_pgdat+0x595/0x595 [<ffffffff8104bf7a>] kthread+0x7f/0x87 [<ffffffff813ad6b4>] kernel_thread_helper+0x4/0x10 [<ffffffff81026b98>] ? finish_task_switch+0x45/0xc0 [<ffffffff813abcdd>] ? retint_restore_args+0xe/0xe [<ffffffff8104befb>] ? __init_kthread_worker+0x53/0x53 [<ffffffff813ad6b0>] ? gs_change+0xb/0xb Signed-off-by: David Howells <dhowells@redhat.com>
2012-12-20FS-Cache: Make cookie relinquishment wait for outstanding readsDavid Howells
Make fscache_relinquish_cookie() log a warning and wait if there are any outstanding reads left on the cookie it was given. Signed-off-by: David Howells <dhowells@redhat.com>
2012-12-20CacheFiles: Fix the marking of cached pagesDavid Howells
Under some circumstances CacheFiles defers the marking of pages with PG_fscache so that it can take advantage of pagevecs to reduce the number of calls to fscache_mark_pages_cached() and the netfs's hook to keep track of this. There are, however, two problems with this: (1) It can lead to the PG_fscache mark being applied _after_ the page is set PG_uptodate and unlocked (by the call to fscache_end_io()). (2) CacheFiles's ref on the page is dropped immediately following fscache_end_io() - and so may not still be held when the mark is applied. This can lead to the page being passed back to the allocator before the mark is applied. Fix this by, where appropriate, marking the page before calling fscache_end_io() and releasing the page. This means that we can't take advantage of pagevecs and have to make a separate call for each page to the marking routines. The symptoms of this are Bad Page state errors cropping up under memory pressure, for example: BUG: Bad page state in process tar pfn:002da page:ffffea0000009fb0 count:0 mapcount:0 mapping: (null) index:0x1447 page flags: 0x1000(private_2) Pid: 4574, comm: tar Tainted: G W 3.1.0-rc4-fsdevel+ #1064 Call Trace: [<ffffffff8109583c>] ? dump_page+0xb9/0xbe [<ffffffff81095916>] bad_page+0xd5/0xea [<ffffffff81095d82>] get_page_from_freelist+0x35b/0x46a [<ffffffff810961f3>] __alloc_pages_nodemask+0x362/0x662 [<ffffffff810989da>] __do_page_cache_readahead+0x13a/0x267 [<ffffffff81098942>] ? __do_page_cache_readahead+0xa2/0x267 [<ffffffff81098d7b>] ra_submit+0x1c/0x20 [<ffffffff8109900a>] ondemand_readahead+0x28b/0x29a [<ffffffff81098ee2>] ? ondemand_readahead+0x163/0x29a [<ffffffff810990ce>] page_cache_sync_readahead+0x38/0x3a [<ffffffff81091d8a>] generic_file_aio_read+0x2ab/0x67e [<ffffffffa008cfbe>] nfs_file_read+0xa4/0xc9 [nfs] [<ffffffff810c22c4>] do_sync_read+0xba/0xfa [<ffffffff81177a47>] ? security_file_permission+0x7b/0x84 [<ffffffff810c25dd>] ? rw_verify_area+0xab/0xc8 [<ffffffff810c29a4>] vfs_read+0xaa/0x13a [<ffffffff810c2a79>] sys_read+0x45/0x6c [<ffffffff813ac37b>] system_call_fastpath+0x16/0x1b As can be seen, PG_private_2 (== PG_fscache) is set in the page flags. Instrumenting fscache_mark_pages_cached() to verify whether page->mapping was set appropriately showed that sometimes it wasn't. This led to the discovery that sometimes the page has apparently been reclaimed by the time the marker got to see it. Reported-by: M. Stevens <m@tippett.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com>
2012-12-20vfs: remove DCACHE_NEED_LOOKUPJeff Layton
The code that relied on that flag was ripped out of btrfs quite some time ago, and never added back. Josef indicated that he was going to take a different approach to the problem in btrfs, and that we could just eliminate this flag. Cc: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-20Merge tag 'virtio-next-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull virtio update from Rusty Russell: "Some nice cleanups, and even a patch my wife did as a "live" demo for Latinoware 2012. There's a slightly non-trivial merge in virtio-net, as we cleaned up the virtio add_buf interface while DaveM accepted the mq virtio-net patches." * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (27 commits) virtio_console: Add support for remoteproc serial virtio_console: Merge struct buffer_token into struct port_buffer virtio: add drv_to_virtio to make code clearly virtio: use dev_to_virtio wrapper in virtio virtio-mmio: Fix irq parsing in command line parameter virtio_console: Free buffers from out-queue upon close virtio: Convert dev_printk(KERN_<LEVEL> to dev_<level>( virtio_console: Use kmalloc instead of kzalloc virtio_console: Free buffer if splice fails virtio: tools: make it clear that virtqueue_add_buf() no longer returns > 0 virtio: scsi: make it clear that virtqueue_add_buf() no longer returns > 0 virtio: rpmsg: make it clear that virtqueue_add_buf() no longer returns > 0 virtio: net: make it clear that virtqueue_add_buf() no longer returns > 0 virtio: console: make it clear that virtqueue_add_buf() no longer returns > 0 virtio: make virtqueue_add_buf() returning 0 on success, not capacity. virtio: console: don't rely on virtqueue_add_buf() returning capacity. virtio_net: don't rely on virtqueue_add_buf() returning capacity. virtio-net: remove unused skb_vnet_hdr->num_sg field virtio-net: correct capacity math on ring full virtio: move queue_index and num_free fields into core struct virtqueue. ...
2012-12-20Merge tag 'sound-3.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "This update contains overall only driver-specific fixes. Slightly large LOC are seen in usb-audio driver for a couple of new device quirks and cs42l71 ASoC driver for enhanced features. The others are a few small (regression) fixes HD-audio, and yet other small / trival ASoC fixes." * tag 'sound-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: usb-audio: Support for Digidesign Mbox 2 USB sound card: ALSA: HDA: Fix sound resume hang ALSA: hda - bug fix for invalid connection list of Haswell HDMI codec pins ALSA: hda - Fix the wrong pincaps set in ALC861VD dallas/hp fixup ALSA: hda - Set codec->single_adc_amp flag for Realtek codecs ASoC: atmel-ssc: change disable to disable in dts node ASoC: Prevent pop_wait overwrite ALSA: usb-audio: ignore-quirk for HP Wireless Audio ALSA: hda - Always turn on pins for HDMI/DP ALSA: hda - Fix pin configuration of HP Pavilion dv7 ASoC: core: Fix splitting of log messages ASoC: cs42l73: Change VSPIN/VSPOUT to VSPINOUT ASoC: cs42l73: Add DAPM events for power down. ASoC: cs42l73: Add DMIC's as DAPM inputs. ASoC: sigmadsp: Fix endianness conversion issue ASoC: tpa6130a2: Use devm_* APIs
2012-12-20Merge tag 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-socLinus Torvalds
Pull ARM SoC fixes from Olof Johansson: "This is a batch of fixes for arm-soc platforms, most of it is for OMAP but there are others too (i.MX, Tegra, ep93xx). Fixes warnings, some broken platforms and drivers, etc. A bit all over the map really." There was some concern about commit 68136b10 ("RM: sunxi: Change device tree naming scheme for sunxi"), but Tony says: "Looks like that's trivial to fix as needed, no need to rebuild the branch to fix that AFAIK. The fix can be done once Olof is available online again. Linus, I suggest that you go ahead and pull this if there are no other issues with this branch." * tag 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (32 commits) ARM: sunxi: Change device tree naming scheme for sunxi ARM: ux500: fix missing include ARM: u300: delete custom pin hog code ARM: davinci: fix build break due to missing include ARM: exynos: Fix warning due to missing 'inline' in stub ARM: imx: Move platform-mx2-emma to arch/arm/mach-imx/devices ARM i.MX51 clock: Fix regression since enabling MIPI/HSP clocks ARM: dts: mx27: Fix the AIPI bus for FEC ARM: OMAP2+: common: remove use of vram ARM: OMAP3/4: cpuidle: fix sparse and checkpatch warnings ARM: OMAP4: clock data: DPLLs are missing bypass clocks in their parent lists ARM: OMAP4: clock data: div_iva_hs_clk is a power-of-two divider ARM: OMAP4: Fix EMU clock domain always on ARM: OMAP4460: Workaround ABE DPLL failing to turn-on ARM: OMAP4: Enhance support for DPLLs with 4X multiplier ARM: OMAP4: Add function table for non-M4X dplls ARM: OMAP4: Update timer clock aliases ARM: OMAP: Move plat/omap-serial.h to include/linux/platform_data/serial-omap.h ARM: dts: Add build target for omap4-panda-a4 ARM: dts: OMAP2420: Correct H4 board memory size ...
2012-12-20dma-buf: remove fallback for !CONFIG_DMA_SHARED_BUFFERMaarten Lankhorst
Documentation says that code requiring dma-buf should add it to select, so inline fallbacks are not going to be used. A link error will make it obvious what went wrong, instead of silently doing nothing at runtime. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Rob Clark <rob.clark@linaro.org> Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
2012-12-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Really fix tuntap SKB use after free bug, from Eric Dumazet. 2) Adjust SKB data pointer to point past the transport header before calling icmpv6_notify() so that the headers are in the state which that function expects. From Duan Jiong. 3) Fix ambiguities in the new tuntap multi-queue APIs. From Jason Wang. 4) mISDN needs to use del_timer_sync(), from Konstantin Khlebnikov. 5) Don't destroy mutex after freeing up device private in mac802154, fix also from Konstantin Khlebnikov. 6) Fix INET request socket leak in TCP and DCCP, from Christoph Paasch. 7) SCTP HMAC kconfig rework, from Neil Horman. 8) Fix SCTP jprobes function signature, otherwise things explode, from Daniel Borkmann. 9) Fix typo in ipv6-offload Makefile variable reference, from Simon Arlott. 10) Don't fail USBNET open just because remote wakeup isn't supported, from Oliver Neukum. 11) be2net driver bug fixes from Sathya Perla. 12) SOLOS PCI ATM driver bug fixes from Nathan Williams and David Woodhouse. 13) Fix MTU changing regression in 8139cp driver, from John Greene. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (45 commits) solos-pci: ensure all TX packets are aligned to 4 bytes solos-pci: add firmware upgrade support for new models solos-pci: remove superfluous debug output solos-pci: add GPIO support for newer versions on Geos board 8139cp: Prevent dev_close/cp_interrupt race on MTU change net: qmi_wwan: add ZTE MF880 drivers/net: Use of_match_ptr() macro in smsc911x.c drivers/net: Use of_match_ptr() macro in smc91x.c ipv6: addrconf.c: remove unnecessary "if" bridge: Correctly encode addresses when dumping mdb entries bridge: Do not unregister all PF_BRIDGE rtnl operations use generic usbnet_manage_power() usbnet: generic manage_power() usbnet: handle PM failure gracefully ksz884x: fix receive polling race condition qlcnic: update driver version qlcnic: fix unused variable warnings net: fec: forbid FEC_PTP on SoCs that do not support be2net: fix wrong frag_idx reported by RX CQ be2net: fix be_close() to ensure all events are ack'ed ...
2012-12-19Merge tags 'dt-for-linus', 'gpio-for-linus' and 'spi-for-linus' of ↵Linus Torvalds
git://git.secretlab.ca/git/linux-2.6 Pull devicetree, gpio and spi bugfixes from Grant Likely: "Device tree v3.8 bug fix: - Fixes an undefined struct device build error and a missing symbol export. GPIO device driver bug fixes: - gpio/mvebu-gpio: Make mvebu-gpio depend on OF_CONFIG - gpio/ich: Add missing spinlock init SPI device driver bug fixes: - Most of this is bug fixes to the core code and the sh-hspi and s3c64xx device drivers. - There is also a patch here to add DT support to the Atmel driver. This one should have been in the first round, but I missed it. It's a low risk change contained within a single driver and the Atmel maintainer has requested it." * tag 'dt-for-linus' of git://git.secretlab.ca/git/linux-2.6: of: define struct device in of_platform.h if !OF_DEVICE and !OF_ADDRESS of: Fix export of of_find_matching_node_and_match() * tag 'gpio-for-linus' of git://git.secretlab.ca/git/linux-2.6: gpio/mvebu-gpio: Make mvebu-gpio depend on OF_CONFIG gpio/ich: Add missing spinlock init * tag 'spi-for-linus' of git://git.secretlab.ca/git/linux-2.6: spi/sh-hspi: fix return value check in hspi_probe(). spi: fix tegra SPI binding examples spi/atmel: add DT support of/spi: Fix SPI module loading by using proper "spi:" modalias prefixes. spi: Change FIFO flush operation and spi channel off spi: Keep chipselect assertion during one message
2012-12-19Merge tag 'for-linus-20121219' of git://git.infradead.org/linux-mtdLinus Torvalds
Pull MTD updates from David Woodhouse: - Various cleanups especially in NAND tests - Add support for NAND flash on BCMA bus - DT support for sh_flctl and denali NAND drivers - Kill obsolete/superceded drivers (fortunet, nomadik_nand) - Fix JFFS2 locking bug in ENOMEM failure path - New SPI flash chips, as usual - Support writing in 'reliable mode' for DiskOnChip G4 - Debugfs support in nandsim * tag 'for-linus-20121219' of git://git.infradead.org/linux-mtd: (96 commits) mtd: nand: typo in nand_id_has_period() comments mtd: nand/gpio: use io{read,write}*_rep accessors mtd: block2mtd: throttle writes by calling balance_dirty_pages_ratelimited. mtd: nand: gpmi: reset BCH earlier, too, to avoid NAND startup problems mtd: nand/docg4: fix and improve read of factory bbt mtd: nand/docg4: reserve bb marker area in ecclayout mtd: nand/docg4: add support for writing in reliable mode mtd: mxc_nand: reorder part_probes to let cmdline override other sources mtd: mxc_nand: fix unbalanced clk_disable() in error path mtd: nandsim: Introduce debugfs infrastructure mtd: physmap_of: error checking to prevent a NULL pointer dereference mtg: docg3: potential divide by zero in doc_write_oob() mtd: bcm47xxnflash: writing support mtd: tests/read: initialize buffer for whole next page mtd: at91: atmel_nand: return bit flips for the PMECC read_page() mtd: fix recovery after failed write-buffer operation in cfi_cmdset_0002.c mtd: nand: onfi need to be probed in 8 bits mode mtd: nand: add NAND_BUSWIDTH_AUTO to autodetect bus width mtd: nand: print flash size during detection mted: nand_wait_ready timeout fix ...
2012-12-19usbnet: generic manage_power()Oliver Neukum
Centralise common code for manage_power() in usbnet by making a generic simple implementation Signed-off-by: Oliver Neukum <oneukum@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-19usbnet: handle PM failure gracefullyOliver Neukum
If a device fails to do remote wakeup, this is no reason to abort an open totally. This patch just continues without runtime PM. Signed-off-by: Oliver Neukum <oneukum@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-19Merge tag 'for-3.8-rc1' of git://gitorious.org/linux-pwm/linux-pwmLinus Torvalds
Pull pwm changes from Thierry Reding: "A new driver has been added for the SPEAr platform and the TWL4030/6030 driver has been replaced by two drivers that control the regular PWMs and the PWM driven LEDs provided by the chips. The vt8500, tiecap, tiehrpwm, i.MX, LPC32xx and Samsung drivers have all been improved and the device tree bindings now support the PWM signal polarity." Fix up trivial conflicts due to __devinit/exit removal. * tag 'for-3.8-rc1' of git://gitorious.org/linux-pwm/linux-pwm: (21 commits) pwm: samsung: add missing s3c->pwm_id assignment pwm: lpc32xx: Set the chip base for dynamic allocation pwm: lpc32xx: Properly disable the clock on device removal pwm: lpc32xx: Fix the PWM polarity pwm: i.MX: eliminate build warning pwm: Export of_pwm_xlate_with_flags() pwm: Remove pwm-twl6030 driver pwm: New driver to support PWM driven LEDs on TWL4030/6030 series of PMICs pwm: New driver to support PWMs on TWL4030/6030 series of PMICs pwm: pwm-tiehrpwm: pinctrl support pwm: tiehrpwm: Add device-tree binding pwm: pwm-tiehrpwm: Adding TBCLK gating support. pwm: pwm-tiecap: pinctrl support pwm: tiecap: Add device-tree binding pwm: Add TI PWM subsystem driver pwm: Device tree support for PWM polarity pwm: vt8500: Ensure PWM clock is enabled during pwm_config pwm: vt8500: Fix build error pwm: spear: Staticize spear_pwm_config() pwm: Add SPEAr PWM chip driver support ...
2012-12-19of: define struct device in of_platform.h if !OF_DEVICE and !OF_ADDRESSJonas Gorski
Fixes the following warning: include/linux/of_platform.h:106:13: warning: 'struct device' declared inside parameter list [enabled by default] include/linux/of_platform.h:106:13: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default] Signed-off-by: Jonas Gorski <jogo@openwrt.org> Signed-off-by: Rob Herring <robherring2@gmail.com> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2012-12-19Merge tag 'modules-next-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull module update from Rusty Russell: "Nothing all that exciting; a new module-from-fd syscall for those who want to verify the source of the module (ChromeOS) and/or use standard IMA on it or other security hooks." * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: MODSIGN: Fix kbuild output when using default extra_certificates MODSIGN: Avoid using .incbin in C source modules: don't hand 0 to vmalloc. module: Remove a extra null character at the top of module->strtab. ASN.1: Use the ASN1_LONG_TAG and ASN1_INDEFINITE_LENGTH constants ASN.1: Define indefinite length marker constant moduleparam: use __UNIQUE_ID() __UNIQUE_ID() MODSIGN: Add modules_sign make target powerpc: add finit_module syscall. ima: support new kernel module syscall add finit_module syscall to asm-generic ARM: add finit_module syscall to ARM security: introduce kernel_module_from_file hook module: add flags arg to sys_finit_module() module: add syscall to load module from fd
2012-12-19Merge tag 'byteswap-for-linus-20121219' of ↵Linus Torvalds
git://git.infradead.org/users/dwmw2/byteswap Pull preparatory gcc intrisics bswap patch from David Woodhouse: "This single patch is effectively a no-op for now. It enables architectures to opt in to using GCC's __builtin_bswapXX() intrinsics for byteswapping, and if we merge this now then the architecture maintainers can enable it for their arch during the next cycle without dependency issues. It's worth making it a par-arch opt-in, because although in *theory* the compiler should never do worse than hand-coded assembler (and of course it also ought to do a lot better on platforms like Atom and PowerPC which have load-and-swap or store-and-swap instructions), that isn't always the case. See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46453 for example." * tag 'byteswap-for-linus-20121219' of git://git.infradead.org/users/dwmw2/byteswap: byteorder: allow arch to opt to use GCC intrinsics for byteswapping
2012-12-19blk: avoid divide-by-zero with zero discard granularityLinus Torvalds
Commit 8dd2cb7e880d ("block: discard granularity might not be power of 2") changed a couple of 'binary and' operations into modulus operations. Which turned the harmless case of a zero discard_granularity into a possible divide-by-zero. The code also had a much more subtle bug: it was doing the modulus of a value in bytes using 'sector_t'. That was always conceptually wrong, but didn't actually matter back when the code assumed a power-of-two granularity: we only looked at the low bits anyway. But with potentially arbitrary sector numbers, using a 'sector_t' to express bytes is very very wrong: depending on configuration it limits the starting offset of the device to just 32 bits, and any overflow would result in a wrong value if the modulus wasn't a power-of-two. So re-write the code to not only protect against the divide-by-zero, but to do the starting sector arithmetic in sectors, and using the proper types. [ For any mathematicians out there: it also looks monumentally stupid to do the 'modulo granularity' operation *twice*, never mind having a "+ granularity" in the second modulus op. But that's the easiest way to avoid negative values or overflow, and it is how the original code was done. ] Reported-by: Ingo Molnar <mingo@kernel.org> Reported-by: Doug Anderson <dianders@chromium.org> Cc: Neil Brown <neilb@suse.de> Cc: Shaohua Li <shli@fusionio.com> Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18Merge branch 'i2c-embedded/for-next' of git://git.pengutronix.de/git/wsa/linuxLinus Torvalds
Pull i2c-embedded changes from Wolfram Sang: - CBUS driver (an I2C variant) - continued rework of the omap driver - s3c2410 gets lots of fixes and gains pinctrl support - at91 gains DMA support - the GPIO muxer gains devicetree probing - typical fixes and additions all over * 'i2c-embedded/for-next' of git://git.pengutronix.de/git/wsa/linux: (45 commits) i2c: omap: Remove the OMAP_I2C_FLAG_RESET_REGS_POSTIDLE flag i2c: at91: add dma support i2c: at91: change struct members indentation i2c: at91: fix compilation warning i2c: mxs: Do not disable the I2C SMBus quick mode i2c: mxs: Handle i2c DMA failure properly i2c: s3c2410: Remove recently introduced performance overheads i2c: ocores: Move grlib set/get functions into #ifdef CONFIG_OF block i2c: s3c2410: Add fix for i2c suspend/resume i2c: s3c2410: Fix code to free gpios i2c: i2c-cbus-gpio: introduce driver i2c: ocores: Add support for the GRLIB port of the controller and use function pointers for getreg and setreg functions i2c: ocores: Add irq support for sparc i2c: omap: Move the remove constraint ARM: dts: cfa10049: Add the i2c muxer buses to the CFA-10049 i2c: s3c2410: do not special case HDMIPHY stuck bus detection i2c: s3c2410: use exponential back off while polling for bus idle i2c: s3c2410: do not generate STOP for QUIRK_HDMIPHY i2c: s3c2410: grab adapter lock while changing i2c clock i2c: s3c2410: Add support for pinctrl ...
2012-12-18Merge branch 'akpm' (more patches from Andrew)Linus Torvalds
Merge patches from Andrew Morton: "Most of the rest of MM, plus a few dribs and drabs. I still have quite a few irritating patches left around: ones with dubious testing results, lack of review, ones which should have gone via maintainer trees but the maintainers are slack, etc. I need to be more activist in getting these things wrapped up outside the merge window, but they're such a PITA." * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (48 commits) mm/vmscan.c: avoid possible deadlock caused by too_many_isolated() vmscan: comment too_many_isolated() mm/kmemleak.c: remove obsolete simple_strtoul mm/memory_hotplug.c: improve comments mm/hugetlb: create hugetlb cgroup file in hugetlb_init mm/mprotect.c: coding-style cleanups Documentation: ABI: /sys/devices/system/node/ slub: drop mutex before deleting sysfs entry memcg: add comments clarifying aspects of cache attribute propagation kmem: add slab-specific documentation about the kmem controller slub: slub-specific propagation changes slab: propagate tunable values memcg: aggregate memcg cache values in slabinfo memcg/sl[au]b: shrink dead caches memcg/sl[au]b: track all the memcg children of a kmem_cache memcg: destroy memcg caches sl[au]b: allocate objects from memcg cache sl[au]b: always get the cache from its page in kmem_cache_free() memcg: skip memcg kmem allocations in specified code regions memcg: infrastructure to match an allocation to the right cache ...
2012-12-18mm/hugetlb: create hugetlb cgroup file in hugetlb_initJianguo Wu
Build kernel with CONFIG_HUGETLBFS=y,CONFIG_HUGETLB_PAGE=y and CONFIG_CGROUP_HUGETLB=y, then specify hugepagesz=xx boot option, system will fail to boot. This failure is caused by following code path: setup_hugepagesz hugetlb_add_hstate hugetlb_cgroup_file_init cgroup_add_cftypes kzalloc <--slab is *not available* yet For this path, slab is not available yet, so memory allocated will be failed, and cause WARN_ON() in hugetlb_cgroup_file_init(). So I move hugetlb_cgroup_file_init() into hugetlb_init(). [akpm@linux-foundation.org: tweak coding-style, remove pointless __init on inlined function] [akpm@linux-foundation.org: fix warning] Signed-off-by: Jianguo Wu <wujianguo@huawei.com> Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18memcg: add comments clarifying aspects of cache attribute propagationGlauber Costa
This patch clarifies two aspects of cache attribute propagation. First, the expected context for the for_each_memcg_cache macro in memcontrol.h. The usages already in the codebase are safe. In mm/slub.c, it is trivially safe because the lock is acquired right before the loop. In mm/slab.c, it is less so: the lock is acquired by an outer function a few steps back in the stack, so a VM_BUG_ON() is added to make sure it is indeed safe. A comment is also added to detail why we are returning the value of the parent cache and ignoring the children's when we propagate the attributes. Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18slub: slub-specific propagation changesGlauber Costa
SLUB allows us to tune a particular cache behavior with sysfs-based tunables. When creating a new memcg cache copy, we'd like to preserve any tunables the parent cache already had. This can be done by tapping into the store attribute function provided by the allocator. We of course don't need to mess with read-only fields. Since the attributes can have multiple types and are stored internally by sysfs, the best strategy is to issue a ->show() in the root cache, and then ->store() in the memcg cache. The drawback of that, is that sysfs can allocate up to a page in buffering for show(), that we are likely not to need, but also can't guarantee. To avoid always allocating a page for that, we can update the caches at store time with the maximum attribute size ever stored to the root cache. We will then get a buffer big enough to hold it. The corolary to this, is that if no stores happened, nothing will be propagated. It can also happen that a root cache has its tunables updated during normal system operation. In this case, we will propagate the change to all caches that are already active. [akpm@linux-foundation.org: tweak code to avoid __maybe_unused] Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18slab: propagate tunable valuesGlauber Costa
SLAB allows us to tune a particular cache behavior with tunables. When creating a new memcg cache copy, we'd like to preserve any tunables the parent cache already had. This could be done by an explicit call to do_tune_cpucache() after the cache is created. But this is not very convenient now that the caches are created from common code, since this function is SLAB-specific. Another method of doing that is taking advantage of the fact that do_tune_cpucache() is always called from enable_cpucache(), which is called at cache initialization. We can just preset the values, and then things work as expected. It can also happen that a root cache has its tunables updated during normal system operation. In this case, we will propagate the change to all caches that are already active. This change will require us to move the assignment of root_cache in memcg_params a bit earlier. We need this to be already set - which memcg_kmem_register_cache will do - when we reach __kmem_cache_create() Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18memcg: aggregate memcg cache values in slabinfoGlauber Costa
When we create caches in memcgs, we need to display their usage information somewhere. We'll adopt a scheme similar to /proc/meminfo, with aggregate totals shown in the global file, and per-group information stored in the group itself. For the time being, only reads are allowed in the per-group cache. Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18memcg/sl[au]b: track all the memcg children of a kmem_cacheGlauber Costa
This enables us to remove all the children of a kmem_cache being destroyed, if for example the kernel module it's being used in gets unloaded. Otherwise, the children will still point to the destroyed parent. Signed-off-by: Suleiman Souhlal <suleiman@google.com> Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18memcg: destroy memcg cachesGlauber Costa
Implement destruction of memcg caches. Right now, only caches where our reference counter is the last remaining are deleted. If there are any other reference counters around, we just leave the caches lying around until they go away. When that happens, a destruction function is called from the cache code. Caches are only destroyed in process context, so we queue them up for later processing in the general case. Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18sl[au]b: allocate objects from memcg cacheGlauber Costa
We are able to match a cache allocation to a particular memcg. If the task doesn't change groups during the allocation itself - a rare event, this will give us a good picture about who is the first group to touch a cache page. This patch uses the now available infrastructure by calling memcg_kmem_get_cache() before all the cache allocations. Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18sl[au]b: always get the cache from its page in kmem_cache_free()Glauber Costa
struct page already has this information. If we start chaining caches, this information will always be more trustworthy than whatever is passed into the function. Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18memcg: skip memcg kmem allocations in specified code regionsGlauber Costa
Create a mechanism that skip memcg allocations during certain pieces of our core code. It basically works in the same way as preempt_disable()/preempt_enable(): By marking a region under which all allocations will be accounted to the root memcg. We need this to prevent races in early cache creation, when we allocate data using caches that are not necessarily created already. Signed-off-by: Glauber Costa <glommer@parallels.com> yCc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18memcg: infrastructure to match an allocation to the right cacheGlauber Costa
The page allocator is able to bind a page to a memcg when it is allocated. But for the caches, we'd like to have as many objects as possible in a page belonging to the same cache. This is done in this patch by calling memcg_kmem_get_cache in the beginning of every allocation function. This function is patched out by static branches when kernel memory controller is not being used. It assumes that the task allocating, which determines the memcg in the page allocator, belongs to the same cgroup throughout the whole process. Misaccounting can happen if the task calls memcg_kmem_get_cache() while belonging to a cgroup, and later on changes. This is considered acceptable, and should only happen upon task migration. Before the cache is created by the memcg core, there is also a possible imbalance: the task belongs to a memcg, but the cache being allocated from is the global cache, since the child cache is not yet guaranteed to be ready. This case is also fine, since in this case the GFP_KMEMCG will not be passed and the page allocator will not attempt any cgroup accounting. Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18memcg: allocate memory for memcg caches whenever a new memcg appearsGlauber Costa
Every cache that is considered a root cache (basically the "original" caches, tied to the root memcg/no-memcg) will have an array that should be large enough to store a cache pointer per each memcg in the system. Theoreticaly, this is as high as 1 << sizeof(css_id), which is currently in the 64k pointers range. Most of the time, we won't be using that much. What goes in this patch, is a simple scheme to dynamically allocate such an array, in order to minimize memory usage for memcg caches. Because we would also like to avoid allocations all the time, at least for now, the array will only grow. It will tend to be big enough to hold the maximum number of kmem-limited memcgs ever achieved. We'll allocate it to be a minimum of 64 kmem-limited memcgs. When we have more than that, we'll start doubling the size of this array every time the limit is reached. Because we are only considering kmem limited memcgs, a natural point for this to happen is when we write to the limit. At that point, we already have set_limit_mutex held, so that will become our natural synchronization mechanism. Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18slab/slub: consider a memcg parameter in kmem_create_cacheGlauber Costa
Allow a memcg parameter to be passed during cache creation. When the slub allocator is being used, it will only merge caches that belong to the same memcg. We'll do this by scanning the global list, and then translating the cache to a memcg-specific cache Default function is created as a wrapper, passing NULL to the memcg version. We only merge caches that belong to the same memcg. A helper is provided, memcg_css_id: because slub needs a unique cache name for sysfs. Since this is visible, but not the canonical location for slab data, the cache name is not used, the css_id should suffice. Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18slab/slub: struct memcg_paramsGlauber Costa
For the kmem slab controller, we need to record some extra information in the kmem_cache structure. Signed-off-by: Glauber Costa <glommer@parallels.com> Signed-off-by: Suleiman Souhlal <suleiman@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18fork: protect architectures where THREAD_SIZE >= PAGE_SIZE against fork bombsGlauber Costa
Because those architectures will draw their stacks directly from the page allocator, rather than the slab cache, we can directly pass __GFP_KMEMCG flag, and issue the corresponding free_pages. This code path is taken when the architecture doesn't define CONFIG_ARCH_THREAD_INFO_ALLOCATOR (only ia64 seems to), and has THREAD_SIZE >= PAGE_SIZE. Luckily, most - if not all - of the remaining architectures fall in this category. This will guarantee that every stack page is accounted to the memcg the process currently lives on, and will have the allocations to fail if they go over limit. For the time being, I am defining a new variant of THREADINFO_GFP, not to mess with the other path. Once the slab is also tracked by memcg, we can get rid of that flag. Tested to successfully protect against :(){ :|:& };: Signed-off-by: Glauber Costa <glommer@parallels.com> Acked-by: Frederic Weisbecker <fweisbec@redhat.com> Acked-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18memcg: use static branches when code not in useGlauber Costa
We can use static branches to patch the code in or out when not used. Because the _ACTIVE bit on kmem_accounted is only set after the increment is done, we guarantee that the root memcg will always be selected for kmem charges until all call sites are patched (see memcg_kmem_enabled). This guarantees that no mischarges are applied. Static branch decrement happens when the last reference count from the kmem accounting in memcg dies. This will only happen when the charges drop down to 0. When that happens, we need to disable the static branch only on those memcgs that enabled it. To achieve this, we would be forced to complicate the code by keeping track of which memcgs were the ones that actually enabled limits, and which ones got it from its parents. It is a lot simpler just to do static_key_slow_inc() on every child that is accounted. Signed-off-by: Glauber Costa <glommer@parallels.com> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: JoonSoo Kim <js1304@gmail.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18res_counter: return amount of charges after res_counter_uncharge()Glauber Costa
It is useful to know how many charges are still left after a call to res_counter_uncharge. While it is possible to issue a res_counter_read after uncharge, this can be racy. If we need, for instance, to take some action when the counters drop down to 0, only one of the callers should see it. This is the same semantics as the atomic variables in the kernel. Since the current return value is void, we don't need to worry about anything breaking due to this change: nobody relied on that, and only users appearing from now on will be checking this value. Signed-off-by: Glauber Costa <glommer@parallels.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Acked-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: JoonSoo Kim <js1304@gmail.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18mm: allocate kernel pages to the right memcgGlauber Costa
When a process tries to allocate a page with the __GFP_KMEMCG flag, the page allocator will call the corresponding memcg functions to validate the allocation. Tasks in the root memcg can always proceed. To avoid adding markers to the page - and a kmem flag that would necessarily follow, as much as doing page_cgroup lookups for no reason, whoever is marking its allocations with __GFP_KMEMCG flag is responsible for telling the page allocator that this is such an allocation at free_pages() time. This is done by the invocation of __free_accounted_pages() and free_accounted_pages(). Signed-off-by: Glauber Costa <glommer@parallels.com> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: JoonSoo Kim <js1304@gmail.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18memcg: kmem controller infrastructureGlauber Costa
Introduce infrastructure for tracking kernel memory pages to a given memcg. This will happen whenever the caller includes the flag __GFP_KMEMCG flag, and the task belong to a memcg other than the root. In memcontrol.h those functions are wrapped in inline acessors. The idea is to later on, patch those with static branches, so we don't incur any overhead when no mem cgroups with limited kmem are being used. Users of this functionality shall interact with the memcg core code through the following functions: memcg_kmem_newpage_charge: will return true if the group can handle the allocation. At this point, struct page is not yet allocated. memcg_kmem_commit_charge: will either revert the charge, if struct page allocation failed, or embed memcg information into page_cgroup. memcg_kmem_uncharge_page: called at free time, will revert the charge. Signed-off-by: Glauber Costa <glommer@parallels.com> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: JoonSoo Kim <js1304@gmail.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Rik van Riel <riel@redhat.com> Cc: Suleiman Souhlal <suleiman@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18mm: add a __GFP_KMEMCG flagGlauber Costa
This flag is used to indicate to the callees that this allocation is a kernel allocation in process context, and should be accounted to current's memcg. Signed-off-by: Glauber Costa <glommer@parallels.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: JoonSoo Kim <js1304@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>