AgeCommit message (Collapse)Author
2012-10-12Merge branch 'task-placement-v2' into big-LITTLE-MP-v10big-LITTLE-MP-v10Viresh Kumar
This merge adds two fixup patches from tixy & morten on task-placement-v2 branch.
2012-10-12sched: Only down migrate low priority tasks if allowed by affinity maskMorten Rasmussen
Adds an extra check intersection of the task affinity mask and the slower hmp_domain cpumask before down migrating low priority tasks. Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
2012-10-12ARM: sched: Avoid empty 'slow' HMP domainJon Medhurst
On homogeneous (non-heterogeneous) systems all CPUs will be declared 'fast' and the slow cpu list will be empty. In this situation we need to avoid adding an empty slow HMP domain otherwise the scheduler code will blow up when it attempts to move a task to the slow domain. Signed-off-by: Jon Medhurst <tixy@linaro.org>
2012-10-12Merge branches 'per-cpu-thread-hotplug-v3-fixed', 'task-placement-v2', ↵Viresh Kumar
'arm-asymmetric-support-v3-v3.6-rc1', 'rcu-hotplug-v1', 'arm-multi_pmu_v2', 'scheduler-misc-v1', 'hw-bkp-v7.1-debug-v1' and 'config-fragments' into big-LITTLE-MP-v10 Updates: ------- - Based on v3.6 - Stats: - Total Patches: 77 (V9 had incorrect count) - New Patches: 7 - task-placement-v2: sched: Enable HMP priority filter by default - arm-multi_pmu_v2 updated existing patches: http://permalink.gmane.org/gmane.linux.linaro.devel/13707 - hw-bkp-v7.1-debug-v1: new branch (1 patch) - Dropped Patches: 2 - branch cpu-hotplug-get_online_cpus-v1 removed as patches are already there in rcu-hotplug-v1 - Updated Patches: - per-task-load-average-v3-fixed updated with minor fixes. from: git://git.kernel.org/pub/scm/linux/kernel/git/pjt/sched.git commands used for merge: ----------------------- $ git checkout -b big-LITTLE-MP-v10 v3.6 $ git merge per-cpu-thread-hotplug-v3-fixed per-task-load-average-v3-fixed task-placement-v2 arm-asymmetric-support-v3-v3.6-rc1 rcu-hotplug-v1 arm-multi_pmu_v2 scheduler-misc-v1 hw-bkp-v7.1-debug-v1 config-fragments
2012-10-12ARM: hw_breakpoint: v7.1 self-hosted debug powerdown supportDietmar Eggemann
This patch introduces os save and restore mechanism for v7.1 debug and self-hosted debuggers. It enables the os to save DBGDSCR before powerdown and restore it when power is restored. The clearing of the os lock in the restore function kick-starts the debug logic again. The os save and restore routines are hooked into the CPU PM event notifier chain. CPU PM events are used to save and restore per-cpu context when a single CPU is preparing to enter or has exited a low power state. Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
2012-10-11linaro/configs: Enable HMP priority filter by defaultMorten Rasmussen
This updates linaro config fragments to enable the HMP priority filter by default. Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
2012-10-11sched: Enable HMP priority filter by defaultMorten Rasmussen
This updates the ARM Kconfig to enable the HMP priority filter by default. Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
2012-10-09ARM: perf: save/restore pmu registers in pm notifierSudeep KarkadaNagesha
This adds core support for saving and restoring CPU PMU registers for suspend/resume support i.e. deeper C-states in cpuidle terms. This patch adds support only to ARMv7 PMU registers save/restore. It needs to be extended to xscale and ARMv6 if needed. Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
2012-10-09ARM: perf: remove spaces in CPU PMU namesSudeep KarkadaNagesha
The userspace perf tool provides options to specify PMU names from command line for the event. An example of pmu event syntax would be (<pmu_name>/<config>/<modifier>) However the parser in the perf tool breaks the tokens at spacesand fails to identify the PMU name with spaces correctly. This patch removes spaces in the ARMv7 CPU PMU names. Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
2012-10-09ARM: perf: set cpu affinity for the irqs correctlySudeep KarkadaNagesha
This patch sets the cpu affinity for the perf IRQs in the logical order within the cluster. However interupts are assumed to be specified in the same logical order within the cluster. Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
2012-10-09ARM: perf: set cpu affinity to support multiple PMUsSudeep KarkadaNagesha
In a system with multiple heterogeneous CPU PMUs and each PMUs can handle events on a subset of CPUs, probably belonging a the same cluster. This patch introduces a cpumask to track which CPUs each PMU supports. It also updates armpmu_event_init to reject cpu-specific events being initialised for unsupported CPUs. Since process-specific events can be initialised for all the CPU PMUs,armpmu_start/stop/add are modified to prevent from being added on unsupported CPUs. Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
2012-10-09ARM: perf: register CPU PMUs with idr typesSudeep KarkadaNagesha
In order to support multiple, heterogeneous CPU PMUs and distinguish them, they cannot be registered as PERF_TYPE_RAW type. Instead we can get perf core to allocate a new idr type id for each PMU. Userspace applications can refer sysfs entried to find a PMU's type, which can then be used in tracking events on individual PMUs. Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
2012-10-09ARM: perf: replace global CPU PMU pointer with per-cpu pointersSudeep KarkadaNagesha
A single global CPU PMU pointer is not useful in a system with multiple, heterogeneous CPU PMUs as we need to access the relevant PMU depending on the current CPU. This patch replaces the single global CPU PMU pointer with per-cpu pointers and changes the OProfile accessors to refer to the PMU affine to CPU0. Signed-off-by: Sudeep KarkadaNagesha <Sudeep.KarkadaNagesha@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2012-10-09ARM: perf: consistently use arm_pmu->name for PMU nameWill Deacon
Perf has three ways to name a PMU: either by passing an explicit char *, reading arm_pmu->name or accessing arm_pmu->pmu.name. Just use arm_pmu->name consistently in the ARM backend. Signed-off-by: Will Deacon <will.deacon@arm.com>
2012-10-09ARM: perf: return NOTIFY_DONE from cpu notifier when no available PMUWill Deacon
When attempting to reset the PMU state for either a NULL PMU or a PMU implementation without a reset function, return NOTIFY_DONE from the CPU notifier as we don't care about the hotplug event. Signed-off-by: Will Deacon <will.deacon@arm.com>
2012-10-09ARM: perf: register cpu_notifier at driver initMark Rutland
The current practice of registering the cpu hotplug notifier at PMU registration time won't be safe with multiple PMUs, as we'll repeatedly attempt to register the notifier. This has the unfortunate effect of silently corrupting the notifier list, leading to boot stalling. Instead, register the notifier at init time. Its sanity checks will prevent anything bad from happening if the notifier is called before we have any PMUs registered. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2012-10-09ARM: perf: check ARMv7 counter validity on a per-pmu basisSudeep KarkadaNagesha
Multi-cluster ARMv7 systems may have CPU PMUs with different number of counters. This patch updates armv7_pmnc_counter_valid so that it takes a pmu argument and checks the counter validity against that. We also remove a number of redundant counter checks whether the current PMU is not easily retrievable. Signed-off-by: Sudeep KarkadaNagesha <Sudeep.KarkadaNagesha@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2012-10-09ARM: perf: consistently use struct perf_event in arm_pmu functionsSudeep KarkadaNagesha
The arm_pmu functions have wildly varied parameters which can often be derived from struct perf_event. This patch changes the arm_pmu function prototypes so that struct perf_event pointers are passed in preference to fields that can be derived from the event. Signed-off-by: Sudeep KarkadaNagesha <Sudeep.KarkadaNagesha@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2012-10-09ARM: perf: allocate CPU PMU dynamically at probe timeSudeep KarkadaNagesha
Supporting multiple, heterogeneous CPU PMUs requires us to allocate the arm_pmu structures dynamically as the devices are probed. This patch removes the static structure definitions for each CPU PMU type and instead passes pointers to the PMU-specific init functions. Signed-off-by: Sudeep KarkadaNagesha <Sudeep.KarkadaNagesha@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2012-10-09ARM: perf: add guest vs host discriminationMarc Zyngier
Add minimal guest support to perf, so it can distinguish whether the PMU interrupt was in the host or the guest, as well as collecting some very basic information (guest PC, user vs kernel mode). This is not feature complete though, as it doesn't support backtracing in the guest. Based on the x86 implementation, tested with KVM/ARM. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2012-10-09ARM: kernel: provide cluster to logical cpu mask mapping APILorenzo Pieralisi
Some device drivers like PMU require to retrieve the logical cpu mask that corresponds to a given cluster id. This patch provides a hook in the topology code that, given an existing cluster id as input, initializes the corresponding cpumask passed as a pointer, reusing all existing topology information required by sched domains in the kernel. Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2012-10-09ARM: ux500: Fix build error due to missing include of asm/pmu.h in cpu-db8500.caxel lin
Include asm/pmu.h to fix below build error: CC arch/arm/mach-ux500/cpu-db8500.o arch/arm/mach-ux500/cpu-db8500.c:118:8: error: variable 'db8500_pmu_platdata' has initializer but incomplete type arch/arm/mach-ux500/cpu-db8500.c:119:2: error: unknown field 'handle_irq' specified in initializer arch/arm/mach-ux500/cpu-db8500.c:119:2: warning: excess elements in struct initializer [enabled by default] arch/arm/mach-ux500/cpu-db8500.c:119:2: warning: (near initialization for 'db8500_pmu_platdata') [enabled by default] make[1]: *** [arch/arm/mach-ux500/cpu-db8500.o] Error 1 make: *** [arch/arm/mach-ux500] Error 2 Signed-off-by: Axel Lin <axel.lin@gmail.com>
2012-10-09ARM: perf: move irq registration into pmu implementationSudeep KarkadaNagesha
This patch moves the CPU-specific IRQ registration and parsing code into the CPU PMU backend. This is required because a PMU may have more than one interrupt, which in turn can be either PPI (per-cpu) or SPI (requiring strict affinity setting at the interrupt distributor). Signed-off-by: Sudeep KarkadaNagesha <Sudeep.KarkadaNagesha@arm.com> [will: cosmetic edits and reworked interrupt dispatching] Signed-off-by: Will Deacon <will.deacon@arm.com>
2012-10-09ARM: perf: move CPU-specific PMU handling code into separate fileWill Deacon
This patch moves the CPU-specific PMU handling code out of perf_event.c and into perf_event_cpu.c. Signed-off-by: Will Deacon <will.deacon@arm.com>
2012-10-09ARM: perf: prepare for moving CPU PMU code into separate fileWill Deacon
The CPU PMU code is tightly coupled with generic ARM PMU handling code. This makes it cumbersome when trying to add support for other ARM PMUs (e.g. interconnect, L2 cache controller, bus) as the generic parts of the code are not readily reusable. This patch cleans up perf_event.c so that reusable code is exposed via header files to other potential PMU drivers. The CPU code is consistently named to identify it as such and also to prepare for moving it into a separate file. Signed-off-by: Will Deacon <will.deacon@arm.com>
2012-10-09ARM: perf: probe devicetree in preference to current CPUWill Deacon
The CPU PMU is probed using the current cpuid information as part of the early_initcall initialising the architecture perf backend. For architectures without NMI (such as ARM), this does not need to be performed early and can be deferred to the driver probe callback. This also allows us to probe the devicetree in preference to parsing the current cpuid, which may be invalid on a big.LITTLE multi-cluster system. This patch defers the PMU probing and uses the devicetree information when available. Signed-off-by: Will Deacon <will.deacon@arm.com>
2012-10-09ARM: perf: remove mysterious compiler barrierWill Deacon
There's a rather strange compiler barrier in the PMU disabling code which was presumably placed there by aliens. There's no valid reason for the barrier and one can only suspect that it's up to no good. This patch removes it before it has a chance to spread. Signed-off-by: Will Deacon <will.deacon@arm.com>
2012-10-09ARM: pmu: remove arm_pmu_type enumerationSudeep KarkadaNagesha
The arm_pmu_type enumeration was initially introduced to identify different PMU types in the system, the usual one being that on the CPU (ARM_PMU_DEVICE_CPU). With the removal of the PMU reservation code and the introduction of devicetree bindings for the CPU PMU, the enumeration is no longer required. This patch removes the enumeration and updates the various CPU PMU platform devices so that they no longer pass an .id field referring to identify the PMU type. Cc: Haojian Zhuang <haojian.zhuang@gmail.com> Cc: Olof Johansson <olof@lixom.net> Cc: Pawel Moll <pawel.moll@arm.com> Acked-by: Jon Hunter <jon-hunter@ti.com> Acked-by: Kukjin Kim <kgene.kim@samsung.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Jiandong Zheng <jdzheng@broadcom.com> Signed-off-by: Sudeep KarkadaNagesha <Sudeep.KarkadaNagesha@arm.com> [will: cosmetic edits and actual removal of the enum type] Signed-off-by: Will Deacon <will.deacon@arm.com>
2012-10-09ARM: pmu: remove unused reservation mechanismWill Deacon
The PMU reservation mechanism was originally intended to allow OProfile and perf-events to co-ordinate over access to the CPU PMU. Since then, OProfile for ARM has moved to using perf as its backend, so the reservation code is no longer used. This patch removes the reservation code for the CPU PMU on ARM. Signed-off-by: Will Deacon <will.deacon@arm.com>
2012-10-09ARM: perf: add devicetree bindings for 11MPcore, A5, A7 and A15 PMUsWill Deacon
This patch adds separate devicetree bindings for 11MPcore and Cortex-{A5,A7,A15} PMUs in preparation for improved devicetree parsing in the ARM perf-event CPU PMU driver. Cc: Grant Likely <grant.likely@secretlab.ca> Acked-by: Rob Herring <rob.herring@calxeda.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2012-10-09ARM: PMU: Add runtime PM SupportJon Hunter
Add runtime PM support to the ARM PMU driver so that devices such as OMAP supporting dynamic PM can use the platform->runtime_* hooks to initialise hardware at runtime. Without having these runtime PM hooks in place any configuration of the PMU hardware would be lost when low power states are entered and hence would prevent PMU from working. This change also replaces the PMU platform functions enable_irq and disable_irq added by Ming Lei with runtime_resume and runtime_suspend funtions. Ming had added the enable_irq and disable_irq functions as a method to configure the cross trigger interface on OMAP4 for routing the PMU interrupts. By adding runtime PM support, we can move the code called by enable_irq and disable_irq into the runtime PM callbacks runtime_resume and runtime_suspend. Cc: Ming Lei <ming.lei@canonical.com> Cc: Benoit Cousson <b-cousson@ti.com> Cc: Paul Walmsley <paul@pwsan.com> Cc: Kevin Hilman <khilman@ti.com> Signed-off-by: Jon Hunter <jon-hunter@ti.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2012-10-08sched: SCHED_HMP multi-domain task migration controlMorten Rasmussen
We need a way to prevent tasks that are migrating up and down the hmp_domains from migrating straight on through before the load has adapted to the new compute capacity of the CPU on the new hmp_domain. This patch adds a next up/down migration delay that prevents the task from doing another migration in the same direction until the delay has expired. Signed-off-by: Morten Rasmussen <Morten.Rasmussen@arm.com>
2012-10-08sched: Add HMP task migration ftrace eventMorten Rasmussen
Adds ftrace event for tracing task migrations using HMP optimized scheduling. Signed-off-by: Morten Rasmussen <Morten.Rasmussen@arm.com>
2012-10-08sched: Add ftrace events for entity load-trackingMorten Rasmussen
Adds ftrace events for key variables related to the entity load-tracking to help debugging scheduler behaviour. Allows tracing of load contribution and runqueue residency ratio for both entities and runqueues as well as entity CPU usage ratio. Signed-off-by: Morten Rasmussen <Morten.Rasmussen@arm.com>
2012-10-08ARM: sched: Setup SCHED_HMP domainsMorten Rasmussen
SCHED_HMP requires the different cpu types to be represented by an ordered list of hmp_domains. Each hmp_domain represents all cpus of a particular type using a cpumask. The list is platform specific and therefore must be generated by platform code by implementing arch_get_hmp_domains(). Signed-off-by: Morten Rasmussen <Morten.Rasmussen@arm.com>
2012-10-08ARM: sched: Use device-tree to provide fast/slow CPU list for HMPMorten Rasmussen
We can't rely on Kconfig options to set the fast and slow CPU lists for HMP scheduling if we want a single kernel binary to support multiple devices with different CPU topology. E.g. TC2 (ARM's Test-Chip-2 big.LITTLE system), Fast Models, or even non big.LITTLE devices. This patch adds the function arch_get_fast_and_slow_cpus() to generate the lists at run-time by parsing the CPU nodes in device-tree; it assumes slow cores are A7s and everything else is fast. The function still supports the old Kconfig options as this is useful for testing the HMP scheduler on devices without big.LITTLE. This patch is reuse of a patch by Jon Medhurst <tixy@linaro.org> with a few bits left out. Signed-off-by: Morten Rasmussen <Morten.Rasmussen@arm.com>
2012-10-08ARM: Add HMP scheduling support for ARM architectureMorten Rasmussen
Adds Kconfig entries to enable HMP scheduling on ARM platforms. Currently, it disables CPU level sched_domain load-balacing in order to simplify things. This needs fixing in a later revision. HMP scheduling will do the load-balancing at this level instead. Signed-off-by: Morten Rasmussen <Morten.Rasmussen@arm.com>
2012-10-08sched: Introduce priority-based task migration filterMorten Rasmussen
Introduces a priority threshold which prevents low priority task from migrating to faster hmp_domains (cpus). This is useful for user-space software which assigns lower task priority to background task. Signed-off-by: Morten Rasmussen <Morten.Rasmussen@arm.com>
2012-10-08sched: Forced task migration on heterogeneous systemsMorten Rasmussen
This patch introduces forced task migration for moving suitable currently running tasks between hmp_domains. Task behaviour is likely to change over time. Tasks running in a less capable hmp_domain may change to become more demanding and should therefore be migrated up. They are unlikely go through the select_task_rq_fair() path anytime soon and therefore need special attention. This patch introduces a period check (SCHED_TICK) of the currently running task on all runqueues and sets up a forced migration using stop_machine_no_wait() if the task needs to be migrated. Ideally, this should not be implemented by polling all runqueues. Signed-off-by: Morten Rasmussen <Morten.Rasmussen@arm.com>
2012-10-08sched: Task placement for heterogeneous systems based on task load-trackingMorten Rasmussen
This patch introduces the basic SCHED_HMP infrastructure. Each class of cpus is represented by a hmp_domain and tasks will only be moved between these domains when their load profiles suggest it is beneficial. SCHED_HMP relies heavily on the task load-tracking introduced in Paul Turners fair group scheduling patch set: <https://lkml.org/lkml/2012/8/23/267> SCHED_HMP requires that the platform implements arch_get_hmp_domains() which should set up the platform specific list of hmp_domains. It is also assumed that the platform disables SD_LOAD_BALANCE for the appropriate sched_domains. Tasks placement takes place every time a task is to be inserted into a runqueue based on its load history. The task placement decision is based on load thresholds. There are no restrictions on the number of hmp_domains, however, multiple (>2) has not been tested and the up/down migration policy is rather simple. Signed-off-by: Morten Rasmussen <Morten.Rasmussen@arm.com>
2012-10-08sched: entity load-tracking load_avg_ratioMorten Rasmussen
This patch adds load_avg_ratio to each task. The load_avg_ratio is a variant of load_avg_contrib which is not scaled by the task priority. It is calculated like this: runnable_avg_sum * NICE_0_LOAD / (runnable_avg_period + 1). Signed-off-by: Morten Rasmussen <Morten.Rasmussen@arm.com>
2012-10-08sched: introduce temporary FAIR_GROUP_SCHED dependency for load-trackingPaul Turner
While per-entity load-tracking is generally useful, beyond computing shares distribution, e.g. runnable based load-balance (in progress), governors, power-management, etc These facilities are not yet consumers of this data. This may be trivially reverted when the information is required; but avoid paying the overhead for calculations we will not use until then. Signed-off-by: Paul Turner <pjt@google.com> Reviewed-by: Ben Segall <bsegall@google.com>
2012-10-08sched: implement usage trackingPaul Turner
With the frame-work for runnable tracking now fully in place. Per-entity usage tracking is a simple and low-overhead addition. Signed-off-by: Paul Turner <pjt@google.com> Reviewed-by: Ben Segall <bsegall@google.com>
2012-10-08sched: make __update_entity_runnable_avg() fastPaul Turner
__update_entity_runnable_avg forms the core of maintaining an entity's runnable load average. In this function we charge the accumulated run-time since last update and handle appropriate decay. In some cases, e.g. a waking task, this time interval may be much larger than our period unit. Fortunately we can exploit some properties of our series to perform decay for a blocked update in constant time and account the contribution for a running update in essentially-constant* time. [*]: For any running entity they should be performing updates at the tick which gives us a soft limit of 1 jiffy between updates, and we can compute up to a 32 jiffy update in a single pass. C program to generate the above: #include <math.h> #include <stdio.h> #define N 32 #define WMULT_SHIFT 32 const long WMULT_CONST = ((1UL << N) - 1); double y; long runnable_avg_yN_inv[N]; void calc_mult_inv() { int i; double yn = 0; printf("inverses\n"); for (i = 0; i < N; i++) { yn = (double)WMULT_CONST * pow(y, i); runnable_avg_yN_inv[i] = yn; printf("%2d: 0x%8lx\n", i, runnable_avg_yN_inv[i]); } printf("\n"); } long mult_inv(long c, int n) { return (c * runnable_avg_yN_inv[n]) >> WMULT_SHIFT; } void calc_yn_sum(int n) { int i; double sum = 0, sum_fl = 0, diff = 0; /* * We take the floored sum to ensure the sum of partial sums is never * larger than the actual sum. */ printf("sum y^n\n"); printf(" %8s %8s %8s\n", "exact", "floor", "error"); for (i = 1; i <= n; i++) { sum = (y * sum + y * 1024); sum_fl = floor(y * sum_fl+ y * 1024); printf("%2d: %8.0f %8.0f %8.0f\n", i, sum, sum_fl, sum_fl - sum); } printf("\n"); } void calc_conv(long n) { long old_n; int i = -1; printf("convergence (LOAD_AVG_MAX, LOAD_AVG_MAX_N)\n"); do { old_n = n; n = mult_inv(n, 1) + 1024; i++; } while (n != old_n); printf("%d> %ld\n", i - 1, n); printf("\n"); } void main() { y = pow(0.5, 1/(double)N); calc_mult_inv(); calc_conv(1024); calc_yn_sum(N); } [ Compile with -lm ] Signed-off-by: Paul Turner <pjt@google.com> Reviewed-by: Ben Segall <bsegall@google.com>
2012-10-08sched: update_cfs_shares at period edgePaul Turner
Now that our measurement intervals are small (~1ms) we can amortize the posting of update_shares() to be about each period overflow. This is a large cost saving for frequently switching tasks. Signed-off-by: Paul Turner <pjt@google.com> Reviewed-by: Ben Segall <bsegall@google.com>
2012-10-08sched: refactor update_shares_cpu() -> update_blocked_avgs()Paul Turner
Now that running entities maintain their own load-averages the work we must do in update_shares() is largely restricted to the periodic decay of blocked entities. This allows us to be a little less pessimistic regarding our occupancy on rq->lock and the associated rq->clock updates required. Signed-off-by: Paul Turner <pjt@google.com> Reviewed-by: Ben Segall <bsegall@google.com>
2012-10-08sched: replace update_shares weight distribution with per-entity computationPaul Turner
Now that the machinery in place is in place to compute contributed load in a bottom up fashion; replace the shares distribution code within update_shares() accordingly. Signed-off-by: Paul Turner <pjt@google.com> Reviewed-by: Ben Segall <bsegall@google.com>
2012-10-08sched: maintain runnable averages across throttled periodsPaul Turner
With bandwidth control tracked entities may cease execution according to user specified bandwidth limits. Charging this time as either throttled or blocked however, is incorrect and would falsely skew in either direction. What we actually want is for any throttled periods to be "invisible" to load-tracking as they are removed from the system for that interval and contribute normally otherwise. Do this by moderating the progression of time to omit any periods in which the entity belonged to a throttled hierarchy. Signed-off-by: Paul Turner <pjt@google.com> Reviewed-by: Ben Segall <bsegall@google.com>
2012-10-08sched: normalize tg load contributions against runnable timePaul Turner
Entities of equal weight should receive equitable distribution of cpu time. This is challenging in the case of a task_group's shares as execution may be occurring on multiple cpus simultaneously. To handle this we divide up the shares into weights proportionate with the load on each cfs_rq. This does not however, account for the fact that the sum of the parts may be less than one cpu and so we need to normalize: load(tg) = min(runnable_avg(tg), 1) * tg->shares Where runnable_avg is the aggregate time in which the task_group had runnable children. Signed-off-by: Paul Turner <pjt@google.com> Reviewed-by: Ben Segall <bsegall@google.com>.
2012-10-08sched: compute load contribution by a group entityPaul Turner
Unlike task entities who have a fixed weight, group entities instead own a fraction of their parenting task_group's shares as their contributed weight. Compute this fraction so that we can correctly account hierarchies and shared entity nodes. Signed-off-by: Paul Turner <pjt@google.com> Reviewed-by: Ben Segall <bsegall@google.com>