path: root/init.c
AgeCommit message (Collapse)Author
2010-05-13Allow MAP_NORESERVE to be used for mappingsMel Gorman
Since 2.6.27-rc1, the kernel makes reservations for mappings at mmap() time. This guarantees that the process that successfully calls mmap() will successfully fault all pages within that region. This is nice reliable behaviour but it can be the case the program wants to create a very large sparse mapping. In this case, mmap() will fail even if the program knows the huge pages are available. This patch introduces a --no-reserve switch that uses MAP_NORESERVE. mmap() will always succeed but the fault might not. Unfortunately, on older kernels, use of MAP_NORESERVE can trigger the OOM killer. Hence, this patch also checks the kernel version and only allows use of MAP_NORESERVE if it's safe to do so. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Eric B Munson <ebmunson@us.ibm.com> Signed-off-by: Eric B Munson <ebmunson@us.ibm.com>
2009-05-13Move check for private reservations out of morecore setupEric B Munson
Prefaulting should be disabled when the kernel supports private reservations. Currently this check is part of the morecore_setup function and is only made if HUGETLB_MORECORE is 'yes'. This is a problem for users of get_hugepage_region because prefaulting will slow allocation down significantly and screw with NUMA placement policies where we want to do demand faulting. This patch moves the check for private reservations into s separate function that is called during hugetlb_setup. Signed-off-by: Eric B Munson <ebmunson@us.ibm.com> Acked-by: Mel Gorman <mel@csn.ul.ie>
2009-01-08Create environment parse function and value storage structEric B Munson
hugetlb_setup_env will be called first during init. All environment variable parsing should happen in this function with values being saved for lateer use. Signed-off-by: Eric B Munson <ebmunson@us.ibm.com>
2008-10-21move to a new library local idiomAndy Whitcroft
Currently we have three types of function: file local -- marked static in the normal way, library local -- external but prefixed with __lh_ library exported -- external and listed in the library.lds file While the library prefix works, it does not allow functions to trivially move from file local to library local as all references to the function have to be modified to the new name. This patch introduces a new idiom. When a function is intended to be library local it is already necessary to declare that function in the libhugetlbfs_internal.h, if we also add a single define for that function adding the __lh_ prefix (as below) then all other references including the original definition may use the original name unchanged but the function will remain unexported: #define hpool_sizes __lh_hpool_sizes extern int hpool_sizes(struct hpage_pool *, int); This patch converts all current library local functions to this new idiom. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Acked-by: Adam Litke <agl@us.ibm.com> Acked-by: Mel Gorman <mel@csn.ul.ie>
2008-09-04Merge branch 'work/kernel-versions' into work/multiple-huge-page-sizesAdam Litke
2008-09-04Merge branch 'work/multiple-mounts' into work/multiple-huge-page-sizesAdam Litke
2008-08-14Basic support for multiple hugetlbfs mount pointsAdam Litke
An upcoming release of the Linux kernel will support simultaneous use of multiple huge page sizes. Each size will be accessed through its own specially-mounted hugetlbfs filesystem. The first step in enabling libhugetlbfs to support multiple simultaneous page sizes is enabling the support of multiple simultaneous hugetlbfs mount points. This patch adds basic support for multiple mount points while preserving backwards-compatibility. Mount points can be added via the HUGETLB_PATH environment variable which has been extended in the normal way to allow multiple paths to be specified (using a colon separator). Mounts will also be discovered by reading /proc/mounts or /etc/mtab. Up to 10 mount points are allowed to co-exist but only one mount per page size is allowed. If HUGETLB_PATH is specified, only mount points listed in that variable will be added. Otherwise, paths in /proc/mounts or /etc/mtab will be added in order of appearance. The first mount point of a given size is used and subsequent mounts of that page size are skipped. For compatibility and ease of use, a default mount point is selected. When multiple mount points have been added, /proc/meminfo is read to determine the system's default huge page size and the mount point having that size is selected as the default. If a mount point for the default page size cannot be found, the first mount point found becomes the default. The gethugepagesize() call has been modified to return the default huge page size as determined the method just described. Signed-off-by: Adam Litke <agl@us.ibm.com>
2008-08-14Be specific about what local symbols should not be exportedMel Gorman
To override shmget(), it is necessary for the symbol to be unversioned. However, all unversioned symbols are given local scope to avoid internal functions being called accidently. This patch marks the internal-only export functions clearly with the prefix __lh_ and then versions them to be only of local scope. Two unused functions are simply deleted. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Adam Litke <agl@us.ibm.com> Signed-off-by: Eric Munson <ebmunson@us.ibm.com>
2008-08-05[RFC] Use the kernel version number to identify kernel functionality V2Adam Litke
Historically, libhugetlbs has relied on kernel features that either: have been known to exist in all supported kernel versions, or are easily detected. As of kernel version 2.6.27-rc1, a new crucial feature has been added that is not possible to reliably detect. Huge page mappings created with the MAP_PRIVATE flag will have huge pages reserved up-front. With private reservations in effect, it is safe to allow demand-faulting of the HUGETLB_MORECORE heap which can lead to dramatic performance improvements on NUMA systems. This is only safe behavior in the presence of private reservations. The only way to identify that a kernel has private reservations support is to examine the kernel version to see if it is more recent than when the feature appeared. I am well aware of the drawbacks of using the kernel version to affect library behavior but I don't see any alternative. I would suggest that the kernel version should be used only in cases when there is no alternative. How it works ============ Kernels are assumed to have a mandatory base version x.y.z (eg. 2.6.17) and one optional modifier: a post version (stable tree x.y.z.q) or a pre version (x.y.z-{preN|rcN}). All other version appendices (such as -mmN) are ignored. The following ordering rules apply: x.y.z-rc(N) < x.y.z-rc(N+1) < x.y.z < x.y.z.(N) < x.y.z.(N+1) When libhugetlbfs initializes, the running kernel version is probed using uname. A list of feature definitions is scanned and those with a minimum kernel version have that version compared to the runninng kernel. If the running kernel is found to be equal to or greater than the minimum required kernel version, a bit in a feature mask is set to indicate the presence of the feature. A feature can be later checked for by using a simple function that checks the bitmask. Changes since V1 (Thanks Andy Whitcroft and Mel Gorman): - Fixed feature_mask handling - Readability improvements
2008-04-15Skip elflink calls in setup_libhugetlbfs on IA64/sparc64Eric B Munson
Building on IA64 and sparc64 currently fails because elflink is not supported. This patch sets up a NO_ELFLINK define in the appropriate sections of the Makefile and a check in setup_libhugetlbfs that will skip the elflink calls on IA64 and sparc64. Signed-off-by: Eric Munson <ebmunson@us.ibm.com> Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
2008-02-28libhugetlbfs: consolidate to one constructorNishanth Aravamudan
Use one constructor to control the constructor order for libhugetlbfs. Currently, the constructors are run in the order their containing object files are linked in to libhugetlbfs.so. This is fragile as new features are added. Instead, have one constructor that calls the others (which are now no longer actually constructors). Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Acked-by: Andrew Hastings <abh@cray.com> Acked-by: David Gibson <david@gibson.dropbear.id.au>