Optimization/kernel#2
Open
anvol wants to merge 43 commits into
Open
Conversation
Signed-off-by: flar2 <asegaert@gmail.com>
the kernel's memcpy and memmove is very inefficient. But the glibc version is quite fast, in some cases it is 10 times faster than the kernel version. So I introduce some memory copy macros and functions of the glibc to improve the kernel version's performance. The strategy of the memory functions is: 1. Copy bytes until the destination pointer is aligned. 2. Copy words in unrolled loops. If the source and destination are not aligned in the same way, use word memory operations, but shift and merge two read words before writing. 3. Copy the few remaining bytes. Signed-off-by: Miao Xie <miaox*******> Conflicts: lib/Makefile Signed-off-by: flar2 <asegaert@gmail.com>
the performance of memcpy and memmove of the general version is very inefficient, this patch improved them. Signed-off-by: Miao Xie <miaox*******> Conflicts: lib/string.c Signed-off-by: flar2 <asegaert@gmail.com>
Optimize the current version of the shift-and-subtract (hardware)
algorithm, described by John von Newmann[1] and Guy L Steele.
Iterating 1,000,000 times, perf shows for the current version:
Performance counter stats for './sqrt-curr' (10 runs):
27.170996 task-clock # 0.979 CPUs utilized ( +- 3.19% )
3 context-switches # 0.103 K/sec ( +- 4.76% )
0 cpu-migrations # 0.004 K/sec ( +-100.00% )
104 page-faults # 0.004 M/sec ( +- 0.16% )
64,921,199 cycles # 2.389 GHz ( +- 0.03% )
28,967,789 stalled-cycles-frontend # 44.62% frontend cycles idle ( +- 0.18% )
<not supported> stalled-cycles-backend
104,502,623 instructions # 1.61 insns per cycle
# 0.28 stalled cycles per insn ( +- 0.00% )
34,088,368 branches # 1254.587 M/sec ( +- 0.00% )
4,901 branch-misses # 0.01% of all branches ( +- 1.32% )
0.027763015 seconds time elapsed ( +- 3.22% )
And for the new version:
Performance counter stats for './sqrt-new' (10 runs):
0.496869 task-clock # 0.519 CPUs utilized ( +- 2.38% )
0 context-switches # 0.000 K/sec
0 cpu-migrations # 0.403 K/sec ( +-100.00% )
104 page-faults # 0.209 M/sec ( +- 0.15% )
590,760 cycles # 1.189 GHz ( +- 2.35% )
395,053 stalled-cycles-frontend # 66.87% frontend cycles idle ( +- 3.67% )
<not supported> stalled-cycles-backend
398,963 instructions # 0.68 insns per cycle
# 0.99 stalled cycles per insn ( +- 0.39% )
70,228 branches # 141.341 M/sec ( +- 0.36% )
3,364 branch-misses # 4.79% of all branches ( +- 5.45% )
0.000957440 seconds time elapsed ( +- 2.42% )
Furthermore, this saves space in instruction text:
text data bss dec hex filename
111 0 0 111 6f lib/int_sqrt-baseline.o
89 0 0 89 59 lib/int_sqrt.o
[1] http://en.wikipedia.org/wiki/First_Draft_of_a_Report_on_the_EDVAC
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Reviewed-by: Jonathan Gonzalez <jgonzlez@linets.cl>
Tested-by: Jonathan Gonzalez <jgonzlez@linets.cl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: flar2 <asegaert@gmail.com>
Android goes through suspend/resume very often (every few seconds when on a busy wifi network with the screen off), and a significant portion of the energy used to go in and out of suspend is spent in the freezer. If a task has called freezer_do_not_count(), don't bother waking it up. If it happens to wake up later it will call freezer_count() and immediately enter the refrigerator. Combined with patches to convert freezable helpers to use freezer_do_not_count() and convert common sites where idle userspace tasks are blocked to use the freezable helpers, this reduces the time and energy required to suspend and resume. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Colin Cross <ccross@android.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: flar2 <asegaert@gmail.com>
Freezing tasks will wake up almost every userspace task from where it is blocking and force it to run until it hits a call to try_to_sleep(), generally on the exit path from the syscall it is blocking in. On resume each task will run again, usually restarting the syscall and running until it hits the same blocking call as it was originally blocked in. Convert the existing wait_event_freezable* wrappers to use freezer_do_not_count(). Combined with a previous patch, these tasks will not run during suspend or resume unless they wake up for another reason, in which case they will run until they hit the try_to_freeze() in freezer_count(), and then continue processing the wakeup after tasks are thawed. This results in a small change in behavior, previously a race between freezing and a normal wakeup would be won by the wakeup, now the task will freeze and then handle the wakeup after thawing. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Colin Cross <ccross@android.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: flar2 <asegaert@gmail.com>
Date Wed, 1 May 2013 18:35:02 -0700 Avoid waking up every thread sleeping in a binder call during suspend and resume by calling a freezable blocking call. Previous patches modified the freezer to avoid sending wakeups to threads that are blocked in freezable blocking calls. This call was selected to be converted to a freezable call because it doesn't hold any locks or release any resources when interrupted that might be needed by another freezing task or a kernel driver during suspend, and is a common site where idle userspace tasks are blocked. Signed-off-by: Colin Cross <ccross@android.com> Signed-off-by: flar2 <asegaert@gmail.com>
Date Wed, 1 May 2013 18:35:03 -0700 Avoid waking up every thread sleeping in an epoll_wait call during suspend and resume by calling a freezable blocking call. Previous patches modified the freezer to avoid sending wakeups to threads that are blocked in freezable blocking calls. This call was selected to be converted to a freezable call because it doesn't hold any locks or release any resources when interrupted that might be needed by another freezing task or a kernel driver during suspend, and is a common site where idle userspace tasks are blocked. Signed-off-by: Colin Cross <ccross@android.com> Signed-off-by: flar2 <asegaert@gmail.com>
Date Wed, 1 May 2013 18:35:04 -0700 Avoid waking up every thread sleeping in a select call during suspend and resume by calling a freezable blocking call. Previous patches modified the freezer to avoid sending wakeups to threads that are blocked in freezable blocking calls. This call was selected to be converted to a freezable call because it doesn't hold any locks or release any resources when interrupted that might be needed by another freezing task or a kernel driver during suspend, and is a common site where idle userspace tasks are blocked. Signed-off-by: Colin Cross <ccross@android.com> Signed-off-by: flar2 <asegaert@gmail.com>
Date Wed, 1 May 2013 18:35:05 -0700 Avoid waking up every thread sleeping in a futex_wait call during suspend and resume by calling a freezable blocking call. Previous patches modified the freezer to avoid sending wakeups to threads that are blocked in freezable blocking calls. This call was selected to be converted to a freezable call because it doesn't hold any locks or release any resources when interrupted that might be needed by another freezing task or a kernel driver during suspend, and is a common site where idle userspace tasks are blocked. Signed-off-by: Colin Cross <ccross@android.com> Signed-off-by: flar2 <asegaert@gmail.com>
Date Wed, 1 May 2013 18:35:06 -0700 Avoid waking up every thread sleeping in a nanosleep call during suspend and resume by calling a freezable blocking call. Previous patches modified the freezer to avoid sending wakeups to threads that are blocked in freezable blocking calls. This call was selected to be converted to a freezable call because it doesn't hold any locks or release any resources when interrupted that might be needed by another freezing task or a kernel driver during suspend, and is a common site where idle userspace tasks are blocked. Signed-off-by: Colin Cross <ccross@android.com> Signed-off-by: flar2 <asegaert@gmail.com>
Date Wed, 1 May 2013 18:35:07 -0700 Avoid waking up every thread sleeping in a sigtimedwait call during suspend and resume by calling a freezable blocking call. Previous patches modified the freezer to avoid sending wakeups to threads that are blocked in freezable blocking calls. This call was selected to be converted to a freezable call because it doesn't hold any locks or release any resources when interrupted that might be needed by another freezing task or a kernel driver during suspend, and is a common site where idle userspace tasks are blocked. Signed-off-by: Colin Cross <ccross@android.com> Signed-off-by: flar2 <asegaert@gmail.com>
Date Wed, 1 May 2013 18:35:08 -0700 Avoid waking up every thread sleeping in read call on an AF_UNIX socket during suspend and resume by calling a freezable blocking call. Previous patches modified the freezer to avoid sending wakeups to threads that are blocked in freezable blocking calls. This call was selected to be converted to a freezable call because it doesn't hold any locks or release any resources when interrupted that might be needed by another freezing task or a kernel driver during suspend, and is a common site where idle userspace tasks are blocked. Signed-off-by: Colin Cross <ccross@android.com> Signed-off-by: flar2 <asegaert@gmail.com>
If we take the 2nd retry path in ext4_expand_extra_isize_ea, we potentionally return from the function without having freed these allocations. If we don't do the return, we over-write the previous allocation pointers, so we leak either way. Spotted with Coverity. [ Fixed by tytso to set is and bs to NULL after freeing these pointers, in case in the retry loop we later end up triggering an error causing a jump to cleanup, at which point we could have a double free bug. -- Ted ] Change-Id: I49b8ca41a6c6d44b563eb23306870258a3affd3b Signed-off-by: Dave Jones <davej@fedoraproject.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Cc: stable@vger.kernel.org Git-commit: 6e4ea8e33b2057b85d75175dd89b93f5e26de3bc Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Signed-off-by: Osvaldo Banuelos <osvaldob@codeaurora.org> Signed-off-by: flar2 <asegaert@gmail.com>
One testbox of mine (Intel Nehalem, 16-way) uses MWAIT for its idle routine, which apparently can break out of its idle loop rather frequently, with high frequency. In that case NO_HZ_FULL=y kernels show high ksoftirqd overhead and constant context switching, because tick_nohz_stop_sched_tick() will, if delta_jiffies == 0, mis-identify this as a timer event - activating the TIMER_SOFTIRQ, which wakes up ksoftirqd. Fix this by treating delta_jiffies == 0 the same way we treat other short wakeups, delta_jiffies == 1.
Signed-off-by: Lens-F <tnoah12@gmail.com> Signed-off-by: flar2 <asegaert@gmail.com>
Signed-off-by: flar2 <asegaert@gmail.com>
Signed-off-by: flar2 <asegaert@gmail.com>
Signed-off-by: flar2 <asegaert@gmail.com>
Signed-off-by: flar2 <asegaert@gmail.com>
Signed-off-by: flar2 <asegaert@gmail.com>
Dave Jones trinity syscall fuzzer exposed an issue in the deadlock detection code of rtmutex: http://lkml.kernel.org/r/20140429151655.GA14277@...hat.com That underlying issue has been fixed with a patch to the rtmutex code, but the futex code must not call into rtmutex in that case because - it can detect that issue early - it avoids a different and more complex fixup for backing out If the user space variable got manipulated to 0x80000000 which means no lock holder, but the waiters bit set and an active pi_state in the kernel is found we can figure out the recursive locking issue by looking at the pi_state owner. If that is the current task, then we can safely return -EDEADLK. The check should have been added in commit 59fa62451 (futex: Handle futex_pi OWNER_DIED take over correctly) already, but I did not see the above issue caused by user space manipulation back then. Signed-off-by: Thomas Gleixner <tglx@...utronix.de> Cc: Dave Jones <davej@...hat.com> Cc: Linus Torvalds <torvalds@...ux-foundation.org> Cc: Peter Zijlstra <peterz@...radead.org> Cc: Darren Hart <darren@...art.com> Cc: Davidlohr Bueso <davidlohr@...com> Cc: Steven Rostedt <rostedt@...dmis.org> Cc: Clark Williams <williams@...hat.com> Cc: Paul McKenney <paulmck@...ux.vnet.ibm.com> Cc: Lai Jiangshan <laijs@...fujitsu.com> Cc: Roland McGrath <roland@...k.frob.com> Cc: Carlos ODonell <carlos@...hat.com> Cc: Jakub Jelinek <jakub@...hat.com> Cc: Michael Kerrisk <mtk.manpages@...il.com> Cc: Sebastian Andrzej Siewior <bigeasy@...utronix.de> Link: http://lkml.kernel.org/r/20140512201701.097349971@...utronix.de Signed-off-by: Thomas Gleixner <tglx@...utronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@...uxfoundation.org> Change-Id: I245f77622443435cf36bbd157b98798c235acdb0 Signed-off-by: flar2 <asegaert@gmail.com>
…addr2 in futex_requeue(..., requeue_pi=1) If uaddr == uaddr2, then we have broken the rule of only requeueing from a non-pi futex to a pi futex with this call. If we attempt this, then dangling pointers may be left for rt_waiter resulting in an exploitable condition. This change brings futex_requeue() into line with futex_wait_requeue_pi() which performs the same check as per commit 6f7b0a2a5 (futex: Forbid uaddr == uaddr2 in futex_wait_requeue_pi()) [ tglx: Compare the resulting keys as well, as uaddrs might be different depending on the mapping ] Fixes CVE-2014-3153. Reported-by: Pinkie Pie Signed-off-by: Will Drewry <wad@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Change-Id: Iea1dfe1d877d63a374db2623dca6cea4e7a544b9 Signed-off-by: flar2 <asegaert@gmail.com>
We need to protect the atomic acquisition in the kernel against rogue user space which sets the user space futex to 0, so the kernel side acquisition succeeds while there is existing state in the kernel associated to the real owner. Verify whether the futex has waiters associated with kernel state. If it has, return -EINVAL. The state is corrupted already, so no point in cleaning it up. Subsequent calls will fail as well. Not our problem. [ tglx: Use futex_top_waiter() and explain why we do not need to try restoring the already corrupted user space state. ] Signed-off-by: Darren Hart <dvhart@linux.intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: Will Drewry <wad@chromium.org> Cc: stable@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Change-Id: I6c19ae4a5e85fbc9c560723e672d5136c4786ab6 Signed-off-by: flar2 <asegaert@gmail.com>
If the owner died bit is set at futex_unlock_pi, we currently do not cleanup the user space futex. So the owner TID of the current owner (the unlocker) persists. That's observable inconsistant state, especially when the ownership of the pi state got transferred. Clean it up unconditionally. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Kees Cook <keescook@chromium.org> Cc: Will Drewry <wad@chromium.org> Cc: Darren Hart <dvhart@linux.intel.com> Cc: stable@vger.kernel.org Change-Id: Ic84473a62bdf3e5702f46e04b266afa38e9ec40e Signed-off-by: flar2 <asegaert@gmail.com>
The current implementation of lookup_pi_state has ambigous handling of
the TID value 0 in the user space futex. We can get into the kernel
even if the TID value is 0, because either there is a stale waiters
bit or the owner died bit is set or we are called from the requeue_pi
path or from user space just for fun.
The current code avoids an explicit sanity check for pid = 0 in case
that kernel internal state (waiters) are found for the user space
address. This can lead to state leakage and worse under some
circumstances.
Handle the cases explicit:
Waiter | pi_state | pi->owner | uTID | uODIED | ?
[1] NULL | --- | --- | 0 | 0/1 | Valid
[2] NULL | --- | --- | >0 | 0/1 | Valid
[3] Found | NULL | -- | Any | 0/1 | Invalid
[4] Found | Found | NULL | 0 | 1 | Valid
[5] Found | Found | NULL | >0 | 1 | Invalid
[6] Found | Found | task | 0 | 1 | Valid
[7] Found | Found | NULL | Any | 0 | Invalid
[8] Found | Found | task | ==taskTID | 0/1 | Valid
[9] Found | Found | task | 0 | 0 | Invalid
[10] Found | Found | task | !=taskTID | 0/1 | Invalid
[1] Indicates that the kernel can acquire the futex atomically. We
came came here due to a stale FUTEX_WAITERS/FUTEX_OWNER_DIED bit.
[2] Valid, if TID does not belong to a kernel thread. If no matching
thread is found then it indicates that the owner TID has died.
[3] Invalid. The waiter is queued on a non PI futex
[4] Valid state after exit_robust_list(), which sets the user space
value to FUTEX_WAITERS | FUTEX_OWNER_DIED.
[5] The user space value got manipulated between exit_robust_list()
and exit_pi_state_list()
[6] Valid state after exit_pi_state_list() which sets the new owner in
the pi_state but cannot access the user space value.
[7] pi_state->owner can only be NULL when the OWNER_DIED bit is set.
[8] Owner and user space value match
[9] There is no transient state which sets the user space TID to 0
except exit_robust_list(), but this is indicated by the
FUTEX_OWNER_DIED bit. See [4]
[10] There is no transient state which leaves owner and user space
TID out of sync.
Backport to 3.13
conflicts: kernel/futex.c
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Will Drewry <wad@chromium.org>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: stable@vger.kernel.org
Change-Id: I967e3bb07b239941ea51af6b09d61d7b08138f40
Signed-off-by: flar2 <asegaert@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Various kernel patches & hacks from mainstream