6 years agoLinux 3.14.17 v3.14.17
Greg Kroah-Hartman [Thu, 14 Aug 2014 01:38:34 +0000 (09:38 +0800)]
Linux 3.14.17

6 years agoxfs: log vector rounding leaks log space
Dave Chinner [Mon, 19 May 2014 22:18:09 +0000 (08:18 +1000)]
xfs: log vector rounding leaks log space

commit 110dc24ad2ae4e9b94b08632fe1eb2fcdff83045 upstream.

The addition of direct formatting of log items into the CIL
linear buffer added alignment restrictions that the start of each
vector needed to be 64 bit aligned. Hence padding was added in
xlog_finish_iovec() to round up the vector length to ensure the next
vector started with the correct alignment.

This adds a small number of bytes to the size of
the linear buffer that is otherwise unused. The issue is that we
then use the linear buffer size to determine the log space used by
the log item, and this includes the unused space. Hence when we
account for space used by the log item, it's more than is actually
written into the iclogs, and hence we slowly leak this space.

This results on log hangs when reserving space, with threads getting
stuck with these stack traces:

Call Trace:
[<ffffffff81d15989>] schedule+0x29/0x70
[<ffffffff8150d3a2>] xlog_grant_head_wait+0xa2/0x1a0
[<ffffffff8150d55d>] xlog_grant_head_check+0xbd/0x140
[<ffffffff8150ee33>] xfs_log_reserve+0x103/0x220
[<ffffffff814b7f05>] xfs_trans_reserve+0x2f5/0x310

The 4 bytes is significant. Brain Foster did all the hard work in
tracking down a reproducable leak to inode chunk allocation (it went
away with the ikeep mount option). His rough numbers were that
creating 50,000 inodes leaked 11 log blocks. This turns out to be
roughly 800 inode chunks or 1600 inode cluster buffers. That
works out at roughly 4 bytes per cluster buffer logged, and at that
I started looking for a 4 byte leak in the buffer logging code.

What I found was that a struct xfs_buf_log_format structure for an
inode cluster buffer is 28 bytes in length. This gets rounded up to
32 bytes, but the vector length remains 28 bytes. Hence the CIL
ticket reservation is decremented by 32 bytes (via lv->lv_buf_len)
for that vector rather than 28 bytes which are written into the log.

The fix for this problem is to separately track the bytes used by
the log vectors in the item and use that instead of the buffer
length when accounting for the log space that will be used by the
formatted log item.

Again, thanks to Brian Foster for doing all the hard work and long
hours to isolate this leak and make finding the bug relatively

Signed-off-by: Dave Chinner <>
Reviewed-by: Christoph Hellwig <>
Reviewed-by: Brian Foster <>
Signed-off-by: Dave Chinner <>
Cc: Bill <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agoarch/sparc/math-emu/math_32.c: drop stray break operator
Andrey Utkin [Mon, 4 Aug 2014 20:47:41 +0000 (23:47 +0300)]
arch/sparc/math-emu/math_32.c: drop stray break operator

[ Upstream commit 093758e3daede29cb4ce6aedb111becf9d4bfc57 ]

This commit is a guesswork, but it seems to make sense to drop this
break, as otherwise the following line is never executed and becomes
dead code. And that following line actually saves the result of
local calculation by the pointer given in function argument. So the
proposed change makes sense if this code in the whole makes sense (but I
am unable to analyze it in the whole).

Reported-by: David Binderman <>
Signed-off-by: Andrey Utkin <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agosparc64: ldc_connect() should not return EINVAL when handshake is in progress.
Sowmini Varadhan [Fri, 1 Aug 2014 13:50:40 +0000 (09:50 -0400)]
sparc64: ldc_connect() should not return EINVAL when handshake is in progress.

[ Upstream commit 4ec1b01029b4facb651b8ef70bc20a4be4cebc63 ]

The LDC handshake could have been asynchronously triggered
after ldc_bind() enables the ldc_rx() receive interrupt-handler
(and thus intercepts incoming control packets)
and before vio_port_up() calls ldc_connect(). If that is the case,
ldc_connect() should return 0 and let the state-machine

Signed-off-by: Sowmini Varadhan <>
Acked-by: Karl Volz <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agosunsab: Fix detection of BREAK on sunsab serial console
Christopher Alexander Tobias Schulze [Sun, 3 Aug 2014 14:01:53 +0000 (16:01 +0200)]
sunsab: Fix detection of BREAK on sunsab serial console

[ Upstream commit fe418231b195c205701c0cc550a03f6c9758fd9e ]

Fix detection of BREAK on sunsab serial console: BREAK detection was only
performed when there were also serial characters received simultaneously.
To handle all BREAKs correctly, the check for BREAK and the corresponding
call to uart_handle_break() must also be done if count == 0, therefore
duplicate this code fragment and pull it out of the loop over the received

Patch applies to 3.16-rc6.

Signed-off-by: Christopher Alexander Tobias Schulze <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agobbc-i2c: Fix BBC I2C envctrl on SunBlade 2000
Christopher Alexander Tobias Schulze [Sun, 3 Aug 2014 13:44:52 +0000 (15:44 +0200)]
bbc-i2c: Fix BBC I2C envctrl on SunBlade 2000

[ Upstream commit 5cdceab3d5e02eb69ea0f5d8fa9181800baf6f77 ]

Fix regression in bbc i2c temperature and fan control on some Sun systems
that causes the driver to refuse to load due to the bbc_i2c_bussel resource not
being present on the (second) i2c bus where the temperature sensors and fan
control are located. (The check for the number of resources was removed when
the driver was ported to a pure OF driver in mid 2008.)

Signed-off-by: Christopher Alexander Tobias Schulze <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agosparc64: Guard against flushing openfirmware mappings.
David S. Miller [Tue, 5 Aug 2014 03:07:37 +0000 (20:07 -0700)]
sparc64: Guard against flushing openfirmware mappings.

[ Upstream commit 4ca9a23765da3260058db3431faf5b4efd8cf926 ]

Based almost entirely upon a patch by Christopher Alexander Tobias

In commit db64fe02258f1507e13fe5212a989922323685ce ("mm: rewrite vmap
layer") lazy VMAP tlb flushing was added to the vmalloc layer.  This
causes problems on sparc64.

Sparc64 has two VMAP mapped regions and they are not contiguous with
eachother.  First we have the malloc mapping area, then another
unrelated region, then the vmalloc region.

This "another unrelated region" is where the firmware is mapped.

If the lazy TLB flushing logic in the vmalloc code triggers after
we've had both a module unload and a vfree or similar, it will pass an
address range that goes from somewhere inside the malloc region to
somewhere inside the vmalloc region, and thus covering the
openfirmware area entirely.

The sparc64 kernel learns about openfirmware's dynamic mappings in
this region early in the boot, and then services TLB misses in this
area.  But openfirmware has some locked TLB entries which are not
mentioned in those dynamic mappings and we should thus not disturb

These huge lazy TLB flush ranges causes those openfirmware locked TLB
entries to be removed, resulting in all kinds of problems including
hard hangs and crashes during reboot/reset.

Besides causing problems like this, such huge TLB flush ranges are
also incredibly inefficient.  A plea has been made with the author of
the VMAP lazy TLB flushing code, but for now we'll put a safety guard
into our flush_tlb_kernel_range() implementation.

Since the implementation has become non-trivial, stop defining it as a
macro and instead make it a function in a C source file.

Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agosparc64: Do not insert non-valid PTEs into the TSB hash table.
David S. Miller [Mon, 4 Aug 2014 23:34:01 +0000 (16:34 -0700)]
sparc64: Do not insert non-valid PTEs into the TSB hash table.

[ Upstream commit 18f38132528c3e603c66ea464727b29e9bbcb91b ]

The assumption was that update_mmu_cache() (and the equivalent for PMDs) would
only be called when the PTE being installed will be accessible by the user.

This is not true for code paths originating from remove_migration_pte().

There are dire consequences for placing a non-valid PTE into the TSB.  The TLB
miss frramework assumes thatwhen a TSB entry matches we can just load it into
the TLB and return from the TLB miss trap.

So if a non-valid PTE is in there, we will deadlock taking the TLB miss over
and over, never satisfying the miss.

Just exit early from update_mmu_cache() and friends in this situation.

Based upon a report and patch from Christopher Alexander Tobias Schulze.

Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agosparc64: Add membar to Niagara2 memcpy code.
David S. Miller [Sat, 17 May 2014 18:28:05 +0000 (11:28 -0700)]
sparc64: Add membar to Niagara2 memcpy code.

[ Upstream commit 5aa4ecfd0ddb1e6dcd1c886e6c49677550f581aa ]

This is the prevent previous stores from overlapping the block stores
done by the memcpy loop.

Based upon a glibc patch by Jose E. Marchesi

Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agosparc64: Fix huge TSB mapping on pre-UltraSPARC-III cpus.
David S. Miller [Wed, 7 May 2014 21:07:32 +0000 (14:07 -0700)]
sparc64: Fix huge TSB mapping on pre-UltraSPARC-III cpus.

[ Upstream commit b18eb2d779240631a098626cb6841ee2dd34fda0 ]

Access to the TSB hash tables during TLB misses requires that there be
an atomic 128-bit quad load available so that we fetch a matching TAG
and DATA field at the same time.

On cpus prior to UltraSPARC-III only virtual address based quad loads
are available.  UltraSPARC-III and later provide physical address
based variants which are easier to use.

When we only have virtual address based quad loads available this
means that we have to lock the TSB into the TLB at a fixed virtual
address on each cpu when it runs that process.  We can't just access
the PAGE_OFFSET based aliased mapping of these TSBs because we cannot
take a recursive TLB miss inside of the TLB miss handler without
risking running out of hardware trap levels (some trap combinations
can be deep, such as those generated by register window spill and fill

Without huge pages it's working perfectly fine, but when the huge TSB
got added another chunk of fixed virtual address space was not
allocated for this second TSB mapping.

So we were mapping both the 8K and 4MB TSBs to the same exact virtual
address, causing multiple TLB matches which gives undefined behavior.

Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agosparc64: Don't bark so loudly about 32-bit tasks generating 64-bit fault addresses.
David S. Miller [Wed, 7 May 2014 04:27:37 +0000 (21:27 -0700)]
sparc64: Don't bark so loudly about 32-bit tasks generating 64-bit fault addresses.

[ Upstream commit e5c460f46ae7ee94831cb55cb980f942aa9e5a85 ]

This was found using Dave Jone's trinity tool.

When a user process which is 32-bit performs a load or a store, the
cpu chops off the top 32-bits of the effective address before
translating it.

This is because we run 32-bit tasks with the PSTATE_AM (address
masking) bit set.

We can't run the kernel with that bit set, so when the kernel accesses
userspace no address masking occurs.

Since a 32-bit process will have no mappings in that region we will
properly fault, so we don't try to handle this using access_ok(),
which can safely just be a NOP on sparc64.

Real faults from 32-bit processes should never generate such addresses
so a bug check was added long ago, and it barks in the logs if this

But it also barks when a kernel user access causes this condition, and
that _can_ happen.  For example, if a pointer passed into a system call
is "0xfffffffc" and the kernel access 4 bytes offset from that pointer.

Just handle such faults normally via the exception entries.

Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agosparc64: Give more detailed information in {pgd,pmd}_ERROR() and kill pte_ERROR().
David S. Miller [Tue, 29 Apr 2014 20:28:23 +0000 (13:28 -0700)]
sparc64: Give more detailed information in {pgd,pmd}_ERROR() and kill pte_ERROR().

[ Upstream commit fe866433f843b080246ce729b5e6b27b5f5d9a58 ]

pte_ERROR() is not used anywhere, delete it.

For pgd_ERROR() and pmd_ERROR(), output something similar to x86, giving the address
of the pgd/pmd as well as it's value.

Also provide the caller, since these macros are invoked from pgd_clear_bad() and
pmd_clear_bad() which provides little context as to what high level operation was
occuring when the BAD state was detected.

Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agosparc64: Add basic validations to {pud,pmd}_bad().
David S. Miller [Tue, 29 Apr 2014 20:03:27 +0000 (13:03 -0700)]
sparc64: Add basic validations to {pud,pmd}_bad().

[ Upstream commit 26cf432551d749e7d581db33529507a711c6eaab ]

Instead of returning false we should at least check the most basic
things, otherwise page table corruptions will be very difficult to

PMD and PTE tables are of size PAGE_SIZE, so none of the sub-PAGE_SIZE
bits should be set.

We also complement this with a check that the physical address the
pud/pmd points to is valid memory.

PowerPC was used as a guide while implementating this.

Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agosparc64: Use 'ILOG2_4MB' instead of constant '22'.
David S. Miller [Sun, 4 May 2014 05:52:50 +0000 (22:52 -0700)]
sparc64: Use 'ILOG2_4MB' instead of constant '22'.

[ Upstream commit 0eef331a3d0ee970dcbebd1bd5fcb57ca33ece01 ]

Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agosparc64: Fix range check in kern_addr_valid().
David S. Miller [Tue, 29 Apr 2014 19:58:03 +0000 (12:58 -0700)]
sparc64: Fix range check in kern_addr_valid().

[ Upstream commit ee73887e92a69ae0a5cda21c68ea75a27804c944 ]

In commit b2d438348024b75a1ee8b66b85d77f569a5dfed8 ("sparc64: Make
PAGE_OFFSET variable."), the MAX_PHYS_ADDRESS_BITS value was increased
(to 47).

This constant reference to '41UL' was missed.

Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agosparc64: Fix top-level fault handling bugs.
David S. Miller [Tue, 29 Apr 2014 06:52:11 +0000 (23:52 -0700)]
sparc64: Fix top-level fault handling bugs.

[ Upstream commit 70ffc6ebaead783ac8dafb1e87df0039bb043596 ]

Make get_user_insn() able to cope with huge PMDs.

Next, make do_fault_siginfo() more robust when get_user_insn() can't
actually fetch the instruction.  In particular, use the MMU announced
fault address when that happens, instead of calling
compute_effective_address() and computing garbage.

Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agosparc64: Handle 32-bit tasks properly in compute_effective_address().
David S. Miller [Tue, 29 Apr 2014 06:50:08 +0000 (23:50 -0700)]
sparc64: Handle 32-bit tasks properly in compute_effective_address().

[ Upstream commit d037d16372bbe4d580342bebbb8826821ad9edf0 ]

If we have a 32-bit task we must chop off the top 32-bits of the
64-bit value just as the cpu would.

Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agosparc64: Don't use _PAGE_PRESENT in pte_modify() mask.
David S. Miller [Tue, 29 Apr 2014 02:11:27 +0000 (19:11 -0700)]
sparc64: Don't use _PAGE_PRESENT in pte_modify() mask.

[ Upstream commit eaf85da82669b057f20c4e438dc2566b51a83af6 ]

Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agosparc64: Fix hex values in comment above pte_modify().
David S. Miller [Mon, 28 Apr 2014 04:01:56 +0000 (21:01 -0700)]
sparc64: Fix hex values in comment above pte_modify().

[ Upstream commit c2e4e676adb40ea764af79d3e08be954e14a0f4c ]

When _PAGE_SPECIAL and _PAGE_PMD_HUGE were added to the mask, the
comment was not updated.

Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agosparc64: Fix bugs in get_user_pages_fast() wrt. THP.
David S. Miller [Fri, 25 Apr 2014 17:21:12 +0000 (10:21 -0700)]
sparc64: Fix bugs in get_user_pages_fast() wrt. THP.

[ Upstream commit 04df419de34104d8818b8c5cffaa062fa36d20ea ]

The large PMD path needs to check _PAGE_VALID not _PAGE_PRESENT, to
decide if it needs to bail and return 0.

pmd_large() should therefore just check _PAGE_PMD_HUGE.

Calls to gup_huge_pmd() are guarded with a check of pmd_large(), so we
just need to add a valid bit check.

Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agosparc64: Fix huge PMD invalidation.
David S. Miller [Thu, 24 Apr 2014 20:58:02 +0000 (13:58 -0700)]
sparc64: Fix huge PMD invalidation.

[ Upstream commit 51e5ef1bb7ab0e5fa7de4e802da5ab22fe35f0bf ]

On sparc64 "present" and "valid" are seperate PTE bits, this allows us to
naturally distinguish between the user explicitly asking for PROT_NONE
with mprotect() and other situations.

However we weren't handling this properly in the huge PMD paths.

First of all, the page table walker in the TSB miss path only checks
for _PAGE_PMD_HUGE.  So the generic pmdp_invalidate() would clear
_PAGE_PRESENT but the TLB miss paths would still load it into the TLB
as a valid huge PMD.

Fix this by clearing the valid bit in pmdp_invalidate(), and also
checking the valid bit in USER_PGTABLE_CHECK_PMD_HUGE using "brgez"
since _PAGE_VALID is bit 63 in both the sun4u and sun4v pte layouts.

Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agosparc64: Fix executable bit testing in set_pmd_at() paths.
David S. Miller [Mon, 21 Apr 2014 01:55:01 +0000 (21:55 -0400)]
sparc64: Fix executable bit testing in set_pmd_at() paths.

[ Upstream commit 5b1e94fa439a3227beefad58c28c17f68287a8e9 ]

This code was mistakenly using the exec bit from the PMD in all
cases, even when the PMD isn't a huge PMD.

If it's not a huge PMD, test the exec bit in the individual ptes down
in tlb_batch_pmd_scan().

Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agosparc64: Make itc_sync_lock raw
Kirill Tkhai [Wed, 16 Apr 2014 20:45:24 +0000 (00:45 +0400)]
sparc64: Make itc_sync_lock raw

[ Upstream commit 49b6c01f4c1de3b5e5427ac5aba80f9f6d27837a ]

One more place where we must not be able
to be preempted or to be interrupted in RT.

Always actually disable interrupts during
synchronization cycle.

Signed-off-by: Kirill Tkhai <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agosparc64: Fix argument sign extension for compat_sys_futex().
David S. Miller [Thu, 1 May 2014 02:37:48 +0000 (19:37 -0700)]
sparc64: Fix argument sign extension for compat_sys_futex().

[ Upstream commit aa3449ee9c87d9b7660dd1493248abcc57769e31 ]

Only the second argument, 'op', is signed.

Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agosctp: fix possible seqlock seadlock in sctp_packet_transmit()
Eric Dumazet [Tue, 5 Aug 2014 14:49:52 +0000 (16:49 +0200)]
sctp: fix possible seqlock seadlock in sctp_packet_transmit()

[ Upstream commit 757efd32d5ce31f67193cc0e6a56e4dffcc42fb1 ]

Dave reported following splat, caused by improper use of
IP_INC_STATS_BH() in process context.

BUG: using __this_cpu_add() in preemptible [00000000] code: trinity-c117/14551
caller is __this_cpu_preempt_check+0x13/0x20
CPU: 3 PID: 14551 Comm: trinity-c117 Not tainted 3.16.0+ #33
 ffffffff9ec898f0 0000000047ea7e23 ffff88022d32f7f0 ffffffff9e7ee207
 0000000000000003 ffff88022d32f818 ffffffff9e397eaa ffff88023ee70b40
 ffff88022d32f970 ffff8801c026d580 ffff88022d32f828 ffffffff9e397ee3
Call Trace:
 [<ffffffff9e7ee207>] dump_stack+0x4e/0x7a
 [<ffffffff9e397eaa>] check_preemption_disabled+0xfa/0x100
 [<ffffffff9e397ee3>] __this_cpu_preempt_check+0x13/0x20
 [<ffffffffc0839872>] sctp_packet_transmit+0x692/0x710 [sctp]
 [<ffffffffc082a7f2>] sctp_outq_flush+0x2a2/0xc30 [sctp]
 [<ffffffff9e0d985c>] ? mark_held_locks+0x7c/0xb0
 [<ffffffff9e7f8c6d>] ? _raw_spin_unlock_irqrestore+0x5d/0x80
 [<ffffffffc082b99a>] sctp_outq_uncork+0x1a/0x20 [sctp]
 [<ffffffffc081e112>] sctp_cmd_interpreter.isra.23+0x1142/0x13f0 [sctp]
 [<ffffffffc081c86b>] sctp_do_sm+0xdb/0x330 [sctp]
 [<ffffffff9e0b8f1b>] ? preempt_count_sub+0xab/0x100
 [<ffffffffc083b350>] ? sctp_cname+0x70/0x70 [sctp]
 [<ffffffffc08389ca>] sctp_primitive_ASSOCIATE+0x3a/0x50 [sctp]
 [<ffffffffc083358f>] sctp_sendmsg+0x88f/0xe30 [sctp]
 [<ffffffff9e0d673a>] ? lock_release_holdtime.part.28+0x9a/0x160
 [<ffffffff9e0d62ce>] ? put_lock_stats.isra.27+0xe/0x30
 [<ffffffff9e73b624>] inet_sendmsg+0x104/0x220
 [<ffffffff9e73b525>] ? inet_sendmsg+0x5/0x220
 [<ffffffff9e68ac4e>] sock_sendmsg+0x9e/0xe0
 [<ffffffff9e1c0c09>] ? might_fault+0xb9/0xc0
 [<ffffffff9e1c0bae>] ? might_fault+0x5e/0xc0
 [<ffffffff9e68b234>] SYSC_sendto+0x124/0x1c0
 [<ffffffff9e0136b0>] ? syscall_trace_enter+0x250/0x330
 [<ffffffff9e68c3ce>] SyS_sendto+0xe/0x10
 [<ffffffff9e7f9be4>] tracesys+0xdd/0xe2

This is a followup of commits f1d8cba61c3c4b ("inet: fix possible
seqlock deadlocks") and 7f88c6b23afbd315 ("ipv6: fix possible seqlock
deadlock in ip6_finish_output2")

Signed-off-by: Eric Dumazet <>
Cc: Hannes Frederic Sowa <>
Reported-by: Dave Jones <>
Acked-by: Neil Horman <>
Acked-by: Hannes Frederic Sowa <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agobatman-adv: Fix out-of-order fragmentation support
Sven Eckelmann [Mon, 26 May 2014 15:21:39 +0000 (17:21 +0200)]
batman-adv: Fix out-of-order fragmentation support

[ Upstream commit d9124268d84a836f14a6ead54ff9d8eee4c43be5 ]

batadv_frag_insert_packet was unable to handle out-of-order packets because it
dropped them directly. This is caused by the way the fragmentation lists is
checked for the correct place to insert a fragmentation entry.

The fragmentation code keeps the fragments in lists. The fragmentation entries
are kept in descending order of sequence number. The list is traversed and each
entry is compared with the new fragment. If the current entry has a smaller
sequence number than the new fragment then the new one has to be inserted
before the current entry. This ensures that the list is still in descending

An out-of-order packet with a smaller sequence number than all entries in the
list still has to be added to the end of the list. The used hlist has no
information about the last entry in the list inside hlist_head and thus the
last entry has to be calculated differently. Currently the code assumes that
the iterator variable of hlist_for_each_entry can be used for this purpose
after the hlist_for_each_entry finished. This is obviously wrong because the
iterator variable is always NULL when the list was completely traversed.

Instead the information about the last entry has to be stored in a different

This problem was introduced in 610bfc6bc99bc83680d190ebc69359a05fc7f605
("batman-adv: Receive fragmented packets and merge").

Signed-off-by: Sven Eckelmann <>
Signed-off-by: Marek Lindner <>
Signed-off-by: Antonio Quartulli <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agoiovec: make sure the caller actually wants anything in memcpy_fromiovecend
Sasha Levin [Fri, 1 Aug 2014 03:00:35 +0000 (23:00 -0400)]
iovec: make sure the caller actually wants anything in memcpy_fromiovecend

[ Upstream commit 06ebb06d49486676272a3c030bfeef4bd969a8e6 ]

Check for cases when the caller requests 0 bytes instead of running off
and dereferencing potentially invalid iovecs.

Signed-off-by: Sasha Levin <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agonet: Correctly set segment mac_len in skb_segment().
Vlad Yasevich [Thu, 31 Jul 2014 14:33:06 +0000 (10:33 -0400)]
net: Correctly set segment mac_len in skb_segment().

[ Upstream commit fcdfe3a7fa4cb74391d42b6a26dc07c20dab1d82 ]

When performing segmentation, the mac_len value is copied right
out of the original skb.  However, this value is not always set correctly
(like when the packet is VLAN-tagged) and we'll end up copying a bad

One way to demonstrate this is to configure a VM which tags
packets internally and turn off VLAN acceleration on the forwarding
bridge port.  The packets show up corrupt like this:
16:18:24.985548 52:54:00:ab:be:25 > 52:54:00:26:ce:a3, ethertype 802.1Q
(0x8100), length 1518: vlan 100, p 0, ethertype 0x05e0,
        0x0000:  8cdb 1c7c 8cdb 0064 4006 b59d 0a00 6402 ...|...d@.....d.
        0x0010:  0a00 6401 9e0d b441 0a5e 64ec 0330 14fa ..d....A.^d..0..
        0x0020:  29e3 01c9 f871 0000 0101 080a 000a e833)....q.........3
        0x0030:  000f 8c75 6e65 7470 6572 6600 6e65 7470 ...unetperf.netp
        0x0040:  6572 6600 6e65 7470 6572 6600 6e65 7470 erf.netperf.netp
        0x0050:  6572 6600 6e65 7470 6572 6600 6e65 7470 erf.netperf.netp
        0x0060:  6572 6600 6e65 7470 6572 6600 6e65 7470 erf.netperf.netp

This also leads to awful throughput as GSO packets are dropped and
cause retransmissions.

The solution is to set the mac_len using the values already available
in then new skb.  We've already adjusted all of the header offset, so we
might as well correctly figure out the mac_len using skb_reset_mac_len().
After this change, packets are segmented correctly and performance
is restored.

CC: Eric Dumazet <>
Signed-off-by: Vlad Yasevich <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agomacvlan: Initialize vlan_features to turn on offload support.
Vlad Yasevich [Thu, 31 Jul 2014 14:30:25 +0000 (10:30 -0400)]
macvlan: Initialize vlan_features to turn on offload support.

[ Upstream commit 081e83a78db9b0ae1f5eabc2dedecc865f509b98 ]

Macvlan devices do not initialize vlan_features.  As a result,
any vlan devices configured on top of macvlans perform very poorly.
Initialize vlan_features based on the vlan features of the lower-level

Signed-off-by: Vlad Yasevich <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agonet: sctp: inherit auth_capable on INIT collisions
Daniel Borkmann [Tue, 22 Jul 2014 13:22:45 +0000 (15:22 +0200)]
net: sctp: inherit auth_capable on INIT collisions

[ Upstream commit 1be9a950c646c9092fb3618197f7b6bfb50e82aa ]

Jason reported an oops caused by SCTP on his ARM machine with
SCTP authentication enabled:

Internal error: Oops: 17 [#1] ARM
CPU: 0 PID: 104 Comm: sctp-test Not tainted 3.13.0-68744-g3632f30c9b20-dirty #1
task: c6eefa40 ti: c6f52000 task.ti: c6f52000
PC is at sctp_auth_calculate_hmac+0xc4/0x10c
LR is at sg_init_table+0x20/0x38
pc : [<c024bb80>]    lr : [<c00f32dc>]    psr: 40000013
sp : c6f538e8  ip : 00000000  fp : c6f53924
r10: c6f50d80  r9 : 00000000  r8 : 00010000
r7 : 00000000  r6 : c7be4000  r5 : 00000000  r4 : c6f56254
r3 : c00c8170  r2 : 00000001  r1 : 00000008  r0 : c6f1e660
Flags: nZcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 0005397f  Table: 06f28000  DAC: 00000015
Process sctp-test (pid: 104, stack limit = 0xc6f521c0)
Stack: (0xc6f538e8 to 0xc6f54000)
[<c024babc>] (sctp_auth_calculate_hmac+0x0/0x10c) from [<c0249af8>] (sctp_packet_transmit+0x33c/0x5c8)
[<c02497bc>] (sctp_packet_transmit+0x0/0x5c8) from [<c023e96c>] (sctp_outq_flush+0x7fc/0x844)
[<c023e170>] (sctp_outq_flush+0x0/0x844) from [<c023ef78>] (sctp_outq_uncork+0x24/0x28)
[<c023ef54>] (sctp_outq_uncork+0x0/0x28) from [<c0234364>] (sctp_side_effects+0x1134/0x1220)
[<c0233230>] (sctp_side_effects+0x0/0x1220) from [<c02330b0>] (sctp_do_sm+0xac/0xd4)
[<c0233004>] (sctp_do_sm+0x0/0xd4) from [<c023675c>] (sctp_assoc_bh_rcv+0x118/0x160)
[<c0236644>] (sctp_assoc_bh_rcv+0x0/0x160) from [<c023d5bc>] (sctp_inq_push+0x6c/0x74)
[<c023d550>] (sctp_inq_push+0x0/0x74) from [<c024a6b0>] (sctp_rcv+0x7d8/0x888)

While we already had various kind of bugs in that area
ec0223ec48a9 ("net: sctp: fix sctp_sf_do_5_1D_ce to verify if
we/peer is AUTH capable") and b14878ccb7fa ("net: sctp: cache
auth_enable per endpoint"), this one is a bit of a different

Giving a bit more background on why SCTP authentication is
needed can be found in RFC4895:

  SCTP uses 32-bit verification tags to protect itself against
  blind attackers. These values are not changed during the
  lifetime of an SCTP association.

  Looking at new SCTP extensions, there is the need to have a
  method of proving that an SCTP chunk(s) was really sent by
  the original peer that started the association and not by a
  malicious attacker.

To cause this bug, we're triggering an INIT collision between
peers; normal SCTP handshake where both sides intent to
authenticate packets contains RANDOM; CHUNKS; HMAC-ALGO
parameters that are being negotiated among peers:

  ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ---------->
  <------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] ---------
  -------------------- COOKIE-ECHO -------------------->
  <-------------------- COOKIE-ACK ---------------------

RFC4895 says that each endpoint therefore knows its own random
number and the peer's random number *after* the association
has been established. The local and peer's random number along
with the shared key are then part of the secret used for
calculating the HMAC in the AUTH chunk.

Now, in our scenario, we have 2 threads with 1 non-blocking
SEQ_PACKET socket each, setting up common shared SCTP_AUTH_KEY
and SCTP_AUTH_ACTIVE_KEY properly, and each of them calling
sctp_bindx(3), listen(2) and connect(2) against each other,
thus the handshake looks similar to this, e.g.:

  ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ---------->
  <------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] ---------
  <--------- INIT[RANDOM; CHUNKS; HMAC-ALGO] -----------
  -------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] -------->

Since such collisions can also happen with verification tags,
the RFC4895 for AUTH rather vaguely says under section 6.1:

  In case of INIT collision, the rules governing the handling
  of this Random Number follow the same pattern as those for
  the Verification Tag, as explained in Section 5.2.4 of
  RFC 2960 [5]. Therefore, each endpoint knows its own Random
  Number and the peer's Random Number after the association
  has been established.

In RFC2960, section 5.2.4, we're eventually hitting Action B:

  B) In this case, both sides may be attempting to start an
     association at about the same time but the peer endpoint
     started its INIT after responding to the local endpoint's
     INIT. Thus it may have picked a new Verification Tag not
     being aware of the previous Tag it had sent this endpoint.
     The endpoint should stay in or enter the ESTABLISHED
     state but it MUST update its peer's Verification Tag from
     the State Cookie, stop any init or cookie timers that may
     running and send a COOKIE ACK.

In other words, the handling of the Random parameter is the
same as behavior for the Verification Tag as described in
Action B of section 5.2.4.

Looking at the code, we exactly hit the sctp_sf_do_dupcook_b()
case which triggers an SCTP_CMD_UPDATE_ASSOC command to the
side effect interpreter, and in fact it properly copies over
peer_{random, hmacs, chunks} parameters from the newly created
association to update the existing one.

Also, the old asoc_shared_key is being released and based on
the new params, sctp_auth_asoc_init_active_key() updated.
However, the issue observed in this case is that the previous
asoc->peer.auth_capable was 0, and has *not* been updated, so
that instead of creating a new secret, we're doing an early
return from the function sctp_auth_asoc_init_active_key()
leaving asoc->asoc_shared_key as NULL. However, we now have to
authenticate chunks from the updated chunk list (e.g. COOKIE-ACK).

That in fact causes the server side when responding with ...

  <------------------ AUTH; COOKIE-ACK -----------------

... to trigger a NULL pointer dereference, since in
sctp_packet_transmit(), it discovers that an AUTH chunk is
being queued for xmit, and thus it calls sctp_auth_calculate_hmac().

Since the asoc->active_key_id is still inherited from the
endpoint, and the same as encoded into the chunk, it uses
asoc->asoc_shared_key, which is still NULL, as an asoc_key
and dereferences it in ...

  crypto_hash_setkey(desc.tfm, &asoc_key->data[0], asoc_key->len)

... causing an oops. All this happens because sctp_make_cookie_ack()
called with the *new* association has the peer.auth_capable=1
and therefore marks the chunk with auth=1 after checking
sctp_auth_send_cid(), but it is *actually* sent later on over
the then *updated* association's transport that didn't initialize
its shared key due to peer.auth_capable=0. Since control chunks
in that case are not sent by the temporary association which
are scheduled for deletion, they are issued for xmit via
SCTP_CMD_REPLY in the interpreter with the context of the
*updated* association. peer.auth_capable was 0 in the updated
association (which went from COOKIE_WAIT into ESTABLISHED state),
since all previous processing that performed sctp_process_init()
was being done on temporary associations, that we eventually
throw away each time.

The correct fix is to update to the new peer.auth_capable
value as well in the collision case via sctp_assoc_update(),
so that in case the collision migrated from 0 -> 1,
sctp_auth_asoc_init_active_key() can properly recalculate
the secret. This therefore fixes the observed server panic.

Fixes: 730fc3d05cd4 ("[SCTP]: Implete SCTP-AUTH parameter processing")
Reported-by: Jason Gunthorpe <>
Signed-off-by: Daniel Borkmann <>
Tested-by: Jason Gunthorpe <>
Cc: Vlad Yasevich <>
Acked-by: Vlad Yasevich <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agobna: fix performance regression
Ivan Vecera [Tue, 29 Jul 2014 14:29:30 +0000 (16:29 +0200)]
bna: fix performance regression

[ Upstream commit c36c9d50cc6af5c5bfcc195f21b73f55520c15f9 ]

The recent commit "e29aa33 bna: Enable Multi Buffer RX" is causing
a performance regression. It does not properly update 'cmpl' pointer
at the end of the loop in NAPI handler bnad_cq_process(). The result is
only one packet / per NAPI-schedule is processed.

Signed-off-by: Ivan Vecera <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agotcp: Fix integer-overflow in TCP vegas
Christoph Paasch [Tue, 29 Jul 2014 11:40:57 +0000 (13:40 +0200)]
tcp: Fix integer-overflow in TCP vegas

[ Upstream commit 1f74e613ded11517db90b2bd57e9464d9e0fb161 ]

In vegas we do a multiplication of the cwnd and the rtt. This
may overflow and thus their result is stored in a u64. However, we first
need to cast the cwnd so that actually 64-bit arithmetic is done.

Then, we need to do do_div to allow this to be used on 32-bit arches.

Cc: Stephen Hemminger <>
Cc: Neal Cardwell <>
Cc: Eric Dumazet <>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Doug Leith <>
Fixes: 8d3a564da34e (tcp: tcp_vegas cong avoid fix)
Signed-off-by: Christoph Paasch <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agotcp: Fix integer-overflows in TCP veno
Christoph Paasch [Tue, 29 Jul 2014 10:07:27 +0000 (12:07 +0200)]
tcp: Fix integer-overflows in TCP veno

[ Upstream commit 45a07695bc64b3ab5d6d2215f9677e5b8c05a7d0 ]

In veno we do a multiplication of the cwnd and the rtt. This
may overflow and thus their result is stored in a u64. However, we first
need to cast the cwnd so that actually 64-bit arithmetic is done.

A first attempt at fixing 76f1017757aa0 ([TCP]: TCP Veno congestion
control) was made by 159131149c2 (tcp: Overflow bug in Vegas), but it
failed to add the required cast in tcp_veno_cong_avoid().

Fixes: 76f1017757aa0 ([TCP]: TCP Veno congestion control)
Signed-off-by: Christoph Paasch <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agoip_tunnel(ipv4): fix tunnels with "local any remote $remote_ip"
Dmitry Popov [Mon, 28 Jul 2014 23:07:52 +0000 (03:07 +0400)]
ip_tunnel(ipv4): fix tunnels with "local any remote $remote_ip"

[ Upstream commit 95cb5745983c222867cc9ac593aebb2ad67d72c0 ]

Ipv4 tunnels created with "local any remote $ip" didn't work properly since
7d442fab0 (ipv4: Cache dst in tunnels). 99% of packets sent via those tunnels
had src addr = That was because only dst_entry was cached, although
fl4.saddr has to be cached too. Every time ip_tunnel_xmit used cached dst_entry
(tunnel_rtable_get returned non-NULL), fl4.saddr was initialized with
tnl_params->saddr (= 0 in our case), and wasn't changed until iptunnel_xmit().

This patch adds saddr to ip_tunnel->dst_cache, fixing this issue.

Reported-by: Sergey Popov <>
Signed-off-by: Dmitry Popov <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agonet: phy: re-apply PHY fixups during phy_register_device
Florian Fainelli [Mon, 28 Jul 2014 23:28:07 +0000 (16:28 -0700)]
net: phy: re-apply PHY fixups during phy_register_device

[ Upstream commit d92f5dec6325079c550889883af51db1b77d5623 ]

Commit 87aa9f9c61ad ("net: phy: consolidate PHY reset in phy_init_hw()")
moved the call to phy_scan_fixups() in phy_init_hw() after a software
reset is performed.

By the time phy_init_hw() is called in phy_device_register(), no driver
has been bound to this PHY yet, so all the checks in phy_init_hw()
against the PHY driver and the PHY driver's config_init function will
return 0. We will therefore never call phy_scan_fixups() as we should.

Fix this by calling phy_scan_fixups() and check for its return value to
restore the intended functionality.

This broke PHY drivers which do register an early PHY fixup callback to
intercept the PHY probing and do things like changing the 32-bits unique
PHY identifier when a pseudo-PHY address has been used, as well as
board-specific PHY fixups that need to be applied during driver probe

Reported-by: Hauke Merthens <>
Reported-by: Jonas Gorski <>
Signed-off-by: Florian Fainelli <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agonet: sendmsg: fix NULL pointer dereference
Andrey Ryabinin [Sat, 26 Jul 2014 17:26:58 +0000 (21:26 +0400)]
net: sendmsg: fix NULL pointer dereference

[ Upstream commit 40eea803c6b2cfaab092f053248cbeab3f368412 ]

Sasha's report:
> While fuzzing with trinity inside a KVM tools guest running the latest -next
> kernel with the KASAN patchset, I've stumbled on the following spew:
> [ 4448.949424] ==================================================================
> [ 4448.951737] AddressSanitizer: user-memory-access on address 0
> [ 4448.952988] Read of size 2 by thread T19638:
> [ 4448.954510] CPU: 28 PID: 19638 Comm: trinity-c76 Not tainted 3.16.0-rc4-next-20140711-sasha-00046-g07d3099-dirty #813
> [ 4448.956823]  ffff88046d86ca40 0000000000000000 ffff880082f37e78 ffff880082f37a40
> [ 4448.958233]  ffffffffb6e47068 ffff880082f37a68 ffff880082f37a58 ffffffffb242708d
> [ 4448.959552]  0000000000000000 ffff880082f37a88 ffffffffb24255b1 0000000000000000
> [ 4448.961266] Call Trace:
> [ 4448.963158] dump_stack (lib/dump_stack.c:52)
> [ 4448.964244] kasan_report_user_access (mm/kasan/report.c:184)
> [ 4448.965507] __asan_load2 (mm/kasan/kasan.c:352)
> [ 4448.966482] ? netlink_sendmsg (net/netlink/af_netlink.c:2339)
> [ 4448.967541] netlink_sendmsg (net/netlink/af_netlink.c:2339)
> [ 4448.968537] ? get_parent_ip (kernel/sched/core.c:2555)
> [ 4448.970103] sock_sendmsg (net/socket.c:654)
> [ 4448.971584] ? might_fault (mm/memory.c:3741)
> [ 4448.972526] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3740)
> [ 4448.973596] ? verify_iovec (net/core/iovec.c:64)
> [ 4448.974522] ___sys_sendmsg (net/socket.c:2096)
> [ 4448.975797] ? put_lock_stats.isra.13 (./arch/x86/include/asm/preempt.h:98 kernel/locking/lockdep.c:254)
> [ 4448.977030] ? lock_release_holdtime (kernel/locking/lockdep.c:273)
> [ 4448.978197] ? lock_release_non_nested (kernel/locking/lockdep.c:3434 (discriminator 1))
> [ 4448.979346] ? check_chain_key (kernel/locking/lockdep.c:2188)
> [ 4448.980535] __sys_sendmmsg (net/socket.c:2181)
> [ 4448.981592] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600)
> [ 4448.982773] ? trace_hardirqs_on (kernel/locking/lockdep.c:2607)
> [ 4448.984458] ? syscall_trace_enter (arch/x86/kernel/ptrace.c:1500 (discriminator 2))
> [ 4448.985621] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600)
> [ 4448.986754] SyS_sendmmsg (net/socket.c:2201)
> [ 4448.987708] tracesys (arch/x86/kernel/entry_64.S:542)
> [ 4448.988929] ==================================================================

This reports means that we've come to netlink_sendmsg() with msg->msg_name == NULL and msg->msg_namelen > 0.

After this report there was no usual "Unable to handle kernel NULL pointer dereference"
and this gave me a clue that address 0 is mapped and contains valid socket address structure in it.

This bug was introduced in f3d3342602f8bcbf37d7c46641cb9bca7618eb1c
(net: rework recvmsg handler msg_name and msg_namelen logic).
Commit message states that:
"Set msg->msg_name = NULL if user specified a NULL in msg_name but had a
 non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't
 affect sendto as it would bail out earlier while trying to copy-in the
But in fact this affects sendto when address 0 is mapped and contains
socket address structure in it. In such case copy-in address will succeed,
verify_iovec() function will successfully exit with msg->msg_namelen > 0
and msg->msg_name == NULL.

This patch fixes it by setting msg_namelen to 0 if msg_name == NULL.

Cc: Hannes Frederic Sowa <>
Cc: Eric Dumazet <>
Cc: <>
Reported-by: Sasha Levin <>
Signed-off-by: Andrey Ryabinin <>
Acked-by: Hannes Frederic Sowa <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agoip: make IP identifiers less predictable
Eric Dumazet [Sat, 26 Jul 2014 06:58:10 +0000 (08:58 +0200)]
ip: make IP identifiers less predictable

[ Upstream commit 04ca6973f7c1a0d8537f2d9906a0cf8e69886d75 ]

In "Counting Packets Sent Between Arbitrary Internet Hosts", Jeffrey and
Jedidiah describe ways exploiting linux IP identifier generation to
infer whether two machines are exchanging packets.

With commit 73f156a6e8c1 ("inetpeer: get rid of ip_id_count"), we
changed IP id generation, but this does not really prevent this
side-channel technique.

This patch adds a random amount of perturbation so that IP identifiers
for a given destination [1] are no longer monotonically increasing after
an idle period.

Note that prandom_u32_max(1) returns 0, so if generator is used at most
once per jiffy, this patch inserts no hole in the ID suite and do not
increase collision probability.

This is jiffies based, so in the worst case (HZ=1000), the id can
rollover after ~65 seconds of idle time, which should be fine.

We also change the hash used in __ip_select_ident() to not only hash
on daddr, but also saddr and protocol, so that ICMP probes can not be
used to infer information for other protocols.

For IPv6, adds saddr into the hash as well, but not nexthdr.

If I ping the patched target, we can see ID are now hard to predict.

21:57:11.008086 IP (...)
    A > target: ICMP echo request, seq 1, length 64
21:57:11.010752 IP (... id 2081 ...)
    target > A: ICMP echo reply, seq 1, length 64

21:57:12.013133 IP (...)
    A > target: ICMP echo request, seq 2, length 64
21:57:12.015737 IP (... id 3039 ...)
    target > A: ICMP echo reply, seq 2, length 64

21:57:13.016580 IP (...)
    A > target: ICMP echo request, seq 3, length 64
21:57:13.019251 IP (... id 3437 ...)
    target > A: ICMP echo reply, seq 3, length 64

[1] TCP sessions uses a per flow ID generator not changed by this patch.

Signed-off-by: Eric Dumazet <>
Reported-by: Jeffrey Knockel <>
Reported-by: Jedidiah R. Crandall <>
Cc: Willy Tarreau <>
Cc: Hannes Frederic Sowa <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agoinetpeer: get rid of ip_id_count
Eric Dumazet [Mon, 2 Jun 2014 12:26:03 +0000 (05:26 -0700)]
inetpeer: get rid of ip_id_count

[ Upstream commit 73f156a6e8c1074ac6327e0abd1169e95eb66463 ]

Ideally, we would need to generate IP ID using a per destination IP

linux kernels used inet_peer cache for this purpose, but this had a huge
cost on servers disabling MTU discovery.

1) each inet_peer struct consumes 192 bytes

2) inetpeer cache uses a binary tree of inet_peer structs,
   with a nominal size of ~66000 elements under load.

3) lookups in this tree are hitting a lot of cache lines, as tree depth
   is about 20.

4) If server deals with many tcp flows, we have a high probability of
   not finding the inet_peer, allocating a fresh one, inserting it in
   the tree with same initial ip_id_count, (cf secure_ip_id())

5) We garbage collect inet_peer aggressively.

IP ID generation do not have to be 'perfect'

Goal is trying to avoid duplicates in a short period of time,
so that reassembly units have a chance to complete reassembly of
fragments belonging to one message before receiving other fragments
with a recycled ID.

We simply use an array of generators, and a Jenkin hash using the dst IP
as a key.

ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it
belongs (it is only used from this file)

secure_ip_id() and secure_ipv6_id() no longer are needed.

Rename ip_select_ident_more() to ip_select_ident_segs() to avoid
unnecessary decrement/increment of the number of segments.

Signed-off-by: Eric Dumazet <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agobnx2x: fix crash during TSO tunneling
Dmitry Kravkov [Thu, 24 Jul 2014 15:54:47 +0000 (18:54 +0300)]
bnx2x: fix crash during TSO tunneling

[ Upstream commit fe26566d8a05151ba1dce75081f6270f73ec4ae1 ]

When TSO packet is transmitted additional BD w/o mapping is used
to describe the packed. The BD needs special handling in tx

kernel: Call Trace:
kernel: <IRQ>  [<ffffffff815e19ba>] dump_stack+0x19/0x1b
kernel: [<ffffffff8105dee1>] warn_slowpath_common+0x61/0x80
kernel: [<ffffffff8105df5c>] warn_slowpath_fmt+0x5c/0x80
kernel: [<ffffffff814a8c0d>] ? find_iova+0x4d/0x90
kernel: [<ffffffff814ab0e2>] intel_unmap_page.part.36+0x142/0x160
kernel: [<ffffffff814ad0e6>] intel_unmap_page+0x26/0x30
kernel: [<ffffffffa01f55d7>] bnx2x_free_tx_pkt+0x157/0x2b0 [bnx2x]
kernel: [<ffffffffa01f8dac>] bnx2x_tx_int+0xac/0x220 [bnx2x]
kernel: [<ffffffff8101a0d9>] ? read_tsc+0x9/0x20
kernel: [<ffffffffa01f8fdb>] bnx2x_poll+0xbb/0x3c0 [bnx2x]
kernel: [<ffffffff814d041a>] net_rx_action+0x15a/0x250
kernel: [<ffffffff81067047>] __do_softirq+0xf7/0x290
kernel: [<ffffffff815f3a5c>] call_softirq+0x1c/0x30
kernel: [<ffffffff81014d25>] do_softirq+0x55/0x90
kernel: [<ffffffff810673e5>] irq_exit+0x115/0x120
kernel: [<ffffffff815f4358>] do_IRQ+0x58/0xf0
kernel: [<ffffffff815e94ad>] common_interrupt+0x6d/0x6d
kernel: <EOI>  [<ffffffff810bbff7>] ? clockevents_notify+0x127/0x140
kernel: [<ffffffff814834df>] ? cpuidle_enter_state+0x4f/0xc0
kernel: [<ffffffff81483615>] cpuidle_idle_call+0xc5/0x200
kernel: [<ffffffff8101bc7e>] arch_cpu_idle+0xe/0x30
kernel: [<ffffffff810b4725>] cpu_startup_entry+0xf5/0x290
kernel: [<ffffffff815cfee1>] start_secondary+0x265/0x27b
kernel: ---[ end trace 11aa7726f18d7e80 ]---

Fixes: a848ade408b ("bnx2x: add CSUM and TSO support for encapsulation protocols")
Reported-by: Yulong Pei <>
Cc: Michal Schmidt <>
Signed-off-by: Dmitry Kravkov <>
Signed-off-by: David S. Miller <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agoxfrm: Fix installation of AH IPsec SAs
Tobias Brunner [Thu, 26 Jun 2014 13:12:45 +0000 (15:12 +0200)]
xfrm: Fix installation of AH IPsec SAs

[ Upstream commit a0e5ef53aac8e5049f9344857d8ec5237d31e58b ]

The SPI check introduced in ea9884b3acf3311c8a11db67bfab21773f6f82ba
was intended for IPComp SAs but actually prevented AH SAs from getting
installed (depending on the SPI).

Fixes: ea9884b3acf3 ("xfrm: check user specified spi for IPComp")
Cc: Fan Du <>
Signed-off-by: Tobias Brunner <>
Signed-off-by: Steffen Klassert <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agoLinux 3.14.16 v3.14.16
Greg Kroah-Hartman [Thu, 7 Aug 2014 23:50:59 +0000 (16:50 -0700)]
Linux 3.14.16

6 years agox86/espfix/xen: Fix allocation of pages for paravirt page tables
Boris Ostrovsky [Wed, 9 Jul 2014 17:18:18 +0000 (13:18 -0400)]
x86/espfix/xen: Fix allocation of pages for paravirt page tables

commit 8762e5092828c4dc0f49da5a47a644c670df77f3 upstream.

init_espfix_ap() is currently off by one level when informing hypervisor
that allocated pages will be used for ministacks' page tables.

The most immediate effect of this on a PV guest is that if
'stack_page = __get_free_page()' returns a non-zeroed-out page the hypervisor
will refuse to use it for a page table (which it shouldn't be anyway). This will
result in warnings by both Xen and Linux.

More importantly, a subsequent write to that page (again, by a PV guest) is
likely to result in fatal page fault.

Signed-off-by: Boris Ostrovsky <>
Reviewed-by: Konrad Rzeszutek Wilk <>
Signed-off-by: H. Peter Anvin <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agolib/btree.c: fix leak of whole btree nodes
Minfei Huang [Wed, 4 Jun 2014 23:11:53 +0000 (16:11 -0700)]
lib/btree.c: fix leak of whole btree nodes

commit c75b53af2f0043aff500af0a6f878497bef41bca upstream.

I use btree from 3.14-rc2 in my own module.  When the btree module is
removed, a warning arises:

 kmem_cache_destroy btree_node: Slab cache still has objects
 CPU: 13 PID: 9150 Comm: rmmod Tainted: GF          O 3.14.0-rc2 #1
 Hardware name: Inspur NF5270M3/NF5270M3, BIOS CHEETAH_2.1.3 09/10/2013
 Call Trace:
   btree_module_exit+0x10/0x12 [btree]

The cause is that it doesn't release the last btree node, when height = 1
and fill = 1.

[ remove unneeded test of NULL]
Signed-off-by: Minfei Huang <>
Cc: Joern Engel <>
Cc: Johannes Berg <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agonet/l2tp: don't fall back on UDP [get|set]sockopt
Sasha Levin [Tue, 15 Jul 2014 00:02:31 +0000 (17:02 -0700)]
net/l2tp: don't fall back on UDP [get|set]sockopt

commit 3cf521f7dc87c031617fd47e4b7aa2593c2f3daf upstream.

The l2tp [get|set]sockopt() code has fallen back to the UDP functions
for socket option levels != SOL_PPPOL2TP since day one, but that has
never actually worked, since the l2tp socket isn't an inet socket.

As David Miller points out:

  "If we wanted this to work, it'd have to look up the tunnel and then
   use tunnel->sk, but I wonder how useful that would be"

Since this can never have worked so nobody could possibly have depended
on that functionality, just remove the broken code and return -EINVAL.

Reported-by: Sasha Levin <>
Acked-by: James Chapman <>
Acked-by: David Miller <>
Cc: Phil Turnbull <>
Cc: Vegard Nossum <>
Cc: Willy Tarreau <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agoxtensa: add fixup for double exception raised in window overflow
Max Filippov [Sat, 24 May 2014 17:48:28 +0000 (21:48 +0400)]
xtensa: add fixup for double exception raised in window overflow

commit 17290231df16eeee5dfc198dbf5ee4b419996dcd upstream.

There are two FIXMEs in the double exception handler 'for the extremely
unlikely case'. This case gets hit by gcc during kernel build once in
a few hours, resulting in an unrecoverable exception condition.

Provide missing fixup routine to handle this case. Double exception
literals now need 8 more bytes, add them to the linker script.

Also replace bbsi instructions with bbsi.l as we're branching depending
on 8th and 7th LSB-based bits of exception address.

This may be tested by adding the explicit DTLB invalidation to window
overflow handlers, like the following:

#    --- a/arch/xtensa/kernel/vectors.S
#    +++ b/arch/xtensa/kernel/vectors.S
#    @@ -592,6 +592,14 @@ ENDPROC(_WindowUnderflow4)
#     ENTRY_ALIGN64(_WindowOverflow8)
#     s32e a0, a9, -16
#    + bbsi.l a9, 31, 1f
#    + rsr a0, ccount
#    + bbsi.l a0, 4, 1f
#    + pdtlb a0, a9
#    + idtlb a0
#    + movi a0, 9
#    + idtlb a0
#    +1:
#     l32e    a0, a1, -12
#     s32e    a2, a9,  -8
#     s32e    a1, a9, -12

Signed-off-by: Max Filippov <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agox86/xen: no need to explicitly register an NMI callback
David Vrabel [Mon, 16 Jun 2014 11:07:00 +0000 (13:07 +0200)]
x86/xen: no need to explicitly register an NMI callback

commit ea9f9274bf4337ba7cbab241c780487651642d63 upstream.

Remove xen_enable_nmi() to fix a 64-bit guest crash when registering
the NMI callback on Xen 3.1 and earlier.

It's not needed since the NMI callback is set by a set_trap_table
hypercall (in xen_load_idt() or xen_write_idt_entry()).

It's also broken since it only set the current VCPU's callback.

Signed-off-by: David Vrabel <>
Reported-by: Vitaly Kuznetsov <>
Tested-by: Vitaly Kuznetsov <>
Cc: Steven Noonan <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agocpufreq: move policy kobj to policy->cpu at resume
Viresh Kumar [Thu, 17 Jul 2014 05:18:25 +0000 (10:48 +0530)]
cpufreq: move policy kobj to policy->cpu at resume

commit 92c14bd9477a20a83144f08c0ca25b0308bf0730 upstream.

This is only relevant to implementations with multiple clusters, where clusters
have separate clock lines but all CPUs within a cluster share it.

Consider a dual cluster platform with 2 cores per cluster. During suspend we
start hot unplugging CPUs in order 1 to 3. When CPU2 is removed, policy->kobj
would be moved to CPU3 and when CPU3 goes down we wouldn't free policy or its
kobj as we want to retain permissions/values/etc.

Now on resume, we will get CPU2 before CPU3 and will call __cpufreq_add_dev().
We will recover the old policy and update policy->cpu from 3 to 2 from

But the kobj is still tied to CPU3 and isn't moved to CPU2. We wouldn't create a
link for CPU2, but would try that for CPU3 while bringing it online. Which will
report errors as CPU3 already has kobj assigned to it.

This bug got introduced with commit 42f921a, which overlooked this scenario.

To fix this, lets move kobj to the new policy->cpu while bringing first CPU of a
cluster back. Also do a WARN_ON() if kobject_move failed, as we would reach here
only for the first CPU of a non-boot cluster. And we can't recover from this
situation, if kobject_move() fails.

Fixes: 42f921a6f10c (cpufreq: remove sysfs files for CPUs which failed to come back after resume)
Cc: 3.13+ <> # 3.13+
Reported-and-tested-by: Bu Yitian <>
Reported-by: Saravana Kannan <>
Reviewed-by: Srivatsa S. Bhat <>
Signed-off-by: Viresh Kumar <>
Signed-off-by: Rafael J. Wysocki <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agoRevert "mac80211: move "bufferable MMPDU" check to fix AP mode scan"
Johannes Berg [Mon, 7 Jul 2014 10:01:11 +0000 (12:01 +0200)]
Revert "mac80211: move "bufferable MMPDU" check to fix AP mode scan"

commit 08b9939997df30e42a228e1ecb97f99e9c8ea84e upstream.

This reverts commit 277d916fc2e959c3f106904116bb4f7b1148d47a as it was
at least breaking iwlwifi by setting the IEEE80211_TX_CTL_NO_PS_BUFFER
flag in all kinds of interface modes, not only for AP mode where it is

To avoid reintroducing the original problem, explicitly check for probe
request frames in the multicast buffering code.

Fixes: 277d916fc2e9 ("mac80211: move "bufferable MMPDU" check to fix AP mode scan")
Signed-off-by: Johannes Berg <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agostaging: vt6655: Fix Warning on boot handle_irq_event_percpu.
Malcolm Priestley [Wed, 23 Jul 2014 20:35:11 +0000 (21:35 +0100)]
staging: vt6655: Fix Warning on boot handle_irq_event_percpu.

commit 6cff1f6ad4c615319c1a146b2aa0af1043c5e9f5 upstream.

WARNING: CPU: 0 PID: 929 at /home/apw/COD/linux/kernel/irq/handle.c:147 handle_irq_event_percpu+0x1d1/0x1e0()
irq 17 handler device_intr+0x0/0xa80 [vt6655_stage] enabled interrupts

Using spin_lock_irqsave appears to fix this.

Signed-off-by: Malcolm Priestley <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agoARM: dts: dra7-evm: Make VDDA_1V8_PHY supply always on
Roger Quadros [Fri, 4 Jul 2014 09:55:43 +0000 (12:55 +0300)]
ARM: dts: dra7-evm: Make VDDA_1V8_PHY supply always on

commit e120fb459693bbc1ac3eabdd65c3659d7cfbfd2a upstream.

After clarification from the hardware team it was found that
this 1.8V PHY supply can't be switched OFF when SoC is Active.

Since the PHY IPs don't contain isolation logic built in the design to
allow the power rail to be switched off, there is a very high risk
of IP reliability and additional leakage paths which can result in
additional power consumption.

The only scenario where this rail can be switched off is part of Power on
reset sequencing, but it needs to be kept always-on during operation.

This patch is required for proper functionality of USB, SATA
and PCIe on DRA7-evm.

CC: Rajendra Nayak <>
CC: Tero Kristo <>
Signed-off-by: Roger Quadros <>
Signed-off-by: Tony Lindgren <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agopinctrl: dra: dt-bindings: Fix pull enable/disable
Nishanth Menon [Tue, 22 Jul 2014 15:39:54 +0000 (10:39 -0500)]
pinctrl: dra: dt-bindings: Fix pull enable/disable

commit 23d9cec07c589276561c13b180577c0b87930140 upstream.

The DRA74/72 control module pins have a weak pull up and pull down.
This is configured by bit offset 17. if BIT(17) is 1, a pull up is
selected, else a pull down is selected.

However, this pull resisstor is applied based on BIT(16) -
PULLUDENABLE - if BIT(18) is *0*, then pull as defined in BIT(17) is
applied, else no weak pulls are applied. We defined this in reverse.

Reference: Table 18-5 (Description of the pad configuration register
bits) in Technical Reference Manual Revision (DRA74x revision Q:
SPRUHI2Q Revised June 2014 and DRA72x revision F: SPRUHP2F - Revised
June 2014)

Fixes: 6e58b8f1daaf1a ("ARM: dts: DRA7: Add the dts files for dra7 SoC and dra7-evm board")
Signed-off-by: Nishanth Menon <>
Tested-by: Felipe Balbi <>
Acked-by: Felipe Balbi <>
Signed-off-by: Tony Lindgren <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agox86_64/entry/xen: Do not invoke espfix64 on Xen
Andy Lutomirski [Wed, 23 Jul 2014 15:34:11 +0000 (08:34 -0700)]
x86_64/entry/xen: Do not invoke espfix64 on Xen

commit 7209a75d2009dbf7745e2fd354abf25c3deb3ca3 upstream.

This moves the espfix64 logic into native_iret.  To make this work,
it gets rid of the native patch for INTERRUPT_RETURN:
INTERRUPT_RETURN on native kernels is now 'jmp native_iret'.

This changes the 16-bit SS behavior on Xen from OOPSing to leaking
some bits of the Xen hypervisor's RSP (I think).

[ hpa: this is a nonzero cost on native, but probably not enough to
  measure. Xen needs to fix this in their own code, probably doing
  something equivalent to espfix64. ]

Signed-off-by: Andy Lutomirski <>
Signed-off-by: H. Peter Anvin <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agox86, espfix: Make it possible to disable 16-bit support
H. Peter Anvin [Sun, 4 May 2014 17:36:22 +0000 (10:36 -0700)]
x86, espfix: Make it possible to disable 16-bit support

commit 34273f41d57ee8d854dcd2a1d754cbb546cb548f upstream.

Embedded systems, which may be very memory-size-sensitive, are
extremely unlikely to ever encounter any 16-bit software, so make it
a CONFIG_EXPERT option to turn off support for any 16-bit software

Signed-off-by: H. Peter Anvin <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agox86, espfix: Make espfix64 a Kconfig option, fix UML
H. Peter Anvin [Sun, 4 May 2014 17:00:49 +0000 (10:00 -0700)]
x86, espfix: Make espfix64 a Kconfig option, fix UML

commit 197725de65477bc8509b41388157c1a2283542bb upstream.

Make espfix64 a hidden Kconfig option.  This fixes the x86-64 UML
build which had broken due to the non-existence of init_espfix_bsp()
in UML: since UML uses its own Kconfig, this option does not appear in
the UML build.

This also makes it possible to make support for 16-bit segments a
configuration option, for the people who want to minimize the size of
the kernel.

Reported-by: Ingo Molnar <>
Signed-off-by: H. Peter Anvin <>
Cc: Richard Weinberger <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agox86, espfix: Fix broken header guard
H. Peter Anvin [Fri, 2 May 2014 18:33:51 +0000 (11:33 -0700)]
x86, espfix: Fix broken header guard

commit 20b68535cd27183ebd3651ff313afb2b97dac941 upstream.

Header guard is #ifndef, not #ifdef...

Reported-by: Fengguang Wu <>
Signed-off-by: H. Peter Anvin <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agox86, espfix: Move espfix definitions into a separate header file
H. Peter Anvin [Thu, 1 May 2014 21:12:23 +0000 (14:12 -0700)]
x86, espfix: Move espfix definitions into a separate header file

commit e1fe9ed8d2a4937510d0d60e20705035c2609aea upstream.

Sparse warns that the percpu variables aren't declared before they are
defined.  Rather than hacking around it, move espfix definitions into
a proper header file.

Reported-by: Fengguang Wu <>
Signed-off-by: H. Peter Anvin <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agox86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stack
H. Peter Anvin [Tue, 29 Apr 2014 23:46:09 +0000 (16:46 -0700)]
x86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stack

commit 3891a04aafd668686239349ea58f3314ea2af86b upstream.

The IRET instruction, when returning to a 16-bit segment, only
restores the bottom 16 bits of the user space stack pointer.  This
causes some 16-bit software to break, but it also leaks kernel state
to user space.  We have a software workaround for that ("espfix") for
the 32-bit kernel, but it relies on a nonzero stack segment base which
is not available in 64-bit mode.

In checkin:

    b3b42ac2cbae x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels

we "solved" this by forbidding 16-bit segments on 64-bit kernels, with
the logic that 16-bit support is crippled on 64-bit kernels anyway (no
V86 support), but it turns out that people are doing stuff like
running old Win16 binaries under Wine and expect it to work.

This works around this by creating percpu "ministacks", each of which
is mapped 2^16 times 64K apart.  When we detect that the return SS is
on the LDT, we copy the IRET frame to the ministack and use the
relevant alias to return to userspace.  The ministacks are mapped
readonly, so if IRET faults we promote #GP to #DF which is an IST
vector and thus has its own stack; we then do the fixup in the #DF

(Making #GP an IST exception would make the msr_safe functions unsafe
in NMI/MC context, and quite possibly have other effects.)

Special thanks to:

- Andy Lutomirski, for the suggestion of using very small stack slots
  and copy (as opposed to map) the IRET frame there, and for the
  suggestion to mark them readonly and let the fault promote to #DF.
- Konrad Wilk for paravirt fixup and testing.
- Borislav Petkov for testing help and useful comments.

Reported-by: Brian Gerst <>
Signed-off-by: H. Peter Anvin <>
Cc: Konrad Rzeszutek Wilk <>
Cc: Borislav Petkov <>
Cc: Andrew Lutomriski <>
Cc: Linus Torvalds <>
Cc: Dirk Hohndel <>
Cc: Arjan van de Ven <>
Cc: comex <>
Cc: Alexander van Heukelum <>
Cc: Boris Ostrovsky <>
Cc: <> # consider after upstream merge
Signed-off-by: Greg Kroah-Hartman <>
6 years agoRevert "x86-64, modify_ldt: Make support for 16-bit segments a runtime option"
H. Peter Anvin [Wed, 21 May 2014 17:22:59 +0000 (10:22 -0700)]
Revert "x86-64, modify_ldt: Make support for 16-bit segments a runtime option"

commit 7ed6fb9b5a5510e4ef78ab27419184741169978a upstream.

This reverts commit fa81511bb0bbb2b1aace3695ce869da9762624ff in
preparation of merging in the proper fix (espfix64).

Signed-off-by: H. Peter Anvin <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agotimer: Fix lock inversion between hrtimer_bases.lock and scheduler locks
Jan Kara [Fri, 1 Aug 2014 10:20:02 +0000 (12:20 +0200)]
timer: Fix lock inversion between hrtimer_bases.lock and scheduler locks

commit 504d58745c9ca28d33572e2d8a9990b43e06075d upstream.

clockevents_increase_min_delta() calls printk() from under
hrtimer_bases.lock. That causes lock inversion on scheduler locks because
printk() can call into the scheduler. Lockdep puts it as:

[ INFO: possible circular locking dependency detected ]
3.15.0-rc8-06195-g939f04b #2 Not tainted
trinity-main/74 is trying to acquire lock:
 (&port_lock_key){-.....}, at: [<811c60be>] serial8250_console_write+0x8c/0x10c

but task is already holding lock:
 (hrtimer_bases.lock){-.-...}, at: [<8103caeb>] hrtimer_try_to_cancel+0x13/0x66

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #5 (hrtimer_bases.lock){-.-...}:
       [<8104a942>] lock_acquire+0x92/0x101
       [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
       [<8103c918>] __hrtimer_start_range_ns+0x1c/0x197
       [<8107ec20>] perf_swevent_start_hrtimer.part.41+0x7a/0x85
       [<81080792>] task_clock_event_start+0x3a/0x3f
       [<810807a4>] task_clock_event_add+0xd/0x14
       [<8108259a>] event_sched_in+0xb6/0x17a
       [<810826a2>] group_sched_in+0x44/0x122
       [<81082885>] ctx_sched_in.isra.67+0x105/0x11f
       [<810828e6>] perf_event_sched_in.isra.70+0x47/0x4b
       [<81082bf6>] __perf_install_in_context+0x8b/0xa3
       [<8107eb8e>] remote_function+0x12/0x2a
       [<8105f5af>] smp_call_function_single+0x2d/0x53
       [<8107e17d>] task_function_call+0x30/0x36
       [<8107fb82>] perf_install_in_context+0x87/0xbb
       [<810852c9>] SYSC_perf_event_open+0x5c6/0x701
       [<810856f9>] SyS_perf_event_open+0x17/0x19
       [<8142f8ee>] syscall_call+0x7/0xb

-> #4 (&ctx->lock){......}:
       [<8104a942>] lock_acquire+0x92/0x101
       [<8142f04c>] _raw_spin_lock+0x21/0x30
       [<81081df3>] __perf_event_task_sched_out+0x1dc/0x34f
       [<8142cacc>] __schedule+0x4c6/0x4cb
       [<8142cae0>] schedule+0xf/0x11
       [<8142f9a6>] work_resched+0x5/0x30

-> #3 (&rq->lock){-.-.-.}:
       [<8104a942>] lock_acquire+0x92/0x101
       [<8142f04c>] _raw_spin_lock+0x21/0x30
       [<81040873>] __task_rq_lock+0x33/0x3a
       [<8104184c>] wake_up_new_task+0x25/0xc2
       [<8102474b>] do_fork+0x15c/0x2a0
       [<810248a9>] kernel_thread+0x1a/0x1f
       [<814232a2>] rest_init+0x1a/0x10e
       [<817af949>] start_kernel+0x303/0x308
       [<817af2ab>] i386_start_kernel+0x79/0x7d

-> #2 (&p->pi_lock){-.-...}:
       [<8104a942>] lock_acquire+0x92/0x101
       [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
       [<810413dd>] try_to_wake_up+0x1d/0xd6
       [<810414cd>] default_wake_function+0xb/0xd
       [<810461f3>] __wake_up_common+0x39/0x59
       [<81046346>] __wake_up+0x29/0x3b
       [<811b8733>] tty_wakeup+0x49/0x51
       [<811c3568>] uart_write_wakeup+0x17/0x19
       [<811c5dc1>] serial8250_tx_chars+0xbc/0xfb
       [<811c5f28>] serial8250_handle_irq+0x54/0x6a
       [<811c5f57>] serial8250_default_handle_irq+0x19/0x1c
       [<811c56d8>] serial8250_interrupt+0x38/0x9e
       [<810510e7>] handle_irq_event_percpu+0x5f/0x1e2
       [<81051296>] handle_irq_event+0x2c/0x43
       [<81052cee>] handle_level_irq+0x57/0x80
       [<81002a72>] handle_irq+0x46/0x5c
       [<810027df>] do_IRQ+0x32/0x89
       [<8143036e>] common_interrupt+0x2e/0x33
       [<8142f23c>] _raw_spin_unlock_irqrestore+0x3f/0x49
       [<811c25a4>] uart_start+0x2d/0x32
       [<811c2c04>] uart_write+0xc7/0xd6
       [<811bc6f6>] n_tty_write+0xb8/0x35e
       [<811b9beb>] tty_write+0x163/0x1e4
       [<811b9cd9>] redirected_tty_write+0x6d/0x75
       [<810b6ed6>] vfs_write+0x75/0xb0
       [<810b7265>] SyS_write+0x44/0x77
       [<8142f8ee>] syscall_call+0x7/0xb

-> #1 (&tty->write_wait){-.....}:
       [<8104a942>] lock_acquire+0x92/0x101
       [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
       [<81046332>] __wake_up+0x15/0x3b
       [<811b8733>] tty_wakeup+0x49/0x51
       [<811c3568>] uart_write_wakeup+0x17/0x19
       [<811c5dc1>] serial8250_tx_chars+0xbc/0xfb
       [<811c5f28>] serial8250_handle_irq+0x54/0x6a
       [<811c5f57>] serial8250_default_handle_irq+0x19/0x1c
       [<811c56d8>] serial8250_interrupt+0x38/0x9e
       [<810510e7>] handle_irq_event_percpu+0x5f/0x1e2
       [<81051296>] handle_irq_event+0x2c/0x43
       [<81052cee>] handle_level_irq+0x57/0x80
       [<81002a72>] handle_irq+0x46/0x5c
       [<810027df>] do_IRQ+0x32/0x89
       [<8143036e>] common_interrupt+0x2e/0x33
       [<8142f23c>] _raw_spin_unlock_irqrestore+0x3f/0x49
       [<811c25a4>] uart_start+0x2d/0x32
       [<811c2c04>] uart_write+0xc7/0xd6
       [<811bc6f6>] n_tty_write+0xb8/0x35e
       [<811b9beb>] tty_write+0x163/0x1e4
       [<811b9cd9>] redirected_tty_write+0x6d/0x75
       [<810b6ed6>] vfs_write+0x75/0xb0
       [<810b7265>] SyS_write+0x44/0x77
       [<8142f8ee>] syscall_call+0x7/0xb

-> #0 (&port_lock_key){-.....}:
       [<8104a62d>] __lock_acquire+0x9ea/0xc6d
       [<8104a942>] lock_acquire+0x92/0x101
       [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
       [<811c60be>] serial8250_console_write+0x8c/0x10c
       [<8104e402>] call_console_drivers.constprop.31+0x87/0x118
       [<8104f5d5>] console_unlock+0x1d7/0x398
       [<8104fb70>] vprintk_emit+0x3da/0x3e4
       [<81425f76>] printk+0x17/0x19
       [<8105bfa0>] clockevents_program_min_delta+0x104/0x116
       [<8105c548>] clockevents_program_event+0xe7/0xf3
       [<8105cc1c>] tick_program_event+0x1e/0x23
       [<8103c43c>] hrtimer_force_reprogram+0x88/0x8f
       [<8103c49e>] __remove_hrtimer+0x5b/0x79
       [<8103cb21>] hrtimer_try_to_cancel+0x49/0x66
       [<8103cb4b>] hrtimer_cancel+0xd/0x18
       [<8107f102>] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30
       [<81080705>] task_clock_event_stop+0x20/0x64
       [<81080756>] task_clock_event_del+0xd/0xf
       [<81081350>] event_sched_out+0xab/0x11e
       [<810813e0>] group_sched_out+0x1d/0x66
       [<81081682>] ctx_sched_out+0xaf/0xbf
       [<81081e04>] __perf_event_task_sched_out+0x1ed/0x34f
       [<8142cacc>] __schedule+0x4c6/0x4cb
       [<8142cae0>] schedule+0xf/0x11
       [<8142f9a6>] work_resched+0x5/0x30

other info that might help us debug this:

Chain exists of:
  &port_lock_key --> &ctx->lock --> hrtimer_bases.lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----

 *** DEADLOCK ***

4 locks held by trinity-main/74:
 #0:  (&rq->lock){-.-.-.}, at: [<8142c6f3>] __schedule+0xed/0x4cb
 #1:  (&ctx->lock){......}, at: [<81081df3>] __perf_event_task_sched_out+0x1dc/0x34f
 #2:  (hrtimer_bases.lock){-.-...}, at: [<8103caeb>] hrtimer_try_to_cancel+0x13/0x66
 #3:  (console_lock){+.+...}, at: [<8104fb5d>] vprintk_emit+0x3c7/0x3e4

stack backtrace:
CPU: 0 PID: 74 Comm: trinity-main Not tainted 3.15.0-rc8-06195-g939f04b #2
 00000000 81c3a310 8b995c14 81426f69 8b995c44 81425a99 8161f671 8161f570
 8161f538 8161f559 8161f538 8b995c78 8b142bb0 00000004 8b142fdc 8b142bb0
 8b995ca8 8104a62d 8b142fac 000016f2 81c3a310 00000001 00000001 00000003
Call Trace:
 [<81426f69>] dump_stack+0x16/0x18
 [<81425a99>] print_circular_bug+0x18f/0x19c
 [<8104a62d>] __lock_acquire+0x9ea/0xc6d
 [<8104a942>] lock_acquire+0x92/0x101
 [<811c60be>] ? serial8250_console_write+0x8c/0x10c
 [<811c6032>] ? wait_for_xmitr+0x76/0x76
 [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
 [<811c60be>] ? serial8250_console_write+0x8c/0x10c
 [<811c60be>] serial8250_console_write+0x8c/0x10c
 [<8104af87>] ? lock_release+0x191/0x223
 [<811c6032>] ? wait_for_xmitr+0x76/0x76
 [<8104e402>] call_console_drivers.constprop.31+0x87/0x118
 [<8104f5d5>] console_unlock+0x1d7/0x398
 [<8104fb70>] vprintk_emit+0x3da/0x3e4
 [<81425f76>] printk+0x17/0x19
 [<8105bfa0>] clockevents_program_min_delta+0x104/0x116
 [<8105cc1c>] tick_program_event+0x1e/0x23
 [<8103c43c>] hrtimer_force_reprogram+0x88/0x8f
 [<8103c49e>] __remove_hrtimer+0x5b/0x79
 [<8103cb21>] hrtimer_try_to_cancel+0x49/0x66
 [<8103cb4b>] hrtimer_cancel+0xd/0x18
 [<8107f102>] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30
 [<81080705>] task_clock_event_stop+0x20/0x64
 [<81080756>] task_clock_event_del+0xd/0xf
 [<81081350>] event_sched_out+0xab/0x11e
 [<810813e0>] group_sched_out+0x1d/0x66
 [<81081682>] ctx_sched_out+0xaf/0xbf
 [<81081e04>] __perf_event_task_sched_out+0x1ed/0x34f
 [<8104416d>] ? __dequeue_entity+0x23/0x27
 [<81044505>] ? pick_next_task_fair+0xb1/0x120
 [<8142cacc>] __schedule+0x4c6/0x4cb
 [<81047574>] ? trace_hardirqs_off_caller+0xd7/0x108
 [<810475b0>] ? trace_hardirqs_off+0xb/0xd
 [<81056346>] ? rcu_irq_exit+0x64/0x77

Fix the problem by using printk_deferred() which does not call into the

Reported-by: Fengguang Wu <>
Signed-off-by: Jan Kara <>
Signed-off-by: Thomas Gleixner <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agosched_clock: Avoid corrupting hrtimer tree during suspend
Stephen Boyd [Thu, 24 Jul 2014 04:03:50 +0000 (21:03 -0700)]
sched_clock: Avoid corrupting hrtimer tree during suspend

commit f723aa1817dd8f4fe005aab52ba70c8ab0ef9457 upstream.

During suspend we call sched_clock_poll() to update the epoch and
accumulated time and reprogram the sched_clock_timer to fire
before the next wrap-around time. Unfortunately,
sched_clock_poll() doesn't restart the timer, instead it relies
on the hrtimer layer to do that and during suspend we aren't
calling that function from the hrtimer layer. Instead, we're
reprogramming the expires time while the hrtimer is enqueued,
which can cause the hrtimer tree to be corrupted. Furthermore, we
restart the timer during suspend but we update the epoch during
resume which seems counter-intuitive.

Let's fix this by saving the accumulated state and canceling the
timer during suspend. On resume we can update the epoch and
restart the timer similar to what we would do if we were starting
the clock for the first time.

Fixes: a08ca5d1089d "sched_clock: Use an hrtimer instead of timer"
Signed-off-by: Stephen Boyd <>
Signed-off-by: John Stultz <>
Cc: Ingo Molnar <>
Signed-off-by: Thomas Gleixner <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agoprintk: rename printk_sched to printk_deferred
John Stultz [Wed, 4 Jun 2014 23:11:40 +0000 (16:11 -0700)]
printk: rename printk_sched to printk_deferred

commit aac74dc495456412c4130a1167ce4beb6c1f0b38 upstream.

After learning we'll need some sort of deferred printk functionality in
the timekeeping core, Peter suggested we rename the printk_sched function
so it can be reused by needed subsystems.

This only changes the function name. No logic changes.

Signed-off-by: John Stultz <>
Reviewed-by: Steven Rostedt <>
Cc: Jan Kara <>
Cc: Peter Zijlstra <>
Cc: Jiri Bohac <>
Cc: Thomas Gleixner <>
Cc: Ingo Molnar <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agodm cache: fix race affecting dirty block count
Anssi Hannula [Fri, 1 Aug 2014 15:55:47 +0000 (11:55 -0400)]
dm cache: fix race affecting dirty block count

commit 44fa816bb778edbab6b6ddaaf24908dd6295937e upstream.

nr_dirty is updated without locking, causing it to drift so that it is
non-zero (either a small positive integer, or a very large one when an
underflow occurs) even when there are no actual dirty blocks.  This was
due to a race between the workqueue and map function accessing nr_dirty
in parallel without proper protection.

People were seeing under runs due to a race on increment/decrement of
nr_dirty, see:

Fix this by using an atomic_t for nr_dirty.

Signed-off-by: Anssi Hannula <>
Signed-off-by: Joe Thornber <>
Signed-off-by: Mike Snitzer <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agodm bufio: fully initialize shrinker
Greg Thelen [Thu, 31 Jul 2014 16:07:19 +0000 (09:07 -0700)]
dm bufio: fully initialize shrinker

commit d8c712ea471ce7a4fd1734ad2211adf8469ddddc upstream.

1d3d4437eae1 ("vmscan: per-node deferred work") added a flags field to
struct shrinker assuming that all shrinkers were zero filled.  The dm
bufio shrinker is not zero filled, which leaves arbitrary kmalloc() data
in flags.  So far the only defined flags bit is SHRINKER_NUMA_AWARE.
But there are proposed patches which add other bits to shrinker.flags
(e.g. memcg awareness).

Rather than simply initializing the shrinker, this patch uses kzalloc()
when allocating the dm_bufio_client to ensure that the embedded shrinker
and any other similar structures are zeroed.

This fixes theoretical over aggressive shrinking of dm bufio objects.
If the uninitialized dm_bufio_client.shrinker.flags contains
SHRINKER_NUMA_AWARE then shrink_slab() would call the dm shrinker for
each numa node rather than just once.  This has been broken since 3.12.

Signed-off-by: Greg Thelen <>
Acked-by: Mikulas Patocka <>
Signed-off-by: Mike Snitzer <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agoiio: buffer: Fix demux table creation
Lars-Peter Clausen [Thu, 17 Jul 2014 15:59:00 +0000 (16:59 +0100)]
iio: buffer: Fix demux table creation

commit 61bd55ce1667809f022be88da77db17add90ea4e upstream.

When creating the demux table we need to iterate over the selected scan mask for
the buffer to get the samples which should be copied to destination buffer.
Right now the code uses the mask which contains all active channels, which means
the demux table contains entries which causes it to copy all the samples from
source to destination buffer one by one without doing any demuxing.

Signed-off-by: Lars-Peter Clausen <>
Signed-off-by: Jonathan Cameron <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agoiio:bma180: Missing check for frequency fractional part
Peter Meerwald [Wed, 16 Jul 2014 18:32:00 +0000 (19:32 +0100)]
iio:bma180: Missing check for frequency fractional part

commit 9b2a4d35a6ceaf217be61ed8eb3c16986244f640 upstream.

val2 should be zero

This will make no difference for correct inputs but will reject
incorrect ones with a decimal part in the value written to the sysfs

Signed-off-by: Peter Meerwald <>
Cc: Oleksandr Kravchenko <>
Signed-off-by: Jonathan Cameron <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agoiio:bma180: Fix scale factors to report correct acceleration units
Peter Meerwald [Wed, 16 Jul 2014 18:32:00 +0000 (19:32 +0100)]
iio:bma180: Fix scale factors to report correct acceleration units

commit 381676d5e86596b11e22a62f196e192df6091373 upstream.

The userspace interface for acceleration sensors is documented as using
m/s^2 units [Documentation/ABI/testing/sysfs-bus-iio]

The fullscale raw values for the BMA80 corresponds to -/+ 1, 1.5, 2, etc G
depending on the selected mode.

The scale table was converting to G rather than m/s^2.
Change the scaling table to match the documented interface.

See commit 71702e6e, iio: mma8452: Use correct acceleration units,
for a related fix.

Signed-off-by: Peter Meerwald <>
Cc: Oleksandr Kravchenko <>
Signed-off-by: Jonathan Cameron <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agoACPI / PNP: Fix acpi_pnp_match()
Rafael J. Wysocki [Tue, 29 Jul 2014 22:23:09 +0000 (00:23 +0200)]
ACPI / PNP: Fix acpi_pnp_match()

commit b6328a07bd6b3d31b64f85864fe74f3b08c010ca upstream.

The acpi_pnp_match() function is used for finding the ACPI device
object that should be associated with the given PNP device.
Unfortunately, the check used by that function is not strict enough
and may cause success to be returned for a wrong ACPI device object.

To fix that, use the observation that the pointer to the ACPI
device object in question is already stored in the data field
in struct pnp_dev, so acpi_pnp_match() can simply use that
field to do its job.

This problem was uncovered in 3.14 by commit 202317a573b2 (ACPI / scan:
Add acpi_device objects for all device nodes in the namespace).

Fixes: 202317a573b2 (ACPI / scan: Add acpi_device objects for all device nodes in the namespace)
Reported-and-tested-by: Vinson Lee <>
Signed-off-by: Rafael J. Wysocki <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agostaging: vt6655: Fix disassociated messages every 10 seconds
Malcolm Priestley [Wed, 23 Jul 2014 20:35:12 +0000 (21:35 +0100)]
staging: vt6655: Fix disassociated messages every 10 seconds

commit 4aa0abed3a2a11b7d71ad560c1a3e7631c5a31cd upstream.

byReAssocCount is incremented every second resulting in
disassociated message being send every 10 seconds whether
connection or not.

byReAssocCount should only advance while eCommandState

Change existing scope to if condition.

Signed-off-by: Malcolm Priestley <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agomemcg: oom_notify use-after-free fix
Michal Hocko [Wed, 30 Jul 2014 23:08:33 +0000 (16:08 -0700)]
memcg: oom_notify use-after-free fix

commit 2bcf2e92c3918ce62ab4e934256e47e9a16d19c3 upstream.

Paul Furtado has reported the following GPF:

  general protection fault: 0000 [#1] SMP
  Modules linked in: ipv6 dm_mod xen_netfront coretemp hwmon x86_pkg_temp_thermal crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 microcode pcspkr ext4 jbd2 mbcache raid0 xen_blkfront
  CPU: 3 PID: 3062 Comm: java Not tainted 3.16.0-rc5 #1
  task: ffff8801cfe8f170 ti: ffff8801d2ec4000 task.ti: ffff8801d2ec4000
  RIP: e030:mem_cgroup_oom_synchronize+0x140/0x240
  RSP: e02b:ffff8801d2ec7d48  EFLAGS: 00010283
  RAX: 0000000000000001 RBX: ffff88009d633800 RCX: 000000000000000e
  RDX: fffffffffffffffe RSI: ffff88009d630200 RDI: ffff88009d630200
  RBP: ffff8801d2ec7da8 R08: 0000000000000012 R09: 00000000fffffffe
  R10: 0000000000000000 R11: 0000000000000000 R12: ffff88009d633800
  R13: ffff8801d2ec7d48 R14: dead000000100100 R15: ffff88009d633a30
  FS:  00007f1748bb4700(0000) GS:ffff8801def80000(0000) knlGS:0000000000000000
  CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 00007f4110300308 CR3: 00000000c05f7000 CR4: 0000000000002660
  Call Trace:
  Code: 44 00 00 48 89 df e8 40 ca ff ff 48 85 c0 49 89 c4 74 35 4c 8b b0 30 02 00 00 4c 8d b8 30 02 00 00 4d 39 fe 74 1b 0f 1f 44 00 00 <49> 8b 7e 10 be 01 00 00 00 e8 42 d2 04 00 4d 8b 36 4d 39 fe 75
  RIP  mem_cgroup_oom_synchronize+0x140/0x240

Commit fb2a6fc56be6 ("mm: memcg: rework and document OOM waiting and
wakeup") has moved mem_cgroup_oom_notify outside of memcg_oom_lock
assuming it is protected by the hierarchical OOM-lock.

Although this is true for the notification part the protection doesn't
cover unregistration of event which can happen in parallel now so
mem_cgroup_oom_notify can see already unlinked and/or freed

Fix this by using memcg_oom_lock also in mem_cgroup_oom_notify.


Fixes: fb2a6fc56be6 (mm: memcg: rework and document OOM waiting and wakeup)
Signed-off-by: Michal Hocko <>
Reported-by: Paul Furtado <>
Tested-by: Paul Furtado <>
Acked-by: Johannes Weiner <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agomm, thp: do not allow thp faults to avoid cpuset restrictions
David Rientjes [Wed, 30 Jul 2014 23:08:24 +0000 (16:08 -0700)]
mm, thp: do not allow thp faults to avoid cpuset restrictions

commit b104a35d32025ca740539db2808aa3385d0f30eb upstream.

The page allocator relies on __GFP_WAIT to determine if ALLOC_CPUSET
should be set in allocflags.  ALLOC_CPUSET controls if a page allocation
should be restricted only to the set of allowed cpuset mems.

Transparent hugepages clears __GFP_WAIT when defrag is disabled to prevent
the fault path from using memory compaction or direct reclaim.  Thus, it
is unfairly able to allocate outside of its cpuset mems restriction as a

This patch ensures that ALLOC_CPUSET is only cleared when the gfp mask is
truly GFP_ATOMIC by verifying it is also not a thp allocation.

Signed-off-by: David Rientjes <>
Reported-by: Alex Thorlton <>
Tested-by: Alex Thorlton <>
Cc: Bob Liu <>
Cc: Dave Hansen <>
Cc: Hedi Berriche <>
Cc: Hugh Dickins <>
Cc: Johannes Weiner <>
Cc: Kirill A. Shutemov <>
Cc: Mel Gorman <>
Cc: Rik van Riel <>
Cc: Srivatsa S. Bhat <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agomm/page-writeback.c: fix divide by zero in bdi_dirty_limits()
Maxim Patlasov [Wed, 30 Jul 2014 23:08:21 +0000 (16:08 -0700)]
mm/page-writeback.c: fix divide by zero in bdi_dirty_limits()

commit f6789593d5cea42a4ecb1cbeab6a23ade5ebbba7 upstream.

Under memory pressure, it is possible for dirty_thresh, calculated by
global_dirty_limits() in balance_dirty_pages(), to equal zero.  Then, if
strictlimit is true, bdi_dirty_limits() tries to resolve the proportion:

  bdi_bg_thresh : bdi_thresh = background_thresh : dirty_thresh

by dividing by zero.

Signed-off-by: Maxim Patlasov <>
Acked-by: Rik van Riel <>
Cc: Michal Hocko <>
Cc: KOSAKI Motohiro <>
Cc: Wu Fengguang <>
Cc: Johannes Weiner <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agoscsi: handle flush errors properly
James Bottomley [Thu, 3 Jul 2014 17:17:34 +0000 (19:17 +0200)]
scsi: handle flush errors properly

commit 89fb4cd1f717a871ef79fa7debbe840e3225cd54 upstream.

Flush commands don't transfer data and thus need to be special cased
in the I/O completion handler so that we can propagate errors to
the block layer and filesystem.

Signed-off-by: James Bottomley <>
Reported-by: Steven Haber <>
Tested-by: Steven Haber <>
Reviewed-by: Martin K. Petersen <>
Signed-off-by: Christoph Hellwig <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agorapidio/tsi721_dma: fix failure to obtain transaction descriptor
Alexandre Bounine [Wed, 30 Jul 2014 23:08:26 +0000 (16:08 -0700)]
rapidio/tsi721_dma: fix failure to obtain transaction descriptor

commit 0193ed8225e1a79ed64632106ec3cc81798cb13c upstream.

This is a bug fix for the situation when function tsi721_desc_get() fails
to obtain a free transaction descriptor.

The bug usually results in a memory access crash dump when data transfer
scatter-gather list has more entries than size of hardware buffer
descriptors ring.  This fix ensures that error is properly returned to a
caller instead of an invalid entry.

This patch is applicable to kernel versions starting from v3.5.

Signed-off-by: Alexandre Bounine <>
Cc: Matt Porter <>
Cc: Andre van Herk <>
Cc: Stef van Os <>
Cc: Vinod Koul <>
Cc: Dan Williams <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agocfg80211: fix mic_failure tracing
Eliad Peller [Thu, 17 Jul 2014 12:00:56 +0000 (15:00 +0300)]
cfg80211: fix mic_failure tracing

commit 8c26d458394be44e135d1c6bd4557e1c4e1a0535 upstream.

tsc can be NULL (mac80211 currently always passes NULL),
resulting in NULL-dereference. check before copying it.

Signed-off-by: Eliad Peller <>
Signed-off-by: Emmanuel Grumbach <>
Signed-off-by: Johannes Berg <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agoath9k: fix aggregation session lockup
Felix Fietkau [Wed, 23 Jul 2014 13:40:54 +0000 (15:40 +0200)]
ath9k: fix aggregation session lockup

commit c01fac1c77a00227f706a1654317023e3f4ac7f0 upstream.

If an aggregation session fails, frames still end up in the driver queue
with IEEE80211_TX_CTL_AMPDU set.
This causes tx for the affected station/tid to stall, since
ath_tx_get_tid_subframe returning packets to send.

Fix this by clearing IEEE80211_TX_CTL_AMPDU as long as no aggregation
session is running.

Reported-by: Antonio Quartulli <>
Signed-off-by: Felix Fietkau <>
Signed-off-by: John W. Linville <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agoARM: 8115/1: LPAE: reduce damage caused by idmap to virtual memory layout
Konstantin Khlebnikov [Fri, 25 Jul 2014 08:17:12 +0000 (09:17 +0100)]
ARM: 8115/1: LPAE: reduce damage caused by idmap to virtual memory layout

commit 811a2407a3cf7bbd027fbe92d73416f17485a3d8 upstream.

On LPAE, each level 1 (pgd) page table entry maps 1GiB, and the level 2
(pmd) entries map 2MiB.

When the identity mapping is created on LPAE, the pgd pointers are copied
from the swapper_pg_dir.  If we find that we need to modify the contents
of a pmd, we allocate a new empty pmd table and insert it into the
appropriate 1GB slot, before then filling it with the identity mapping.

However, if the 1GB slot covers the kernel lowmem mappings, we obliterate
those mappings.

When replacing a PMD, first copy the old PMD contents to the new PMD, so
that we preserve the existing mappings, particularly the mappings of the
kernel itself.

[rewrote commit message and added code comment -- rmk]

Fixes: ae2de101739c ("ARM: LPAE: Add identity mapping support for the 3-level page table format")
Signed-off-by: Konstantin Khlebnikov <>
Signed-off-by: Russell King <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agoARM: fix alignment of keystone page table fixup
Russell King [Tue, 29 Jul 2014 08:24:47 +0000 (09:24 +0100)]
ARM: fix alignment of keystone page table fixup

commit 823a19cd3b91b0729d7417f1848413846be61712 upstream.

If init_mm.brk is not section aligned, the LPAE fixup code will miss
updating the final PMD.  Fix this by aligning map_end.

Fixes: a77e0c7b2774 ("ARM: mm: Recreate kernel mappings in early_paging_init()")
Signed-off-by: Russell King <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agoARM: dts: fix L2 address in Hi3620
Haojian Zhuang [Wed, 2 Apr 2014 13:31:50 +0000 (21:31 +0800)]
ARM: dts: fix L2 address in Hi3620

commit 28c9770bcbd2b6dbab99669825a2f8fa69e6d35b upstream.

Fix the address of L2 controler register in hi3620 SoC.
This has been wrong from the point that the file was merged
in v3.14.

Signed-off-by: Haojian Zhuang <>
Acked-by: Wei Xu <>
Signed-off-by: Arnd Bergmann <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agocrypto: af_alg - properly label AF_ALG socket
Milan Broz [Tue, 29 Jul 2014 18:41:09 +0000 (18:41 +0000)]
crypto: af_alg - properly label AF_ALG socket

commit 4c63f83c2c2e16a13ce274ee678e28246bd33645 upstream.

Th AF_ALG socket was missing a security label (e.g. SELinux)
which means that socket was in "unlabeled" state.

This was recently demonstrated in the cryptsetup package
(cryptsetup v1.6.5 and later.)

This patch clones the sock's label from the parent sock
and resolves the issue (similar to AF_BLUETOOTH protocol family).

Signed-off-by: Milan Broz <>
Acked-by: Paul Moore <>
Signed-off-by: Herbert Xu <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agocrypto: arm-aes - fix encryption of unaligned data
Mikulas Patocka [Fri, 25 Jul 2014 23:42:30 +0000 (19:42 -0400)]
crypto: arm-aes - fix encryption of unaligned data

commit f3c400ef473e00c680ea713a66196b05870b3710 upstream.

Fix the same alignment bug as in arm64 - we need to pass residue
unprocessed bytes as the last argument to blkcipher_walk_done.

Signed-off-by: Mikulas Patocka <>
Acked-by: Ard Biesheuvel <>
Signed-off-by: Herbert Xu <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agoLinux 3.14.15 v3.14.15
Greg Kroah-Hartman [Thu, 31 Jul 2014 21:51:43 +0000 (14:51 -0700)]
Linux 3.14.15

6 years agoplatform_get_irq: Revert to platform_get_resource if of_irq_get fails
Guenter Roeck [Tue, 17 Jun 2014 22:51:02 +0000 (15:51 -0700)]
platform_get_irq: Revert to platform_get_resource if of_irq_get fails

commit aff008ad813c7cf3cfe7b532e7ba2c526c136f22 upstream.

Commits 9ec36ca (of/irq: do irq resolution in platform_get_irq)
and ad69674 (of/irq: do irq resolution in platform_get_irq_byname)
change the semantics of platform_get_irq and platform_get_irq_byname
to always rely on devicetree information if devicetree is enabled
and if a devicetree node is attached to the device. The functions
now return an error if the devicetree data does not include interrupt
information, even if the information is available as platform resource

This causes mfd client drivers to fail if the interrupt number is
passed via platform resources. Therefore, if of_irq_get fails, try
platform_get_resource as method of last resort. This restores the
original functionality for drivers depending on platform resources
to get irq information.

Cc: Russell King <>
Cc: Tony Lindgren <>
Cc: Grant Likely <>
Cc: Grygorii Strashko <>
Signed-off-by: Guenter Roeck <>
Acked-by: Rob Herring <>
[ Guenter Roeck: backported to 3.15 ]
Signed-off-by: Guenter Roeck <>
6 years agonl80211: move set_qos_map command into split state
Johannes Berg [Tue, 10 Jun 2014 12:06:25 +0000 (14:06 +0200)]
nl80211: move set_qos_map command into split state

commit 02df00eb0019e7d15a1fcddebe4d020226c1ccda upstream.

The non-split wiphy state shouldn't be increased in size
so move the new set_qos_map command into the split if

Fixes: fa9ffc745610 ("cfg80211: Add support for QoS mapping")
Reviewed-by: Emmanuel Grumbach <>
Signed-off-by: Johannes Berg <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agox86/efi: Include a .bss section within the PE/COFF headers
Michael Brown [Thu, 10 Jul 2014 11:26:20 +0000 (12:26 +0100)]
x86/efi: Include a .bss section within the PE/COFF headers

commit c7fb93ec51d462ec3540a729ba446663c26a0505 upstream.

The PE/COFF headers currently describe only the initialised-data
portions of the image, and result in no space being allocated for the
uninitialised-data portions.  Consequently, the EFI boot stub will end
up overwriting unexpected areas of memory, with unpredictable results.

Fix by including a .bss section in the PE/COFF headers (functionally
equivalent to the init_size field in the bzImage header).

Signed-off-by: Michael Brown <>
Cc: Thomas Bächler <>
Cc: Josh Boyer <>
Signed-off-by: Matt Fleming <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agoFix gcc-4.9.0 miscompilation of load_balance() in scheduler
Linus Torvalds [Sat, 26 Jul 2014 21:52:01 +0000 (14:52 -0700)]
Fix gcc-4.9.0 miscompilation of load_balance() in scheduler

commit 2062afb4f804afef61cbe62a30cac9a46e58e067 upstream.

Michel Dänzer and a couple of other people reported inexplicable random
oopses in the scheduler, and the cause turns out to be gcc mis-compiling
the load_balance() function when debugging is enabled.  The gcc bug
apparently goes back to gcc-4.5, but slight optimization changes means
that it now showed up as a problem in 4.9.0 and 4.9.1.

The instruction scheduling problem causes gcc to schedule a spill
operation to before the stack frame has been created, which in turn can
corrupt the spilled value if an interrupt comes in.  There may be other
effects of this bug too, but that's the code generation problem seen in
Michel's case.

This is fixed in current gcc HEAD, but the workaround as suggested by
Markus Trippelsdorf is pretty simple: use -fno-var-tracking-assignments
when compiling the kernel, which disables the gcc code that causes the
problem.  This can result in slightly worse debug information for
variable accesses, but that is infinitely preferable to actual code
generation problems.

Doing this unconditionally (not just for CONFIG_DEBUG_INFO) also allows
non-debug builds to verify that the debug build would be identical: we
can do

    export GCC_COMPARE_DEBUG=1

to make gcc internally verify that the result of the build is
independent of the "-g" flag (it will make the compiler build everything
twice, toggling the debug flag, and compare the results).

Without the "-fno-var-tracking-assignments" option, the build would fail
(even with 4.8.3 that didn't show the actual stack frame bug) with a gcc
compare failure.

See also gcc bugzilla:

Reported-by: Michel Dänzer <>
Suggested-by: Markus Trippelsdorf <>
Cc: Jakub Jelinek <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agomm: hugetlb: fix copy_hugetlb_page_range()
Naoya Horiguchi [Wed, 23 Jul 2014 21:00:19 +0000 (14:00 -0700)]
mm: hugetlb: fix copy_hugetlb_page_range()

commit 0253d634e0803a8376a0d88efee0bf523d8673f9 upstream.

Commit 4a705fef9862 ("hugetlb: fix copy_hugetlb_page_range() to handle
migration/hwpoisoned entry") changed the order of
huge_ptep_set_wrprotect() and huge_ptep_get(), which leads to breakage
in some workloads like hugepage-backed heap allocation via libhugetlbfs.
This patch fixes it.

The test program for the problem is shown below:

  $ cat heap.c
  #include <unistd.h>
  #include <stdlib.h>
  #include <string.h>

  #define HPS 0x200000

  int main() {
   int i;
   char *p = malloc(HPS);
   memset(p, '1', HPS);
   for (i = 0; i < 5; i++) {
   if (!fork()) {
   memset(p, '2', HPS);
   p = malloc(HPS);
   memset(p, '3', HPS);
   return 0;
   return 0;

  $ export HUGETLB_MORECORE=yes ; export HUGETLB_NO_PREFAULT= ; hugectl --heap ./heap

Fixes 4a705fef9862 ("hugetlb: fix copy_hugetlb_page_range() to handle
migration/hwpoisoned entry"), so is applicable to -stable kernels which
include it.

Signed-off-by: Naoya Horiguchi <>
Reported-by: Guillaume Morin <>
Suggested-by: Guillaume Morin <>
Acked-by: Hugh Dickins <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agodrm/radeon: fix cut and paste issue for hawaii.
Jerome Glisse [Thu, 24 Jul 2014 20:34:17 +0000 (16:34 -0400)]
drm/radeon: fix cut and paste issue for hawaii.

commit 1b2c4869d8247f9e202fa8a73777c34adc62d409 upstream.

This is a halfway fix for hawaii acceleration. More fixes to come
but hopefully isolated to userspace.

Signed-off-by: Jérôme Glisse <>
Signed-off-by: Dave Airlie <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agodrm/radeon: fix irq ring buffer overflow handling
Christian König [Wed, 23 Jul 2014 07:47:58 +0000 (09:47 +0200)]
drm/radeon: fix irq ring buffer overflow handling

commit e8c214d22e76dd0ead38f97f8d2dc09aac70d651 upstream.

We must mask out the overflow bit as well, otherwise
the wptr will never match the rptr again and the interrupt
handler will loop forever.

Signed-off-by: Christian König <>
Signed-off-by: Alex Deucher <>
Reviewed-by: Michel Dänzer <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agox86_32, entry: Store badsys error code in %eax
Sven Wegener [Tue, 22 Jul 2014 08:26:06 +0000 (10:26 +0200)]
x86_32, entry: Store badsys error code in %eax

commit 8142b215501f8b291a108a202b3a053a265b03dd upstream.

Commit 554086d ("x86_32, entry: Do syscall exit work on badsys
(CVE-2014-4508)") introduced a regression in the x86_32 syscall entry
code, resulting in syscall() not returning proper errors for undefined
syscalls on CPUs supporting the sysenter feature.

The following code:

> int result = syscall(666);
> printf("result=%d errno=%d error=%s\n", result, errno, strerror(errno));

results in:

> result=666 errno=0 error=Success

Obviously, the syscall return value is the called syscall number, but it
should have been an ENOSYS error. When run under ptrace it behaves
correctly, which makes it hard to debug in the wild:

> result=-1 errno=38 error=Function not implemented

The %eax register is the return value register. For debugging via ptrace
the syscall entry code stores the complete register context on the
stack. The badsys handlers only store the ENOSYS error code in the
ptrace register set and do not set %eax like a regular syscall handler
would. The old resume_userspace call chain contains code that clobbers
%eax and it restores %eax from the ptrace registers afterwards. The same
goes for the ptrace-enabled call chain. When ptrace is not used, the
syscall return value is the passed-in syscall number from the untouched
%eax register.

Use %eax as the return value register in syscall_badsys and
sysenter_badsys, like a real syscall handler does, and have the caller
push the value onto the stack for ptrace access.

Signed-off-by: Sven Wegener <>
Reviewed-and-tested-by: Andy Lutomirski <>
Signed-off-by: H. Peter Anvin <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agofs: umount on symlink leaks mnt count
Vasily Averin [Mon, 21 Jul 2014 08:30:23 +0000 (12:30 +0400)]
fs: umount on symlink leaks mnt count

commit 295dc39d941dc2ae53d5c170365af4c9d5c16212 upstream.

Currently umount on symlink blocks following umount:

/vz is separate mount

# ls /vz/ -al | grep test
drwxr-xr-x.  2 root root       4096 Jul 19 01:14 testdir
lrwxrwxrwx.  1 root root         11 Jul 19 01:16 testlink -> /vz/testdir
# umount -l /vz/testlink
umount: /vz/testlink: not mounted (expected)

# lsof /vz
# umount /vz
umount: /vz: device is busy. (unexpected)

In this case mountpoint_last() gets an extra refcount on path->mnt

Signed-off-by: Vasily Averin <>
Acked-by: Ian Kent <>
Acked-by: Jeff Layton <>
Signed-off-by: Christoph Hellwig <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agoparport: fix menu breakage
Randy Dunlap [Sat, 26 Jul 2014 00:07:47 +0000 (17:07 -0700)]
parport: fix menu breakage

commit edffe1b626b39bd7121691dfdecb548431003bbb upstream.

Do not split the PARPORT-related symbols with the new kconfig
symbol ARCH_MIGHT_HAVE_PC_PARPORT. The split was causing incorrect
display of these symbols -- they were not being displayed together
as they should be.

Fixes: d90c3eb31535 "Kconfig cleanup (PARPORT_PC dependencies)"

Signed-off-by: Randy Dunlap <>
Cc: Mark Salter <>
Cc: Ingo Molnar <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agohwmon: (smsc47m192) Fix temperature limit and vrm write operations
Guenter Roeck [Fri, 18 Jul 2014 14:31:18 +0000 (07:31 -0700)]
hwmon: (smsc47m192) Fix temperature limit and vrm write operations

commit 043572d5444116b9d9ad8ae763cf069e7accbc30 upstream.

Temperature limit clamps are applied after converting the temperature
from milli-degrees C to degrees C, so either the clamp limit needs
to be specified in degrees C, not milli-degrees C, or clamping must
happen before converting to degrees C. Use the latter method to avoid

vrm is an u8, so the written value needs to be limited to [0, 255].

Cc: Axel Lin <>
Signed-off-by: Guenter Roeck <>
Reviewed-by: Jean Delvare <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agoparisc: Remove SA_RESTORER define
John David Anglin [Wed, 23 Jul 2014 23:44:12 +0000 (19:44 -0400)]
parisc: Remove SA_RESTORER define

commit 20dbea494543aefaace874cc3ec93a39b94b1ec4 upstream.

The sa_restorer field in struct sigaction is obsolete and no longer in
the parisc implementation.  However, the core code assumes the field is
present if SA_RESTORER is defined. So, the define needs to be removed.

Signed-off-by: John David Anglin <>
Signed-off-by: Helge Deller <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agocoredump: fix the setting of PF_DUMPCORE
Silesh C V [Wed, 23 Jul 2014 20:59:59 +0000 (13:59 -0700)]
coredump: fix the setting of PF_DUMPCORE

commit aed8adb7688d5744cb484226820163af31d2499a upstream.

Commit 079148b919d0 ("coredump: factor out the setting of PF_DUMPCORE")
cleaned up the setting of PF_DUMPCORE by removing it from all the
linux_binfmt->core_dump() and moving it to zap_threads().But this ended
up clearing all the previously set flags.  This causes issues during
core generation when tsk->flags is checked again (eg.  for PF_USED_MATH
to dump floating point registers).  Fix this.

Signed-off-by: Silesh C V <>
Acked-by: Oleg Nesterov <>
Cc: Mandeep Singh Baines <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agoInput: fix defuzzing logic
Dmitry Torokhov [Sat, 19 Jul 2014 23:30:31 +0000 (16:30 -0700)]
Input: fix defuzzing logic

commit 50c5d36dab930b1f1b1e3348b8608aa8b9ee7610 upstream.

We attempt to remove noise from coordinates reported by devices in
input_handle_abs_event(), unfortunately, unless we were dropping the
event altogether, we were ignoring the adjusted value and were passing
on the original value instead.

Reviewed-by: Andrew de los Reyes <>
Reviewed-by: Benson Leung <>
Reviewed-by: David Herrmann <>
Reviewed-by: Henrik Rydberg <>
Signed-off-by: Dmitry Torokhov <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agoInput: synaptics - add min/max quirk for pnp-id LEN2002 (Edge E531)
Hans de Goede [Tue, 15 Jul 2014 00:12:21 +0000 (17:12 -0700)]
Input: synaptics - add min/max quirk for pnp-id LEN2002 (Edge E531)

commit e76aed9da7189eeb41b9856552ce5721181e8e8d upstream.

Signed-off-by: Hans de Goede <>
Signed-off-by: Dmitry Torokhov <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agoslab_common: fix the check for duplicate slab names
Mikulas Patocka [Tue, 4 Mar 2014 22:13:47 +0000 (17:13 -0500)]
slab_common: fix the check for duplicate slab names

commit 694617474e33b8603fc76e090ed7d09376514b1a upstream.

The patch 3e374919b314f20e2a04f641ebc1093d758f66a4 is supposed to fix the
problem where kmem_cache_create incorrectly reports duplicate cache name
and fails. The problem is described in the header of that patch.

However, the patch doesn't really fix the problem because of these

* the logic to test for debugging is reversed. It was intended to perform
  the check only if slub debugging is enabled (which implies that caches
  with the same parameters are not merged). Therefore, there should be
  #if !defined(CONFIG_SLUB) || defined(CONFIG_SLUB_DEBUG_ON)
  The current code has the condition reversed and performs the test if
  debugging is disabled.

* slub debugging may be enabled or disabled based on kernel command line,
  CONFIG_SLUB_DEBUG_ON is just the default settings. Therefore the test
  based on definition of CONFIG_SLUB_DEBUG_ON is unreliable.

This patch fixes the problem by removing the test
"!defined(CONFIG_SLUB_DEBUG_ON)". Therefore, duplicate names are never
checked if the SLUB allocator is used.

Note to stable kernel maintainers: when backporint this patch, please
backport also the patch 3e374919b314f20e2a04f641ebc1093d758f66a4.

Acked-by: David Rientjes <>
Acked-by: Christoph Lameter <>
Signed-off-by: Mikulas Patocka <>
Signed-off-by: Pekka Enberg <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agotracing: Fix wraparound problems in "uptime" trace clock
Tony Luck [Fri, 18 Jul 2014 18:43:01 +0000 (11:43 -0700)]
tracing: Fix wraparound problems in "uptime" trace clock

commit 58d4e21e50ff3cc57910a8abc20d7e14375d2f61 upstream.

The "uptime" trace clock added in:

    commit 8aacf017b065a805d27467843490c976835eb4a5
    tracing: Add "uptime" trace clock that uses jiffies

has wraparound problems when the system has been up more
than 1 hour 11 minutes and 34 seconds. It converts jiffies
to nanoseconds using:
        (u64)jiffies_to_usecs(jiffy) * 1000ULL
but since jiffies_to_usecs() only returns a 32-bit value, it
truncates at 2^32 microseconds.  An additional problem on 32-bit
systems is that the argument is "unsigned long", so fixing the
return value only helps until 2^32 jiffies (49.7 days on a HZ=1000

Avoid these problems by using jiffies_64 as our basis, and
not converting to nanoseconds (we do convert to clock_t because
user facing API must not be dependent on internal kernel
HZ values).

Fixes: 8aacf017b065 "tracing: Add "uptime" trace clock that uses jiffies"
Signed-off-by: Tony Luck <>
Signed-off-by: Steven Rostedt <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agoblkcg: don't call into policy draining if root_blkg is already gone
Tejun Heo [Sat, 5 Jul 2014 22:43:21 +0000 (18:43 -0400)]
blkcg: don't call into policy draining if root_blkg is already gone

commit 0b462c89e31f7eb6789713437eb551833ee16ff3 upstream.

While a queue is being destroyed, all the blkgs are destroyed and its
->root_blkg pointer is set to NULL.  If someone else starts to drain
while the queue is in this state, the following oops happens.

  NULL pointer dereference at 0000000000000028
  IP: [<ffffffff8144e944>] blk_throtl_drain+0x84/0x230
  PGD e4a1067 PUD b773067 PMD 0
  Modules linked in: cfq_iosched(-) [last unloaded: cfq_iosched]
  CPU: 1 PID: 537 Comm: bash Not tainted 3.16.0-rc3-work+ #2
  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  task: ffff88000e222250 ti: ffff88000efd4000 task.ti: ffff88000efd4000
  RIP: 0010:[<ffffffff8144e944>]  [<ffffffff8144e944>] blk_throtl_drain+0x84/0x230
  RSP: 0018:ffff88000efd7bf0  EFLAGS: 00010046
  RAX: 0000000000000000 RBX: ffff880015091450 RCX: 0000000000000001
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
  RBP: ffff88000efd7c10 R08: 0000000000000000 R09: 0000000000000001
  R10: ffff88000e222250 R11: 0000000000000000 R12: ffff880015091450
  R13: ffff880015092e00 R14: ffff880015091d70 R15: ffff88001508fc28
  FS:  00007f1332650740(0000) GS:ffff88001fa80000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 0000000000000028 CR3: 0000000009446000 CR4: 00000000000006e0
   ffffffff8144e8f6 ffff880015091450 0000000000000000 ffff880015091d80
   ffff88000efd7c28 ffffffff8144ae2f ffff880015091450 ffff88000efd7c58
   ffffffff81427641 ffff880015091450 ffffffff82401f00 ffff880015091450
  Call Trace:
   [<ffffffff8144ae2f>] blkcg_drain_queue+0x1f/0x60
   [<ffffffff81427641>] __blk_drain_queue+0x71/0x180
   [<ffffffff81429b3e>] blk_queue_bypass_start+0x6e/0xb0
   [<ffffffff814498b8>] blkcg_deactivate_policy+0x38/0x120
   [<ffffffff8144ec44>] blk_throtl_exit+0x34/0x50
   [<ffffffff8144aea5>] blkcg_exit_queue+0x35/0x40
   [<ffffffff8142d476>] blk_release_queue+0x26/0xd0
   [<ffffffff81454968>] kobject_cleanup+0x38/0x70
   [<ffffffff81454848>] kobject_put+0x28/0x60
   [<ffffffff81427505>] blk_put_queue+0x15/0x20
   [<ffffffff817d07bb>] scsi_device_dev_release_usercontext+0x16b/0x1c0
   [<ffffffff810bc339>] execute_in_process_context+0x89/0xa0
   [<ffffffff817d064c>] scsi_device_dev_release+0x1c/0x20
   [<ffffffff817930e2>] device_release+0x32/0xa0
   [<ffffffff81454968>] kobject_cleanup+0x38/0x70
   [<ffffffff81454848>] kobject_put+0x28/0x60
   [<ffffffff817934d7>] put_device+0x17/0x20
   [<ffffffff817d11b9>] __scsi_remove_device+0xa9/0xe0
   [<ffffffff817d121b>] scsi_remove_device+0x2b/0x40
   [<ffffffff817d1257>] sdev_store_delete+0x27/0x30
   [<ffffffff81792ca8>] dev_attr_store+0x18/0x30
   [<ffffffff8126f75e>] sysfs_kf_write+0x3e/0x50
   [<ffffffff8126ea87>] kernfs_fop_write+0xe7/0x170
   [<ffffffff811f5e9f>] vfs_write+0xaf/0x1d0
   [<ffffffff811f69bd>] SyS_write+0x4d/0xc0
   [<ffffffff81d24692>] system_call_fastpath+0x16/0x1b

776687bce42b ("block, blk-mq: draining can't be skipped even if
bypass_depth was non-zero") made it easier to trigger this bug by
making blk_queue_bypass_start() drain even when it loses the first
bypass test to blk_cleanup_queue(); however, the bug has always been
there even before the commit as blk_queue_bypass_start() could race
against queue destruction, win the initial bypass test but perform the
actual draining after blk_cleanup_queue() already destroyed all blkgs.

Fix it by skippping calling into policy draining if all the blkgs are
already gone.

Signed-off-by: Tejun Heo <>
Reported-by: Shirish Pargaonkar <>
Reported-by: Sasha Levin <>
Reported-by: Jet Chen <>
Tested-by: Shirish Pargaonkar <>
Signed-off-by: Jens Axboe <>
Signed-off-by: Greg Kroah-Hartman <>
6 years agoahci: add support for the Promise FastTrak TX8660 SATA HBA (ahci mode)
Romain Degez [Fri, 11 Jul 2014 16:08:13 +0000 (18:08 +0200)]
ahci: add support for the Promise FastTrak TX8660 SATA HBA (ahci mode)

commit b32bfc06aefab61acc872dec3222624e6cd867ed upstream.

Add support of the Promise FastTrak TX8660 SATA HBA in ahci mode by
registering the board in the ahci_pci_tbl[].

Note: this HBA also provide a hardware RAID mode when activated in
BIOS but specific drivers from the manufacturer are required in this

Signed-off-by: Romain Degez <>
Tested-by: Romain Degez <>
Signed-off-by: Tejun Heo <>
Signed-off-by: Greg Kroah-Hartman <>