Skip to content

Conversation

@torrorosso
Copy link

gcc4.8 fixes

@M1cha
Copy link
Owner

M1cha commented Feb 18, 2014

u learned fast :)

M1cha pushed a commit that referenced this pull request Feb 19, 2014
24 core Intel box's first exposure to 3.0.12-rt30-rc3 didn't go well.

[   27.104159] i7300_idle: loaded v1.55
[   27.104192] BUG: scheduling while atomic: swapper/2/0/0x00000002
[   27.104309] Pid: 0, comm: swapper/2 Tainted: G           N  3.0.12-rt30-rc3-rt #1
[   27.104317] Call Trace:
[   27.104338]  [<ffffffff810046a5>] dump_trace+0x85/0x2e0
[   27.104372]  [<ffffffff8144eb00>] thread_return+0x12b/0x30b
[   27.104381]  [<ffffffff8144f1b9>] schedule+0x29/0xb0
[   27.104389]  [<ffffffff814506e5>] rt_spin_lock_slowlock+0xc5/0x240
[   27.104401]  [<ffffffffa01f818f>] i7300_idle_notifier+0x3f/0x360 [i7300_idle]
[   27.104415]  [<ffffffff814546c7>] notifier_call_chain+0x37/0x70
[   27.104426]  [<ffffffff81454748>] __atomic_notifier_call_chain+0x48/0x70
[   27.104439]  [<ffffffff81001a39>] cpu_idle+0x89/0xb0
[   27.104449] bad: scheduling from the idle thread!

This lock is taken from interrupt disabled context in the guts of
idle. Convert it to a raw_spinlock.

Signed-off-by: Mike Galbraith <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Andy Henroid <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
M1cha pushed a commit that referenced this pull request Feb 19, 2014
Currently request_irq() is called prior to fec_enet_init() and fec_ptp_init(),
which causes the following crash on a mx53qsb:

Unable to handle kernel NULL pointer dereference at virtual address 00000002
pgd = 80004000
[00000002] *pgd=00000000
Internal error: Oops: 5 [#1] SMP ARM
Modules linked in:
CPU: 0    Not tainted  (3.8.0-rc7-next-20130215+ #346)
PC is at fec_enet_interrupt+0xd0/0x348
LR is at fec_enet_interrupt+0xb8/0x348
pc : [<80372b7c>]    lr : [<80372b64>]    psr: 60000193
sp : df855c20  ip : df855c20  fp : df855c74
r10: 00000516  r9 : 1c000000  r8 : 00000000
r7 : 00000000  r6 : 00000000  r5 : 00000000  r4 : df9b7800
r3 : df9b7df4  r2 : 00000000  r1 : 00000000  r0 : df9b7d34

Ensure that such initialization functions are called prior to requesting the
interrupts, so that all necessary the data structures are in place when the
irqs occur.

Signed-off-by: Fabio Estevam <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
M1cha pushed a commit that referenced this pull request Feb 19, 2014
bd_mutex and lo_ctl_mutex can be held in different order.

Path #1:

blkdev_open
 blkdev_get
  __blkdev_get (hold bd_mutex)
   lo_open (hold lo_ctl_mutex)

Path #2:

blkdev_ioctl
 lo_ioctl (hold lo_ctl_mutex)
  lo_set_capacity (hold bd_mutex)

Lockdep does not report it, because path #2 actually holds a subclass of
lo_ctl_mutex.  This subclass seems creep into the code by mistake.  The
patch author actually just mentioned it in the changelog, see commit
f028f3b ("loop: fix circular locking in loop_clr_fd()"), also see:

	http://marc.info/?l=linux-kernel&m=123806169129727&w=2

Path #2 hold bd_mutex to call bd_set_size(), I've protected it
with i_mutex in a previous patch, so drop bd_mutex at this site.

Signed-off-by: Guo Chao <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Guo Chao <[email protected]>
Cc: M. Hindess <[email protected]>
Cc: Nikanth Karthikesan <[email protected]>
Cc: Jens Axboe <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
M1cha pushed a commit that referenced this pull request Feb 19, 2014
usemap could also be allocated as compound pages.  Should also consider
compound pages when freeing memmap.

If we don't fix it, there could be problems when we free vmemmap
pagetables which are stored in compound pages.  The old pagetables will
not be freed properly, and when we add the memory again, no new
pagetable will be created.  And the old pagetable entry is used, than
the kernel will panic.

The call trace is like the following:

  BUG: unable to handle kernel paging request at ffffea0040000000
  IP: [<ffffffff816a483f>] sparse_add_one_section+0xef/0x166
  PGD 7ff7d4067 PUD 78e035067 PMD 78e11d067 PTE 0
  Oops: 0002 [#1] SMP
  Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge stp llc sunrpc binfmt_misc dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crc32c_intel microcode pcspkr sg lpc_ich mfd_core i2c_i801 i2c_core i7core_edac edac_core ioatdma e1000e igb dca ptp pps_core sd_mod crc_t10dif megaraid_sas mptsas mptscsih mptbase scsi_transport_sas scsi_mod
  CPU 0
  Pid: 4, comm: kworker/0:0 Tainted: G        W 3.8.0-rc3-phy-hot-remove+ #3 FUJITSU-SV PRIMEQUEST 1800E/SB
  RIP: 0010:[<ffffffff816a483f>]  [<ffffffff816a483f>] sparse_add_one_section+0xef/0x166
  RSP: 0018:ffff8807bdcb35d8  EFLAGS: 00010006
  RAX: 0000000000000000 RBX: 0000000000000200 RCX: 0000000000200000
  RDX: ffff88078df01148 RSI: 0000000000000282 RDI: ffffea0040000000
  RBP: ffff8807bdcb3618 R08: 4cf05005b019467a R09: 0cd98fa09631467a
  R10: 0000000000000000 R11: 0000000000030e20 R12: 0000000000008000
  R13: ffffea0040000000 R14: ffff88078df66248 R15: ffff88078ea13b10
  FS:  0000000000000000(0000) GS:ffff8807c1a00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: ffffea0040000000 CR3: 0000000001c0c000 CR4: 00000000000007f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process kworker/0:0 (pid: 4, threadinfo ffff8807bdcb2000, task ffff8807bde18000)
  Call Trace:
    __add_pages+0x85/0x120
    arch_add_memory+0x71/0xf0
    add_memory+0xd6/0x1f0
    acpi_memory_device_add+0x170/0x20c
    acpi_device_probe+0x50/0x18a
    really_probe+0x6c/0x320
    driver_probe_device+0x47/0xa0
    __device_attach+0x53/0x60
    bus_for_each_drv+0x6c/0xa0
    device_attach+0xa8/0xc0
    bus_probe_device+0xb0/0xe0
    device_add+0x301/0x570
    device_register+0x1e/0x30
    acpi_device_register+0x1d8/0x27c
    acpi_add_single_object+0x1df/0x2b9
    acpi_bus_check_add+0x112/0x18f
    acpi_ns_walk_namespace+0x105/0x255
    acpi_walk_namespace+0xcf/0x118
    acpi_bus_scan+0x5b/0x7c
    acpi_bus_add+0x2a/0x2c
    container_notify_cb+0x112/0x1a9
    acpi_ev_notify_dispatch+0x46/0x61
    acpi_os_execute_deferred+0x27/0x34
    process_one_work+0x20e/0x5c0
    worker_thread+0x12e/0x370
    kthread+0xee/0x100
    ret_from_fork+0x7c/0xb0
  Code: 00 00 48 89 df 48 89 45 c8 e8 3e 71 b1 ff 48 89 c2 48 8b 75 c8 b8 ef ff ff ff f6 02 01 75 4b 49 63 cc 31 c0 4c 89 ef 48 c1 e1 06 <f3> aa 48 8b 02 48 83 c8 01 48 85 d2 48 89 02 74 29 a8 01 74 25
  RIP  [<ffffffff816a483f>] sparse_add_one_section+0xef/0x166
   RSP <ffff8807bdcb35d8>
  CR2: ffffea0040000000
  ---[ end trace e7f94e3a34c442d4 ]---
  Kernel panic - not syncing: Fatal exception

Signed-off-by: Wen Congyang <[email protected]>
Signed-off-by: Tang Chen <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Jianguo Wu <[email protected]>
Cc: Kamezawa Hiroyuki <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Cc: Yasuaki Ishimatsu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
M1cha pushed a commit that referenced this pull request Feb 19, 2014
When a cpu is hotpluged, we call acpi_map_cpu2node() in
_acpi_map_lsapic() to store the cpu's node and apicid's node.  But we
don't clear the cpu's node in acpi_unmap_lsapic() when this cpu is
hotremoved.  If the node is also hotremoved, we will get the following
messages:

  kernel BUG at include/linux/gfp.h:329!
  invalid opcode: 0000 [#1] SMP
  Modules linked in: ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM iptable_mangle bridge stp llc sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crc32c_intel microcode pcspkr i2c_i801 i2c_core lpc_ich mfd_core ioatdma e1000e i7core_edac edac_core sg acpi_memhotplug igb dca sd_mod crc_t10dif megaraid_sas mptsas mptscsih mptbase scsi_transport_sas scsi_mod
  Pid: 3126, comm: init Not tainted 3.6.0-rc3-tangchen-hostbridge+ #13 FUJITSU-SV PRIMEQUEST 1800E/SB
  RIP: 0010:[<ffffffff811bc3fd>]  [<ffffffff811bc3fd>] allocate_slab+0x28d/0x300
  RSP: 0018:ffff88078a049cf8  EFLAGS: 00010246
  RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
  RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000246
  RBP: ffff88078a049d38 R08: 00000000000040d0 R09: 0000000000000001
  R10: 0000000000000000 R11: 0000000000000b5f R12: 00000000000052d0
  R13: ffff8807c1417300 R14: 0000000000030038 R15: 0000000000000003
  FS:  00007fa9b1b44700(0000) GS:ffff8807c3800000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 00007fa9b09acca0 CR3: 000000078b855000 CR4: 00000000000007e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process init (pid: 3126, threadinfo ffff88078a048000, task ffff8807bb6f2650)
  Call Trace:
    new_slab+0x30/0x1b0
    __slab_alloc+0x358/0x4c0
    kmem_cache_alloc_node_trace+0xb4/0x1e0
    alloc_fair_sched_group+0xd0/0x1b0
    sched_create_group+0x3e/0x110
    sched_autogroup_create_attach+0x4d/0x180
    sys_setsid+0xd4/0xf0
    system_call_fastpath+0x16/0x1b
  Code: 89 c4 e9 73 fe ff ff 31 c0 89 de 48 c7 c7 45 de 9e 81 44 89 45 c8 e8 22 05 4b 00 85 db 44 8b 45 c8 0f 89 4f ff ff ff 0f 0b eb fe <0f> 0b 90 eb fd 0f 0b eb fe 89 de 48 c7 c7 45 de 9e 81 31 c0 44
  RIP  [<ffffffff811bc3fd>] allocate_slab+0x28d/0x300
   RSP <ffff88078a049cf8>
  ---[ end trace adf84c90f3fea3e5 ]---

The reason is that the cpu's node is not NUMA_NO_NODE, we will call
alloc_pages_exact_node() to alloc memory on the node, but the node is
offlined.

If the node is onlined, we still need cpu's node.  For example: a task
on the cpu is sleeped when the cpu is hotremoved.  We will choose
another cpu to run this task when it is waked up.  If we know the cpu's
node, we will choose the cpu on the same node first.  So we should clear
cpu-to-node mapping when the node is offlined.

This patch only clears apicid-to-node mapping when the cpu is
hotremoved.

[[email protected]: fix section error]
Signed-off-by: Wen Congyang <[email protected]>
Signed-off-by: Tang Chen <[email protected]>
Cc: Yasuaki Ishimatsu <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
M1cha pushed a commit that referenced this pull request Feb 19, 2014
The tmpfs remount logic preserves filesystem mempolicy if the mpol=M
option is not specified in the remount request.  A new policy can be
specified if mpol=M is given.

Before this patch remounting an mpol bound tmpfs without specifying
mpol= mount option in the remount request would set the filesystem's
mempolicy object to a freed mempolicy object.

To reproduce the problem boot a DEBUG_PAGEALLOC kernel and run:
    # mkdir /tmp/x

    # mount -t tmpfs -o size=100M,mpol=interleave nodev /tmp/x

    # grep /tmp/x /proc/mounts
    nodev /tmp/x tmpfs rw,relatime,size=102400k,mpol=interleave:0-3 0 0

    # mount -o remount,size=200M nodev /tmp/x

    # grep /tmp/x /proc/mounts
    nodev /tmp/x tmpfs rw,relatime,size=204800k,mpol=??? 0 0
        # note ? garbage in mpol=... output above

    # dd if=/dev/zero of=/tmp/x/f count=1
        # panic here

Panic:
    BUG: unable to handle kernel NULL pointer dereference at           (null)
    IP: [<          (null)>]           (null)
    [...]
    Oops: 0010 [#1] SMP DEBUG_PAGEALLOC
    Call Trace:
      mpol_shared_policy_init+0xa5/0x160
      shmem_get_inode+0x209/0x270
      shmem_mknod+0x3e/0xf0
      shmem_create+0x18/0x20
      vfs_create+0xb5/0x130
      do_last+0x9a1/0xea0
      path_openat+0xb3/0x4d0
      do_filp_open+0x42/0xa0
      do_sys_open+0xfe/0x1e0
      compat_sys_open+0x1b/0x20
      cstar_dispatch+0x7/0x1f

Non-debug kernels will not crash immediately because referencing the
dangling mpol will not cause a fault.  Instead the filesystem will
reference a freed mempolicy object, which will cause unpredictable
behavior.

The problem boils down to a dropped mpol reference below if
shmem_parse_options() does not allocate a new mpol:

    config = *sbinfo
    shmem_parse_options(data, &config, true)
    mpol_put(sbinfo->mpol)
    sbinfo->mpol = config.mpol  /* BUG: saves unreferenced mpol */

This patch avoids the crash by not releasing the mempolicy if
shmem_parse_options() doesn't create a new mpol.

How far back does this issue go? I see it in both 2.6.36 and 3.3.  I did
not look back further.

Signed-off-by: Greg Thelen <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
M1cha pushed a commit that referenced this pull request Feb 19, 2014
Pass the directio request on pageio_init to clean up the API.

Percolate pg_dreq from original nfs_pageio_descriptor to the
pnfs_{read,write}_done_resend_to_mds and use it on respective
call to nfs_pageio_init_{read,write} on the newly created
nfs_pageio_descriptor.

Reproduced by command:
 mount -o vers=4.1 server:/ /mnt
 dd bs=128k count=8 if=/dev/zero of=/mnt/dd.out oflag=direct

BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
IP: [<ffffffffa021a3a8>] atomic_inc+0x4/0x9 [nfs]
PGD 34786067 PUD 34794067 PMD 0
Oops: 0002 [#1] SMP
Modules linked in: nfs_layout_nfsv41_files nfsv4 nfs nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc btrfs zlib_deflate libcrc32c ipv6 autofs4
CPU 1
Pid: 259, comm: kworker/1:2 Not tainted 3.8.0-rc6 #2 Bochs Bochs
RIP: 0010:[<ffffffffa021a3a8>]  [<ffffffffa021a3a8>] atomic_inc+0x4/0x9 [nfs]
RSP: 0018:ffff880038f8fa68  EFLAGS: 00010206
RAX: ffffffffa021a6a9 RBX: ffff880038f8fb48 RCX: 00000000000a0000
RDX: ffffffffa021e616 RSI: ffff8800385e9a40 RDI: 0000000000000028
RBP: ffff880038f8fa68 R08: ffffffff81ad6720 R09: ffff8800385e9510
R10: ffffffffa0228450 R11: ffff880038e87418 R12: ffff8800385e9a40
R13: ffff8800385e9a70 R14: ffff880038f8fb38 R15: ffffffffa0148878
FS:  0000000000000000(0000) GS:ffff88003e400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000028 CR3: 0000000034789000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/1:2 (pid: 259, threadinfo ffff880038f8e000, task ffff880038302480)
Stack:
 ffff880038f8fa78 ffffffffa021a6bf ffff880038f8fa88 ffffffffa021bb82
 ffff880038f8fae8 ffffffffa021f454 ffff880038f8fae8 ffffffff8109689d
 ffff880038f8fab8 ffffffff00000006 0000000000000000 ffff880038f8fb48
Call Trace:
 [<ffffffffa021a6bf>] nfs_direct_pgio_init+0x16/0x18 [nfs]
 [<ffffffffa021bb82>] nfs_pgheader_init+0x6a/0x6c [nfs]
 [<ffffffffa021f454>] nfs_generic_pg_writepages+0x51/0xf8 [nfs]
 [<ffffffff8109689d>] ? mark_held_locks+0x71/0x99
 [<ffffffffa0148878>] ? rpc_release_resources_task+0x37/0x37 [sunrpc]
 [<ffffffffa021bc25>] nfs_pageio_doio+0x1a/0x43 [nfs]
 [<ffffffffa021be7c>] nfs_pageio_complete+0x16/0x2c [nfs]
 [<ffffffffa02608be>] pnfs_write_done_resend_to_mds+0x95/0xc5 [nfsv4]
 [<ffffffffa0148878>] ? rpc_release_resources_task+0x37/0x37 [sunrpc]
 [<ffffffffa028e27f>] filelayout_reset_write+0x8c/0x99 [nfs_layout_nfsv41_files]
 [<ffffffffa028e5f9>] filelayout_write_done_cb+0x4d/0xc1 [nfs_layout_nfsv41_files]
 [<ffffffffa024587a>] nfs4_write_done+0x36/0x49 [nfsv4]
 [<ffffffffa021f996>] nfs_writeback_done+0x53/0x1cc [nfs]
 [<ffffffffa021fb1d>] nfs_writeback_done_common+0xe/0x10 [nfs]
 [<ffffffffa028e03d>] filelayout_write_call_done+0x28/0x2a [nfs_layout_nfsv41_files]
 [<ffffffffa01488a1>] rpc_exit_task+0x29/0x87 [sunrpc]
 [<ffffffffa014a0c9>] __rpc_execute+0x11d/0x3cc [sunrpc]
 [<ffffffff810969dc>] ? trace_hardirqs_on_caller+0x117/0x173
 [<ffffffffa014a39f>] rpc_async_schedule+0x27/0x32 [sunrpc]
 [<ffffffffa014a378>] ? __rpc_execute+0x3cc/0x3cc [sunrpc]
 [<ffffffff8105f8c1>] process_one_work+0x226/0x422
 [<ffffffff8105f7f4>] ? process_one_work+0x159/0x422
 [<ffffffff81094757>] ? lock_acquired+0x210/0x249
 [<ffffffffa014a378>] ? __rpc_execute+0x3cc/0x3cc [sunrpc]
 [<ffffffff810600d8>] worker_thread+0x126/0x1c4
 [<ffffffff8105ffb2>] ? manage_workers+0x240/0x240
 [<ffffffff81064ef8>] kthread+0xb1/0xb9
 [<ffffffff81064e47>] ? __kthread_parkme+0x65/0x65
 [<ffffffff815206ec>] ret_from_fork+0x7c/0xb0
 [<ffffffff81064e47>] ? __kthread_parkme+0x65/0x65
Code: 00 83 38 02 74 12 48 81 4b 50 00 00 01 00 c7 83 60 07 00 00 01 00 00 00 48 89 df e8 55 fe ff ff 5b 41 5c 5d c3 66 90 55 48 89 e5 <f0> ff 07 5d c3 55 48 89 e5 f0 ff 0f 0f 94 c0 84 c0 0f 95 c0 0f
RIP  [<ffffffffa021a3a8>] atomic_inc+0x4/0x9 [nfs]
 RSP <ffff880038f8fa68>
CR2: 0000000000000028

Signed-off-by: Benny Halevy <[email protected]>
Cc: [email protected] [>= 3.6]
Signed-off-by: Trond Myklebust <[email protected]>
M1cha pushed a commit that referenced this pull request Feb 19, 2014
…device"

This reverts commit eb6b9a8.

Above commit limits GSO capability of gre device to just TSO, but
software GRE-GSO is capable of handling all GSO capabilities.

This patch also fixes following panic which reverted commit introduced:-

BUG: unable to handle kernel NULL pointer dereference at 00000000000000a2
IP: [<ffffffffa0680fd1>] ipgre_tunnel_bind_dev+0x161/0x1f0 [ip_gre]
PGD 42bc19067 PUD 42bca9067 PMD 0
Oops: 0000 [#1] SMP
Pid: 2636, comm: ip Tainted: GF            3.8.0+ #83 Dell Inc. PowerEdge R620/0KCKR5
RIP: 0010:[<ffffffffa0680fd1>]  [<ffffffffa0680fd1>] ipgre_tunnel_bind_dev+0x161/0x1f0 [ip_gre]
RSP: 0018:ffff88042bfcb708  EFLAGS: 00010246
RAX: 00000000000005b6 RBX: ffff88042d2fa000 RCX: 0000000000000044
RDX: 0000000000000018 RSI: 0000000000000078 RDI: 0000000000000060
RBP: ffff88042bfcb748 R08: 0000000000000018 R09: 000000000000000c
R10: 0000000000000020 R11: 000000000101010a R12: ffff88042d2fa800
R13: 0000000000000000 R14: ffff88042d2fa800 R15: ffff88042cd7f650
FS:  00007fa784f55700(0000) GS:ffff88043fd20000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000a2 CR3: 000000042d8b9000 CR4: 00000000000407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ip (pid: 2636, threadinfo ffff88042bfca000, task ffff88042d142a80)
Stack:
 0000000100000000 002f000000000000 0a01010100000000 000000000b010101
 ffff88042d2fa800 ffff88042d2fa000 ffff88042bfcb858 ffff88042f418c00
 ffff88042bfcb798 ffffffffa068199a ffff88042bfcb798 ffff88042d2fa830
Call Trace:
 [<ffffffffa068199a>] ipgre_newlink+0xca/0x160 [ip_gre]
 [<ffffffff8143b692>] rtnl_newlink+0x532/0x5f0
 [<ffffffff8143b2fc>] ? rtnl_newlink+0x19c/0x5f0
 [<ffffffff81438978>] rtnetlink_rcv_msg+0x2c8/0x340
 [<ffffffff814386b0>] ? rtnetlink_rcv+0x40/0x40
 [<ffffffff814560f9>] netlink_rcv_skb+0xa9/0xd0
 [<ffffffff81438695>] rtnetlink_rcv+0x25/0x40
 [<ffffffff81455ddc>] netlink_unicast+0x1ac/0x230
 [<ffffffff81456a45>] netlink_sendmsg+0x265/0x380
 [<ffffffff814138c0>] sock_sendmsg+0xb0/0xe0
 [<ffffffff8141141e>] ? move_addr_to_kernel+0x4e/0x90
 [<ffffffff81420445>] ? verify_iovec+0x85/0xf0
 [<ffffffff81414ffd>] __sys_sendmsg+0x3fd/0x420
 [<ffffffff8114b701>] ? handle_mm_fault+0x251/0x3b0
 [<ffffffff8114f39f>] ? vma_link+0xcf/0xe0
 [<ffffffff81415239>] sys_sendmsg+0x49/0x90
 [<ffffffff814ffd19>] system_call_fastpath+0x16/0x1b

CC: Dmitry Kravkov <[email protected]>
Signed-off-by: Pravin B Shelar <[email protected]>
Acked-by: Dmitry Kravkov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
M1cha pushed a commit that referenced this pull request Feb 19, 2014
This fixes an oops where a LAYOUTGET is in still in the rpciod queue,
but the requesting processes has been killed.  Without this, killing
the process does the final pnfs_put_layout_hdr() and sets NFS_I(inode)->layout
to NULL while the LAYOUTGET rpc task still references it.

Example oops:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
IP: [<ffffffffa01bd586>] pnfs_choose_layoutget_stateid+0x37/0xef [nfsv4]
PGD 7365b067 PUD 7365d067 PMD 0
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
Modules linked in: nfs_layout_nfsv41_files nfsv4 auth_rpcgss nfs lockd sunrpc ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle ip6table_filter ip6_tables ppdev e1000 i2c_piix4 i2c_core shpchp parport_pc parport crc32c_intel aesni_intel xts aes_x86_64 lrw gf128mul ablk_helper cryptd mptspi scsi_transport_spi mptscsih mptbase floppy autofs4
CPU 0
Pid: 27, comm: kworker/0:1 Not tainted 3.8.0-dros_cthon2013+ #4 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
RIP: 0010:[<ffffffffa01bd586>]  [<ffffffffa01bd586>] pnfs_choose_layoutget_stateid+0x37/0xef [nfsv4]
RSP: 0018:ffff88007b0c1c88  EFLAGS: 00010246
RAX: ffff88006ed36678 RBX: 0000000000000000 RCX: 0000000ea877e3bc
RDX: ffff88007a729da8 RSI: 0000000000000000 RDI: ffff88007a72b958
RBP: ffff88007b0c1ca8 R08: 0000000000000002 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88007a72b958
R13: ffff88007a729da8 R14: 0000000000000000 R15: ffffffffa011077e
FS:  0000000000000000(0000) GS:ffff88007f600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000080 CR3: 00000000735f8000 CR4: 00000000001407f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/0:1 (pid: 27, threadinfo ffff88007b0c0000, task ffff88007c2fa0c0)
Stack:
 ffff88006fc05388 ffff88007a72b908 ffff88007b240900 ffff88006fc05388
 ffff88007b0c1cd8 ffffffffa01a2170 ffff88007b240900 ffff88007b240900
 ffff88007b240970 ffffffffa011077e ffff88007b0c1ce8 ffffffffa0110791
Call Trace:
 [<ffffffffa01a2170>] nfs4_layoutget_prepare+0x7b/0x92 [nfsv4]
 [<ffffffffa011077e>] ? __rpc_atrun+0x15/0x15 [sunrpc]
 [<ffffffffa0110791>] rpc_prepare_task+0x13/0x15 [sunrpc]

Reported-by: Tigran Mkrtchyan <[email protected]>
Signed-off-by: Weston Andros Adamson <[email protected]>
Cc: [email protected]
Signed-off-by: Trond Myklebust <[email protected]>
M1cha pushed a commit that referenced this pull request Feb 19, 2014
Kjell Braden reported this oops:

[  833.211970] BUG: unable to handle kernel NULL pointer dereference at           (null)
[  833.212816] IP: [<          (null)>]           (null)
[  833.213280] PGD 1b9b2067 PUD e9f7067 PMD 0
[  833.213874] Oops: 0010 [#1] SMP
[  833.214344] CPU 0
[  833.214458] Modules linked in: des_generic md4 nls_utf8 cifs vboxvideo drm snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq bnep rfcomm snd_timer bluetooth snd_seq_device ppdev snd vboxguest parport_pc joydev mac_hid soundcore snd_page_alloc psmouse i2c_piix4 serio_raw lp parport usbhid hid e1000
[  833.215629]
[  833.215629] Pid: 1752, comm: mount.cifs Not tainted 3.0.0-rc7-bisectcifs-fec11dd9a0+ #18 innotek GmbH VirtualBox/VirtualBox
[  833.215629] RIP: 0010:[<0000000000000000>]  [<          (null)>]           (null)
[  833.215629] RSP: 0018:ffff8800119c9c50  EFLAGS: 00010282
[  833.215629] RAX: ffffffffa02186c0 RBX: ffff88000c427780 RCX: 0000000000000000
[  833.215629] RDX: 0000000000000000 RSI: ffff88000c427780 RDI: ffff88000c4362e8
[  833.215629] RBP: ffff8800119c9c88 R08: ffff88001fc15e30 R09: 00000000d69515c7
[  833.215629] R10: ffffffffa0201972 R11: ffff88000e8f6a28 R12: ffff88000c4362e8
[  833.215629] R13: 0000000000000000 R14: 0000000000000000 R15: ffff88001181aaa6
[  833.215629] FS:  00007f2986171700(0000) GS:ffff88001fc00000(0000) knlGS:0000000000000000
[  833.215629] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  833.215629] CR2: 0000000000000000 CR3: 000000001b982000 CR4: 00000000000006f0
[  833.215629] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  833.215629] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  833.215629] Process mount.cifs (pid: 1752, threadinfo ffff8800119c8000, task ffff88001c1c16f0)
[  833.215629] Stack:
[  833.215629]  ffffffff8116a9b5 ffff8800119c9c88 ffffffff81178075 0000000000000286
[  833.215629]  0000000000000000 ffff88000c4276c0 ffff8800119c9ce8 ffff8800119c9cc8
[  833.215629]  ffffffff8116b06e ffff88001bc6fc00 ffff88000c4276c0 ffff88000c4276c0
[  833.215629] Call Trace:
[  833.215629]  [<ffffffff8116a9b5>] ? d_alloc_and_lookup+0x45/0x90
[  833.215629]  [<ffffffff81178075>] ? d_lookup+0x35/0x60
[  833.215629]  [<ffffffff8116b06e>] __lookup_hash.part.14+0x9e/0xc0
[  833.215629]  [<ffffffff8116b1d6>] lookup_one_len+0x146/0x1e0
[  833.215629]  [<ffffffff815e4f7e>] ? _raw_spin_lock+0xe/0x20
[  833.215629]  [<ffffffffa01eef0d>] cifs_do_mount+0x26d/0x500 [cifs]
[  833.215629]  [<ffffffff81163bd3>] mount_fs+0x43/0x1b0
[  833.215629]  [<ffffffff8117d41a>] vfs_kern_mount+0x6a/0xd0
[  833.215629]  [<ffffffff8117e584>] do_kern_mount+0x54/0x110
[  833.215629]  [<ffffffff8117fdc2>] do_mount+0x262/0x840
[  833.215629]  [<ffffffff81108a0e>] ? __get_free_pages+0xe/0x50
[  833.215629]  [<ffffffff8117f9ca>] ? copy_mount_options+0x3a/0x180
[  833.215629]  [<ffffffff8118075d>] sys_mount+0x8d/0xe0
[  833.215629]  [<ffffffff815ece82>] system_call_fastpath+0x16/0x1b
[  833.215629] Code:  Bad RIP value.
[  833.215629] RIP  [<          (null)>]           (null)
[  833.215629]  RSP <ffff8800119c9c50>
[  833.215629] CR2: 0000000000000000
[  833.238525] ---[ end trace ec00758b8d44f529 ]---

When walking down the path on the server, it's possible to hit a
symlink. The path walking code assumes that the caller will handle that
situation properly, but cifs_get_root() isn't set up for it. This patch
prevents the oops by simply returning an error.

A better solution would be to try and chase the symlinks here, but that's
fairly complicated to handle.

Fixes:

    https://bugzilla.kernel.org/show_bug.cgi?id=53221

Reported-and-tested-by: Kjell Braden <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Steve French <[email protected]>
M1cha pushed a commit that referenced this pull request Feb 19, 2014
Benny Halevy reported the following oops when testing RHEL6:

<7>nfs_update_inode: inode 892950 mode changed, 0040755 to 0100644
<1>BUG: unable to handle kernel NULL pointer dereference at (null)
<1>IP: [<ffffffffa02a52c5>] nfs_closedir+0x15/0x30 [nfs]
<4>PGD 81448a067 PUD 831632067 PMD 0
<4>Oops: 0000 [#1] SMP
<4>last sysfs file: /sys/kernel/mm/redhat_transparent_hugepage/enabled
<4>CPU 6
<4>Modules linked in: fuse bonding 8021q garp ebtable_nat ebtables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi softdog bridge stp llc xt_physdev ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 dm_round_robin dm_multipath objlayoutdriver2(U) nfs(U) lockd fscache auth_rpcgss nfs_acl sunrpc vhost_net macvtap macvlan tun kvm_intel kvm be2net igb dca ptp pps_core microcode serio_raw sg iTCO_wdt iTCO_vendor_support i7core_edac edac_core shpchp ext4 mbcache jbd2 sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
<4>
<4>Pid: 6332, comm: dd Not tainted 2.6.32-358.el6.x86_64 #1 HP ProLiant DL170e G6  /ProLiant DL170e G6
<4>RIP: 0010:[<ffffffffa02a52c5>]  [<ffffffffa02a52c5>] nfs_closedir+0x15/0x30 [nfs]
<4>RSP: 0018:ffff88081458bb98  EFLAGS: 00010292
<4>RAX: ffffffffa02a52b0 RBX: 0000000000000000 RCX: 0000000000000003
<4>RDX: ffffffffa02e45a0 RSI: ffff88081440b300 RDI: ffff88082d5f5760
<4>RBP: ffff88081458bba8 R08: 0000000000000000 R09: 0000000000000000
<4>R10: 0000000000000772 R11: 0000000000400004 R12: 0000000040000008
<4>R13: ffff88082d5f5760 R14: ffff88082d6e8800 R15: ffff88082f12d780
<4>FS:  00007f728f37e700(0000) GS:ffff8800456c0000(0000) knlGS:0000000000000000
<4>CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
<4>CR2: 0000000000000000 CR3: 0000000831279000 CR4: 00000000000007e0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process dd (pid: 6332, threadinfo ffff88081458a000, task ffff88082fa0e040)
<4>Stack:
<4> 0000000040000008 ffff88081440b300 ffff88081458bbf8 ffffffff81182745
<4><d> ffff88082d5f5760 ffff88082d6e8800 ffff88081458bbf8 ffffffffffffffea
<4><d> ffff88082f12d780 ffff88082d6e8800 ffffffffa02a50a0 ffff88082d5f5760
<4>Call Trace:
<4> [<ffffffff81182745>] __fput+0xf5/0x210
<4> [<ffffffffa02a50a0>] ? do_open+0x0/0x20 [nfs]
<4> [<ffffffff81182885>] fput+0x25/0x30
<4> [<ffffffff8117e23e>] __dentry_open+0x27e/0x360
<4> [<ffffffff811c397a>] ? inotify_d_instantiate+0x2a/0x60
<4> [<ffffffff8117e4b9>] lookup_instantiate_filp+0x69/0x90
<4> [<ffffffffa02a6679>] nfs_intent_set_file+0x59/0x90 [nfs]
<4> [<ffffffffa02a686b>] nfs_atomic_lookup+0x1bb/0x310 [nfs]
<4> [<ffffffff8118e0c2>] __lookup_hash+0x102/0x160
<4> [<ffffffff81225052>] ? selinux_inode_permission+0x72/0xb0
<4> [<ffffffff8118e76a>] lookup_hash+0x3a/0x50
<4> [<ffffffff81192a4b>] do_filp_open+0x2eb/0xdd0
<4> [<ffffffff8104757c>] ? __do_page_fault+0x1ec/0x480
<4> [<ffffffff8119f562>] ? alloc_fd+0x92/0x160
<4> [<ffffffff8117de79>] do_sys_open+0x69/0x140
<4> [<ffffffff811811f6>] ? sys_lseek+0x66/0x80
<4> [<ffffffff8117df90>] sys_open+0x20/0x30
<4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
<4>Code: 65 48 8b 04 25 c8 cb 00 00 83 a8 44 e0 ff ff 01 5b 41 5c c9 c3 90 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 48 8b 9e a0 00 00 00 <48> 8b 3b e8 13 0c f7 ff 48 89 df e8 ab 3d ec e0 48 83 c4 08 31
<1>RIP  [<ffffffffa02a52c5>] nfs_closedir+0x15/0x30 [nfs]
<4> RSP <ffff88081458bb98>
<4>CR2: 0000000000000000

I think this is ultimately due to a bug on the server. The client had
previously found a directory dentry. It then later tried to do an atomic
open on a new (regular file) dentry. The attributes it got back had the
same filehandle as the previously found directory inode. It then tried
to put the filp because it failed the aops tests for O_DIRECT opens, and
oopsed here because the ctx was still NULL.

Obviously the root cause here is a server issue, but we can take steps
to mitigate this on the client. When nfs_fhget is called, we always know
what type of inode it is. In the event that there's a broken or
malicious server on the other end of the wire, the client can end up
crashing because the wrong ops are set on it.

Have nfs_find_actor check that the inode type is correct after checking
the fileid. The fileid check should rarely ever match, so it should only
rarely ever get to this check. In the case where we have a broken
server, we may see two different inodes with the same i_ino, but the
client should be able to cope with them without crashing.

This should fix the oops reported here:

    https://bugzilla.redhat.com/show_bug.cgi?id=913660

Reported-by: Benny Halevy <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
M1cha pushed a commit that referenced this pull request Feb 19, 2014
While adding and removing a lot of disks disks and partitions this
sometimes shows up:

  WARNING: at fs/sysfs/dir.c:512 sysfs_add_one+0xc9/0x130() (Not tainted)
  Hardware name:
  sysfs: cannot create duplicate filename '/dev/block/259:751'
  Modules linked in: raid1 autofs4 bnx2fc cnic uio fcoe libfcoe libfc 8021q scsi_transport_fc scsi_tgt garp stp llc sunrpc cpufreq_ondemand powernow_k8 freq_table mperf ipv6 dm_mirror dm_region_hash dm_log power_meter microcode dcdbas serio_raw amd64_edac_mod edac_core edac_mce_amd i2c_piix4 i2c_core k10temp bnx2 sg ixgbe dca mdio ext4 mbcache jbd2 dm_round_robin sr_mod cdrom sd_mod crc_t10dif ata_generic pata_acpi pata_atiixp ahci mptsas mptscsih mptbase scsi_transport_sas dm_multipath dm_mod [last unloaded: scsi_wait_scan]
  Pid: 44103, comm: async/16 Not tainted 2.6.32-195.el6.x86_64 #1
  Call Trace:
    warn_slowpath_common+0x87/0xc0
    warn_slowpath_fmt+0x46/0x50
    sysfs_add_one+0xc9/0x130
    sysfs_do_create_link+0x12b/0x170
    sysfs_create_link+0x13/0x20
    device_add+0x317/0x650
    idr_get_new+0x13/0x50
    add_partition+0x21c/0x390
    rescan_partitions+0x32b/0x470
    sd_open+0x81/0x1f0 [sd_mod]
    __blkdev_get+0x1b6/0x3c0
    blkdev_get+0x10/0x20
    register_disk+0x155/0x170
    add_disk+0xa6/0x160
    sd_probe_async+0x13b/0x210 [sd_mod]
    add_wait_queue+0x46/0x60
    async_thread+0x102/0x250
    default_wake_function+0x0/0x20
    async_thread+0x0/0x250
    kthread+0x96/0xa0
    child_rip+0xa/0x20
    kthread+0x0/0xa0
    child_rip+0x0/0x20

This most likely happens because dev_t is freed while the number is
still used and idr_get_new() is not protected on every use.  The fix
adds a mutex where it wasn't before and moves the dev_t free function so
it is called after device del.

Signed-off-by: Tomas Henzl <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
M1cha pushed a commit that referenced this pull request Feb 19, 2014
This patch fixes a regression introduced in v3.8, which causes oops
like this when dm-multipath is used:

general protection fault: 0000 [#1] SMP
RIP: 0010:[<ffffffff810fe754>]  [<ffffffff810fe754>] mempool_free+0x24/0xb0
Call Trace:
  <IRQ>
  [<ffffffff81187417>] bio_put+0x97/0xc0
  [<ffffffffa02247a5>] end_clone_bio+0x35/0x90 [dm_mod]
  [<ffffffff81185efd>] bio_endio+0x1d/0x30
  [<ffffffff811f03a3>] req_bio_endio.isra.51+0xa3/0xe0
  [<ffffffff811f2f68>] blk_update_request+0x118/0x520
  [<ffffffff811f3397>] blk_update_bidi_request+0x27/0xa0
  [<ffffffff811f343c>] blk_end_bidi_request+0x2c/0x80
  [<ffffffff811f34d0>] blk_end_request+0x10/0x20
  [<ffffffffa000b32b>] scsi_io_completion+0xfb/0x6c0 [scsi_mod]
  [<ffffffffa000107d>] scsi_finish_command+0xbd/0x120 [scsi_mod]
  [<ffffffffa000b12f>] scsi_softirq_done+0x13f/0x160 [scsi_mod]
  [<ffffffff811f9fd0>] blk_done_softirq+0x80/0xa0
  [<ffffffff81044551>] __do_softirq+0xf1/0x250
  [<ffffffff8142ee8c>] call_softirq+0x1c/0x30
  [<ffffffff8100420d>] do_softirq+0x8d/0xc0
  [<ffffffff81044885>] irq_exit+0xd5/0xe0
  [<ffffffff8142f3e3>] do_IRQ+0x63/0xe0
  [<ffffffff814257af>] common_interrupt+0x6f/0x6f
  <EOI>
  [<ffffffffa021737c>] srp_queuecommand+0x8c/0xcb0 [ib_srp]
  [<ffffffffa0002f18>] scsi_dispatch_cmd+0x148/0x310 [scsi_mod]
  [<ffffffffa000a38e>] scsi_request_fn+0x31e/0x520 [scsi_mod]
  [<ffffffff811f1e57>] __blk_run_queue+0x37/0x50
  [<ffffffff811f1f69>] blk_delay_work+0x29/0x40
  [<ffffffff81059003>] process_one_work+0x1c3/0x5c0
  [<ffffffff8105b22e>] worker_thread+0x15e/0x440
  [<ffffffff8106164b>] kthread+0xdb/0xe0
  [<ffffffff8142db9c>] ret_from_fork+0x7c/0xb0

The regression was introduced by the change
c0820cf "dm: introduce per_bio_data", where dm started to replace
bioset during table replacement.
For bio-based dm, it is good because clone bios do not exist during the
table replacement.
For request-based dm, however, (not-yet-mapped) clone bios may stay in
request queue and survive during the table replacement.
So freeing the old bioset could cause the oops in bio_put().

Since the size of front_pad may change only with bio-based dm,
it is not necessary to replace bioset for request-based dm.

Reported-by: Bart Van Assche <[email protected]>
Tested-by: Bart Van Assche <[email protected]>
Signed-off-by: Jun'ichi Nomura <[email protected]>
Acked-by: Mikulas Patocka <[email protected]>
Acked-by: Mike Snitzer <[email protected]>
Cc: <[email protected]>
Signed-off-by: Alasdair G Kergon <[email protected]>
M1cha pushed a commit that referenced this pull request Feb 19, 2014
Tim found:

  WARNING: at arch/x86/kernel/smpboot.c:324 topology_sane.isra.2+0x6f/0x80()
  Hardware name: S2600CP
  sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
  smpboot: Booting Node   1, Processors  #1
  Modules linked in:
  Pid: 0, comm: swapper/1 Not tainted 3.9.0-0-generic #1
  Call Trace:
    set_cpu_sibling_map+0x279/0x449
    start_secondary+0x11d/0x1e5

Don Morris reproduced on a HP z620 workstation, and bisected it to
commit e8d1955 ("acpi, memory-hotplug: parse SRAT before memblock
is ready")

It turns out movable_map has some problems, and it breaks several things

1. numa_init is called several times, NOT just for srat. so those
	nodes_clear(numa_nodes_parsed)
	memset(&numa_meminfo, 0, sizeof(numa_meminfo))
   can not be just removed.  Need to consider sequence is: numaq, srat, amd, dummy.
   and make fall back path working.

2. simply split acpi_numa_init to early_parse_srat.
   a. that early_parse_srat is NOT called for ia64, so you break ia64.
   b.  for (i = 0; i < MAX_LOCAL_APIC; i++)
	     set_apicid_to_node(i, NUMA_NO_NODE)
     still left in numa_init. So it will just clear result from early_parse_srat.
     it should be moved before that....
   c.  it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved
       early before override from INITRD is settled.

3. that patch TITLE is total misleading, there is NO x86 in the title,
   but it changes critical x86 code. It caused x86 guys did not
   pay attention to find the problem early. Those patches really should
   be routed via tip/x86/mm.

4. after that commit, following range can not use movable ram:
  a. real_mode code.... well..funny, legacy Node0 [0,1M) could be hot-removed?
  b. initrd... it will be freed after booting, so it could be on movable...
  c. crashkernel for kdump...: looks like we can not put kdump kernel above 4G
	anymore.
  d. init_mem_mapping: can not put page table high anymore.
  e. initmem_init: vmemmap can not be high local node anymore. That is
     not good.

If node is hotplugable, the mem related range like page table and
vmemmap could be on the that node without problem and should be on that
node.

We have workaround patch that could fix some problems, but some can not
be fixed.

So just remove that offending commit and related ones including:

 f7210e6 ("mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to
    protect movablecore_map in memblock_overlaps_region().")

 01a178a ("acpi, memory-hotplug: support getting hotplug info from
    SRAT")

 27168d3 ("acpi, memory-hotplug: extend movablemem_map ranges to
    the end of node")

 e8d1955 ("acpi, memory-hotplug: parse SRAT before memblock is
    ready")

 fb06bc8 ("page_alloc: bootmem limit with movablecore_map")

 42f47e2 ("page_alloc: make movablemem_map have higher priority")

 6981ec3 ("page_alloc: introduce zone_movable_limit[] to keep
    movable limit for nodes")

 34b71f1 ("page_alloc: add movable_memmap kernel parameter")

 4d59a75 ("x86: get pg_data_t's memory from other node")

Later we should have patches that will make sure kernel put page table
and vmemmap on local node ram instead of push them down to node0.  Also
need to find way to put other kernel used ram to local node ram.

Reported-by: Tim Gardner <[email protected]>
Reported-by: Don Morris <[email protected]>
Bisected-by: Don Morris <[email protected]>
Tested-by: Don Morris <[email protected]>
Signed-off-by: Yinghai Lu <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Thomas Renninger <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Tang Chen <[email protected]>
Cc: Yasuaki Ishimatsu <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
M1cha pushed a commit that referenced this pull request Feb 19, 2014
If start_this_handle() failed handle will be initialized
to ERR_PTR() and can not be dereferenced.

paging request at fffffffffffffff6
IP: [<ffffffff813c073f>] jbd2__journal_start+0x18f/0x290
PGD 200e067 PUD 200f067 PMD 0
Oops: 0000 [#1] SMP
Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod
CPU 0 journal commit I/O error

Pid: 2694, comm: fio Not tainted 3.8.0-rc3+ #79                  /DQ67SW
RIP: 0010:[<ffffffff813c073f>]  [<ffffffff813c073f>] jbd2__journal_start+0x18f/0x290
RSP: 0018:ffff880233b8ba58  EFLAGS: 00010292
RAX: 00000000ffffffe2 RBX: ffffffffffffffe2 RCX: 0000000000000006
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff82128f48
RBP: ffff880233b8ba98 R08: 0000000000000000 R09: ffff88021440a6e0

Signed-off-by: Dmitry Monakhov <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
M1cha pushed a commit that referenced this pull request Feb 19, 2014
Dave Jones <[email protected]> writes:
> Just hit this on Linus' current tree.
>
> [   89.621770] BUG: unable to handle kernel NULL pointer dereference at 00000000000000c8
> [   89.623111] IP: [<ffffffff810784b0>] commit_creds+0x250/0x2f0
> [   89.624062] PGD 122bfd067 PUD 122bfe067 PMD 0
> [   89.624901] Oops: 0000 [#1] PREEMPT SMP
> [   89.625678] Modules linked in: caif_socket caif netrom bridge hidp 8021q garp stp mrp rose llc2 af_rxrpc phonet af_key binfmt_misc bnep l2tp_ppp can_bcm l2tp_core pppoe pppox can_raw scsi_transport_iscsi ppp_generic slhc nfnetlink can ipt_ULOG ax25 decnet irda nfc rds x25 crc_ccitt appletalk atm ipx p8023 psnap p8022 llc lockd sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack nf_conntrack ip6table_filter ip6_tables btusb bluetooth snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_pcm vhost_net snd_page_alloc snd_timer tun macvtap usb_debug snd rfkill microcode macvlan edac_core pcspkr serio_raw kvm_amd soundcore kvm r8169 mii
> [   89.637846] CPU 2
> [   89.638175] Pid: 782, comm: trinity-main Not tainted 3.8.0+ #63 Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H
> [   89.639850] RIP: 0010:[<ffffffff810784b0>]  [<ffffffff810784b0>] commit_creds+0x250/0x2f0
> [   89.641161] RSP: 0018:ffff880115657eb8  EFLAGS: 00010207
> [   89.641984] RAX: 00000000000003e8 RBX: ffff88012688b000 RCX: 0000000000000000
> [   89.643069] RDX: 0000000000000000 RSI: ffffffff81c32960 RDI: ffff880105839600
> [   89.644167] RBP: ffff880115657ed8 R08: 0000000000000000 R09: 0000000000000000
> [   89.645254] R10: 0000000000000001 R11: 0000000000000246 R12: ffff880105839600
> [   89.646340] R13: ffff88011beea490 R14: ffff88011beea490 R15: 0000000000000000
> [   89.647431] FS:  00007f3ac063b740(0000) GS:ffff88012b200000(0000) knlGS:0000000000000000
> [   89.648660] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [   89.649548] CR2: 00000000000000c8 CR3: 0000000122bfc000 CR4: 00000000000007e0
> [   89.650635] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   89.651723] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [   89.652812] Process trinity-main (pid: 782, threadinfo ffff880115656000, task ffff88011beea490)
> [   89.654128] Stack:
> [   89.654433]  0000000000000000 ffff8801058396a0 ffff880105839600 ffff88011beeaa78
> [   89.655769]  ffff880115657ef8 ffffffff812c7d9b ffffffff82079be0 0000000000000000
> [   89.657073]  ffff880115657f28 ffffffff8106c665 0000000000000002 ffff880115657f58
> [   89.658399] Call Trace:
> [   89.658822]  [<ffffffff812c7d9b>] key_change_session_keyring+0xfb/0x140
> [   89.659845]  [<ffffffff8106c665>] task_work_run+0xa5/0xd0
> [   89.660698]  [<ffffffff81002911>] do_notify_resume+0x71/0xb0
> [   89.661581]  [<ffffffff816c9a4a>] int_signal+0x12/0x17
> [   89.662385] Code: 24 90 00 00 00 48 8b b3 90 00 00 00 49 8b 4c 24 40 48 39 f2 75 08 e9 83 00 00 00 48 89 ca 48 81 fa 60 29 c3 81 0f 84 41 fe ff ff <48> 8b 8a c8 00 00 00 48 39 ce 75 e4 3b 82 d0 00 00 00 0f 84 4b
> [   89.667778] RIP  [<ffffffff810784b0>] commit_creds+0x250/0x2f0
> [   89.668733]  RSP <ffff880115657eb8>
> [   89.669301] CR2: 00000000000000c8
>
> My fastest trinity induced oops yet!
>
>
> Appears to be..
>
>                 if ((set_ns == subset_ns->parent)  &&
>      850:       48 8b 8a c8 00 00 00    mov    0xc8(%rdx),%rcx
>
> from the inlined cred_cap_issubset

By historical accident we have been reading trying to set new->user_ns
from new->user_ns.  Which is totally silly as new->user_ns is NULL (as
is every other field in new except session_keyring at that point).

The intent is clearly to copy all of the fields from old to new so copy
old->user_ns into  into new->user_ns.

Cc: [email protected]
Reported-by: Dave Jones <[email protected]>
Tested-by: Dave Jones <[email protected]>
Acked-by: Serge Hallyn <[email protected]>
Signed-off-by: "Eric W. Biederman" <[email protected]>
M1cha pushed a commit that referenced this pull request Feb 19, 2014
While shuting down a HVM guest with pci devices passed through we
get this:

pciback 0000:04:00.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100002)
------------[ cut here ]------------
WARNING: at drivers/pci/pci.c:1397 pci_disable_device+0x88/0xa0()
Hardware name: MS-7640
Device pciback
disabling already-disabled device
Modules linked in:
Pid: 53, comm: xenwatch Not tainted 3.9.0-rc1-20130304a+ #1
Call Trace:
 [<ffffffff8106994a>] warn_slowpath_common+0x7a/0xc0
 [<ffffffff81069a31>] warn_slowpath_fmt+0x41/0x50
 [<ffffffff813cf288>] pci_disable_device+0x88/0xa0
 [<ffffffff814554a7>] xen_pcibk_reset_device+0x37/0xd0
 [<ffffffff81454b6f>] ? pcistub_put_pci_dev+0x6f/0x120
 [<ffffffff81454b8d>] pcistub_put_pci_dev+0x8d/0x120
 [<ffffffff814582a9>] __xen_pcibk_release_devices+0x59/0xa0

This fixes the bug.

CC: [email protected]
Reported-and-Tested-by: Sander Eikelenboom <[email protected]>
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
M1cha pushed a commit that referenced this pull request Feb 19, 2014
Dave reported following crash :

general protection fault: 0000 [#1] SMP
CPU 2
Pid: 25407, comm: qemu-kvm Not tainted 3.7.9-205.fc18.x86_64 #1 Hewlett-Packard HP Z400 Workstation/0B4Ch
RIP: 0010:[<ffffffffa0399bd5>]  [<ffffffffa0399bd5>] destroy_conntrack+0x35/0x120 [nf_conntrack]
RSP: 0018:ffff880276913d78  EFLAGS: 00010206
RAX: 50626b6b7876376c RBX: ffff88026e530d68 RCX: ffff88028d158e00
RDX: ffff88026d0d5470 RSI: 0000000000000011 RDI: 0000000000000002
RBP: ffff880276913d88 R08: 0000000000000000 R09: ffff880295002900
R10: 0000000000000000 R11: 0000000000000003 R12: ffffffff81ca3b40
R13: ffffffff8151a8e0 R14: ffff880270875000 R15: 0000000000000002
FS:  00007ff3bce38a00(0000) GS:ffff88029fc40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fd1430bd000 CR3: 000000027042b000 CR4: 00000000000027e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process qemu-kvm (pid: 25407, threadinfo ffff880276912000, task ffff88028c369720)
Stack:
 ffff880156f59100 ffff880156f59100 ffff880276913d98 ffffffff815534f7
 ffff880276913db8 ffffffff8151a74b ffff880270875000 ffff880156f59100
 ffff880276913dd8 ffffffff8151a5a6 ffff880276913dd8 ffff88026d0d5470
Call Trace:
 [<ffffffff815534f7>] nf_conntrack_destroy+0x17/0x20
 [<ffffffff8151a74b>] skb_release_head_state+0x7b/0x100
 [<ffffffff8151a5a6>] __kfree_skb+0x16/0xa0
 [<ffffffff8151a666>] kfree_skb+0x36/0xa0
 [<ffffffff8151a8e0>] skb_queue_purge+0x20/0x40
 [<ffffffffa02205f7>] __tun_detach+0x117/0x140 [tun]
 [<ffffffffa022184c>] tun_chr_close+0x3c/0xd0 [tun]
 [<ffffffff8119669c>] __fput+0xec/0x240
 [<ffffffff811967fe>] ____fput+0xe/0x10
 [<ffffffff8107eb27>] task_work_run+0xa7/0xe0
 [<ffffffff810149e1>] do_notify_resume+0x71/0xb0
 [<ffffffff81640152>] int_signal+0x12/0x17
Code: 00 00 04 48 89 e5 41 54 53 48 89 fb 4c 8b a7 e8 00 00 00 0f 85 de 00 00 00 0f b6 73 3e 0f b7 7b 2a e8 10 40 00 00 48 85 c0 74 0e <48> 8b 40 28 48 85 c0 74 05 48 89 df ff d0 48 c7 c7 08 6a 3a a0
RIP  [<ffffffffa0399bd5>] destroy_conntrack+0x35/0x120 [nf_conntrack]
 RSP <ffff880276913d78>

This is because tun_net_xmit() needs to call nf_reset()
before queuing skb into receive_queue

Reported-by: Dave Jones <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
M1cha pushed a commit that referenced this pull request Feb 19, 2014
The following script will produce a kernel oops:

    sudo ip netns add v
    sudo ip netns exec v ip ad add 127.0.0.1/8 dev lo
    sudo ip netns exec v ip link set lo up
    sudo ip netns exec v ip ro add 224.0.0.0/4 dev lo
    sudo ip netns exec v ip li add vxlan0 type vxlan id 42 group 239.1.1.1 dev lo
    sudo ip netns exec v ip link set vxlan0 up
    sudo ip netns del v

where inspect by gdb:

    Program received signal SIGSEGV, Segmentation fault.
    [Switching to Thread 107]
    0xffffffffa0289e33 in ?? ()
    (gdb) bt
    #0  vxlan_leave_group (dev=0xffff88001bafa000) at drivers/net/vxlan.c:533
    #1  vxlan_stop (dev=0xffff88001bafa000) at drivers/net/vxlan.c:1087
    #2  0xffffffff812cc498 in __dev_close_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:1299
    #3  0xffffffff812cd920 in dev_close_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:1335
    #4  0xffffffff812cef31 in rollback_registered_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:4851
    #5  0xffffffff812cf040 in unregister_netdevice_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:5752
    #6  0xffffffff812cf1ba in default_device_exit_batch (net_list=0xffff88001f2e7e18) at net/core/dev.c:6170
    #7  0xffffffff812cab27 in cleanup_net (work=<optimized out>) at net/core/net_namespace.c:302
    #8  0xffffffff810540ef in process_one_work (worker=0xffff88001ba9ed40, work=0xffffffff8167d020) at kernel/workqueue.c:2157
    #9  0xffffffff810549d0 in worker_thread (__worker=__worker@entry=0xffff88001ba9ed40) at kernel/workqueue.c:2276
    #10 0xffffffff8105870c in kthread (_create=0xffff88001f2e5d68) at kernel/kthread.c:168
    #11 <signal handler called>
    #12 0x0000000000000000 in ?? ()
    #13 0x0000000000000000 in ?? ()
    (gdb) fr 0
    #0  vxlan_leave_group (dev=0xffff88001bafa000) at drivers/net/vxlan.c:533
    533		struct sock *sk = vn->sock->sk;
    (gdb) l
    528	static int vxlan_leave_group(struct net_device *dev)
    529	{
    530		struct vxlan_dev *vxlan = netdev_priv(dev);
    531		struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
    532		int err = 0;
    533		struct sock *sk = vn->sock->sk;
    534		struct ip_mreqn mreq = {
    535			.imr_multiaddr.s_addr	= vxlan->gaddr,
    536			.imr_ifindex		= vxlan->link,
    537		};
    (gdb) p vn->sock
    $4 = (struct socket *) 0x0

The kernel calls `vxlan_exit_net` when deleting the netns before shutting down
vxlan interfaces. Later the removal of all vxlan interfaces, where `vn->sock`
is already gone causes the oops. so we should manually shutdown all interfaces
before deleting `vn->sock` as the patch does.

Signed-off-by: Zang MingJie <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
M1cha pushed a commit that referenced this pull request Feb 19, 2014
ca22e56 (driver-core: implement 'sysdev' functionality for regular
devices and buses) has introduced bus_register macro with a static
key to distinguish different subsys mutex classes.

This however doesn't work for different subsys which use a common
registering function. One example is subsys_system_register (and
mce_device and cpu_device).

In the end this leads to the following lockdep splat:
[  207.271924] ======================================================
[  207.271932] [ INFO: possible circular locking dependency detected ]
[  207.271942] 3.9.0-rc1-0.7-default+ #34 Not tainted
[  207.271948] -------------------------------------------------------
[  207.271957] bash/10493 is trying to acquire lock:
[  207.271963]  (subsys mutex){+.+.+.}, at: [<ffffffff8134af27>] bus_remove_device+0x37/0x1c0
[  207.271987]
[  207.271987] but task is already holding lock:
[  207.271995]  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81046ccf>] cpu_hotplug_begin+0x2f/0x60
[  207.272012]
[  207.272012] which lock already depends on the new lock.
[  207.272012]
[  207.272023]
[  207.272023] the existing dependency chain (in reverse order) is:
[  207.272033]
[  207.272033] -> #4 (cpu_hotplug.lock){+.+.+.}:
[  207.272044]        [<ffffffff810ae329>] lock_acquire+0xe9/0x120
[  207.272056]        [<ffffffff814ad807>] mutex_lock_nested+0x37/0x360
[  207.272069]        [<ffffffff81046ba9>] get_online_cpus+0x29/0x40
[  207.272082]        [<ffffffff81185210>] drain_all_stock+0x30/0x150
[  207.272094]        [<ffffffff811853da>] mem_cgroup_reclaim+0xaa/0xe0
[  207.272104]        [<ffffffff8118775e>] __mem_cgroup_try_charge+0x51e/0xcf0
[  207.272114]        [<ffffffff81188486>] mem_cgroup_charge_common+0x36/0x60
[  207.272125]        [<ffffffff811884da>] mem_cgroup_newpage_charge+0x2a/0x30
[  207.272135]        [<ffffffff81150531>] do_wp_page+0x231/0x830
[  207.272147]        [<ffffffff8115151e>] handle_pte_fault+0x19e/0x8d0
[  207.272157]        [<ffffffff81151da8>] handle_mm_fault+0x158/0x1e0
[  207.272166]        [<ffffffff814b6153>] do_page_fault+0x2a3/0x4e0
[  207.272178]        [<ffffffff814b2578>] page_fault+0x28/0x30
[  207.272189]
[  207.272189] -> #3 (&mm->mmap_sem){++++++}:
[  207.272199]        [<ffffffff810ae329>] lock_acquire+0xe9/0x120
[  207.272208]        [<ffffffff8114c5ad>] might_fault+0x6d/0x90
[  207.272218]        [<ffffffff811a11e3>] filldir64+0xb3/0x120
[  207.272229]        [<ffffffffa013fc19>] call_filldir+0x89/0x130 [ext3]
[  207.272248]        [<ffffffffa0140377>] ext3_readdir+0x6b7/0x7e0 [ext3]
[  207.272263]        [<ffffffff811a1519>] vfs_readdir+0xa9/0xc0
[  207.272273]        [<ffffffff811a15cb>] sys_getdents64+0x9b/0x110
[  207.272284]        [<ffffffff814bb599>] system_call_fastpath+0x16/0x1b
[  207.272296]
[  207.272296] -> #2 (&type->i_mutex_dir_key#3){+.+.+.}:
[  207.272309]        [<ffffffff810ae329>] lock_acquire+0xe9/0x120
[  207.272319]        [<ffffffff814ad807>] mutex_lock_nested+0x37/0x360
[  207.272329]        [<ffffffff8119c254>] link_path_walk+0x6f4/0x9a0
[  207.272339]        [<ffffffff8119e7fa>] path_openat+0xba/0x470
[  207.272349]        [<ffffffff8119ecf8>] do_filp_open+0x48/0xa0
[  207.272358]        [<ffffffff8118d81c>] file_open_name+0xdc/0x110
[  207.272369]        [<ffffffff8118d885>] filp_open+0x35/0x40
[  207.272378]        [<ffffffff8135c76e>] _request_firmware+0x52e/0xb20
[  207.272389]        [<ffffffff8135cdd6>] request_firmware+0x16/0x20
[  207.272399]        [<ffffffffa03bdb91>] request_microcode_fw+0x61/0xd0 [microcode]
[  207.272416]        [<ffffffffa03bd554>] microcode_init_cpu+0x104/0x150 [microcode]
[  207.272431]        [<ffffffffa03bd61c>] mc_device_add+0x7c/0xb0 [microcode]
[  207.272444]        [<ffffffff8134a419>] subsys_interface_register+0xc9/0x100
[  207.272457]        [<ffffffffa04fc0f4>] 0xffffffffa04fc0f4
[  207.272472]        [<ffffffff81000202>] do_one_initcall+0x42/0x180
[  207.272485]        [<ffffffff810bbeff>] load_module+0x19df/0x1b70
[  207.272499]        [<ffffffff810bc376>] sys_init_module+0xe6/0x130
[  207.272511]        [<ffffffff814bb599>] system_call_fastpath+0x16/0x1b
[  207.272523]
[  207.272523] -> #1 (umhelper_sem){++++.+}:
[  207.272537]        [<ffffffff810ae329>] lock_acquire+0xe9/0x120
[  207.272548]        [<ffffffff814ae9c4>] down_read+0x34/0x50
[  207.272559]        [<ffffffff81062bff>] usermodehelper_read_trylock+0x4f/0x100
[  207.272575]        [<ffffffff8135c7dd>] _request_firmware+0x59d/0xb20
[  207.272587]        [<ffffffff8135cdd6>] request_firmware+0x16/0x20
[  207.272599]        [<ffffffffa03bdb91>] request_microcode_fw+0x61/0xd0 [microcode]
[  207.272613]        [<ffffffffa03bd554>] microcode_init_cpu+0x104/0x150 [microcode]
[  207.272627]        [<ffffffffa03bd61c>] mc_device_add+0x7c/0xb0 [microcode]
[  207.272641]        [<ffffffff8134a419>] subsys_interface_register+0xc9/0x100
[  207.272654]        [<ffffffffa04fc0f4>] 0xffffffffa04fc0f4
[  207.272666]        [<ffffffff81000202>] do_one_initcall+0x42/0x180
[  207.272678]        [<ffffffff810bbeff>] load_module+0x19df/0x1b70
[  207.272690]        [<ffffffff810bc376>] sys_init_module+0xe6/0x130
[  207.272702]        [<ffffffff814bb599>] system_call_fastpath+0x16/0x1b
[  207.272715]
[  207.272715] -> #0 (subsys mutex){+.+.+.}:
[  207.272729]        [<ffffffff810ae002>] __lock_acquire+0x13b2/0x15f0
[  207.272740]        [<ffffffff810ae329>] lock_acquire+0xe9/0x120
[  207.272751]        [<ffffffff814ad807>] mutex_lock_nested+0x37/0x360
[  207.272763]        [<ffffffff8134af27>] bus_remove_device+0x37/0x1c0
[  207.272775]        [<ffffffff81349114>] device_del+0x134/0x1f0
[  207.272786]        [<ffffffff813491f2>] device_unregister+0x22/0x60
[  207.272798]        [<ffffffff814a24ea>] mce_cpu_callback+0x15e/0x1ad
[  207.272812]        [<ffffffff814b6402>] notifier_call_chain+0x72/0x130
[  207.272824]        [<ffffffff81073d6e>] __raw_notifier_call_chain+0xe/0x10
[  207.272839]        [<ffffffff81498f76>] _cpu_down+0x1d6/0x350
[  207.272851]        [<ffffffff81499130>] cpu_down+0x40/0x60
[  207.272862]        [<ffffffff8149cc55>] store_online+0x75/0xe0
[  207.272874]        [<ffffffff813474a0>] dev_attr_store+0x20/0x30
[  207.272886]        [<ffffffff812090d9>] sysfs_write_file+0xd9/0x150
[  207.272900]        [<ffffffff8118e10b>] vfs_write+0xcb/0x130
[  207.272911]        [<ffffffff8118e924>] sys_write+0x64/0xa0
[  207.272923]        [<ffffffff814bb599>] system_call_fastpath+0x16/0x1b
[  207.272936]
[  207.272936] other info that might help us debug this:
[  207.272936]
[  207.272952] Chain exists of:
[  207.272952]   subsys mutex --> &mm->mmap_sem --> cpu_hotplug.lock
[  207.272952]
[  207.272973]  Possible unsafe locking scenario:
[  207.272973]
[  207.272984]        CPU0                    CPU1
[  207.272992]        ----                    ----
[  207.273000]   lock(cpu_hotplug.lock);
[  207.273009]                                lock(&mm->mmap_sem);
[  207.273020]                                lock(cpu_hotplug.lock);
[  207.273031]   lock(subsys mutex);
[  207.273040]
[  207.273040]  *** DEADLOCK ***
[  207.273040]
[  207.273055] 5 locks held by bash/10493:
[  207.273062]  #0:  (&buffer->mutex){+.+.+.}, at: [<ffffffff81209049>] sysfs_write_file+0x49/0x150
[  207.273080]  #1:  (s_active#150){.+.+.+}, at: [<ffffffff812090c2>] sysfs_write_file+0xc2/0x150
[  207.273099]  #2:  (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81027557>] cpu_hotplug_driver_lock+0x17/0x20
[  207.273121]  #3:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff8149911c>] cpu_down+0x2c/0x60
[  207.273140]  #4:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81046ccf>] cpu_hotplug_begin+0x2f/0x60
[  207.273158]
[  207.273158] stack backtrace:
[  207.273170] Pid: 10493, comm: bash Not tainted 3.9.0-rc1-0.7-default+ #34
[  207.273180] Call Trace:
[  207.273192]  [<ffffffff810ab373>] print_circular_bug+0x223/0x310
[  207.273204]  [<ffffffff810ae002>] __lock_acquire+0x13b2/0x15f0
[  207.273216]  [<ffffffff812086b0>] ? sysfs_hash_and_remove+0x60/0xc0
[  207.273227]  [<ffffffff810ae329>] lock_acquire+0xe9/0x120
[  207.273239]  [<ffffffff8134af27>] ? bus_remove_device+0x37/0x1c0
[  207.273251]  [<ffffffff814ad807>] mutex_lock_nested+0x37/0x360
[  207.273263]  [<ffffffff8134af27>] ? bus_remove_device+0x37/0x1c0
[  207.273274]  [<ffffffff812086b0>] ? sysfs_hash_and_remove+0x60/0xc0
[  207.273286]  [<ffffffff8134af27>] bus_remove_device+0x37/0x1c0
[  207.273298]  [<ffffffff81349114>] device_del+0x134/0x1f0
[  207.273309]  [<ffffffff813491f2>] device_unregister+0x22/0x60
[  207.273321]  [<ffffffff814a24ea>] mce_cpu_callback+0x15e/0x1ad
[  207.273332]  [<ffffffff814b6402>] notifier_call_chain+0x72/0x130
[  207.273344]  [<ffffffff81073d6e>] __raw_notifier_call_chain+0xe/0x10
[  207.273356]  [<ffffffff81498f76>] _cpu_down+0x1d6/0x350
[  207.273368]  [<ffffffff81027557>] ? cpu_hotplug_driver_lock+0x17/0x20
[  207.273380]  [<ffffffff81499130>] cpu_down+0x40/0x60
[  207.273391]  [<ffffffff8149cc55>] store_online+0x75/0xe0
[  207.273402]  [<ffffffff813474a0>] dev_attr_store+0x20/0x30
[  207.273413]  [<ffffffff812090d9>] sysfs_write_file+0xd9/0x150
[  207.273425]  [<ffffffff8118e10b>] vfs_write+0xcb/0x130
[  207.273436]  [<ffffffff8118e924>] sys_write+0x64/0xa0
[  207.273447]  [<ffffffff814bb599>] system_call_fastpath+0x16/0x1b

Which reports a false possitive deadlock because it sees:
1) load_module -> subsys_interface_register -> mc_deveice_add (*) -> subsys->p->mutex -> link_path_walk -> lookup_slow -> i_mutex
2) sys_write -> _cpu_down -> cpu_hotplug_begin -> cpu_hotplug.lock -> mce_cpu_callback -> mce_device_remove(**) -> device_unregister -> bus_remove_device -> subsys mutex
3) vfs_readdir -> i_mutex -> filldir64 -> might_fault -> might_lock_read(mmap_sem) -> page_fault -> mmap_sem -> drain_all_stock -> cpu_hotplug.lock

but
1) takes cpu_subsys subsys (*) but 2) takes mce_device subsys (**) so
the deadlock is not possible AFAICS.

The fix is quite simple. We can pull the key inside bus_type structure
because they are defined per device so the pointer will be unique as
well. bus_register doesn't need to be a macro anymore so change it
to the inline. We could get rid of __bus_register as there is no other
caller but maybe somebody will want to use a different key so keep it
around for now.

Reported-by: Li Zefan <[email protected]>
Signed-off-by: Michal Hocko <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
M1cha pushed a commit that referenced this pull request Feb 19, 2014
With deferred setup for SCO, it is possible that userspace closes the
socket when it is in the BT_CONNECT2 state, after the Connect Request is
received but before the Accept Synchonous Connection is sent.

If this happens the following crash was observed, when the connection is
terminated:

[  +0.000003] hci_sync_conn_complete_evt: hci0 status 0x10
[  +0.000005] sco_connect_cfm: hcon ffff88003d1bd800 bdaddr 40:98:4e:32:d7:39 status 16
[  +0.000003] sco_conn_del: hcon ffff88003d1bd800 conn ffff88003cc8e300, err 110
[  +0.000015] BUG: unable to handle kernel NULL pointer dereference at 0000000000000199
[  +0.000906] IP: [<ffffffff810620dd>] __lock_acquire+0xed/0xe82
[  +0.000000] PGD 3d21f067 PUD 3d291067 PMD 0
[  +0.000000] Oops: 0002 [#1] SMP
[  +0.000000] Modules linked in: rfcomm bnep btusb bluetooth
[  +0.000000] CPU 0
[  +0.000000] Pid: 1481, comm: kworker/u:2H Not tainted 3.9.0-rc1-25019-gad82cdd #1 Bochs Bochs
[  +0.000000] RIP: 0010:[<ffffffff810620dd>]  [<ffffffff810620dd>] __lock_acquire+0xed/0xe82
[  +0.000000] RSP: 0018:ffff88003c3c19d8  EFLAGS: 00010002
[  +0.000000] RAX: 0000000000000001 RBX: 0000000000000246 RCX: 0000000000000000
[  +0.000000] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88003d1be868
[  +0.000000] RBP: ffff88003c3c1a98 R08: 0000000000000002 R09: 0000000000000000
[  +0.000000] R10: ffff88003d1be868 R11: ffff88003e20b000 R12: 0000000000000002
[  +0.000000] R13: ffff88003aaa8000 R14: 000000000000006e R15: ffff88003d1be850
[  +0.000000] FS:  0000000000000000(0000) GS:ffff88003e200000(0000) knlGS:0000000000000000
[  +0.000000] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  +0.000000] CR2: 0000000000000199 CR3: 000000003c1cb000 CR4: 00000000000006b0
[  +0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  +0.000000] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  +0.000000] Process kworker/u:2H (pid: 1481, threadinfo ffff88003c3c0000, task ffff88003aaa8000)
[  +0.000000] Stack:
[  +0.000000]  ffffffff81b16342 0000000000000000 0000000000000000 ffff88003d1be868
[  +0.000000]  ffffffff00000000 00018c0c7863e367 000000003c3c1a28 ffffffff8101efbd
[  +0.000000]  0000000000000000 ffff88003e3d2400 ffff88003c3c1a38 ffffffff81007c7a
[  +0.000000] Call Trace:
[  +0.000000]  [<ffffffff8101efbd>] ? kvm_clock_read+0x34/0x3b
[  +0.000000]  [<ffffffff81007c7a>] ? paravirt_sched_clock+0x9/0xd
[  +0.000000]  [<ffffffff81007fd4>] ? sched_clock+0x9/0xb
[  +0.000000]  [<ffffffff8104fd7a>] ? sched_clock_local+0x12/0x75
[  +0.000000]  [<ffffffff810632d1>] lock_acquire+0x93/0xb1
[  +0.000000]  [<ffffffffa0022339>] ? spin_lock+0x9/0xb [bluetooth]
[  +0.000000]  [<ffffffff8105f3d8>] ? lock_release_holdtime.part.22+0x4e/0x55
[  +0.000000]  [<ffffffff814f6038>] _raw_spin_lock+0x40/0x74
[  +0.000000]  [<ffffffffa0022339>] ? spin_lock+0x9/0xb [bluetooth]
[  +0.000000]  [<ffffffff814f6936>] ? _raw_spin_unlock+0x23/0x36
[  +0.000000]  [<ffffffffa0022339>] spin_lock+0x9/0xb [bluetooth]
[  +0.000000]  [<ffffffffa00230cc>] sco_conn_del+0x76/0xbb [bluetooth]
[  +0.000000]  [<ffffffffa002391d>] sco_connect_cfm+0x2da/0x2e9 [bluetooth]
[  +0.000000]  [<ffffffffa000862a>] hci_proto_connect_cfm+0x38/0x65 [bluetooth]
[  +0.000000]  [<ffffffffa0008d30>] hci_sync_conn_complete_evt.isra.79+0x11a/0x13e [bluetooth]
[  +0.000000]  [<ffffffffa000cd96>] hci_event_packet+0x153b/0x239d [bluetooth]
[  +0.000000]  [<ffffffff814f68ff>] ? _raw_spin_unlock_irqrestore+0x48/0x5c
[  +0.000000]  [<ffffffffa00025f6>] hci_rx_work+0xf3/0x2e3 [bluetooth]
[  +0.000000]  [<ffffffff8103efed>] process_one_work+0x1dc/0x30b
[  +0.000000]  [<ffffffff8103ef83>] ? process_one_work+0x172/0x30b
[  +0.000000]  [<ffffffff8103e07f>] ? spin_lock_irq+0x9/0xb
[  +0.000000]  [<ffffffff8103fc8d>] worker_thread+0x123/0x1d2
[  +0.000000]  [<ffffffff8103fb6a>] ? manage_workers+0x240/0x240
[  +0.000000]  [<ffffffff81044211>] kthread+0x9d/0xa5
[  +0.000000]  [<ffffffff81044174>] ? __kthread_parkme+0x60/0x60
[  +0.000000]  [<ffffffff814f75bc>] ret_from_fork+0x7c/0xb0
[  +0.000000]  [<ffffffff81044174>] ? __kthread_parkme+0x60/0x60
[  +0.000000] Code: d7 44 89 8d 50 ff ff ff 4c 89 95 58 ff ff ff e8 44 fc ff ff 44 8b 8d 50 ff ff ff 48 85 c0 4c 8b 95 58 ff ff ff 0f 84 7a 04 00 00 <f0> ff 80 98 01 00 00 83 3d 25 41 a7 00 00 45 8b b5 e8 05 00 00
[  +0.000000] RIP  [<ffffffff810620dd>] __lock_acquire+0xed/0xe82
[  +0.000000]  RSP <ffff88003c3c19d8>
[  +0.000000] CR2: 0000000000000199
[  +0.000000] ---[ end trace e73cd3b52352dd34 ]---

Cc: [email protected] [3.8]
Signed-off-by: Vinicius Costa Gomes <[email protected]>
Tested-by: Frederic Dalleau <[email protected]>
Signed-off-by: Gustavo Padovan <[email protected]>
M1cha pushed a commit that referenced this pull request Feb 19, 2014
We need to hand down parallel build options like the internal make
--jobserver-fds one so that parallel builds can also happen when
building perf from the toplevel directory.

Make it so #1!

Signed-off-by: Borislav Petkov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Steven Rostedt <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
M1cha pushed a commit that referenced this pull request Feb 19, 2014
The get_node_page_ra tries to:
1. grab or read a target node page for the given nid,
2. then, call ra_node_page to read other adjacent node pages in advance.

So, when we try to read a target node page by #1, we should submit bio with
READ_SYNC instead of READA.
And, in #2, READA should be used.

Signed-off-by: Jaegeuk Kim <[email protected]>
Reviewed-by: Namjae Jeon <[email protected]>
M1cha pushed a commit that referenced this pull request Feb 19, 2014
Credit distribution stats is currently implemented
only for SDIO. This fixes a crash in debugfs for
USB interface.

BUG: unable to handle kernel NULL pointer dereference at   (null)
IP: [<f91c2048>] read_file_credit_dist_stats+0x38/0x330 [ath6kl_core]
*pde = b62bd067
Oops: 0000 [#1] SMP

EIP: 0060:[<f91c2048>] EFLAGS: 00210246 CPU: 0
EIP is at read_file_credit_dist_stats+0x38/0x330 [ath6kl_core]
EAX: 00000000 EBX: e6f7a9c0 ECX: e7b148b8 EDX: 00000000
ESI: 000000c8 EDI: e7b14000 EBP: e6e09f64 ESP: e6e09f30
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process cat (pid: 4058, ti=e6e08000 task=e50cf230 task.ti=e6e08000)
Stack:
00008000 00000000 e6e09f64 c1132d3c 00004e71 e50cf230 00008000 089e4000
e7b148b8 00000000 e6f7a9c0 00008000 089e4000 e6e09f8c c11331fc e6e09f98
00000001 e6e09f7c f91c2010 e6e09fac e6f7a9c0 089e4877 089e4000 e6e09fac

	Call Trace:
	[<c1132d3c>] ? rw_verify_area+0x6c/0x120
	[<c11331fc>] vfs_read+0x8c/0x160
	[<f91c2010>] ? read_file_war_stats+0x130/0x130 [ath6kl_core]
	[<c113330d>] sys_read+0x3d/0x70
	[<c15755b4>] syscall_call+0x7/0xb
	[<c1570000>] ? fill_powernow_table_pstate+0x127/0x127

Cc: Ryan Hsu <[email protected]>
Signed-off-by: Mohammed Shafi Shajakhan <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
M1cha pushed a commit that referenced this pull request Feb 19, 2014
Commit 84c1754 (ext4: move work from io_end to inode) triggered a
regression when running xfstest #270 when the file system is mounted
with dioread_nolock.

The problem is that after ext4_evict_inode() calls ext4_ioend_wait(),
this guarantees that last io_end structure has been freed, but it does
not guarantee that the workqueue structure, which was moved into the
inode by commit 84c1754, is actually finished.  Once
ext4_flush_completed_IO() calls ext4_free_io_end() on CPU #1, this
will allow ext4_ioend_wait() to return on CPU #2, at which point the
evict_inode() codepath can race against the workqueue code on CPU #1
accessing EXT4_I(inode)->i_unwritten_work to find the next item of
work to do.

Fix this by calling cancel_work_sync() in ext4_ioend_wait(), which
will be renamed ext4_ioend_shutdown(), since it is only used by
ext4_evict_inode().  Also, move the call to ext4_ioend_shutdown()
until after truncate_inode_pages() and filemap_write_and_wait() are
called, to make sure all dirty pages have been written back and
flushed from the page cache first.

BUG: unable to handle kernel NULL pointer dereference at   (null)
IP: [<c01dda6a>] cwq_activate_delayed_work+0x3b/0x7e
*pdpt = 0000000030bc3001 *pde = 0000000000000000 
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
Modules linked in:
Pid: 6, comm: kworker/u:0 Not tainted 3.8.0-rc3-00013-g84c1754-dirty #91 Bochs Bochs
EIP: 0060:[<c01dda6a>] EFLAGS: 00010046 CPU: 0
EIP is at cwq_activate_delayed_work+0x3b/0x7e
EAX: 00000000 EBX: 00000000 ECX: f505fe54 EDX: 00000000
ESI: ed5b697c EDI: 00000006 EBP: f64b7e8c ESP: f64b7e84
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
CR0: 8005003b CR2: 00000000 CR3: 30bc2000 CR4: 000006f0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
Process kworker/u:0 (pid: 6, ti=f64b6000 task=f64b4160 task.ti=f64b6000)
Stack:
 f505fe00 00000006 f64b7e9c c01de3d7 f6435540 00000003 f64b7efc c01def1d
 f6435540 00000002 00000000 0000008a c16d0808 c040a10b c16d07d8 c16d08b0
 f505fe00 c16d0780 00000000 00000000 ee153df4 c1ce4a30 c17d0e30 00000000
Call Trace:
 [<c01de3d7>] cwq_dec_nr_in_flight+0x71/0xfb
 [<c01def1d>] process_one_work+0x5d8/0x637
 [<c040a10b>] ? ext4_end_bio+0x300/0x300
 [<c01e3105>] worker_thread+0x249/0x3ef
 [<c01ea317>] kthread+0xd8/0xeb
 [<c01e2ebc>] ? manage_workers+0x4bb/0x4bb
 [<c023a370>] ? trace_hardirqs_on+0x27/0x37
 [<c0f1b4b7>] ret_from_kernel_thread+0x1b/0x28
 [<c01ea23f>] ? __init_kthread_worker+0x71/0x71
Code: 01 83 15 ac ff 6c c1 00 31 db 89 c6 8b 00 a8 04 74 12 89 c3 30 db 83 05 b0 ff 6c c1 01 83 15 b4 ff 6c c1 00 89 f0 e8 42 ff ff ff <8b> 13 89 f0 83 05 b8 ff 6c c1
 6c c1 00 31 c9 83
EIP: [<c01dda6a>] cwq_activate_delayed_work+0x3b/0x7e SS:ESP 0068:f64b7e84
CR2: 0000000000000000
---[ end trace a1923229da53d8a4 ]---

Signed-off-by: "Theodore Ts'o" <[email protected]>
Cc: Jan Kara <[email protected]>
M1cha pushed a commit that referenced this pull request Sep 27, 2015
When midi function is created, 'id' attribute is initialized with
SNDRV_DEFAULT_STR1, which is NULL pointer. Trying to read this attribute
before filling it ends up with segmentation fault.

This commit fix this issue by preventing null pointer dereference. Now
f_midi_opts_id_show() returns empty string when id is a null pointer.

Reproduction path:

$ mkdir functions/midi.0
$ cat functions/midi.0/id

[   53.130132] Unable to handle kernel NULL pointer dereference at
virtual address 00000000
[   53.132630] pgd = ec6cc000
[   53.135308] [00000000] *pgd=6b759831, *pte=00000000, *ppte=00000000
[   53.141530] Internal error: Oops: 17 [#1] PREEMPT SMP ARM
[   53.146904] Modules linked in: usb_f_midi snd_rawmidi libcomposite
[   53.153071] CPU: 1 PID: 2936 Comm: cat Not tainted
3.19.0-00041-gcf4b216 #7
[   53.160010] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[   53.166088] task: ee234c80 ti: ec764000 task.ti: ec764000
[   53.171482] PC is at strlcpy+0x8/0x60
[   53.175128] LR is at f_midi_opts_id_show+0x28/0x3c [usb_f_midi]
[   53.181019] pc : [<c0222a9c>]    lr : [<bf01bed0>]    psr: 60000053
[   53.181019] sp : ec765ef8  ip : 00000141  fp : 00000000
[   53.192474] r10: 00019000  r9 : ed7546c0  r8 : 00010000
[   53.197682] r7 : ec765f80  r6 : eb46a000  r5 : eb46a000  r4 :
ed754734
[   53.204192] r3 : ee234c80  r2 : 00001000  r1 : 00000000  r0 :
eb46a000
[   53.210704] Flags: nZCv  IRQs on  FIQs off  Mode SVC_32  ISA ARM
Segment user
[   53.217907] Control: 10c5387d  Table: 6c6cc04a  DAC: 00000015
[   53.223636] Process cat (pid: 2936, stack limit = 0xec764238)
[   53.229364] Stack: (0xec765ef8 to 0xec766000)
[   53.233706] 5ee0:
ed754734 ed7546c0
[   53.241866] 5f00: eb46a000 bf01bed0 eb753b80 bf01cc44 eb753b98
bf01b0a4 bf01b08c c0125dd0
[   53.250025] 5f20: 00002f19 00000000 ec432e00 bf01cce8 c0530c00
00019000 00010000 ec765f80
[   53.258184] 5f40: 00010000 ec764000 00019000 c00cc4ac ec432e00
c00cc55c 00000017 000081a4
[   53.266343] 5f60: 00000001 00000000 00000000 ec432e00 ec432e00
00010000 00019000 c00cc620
[   53.274502] 5f80: 00000000 00000000 00000000 00010000 ffff1000
00019000 00000003 c000e9a8
[   53.282662] 5fa0: 00000000 c000e7e0 00010000 ffff1000 00000003
00019000 00010000 00019000
[   53.290821] 5fc0: 00010000 ffff1000 00019000 00000003 7fffe000
00000001 00000000 00000000
[   53.298980] 5fe0: 00000000 be8c68d4 0000b995 b6f0e3e6 40000070
00000003 00000000 00000000
[   53.307157] [<c0222a9c>] (strlcpy) from [<bf01bed0>]
(f_midi_opts_id_show+0x28/0x3c [usb_f_midi])
[   53.316006] [<bf01bed0>] (f_midi_opts_id_show [usb_f_midi]) from
[<bf01b0a4>] (f_midi_opts_attr_show+0x18/0x24 )
[   53.327209] [<bf01b0a4>] (f_midi_opts_attr_show [usb_f_midi]) from
[<c0125dd0>] (configfs_read_file+0x9c/0xec)
[   53.337180] [<c0125dd0>] (configfs_read_file) from [<c00cc4ac>]
(__vfs_read+0x18/0x4c)
[   53.345073] [<c00cc4ac>] (__vfs_read) from [<c00cc55c>]
(vfs_read+0x7c/0x100)
[   53.352190] [<c00cc55c>] (vfs_read) from [<c00cc620>]
(SyS_read+0x40/0x8c)
[   53.359056] [<c00cc620>] (SyS_read) from [<c000e7e0>]
(ret_fast_syscall+0x0/0x34)
[   53.366513] Code: ebffe3d3 e8bd8008 e92d4070 e1a05000 (e5d14000)
[   53.372641] ---[ end trace e4f53a4e233d98d0 ]---

BACKPORT FROM MAINLINE KERNEL

Signed-off-by: Pawel Szewczyk <[email protected]>
Signed-off-by: Felipe Balbi <[email protected]>
Change-Id: I7cb7fef0d8d8336d7801a99941824be2bf04f256
M1cha pushed a commit that referenced this pull request Sep 27, 2015
xt_socket_get[4|6]_sk() do not always increment sock refcount, which
causes confusion in xt_qtaguid module which is not aware of this fact
and drops the reference whether it should have or not. Fix it by
changing xt_socket_get[4|6]_sk() to always increment recount of returned
sock.

This should fix the following crash:

[  111.319523] BUG: failure at
/mnt/host/source/src/third_party/kernel/v3.18/net/ipv4/inet_timewait_sock.c:90/__inet_twsk_kill()!
[  111.331192] Kernel panic - not syncing: BUG!
[  111.335468] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G     U  W
3.18.0-06867-g268df91 #1
[  111.343810] Hardware name: Google Tegra210 Smaug Rev 1+ (DT)
[  111.349463] Call trace:
[  111.351917] [<ffffffc000207288>] dump_backtrace+0x0/0x10c
[  111.357314] [<ffffffc0002073a4>] show_stack+0x10/0x1c
[  111.362367] [<ffffffc000a82d1c>] dump_stack+0x74/0x94
[  111.367414] [<ffffffc000a81824>] panic+0xec/0x238
[  111.372116] [<ffffffc000981648>] __inet_twsk_kill+0xd0/0xf8
[  111.377684] [<ffffffc0009817b0>] inet_twdr_do_twkill_work+0x64/0xd0
[  111.383946] [<ffffffc000981a5c>] inet_twdr_hangman+0x2c/0xa4
[  111.389602] [<ffffffc000271cf0>] call_timer_fn+0xac/0x160
[  111.394995] [<ffffffc00027250c>] run_timer_softirq+0x23c/0x274
[  111.400824] [<ffffffc000220a68>] __do_softirq+0x1a4/0x330
[  111.406218] [<ffffffc000220e94>] irq_exit+0x70/0xd0
[  111.411093] [<ffffffc000264e00>] __handle_domain_irq+0x84/0xa8
[  111.416922] [<ffffffc0002003ec>] gic_handle_irq+0x4c/0x80

b/22476945

Originally reviewed at:
https://chromium-review.googlesource.com/#/c/297414/

Signed-off-by: Dmitry Torokhov <[email protected]>
M1cha pushed a commit that referenced this pull request Sep 30, 2015
…ssion()

commit 3dc91d4 upstream.

While running stress tests on adding and deleting ftrace instances I hit
this bug:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
  IP: selinux_inode_permission+0x85/0x160
  PGD 63681067 PUD 7ddbe067 PMD 0
  Oops: 0000 [#1] PREEMPT
  CPU: 0 PID: 5634 Comm: ftrace-test-mki Not tainted 3.13.0-rc4-test-00033-gd2a6dde-dirty #20
  Hardware name:                  /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006
  task: ffff880078375800 ti: ffff88007ddb0000 task.ti: ffff88007ddb0000
  RIP: 0010:[<ffffffff812d8bc5>]  [<ffffffff812d8bc5>] selinux_inode_permission+0x85/0x160
  RSP: 0018:ffff88007ddb1c48  EFLAGS: 00010246
  RAX: 0000000000000000 RBX: 0000000000800000 RCX: ffff88006dd43840
  RDX: 0000000000000001 RSI: 0000000000000081 RDI: ffff88006ee46000
  RBP: ffff88007ddb1c88 R08: 0000000000000000 R09: ffff88007ddb1c54
  R10: 6e6576652f6f6f66 R11: 0000000000000003 R12: 0000000000000000
  R13: 0000000000000081 R14: ffff88006ee46000 R15: 0000000000000000
  FS:  00007f217b5b6700(0000) GS:ffffffff81e21000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M
  CR2: 0000000000000020 CR3: 000000006a0fe000 CR4: 00000000000007f0
  Call Trace:
    security_inode_permission+0x1c/0x30
    __inode_permission+0x41/0xa0
    inode_permission+0x18/0x50
    link_path_walk+0x66/0x920
    path_openat+0xa6/0x6c0
    do_filp_open+0x43/0xa0
    do_sys_open+0x146/0x240
    SyS_open+0x1e/0x20
    system_call_fastpath+0x16/0x1b
  Code: 84 a1 00 00 00 81 e3 00 20 00 00 89 d8 83 c8 02 40 f6 c6 04 0f 45 d8 40 f6 c6 08 74 71 80 cf 02 49 8b 46 38 4c 8d 4d cc 45 31 c0 <0f> b7 50 20 8b 70 1c 48 8b 41 70 89 d9 8b 78 04 e8 36 cf ff ff
  RIP  selinux_inode_permission+0x85/0x160
  CR2: 0000000000000020

Investigating, I found that the inode->i_security was NULL, and the
dereference of it caused the oops.

in selinux_inode_permission():

	isec = inode->i_security;

	rc = avc_has_perm_noaudit(sid, isec->sid, isec->sclass, perms, 0, &avd);

Note, the crash came from stressing the deletion and reading of debugfs
files.  I was not able to recreate this via normal files.  But I'm not
sure they are safe.  It may just be that the race window is much harder
to hit.

What seems to have happened (and what I have traced), is the file is
being opened at the same time the file or directory is being deleted.
As the dentry and inode locks are not held during the path walk, nor is
the inodes ref counts being incremented, there is nothing saving these
structures from being discarded except for an rcu_read_lock().

The rcu_read_lock() protects against freeing of the inode, but it does
not protect freeing of the inode_security_struct.  Now if the freeing of
the i_security happens with a call_rcu(), and the i_security field of
the inode is not changed (it gets freed as the inode gets freed) then
there will be no issue here.  (Linus Torvalds suggested not setting the
field to NULL such that we do not need to check if it is NULL in the
permission check).

Note, this is a hack, but it fixes the problem at hand.  A real fix is
to restructure the destroy_inode() to call all the destructor handlers
from the RCU callback.  But that is a major job to do, and requires a
lot of work.  For now, we just band-aid this bug with this fix (it
works), and work on a more maintainable solution in the future.

Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]

Signed-off-by: Steven Rostedt <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
M1cha pushed a commit that referenced this pull request Sep 30, 2015
Setting an empty security context (length=0) on a file will
lead to incorrectly dereferencing the type and other fields
of the security context structure, yielding a kernel BUG.
As a zero-length security context is never valid, just reject
all such security contexts whether coming from userspace
via setxattr or coming from the filesystem upon a getxattr
request by SELinux.

Setting a security context value (empty or otherwise) unknown to
SELinux in the first place is only possible for a root process
(CAP_MAC_ADMIN), and, if running SELinux in enforcing mode, only
if the corresponding SELinux mac_admin permission is also granted
to the domain by policy.  In Fedora policies, this is only allowed for
specific domains such as livecd for setting down security contexts
that are not defined in the build host policy.

[On Android, this can only be set by root/CAP_MAC_ADMIN processes,
and if running SELinux in enforcing mode, only if mac_admin permission
is granted in policy.  In Android 4.4, this would only be allowed for
root/CAP_MAC_ADMIN processes that are also in unconfined domains. In current
AOSP master, mac_admin is not allowed for any domains except the recovery
console which has a legitimate need for it.  The other potential vector
is mounting a maliciously crafted filesystem for which SELinux fetches
xattrs (e.g. an ext4 filesystem on a SDcard).  However, the end result is
only a local denial-of-service (DOS) due to kernel BUG.  This fix is
queued for 3.14.]

Reproducer:
su
setenforce 0
touch foo
setfattr -n security.selinux foo

Caveat:
Relabeling or removing foo after doing the above may not be possible
without booting with SELinux disabled.  Any subsequent access to foo
after doing the above will also trigger the BUG.

BUG output from Matthew Thode:
[  473.893141] ------------[ cut here ]------------
[  473.962110] kernel BUG at security/selinux/ss/services.c:654!
[  473.995314] invalid opcode: 0000 [#6] SMP
[  474.027196] Modules linked in:
[  474.058118] CPU: 0 PID: 8138 Comm: ls Tainted: G      D   I
3.13.0-grsec #1
[  474.116637] Hardware name: Supermicro X8ST3/X8ST3, BIOS 2.0
07/29/10
[  474.149768] task: ffff8805f50cd010 ti: ffff8805f50cd488 task.ti:
ffff8805f50cd488
[  474.183707] RIP: 0010:[<ffffffff814681c7>]  [<ffffffff814681c7>]
context_struct_compute_av+0xce/0x308
[  474.219954] RSP: 0018:ffff8805c0ac3c38  EFLAGS: 00010246
[  474.252253] RAX: 0000000000000000 RBX: ffff8805c0ac3d94 RCX:
0000000000000100
[  474.287018] RDX: ffff8805e8aac000 RSI: 00000000ffffffff RDI:
ffff8805e8aaa000
[  474.321199] RBP: ffff8805c0ac3cb8 R08: 0000000000000010 R09:
0000000000000006
[  474.357446] R10: 0000000000000000 R11: ffff8805c567a000 R12:
0000000000000006
[  474.419191] R13: ffff8805c2b74e88 R14: 00000000000001da R15:
0000000000000000
[  474.453816] FS:  00007f2e75220800(0000) GS:ffff88061fc00000(0000)
knlGS:0000000000000000
[  474.489254] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  474.522215] CR2: 00007f2e74716090 CR3: 00000005c085e000 CR4:
00000000000207f0
[  474.556058] Stack:
[  474.584325]  ffff8805c0ac3c98 ffffffff811b549b ffff8805c0ac3c98
ffff8805f1190a40
[  474.618913]  ffff8805a6202f08 ffff8805c2b74e88 00068800d0464990
ffff8805e8aac860
[  474.653955]  ffff8805c0ac3cb8 000700068113833a ffff880606c75060
ffff8805c0ac3d94
[  474.690461] Call Trace:
[  474.723779]  [<ffffffff811b549b>] ? lookup_fast+0x1cd/0x22a
[  474.778049]  [<ffffffff81468824>] security_compute_av+0xf4/0x20b
[  474.811398]  [<ffffffff8196f419>] avc_compute_av+0x2a/0x179
[  474.843813]  [<ffffffff8145727b>] avc_has_perm+0x45/0xf4
[  474.875694]  [<ffffffff81457d0e>] inode_has_perm+0x2a/0x31
[  474.907370]  [<ffffffff81457e76>] selinux_inode_getattr+0x3c/0x3e
[  474.938726]  [<ffffffff81455cf6>] security_inode_getattr+0x1b/0x22
[  474.970036]  [<ffffffff811b057d>] vfs_getattr+0x19/0x2d
[  475.000618]  [<ffffffff811b05e5>] vfs_fstatat+0x54/0x91
[  475.030402]  [<ffffffff811b063b>] vfs_lstat+0x19/0x1b
[  475.061097]  [<ffffffff811b077e>] SyS_newlstat+0x15/0x30
[  475.094595]  [<ffffffff8113c5c1>] ? __audit_syscall_entry+0xa1/0xc3
[  475.148405]  [<ffffffff8197791e>] system_call_fastpath+0x16/0x1b
[  475.179201] Code: 00 48 85 c0 48 89 45 b8 75 02 0f 0b 48 8b 45 a0 48
8b 3d 45 d0 b6 00 8b 40 08 89 c6 ff ce e8 d1 b0 06 00 48 85 c0 49 89 c7
75 02 <0f> 0b 48 8b 45 b8 4c 8b 28 eb 1e 49 8d 7d 08 be 80 01 00 00 e8
[  475.255884] RIP  [<ffffffff814681c7>]
context_struct_compute_av+0xce/0x308
[  475.296120]  RSP <ffff8805c0ac3c38>
[  475.328734] ---[ end trace f076482e9d754adc ]---

[sds:  commit message edited to note Android implications and
to generate a unique Change-Id for gerrit]

Change-Id: I4d5389f0cfa72b5f59dada45081fa47e03805413
Reported-by:  Matthew Thode <[email protected]>
Signed-off-by: Stephen Smalley <[email protected]>
Cc: [email protected]
Signed-off-by: Paul Moore <[email protected]>
M1cha pushed a commit that referenced this pull request Sep 30, 2015
There are a couple of seq_files which use the single_open() interface.
This interface requires that the whole output must fit into a single
buffer.

E.g.  for /proc/stat allocation failures have been observed because an
order-4 memory allocation failed due to memory fragmentation.  In such
situations reading /proc/stat is not possible anymore.

Therefore change the seq_file code to fallback to vmalloc allocations
which will usually result in a couple of order-0 allocations and hence
also work if memory is fragmented.

For reference a call trace where reading from /proc/stat failed:

  sadc: page allocation failure: order:4, mode:0x1040d0
  CPU: 1 PID: 192063 Comm: sadc Not tainted 3.10.0-123.el7.s390x #1
  [...]
  Call Trace:
    show_stack+0x6c/0xe8
    warn_alloc_failed+0xd6/0x138
    __alloc_pages_nodemask+0x9da/0xb68
    __get_free_pages+0x2e/0x58
    kmalloc_order_trace+0x44/0xc0
    stat_open+0x5a/0xd8
    proc_reg_open+0x8a/0x140
    do_dentry_open+0x1bc/0x2c8
    finish_open+0x46/0x60
    do_last+0x382/0x10d0
    path_openat+0xc8/0x4f8
    do_filp_open+0x46/0xa8
    do_sys_open+0x114/0x1f0
    sysc_tracego+0x14/0x1a

Signed-off-by: Heiko Carstens <[email protected]>
Tested-by: David Rientjes <[email protected]>
Cc: Ian Kent <[email protected]>
Cc: Hendrik Brueckner <[email protected]>
Cc: Thorsten Diehl <[email protected]>
Cc: Andrea Righi <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Stefan Bader <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Conflicts:
	fs/seq_file.c

Change-Id: I009080dd017b020ffd5e812e5b472bdb8349217a
M1cha pushed a commit that referenced this pull request Sep 30, 2015
An issue was observed when a userspace task exits.
The page which hits error here is the zero page.
In binder mmap, the whole of vma is not mapped.
On a task crash, when debuggerd reads the binder regions,
the unmapped areas fall to do_anonymous_page in handle_pte_fault,
due to the absence of a vm_fault handler. This results in
zero page being mapped. Later in zap_pte_range, vm_normal_page
returns zero page in the case of VM_MIXEDMAP and it results in the
error.

BUG: Bad page map in process mediaserver  pte:9dff379f pmd:9bfbd831
page:c0ed8e60 count:1 mapcount:-1 mapping:  (null) index:0x0
page flags: 0x404(referenced|reserved)
addr:40c3f000 vm_flags:10220051 anon_vma:  (null) mapping:d9fe0764 index:fd
vma->vm_ops->fault:   (null)
vma->vm_file->f_op->mmap: binder_mmap+0x0/0x274
CPU: 0 PID: 1463 Comm: mediaserver Tainted: G        W    3.10.17+ #1
[<c001549c>] (unwind_backtrace+0x0/0x11c) from [<c001200c>] (show_stack+0x10/0x14)
[<c001200c>] (show_stack+0x10/0x14) from [<c0103d78>] (print_bad_pte+0x158/0x190)
[<c0103d78>] (print_bad_pte+0x158/0x190) from [<c01055f0>] (unmap_single_vma+0x2e4/0x598)
[<c01055f0>] (unmap_single_vma+0x2e4/0x598) from [<c010618c>] (unmap_vmas+0x34/0x50)
[<c010618c>] (unmap_vmas+0x34/0x50) from [<c010a9e4>] (exit_mmap+0xc8/0x1e8)
[<c010a9e4>] (exit_mmap+0xc8/0x1e8) from [<c00520f0>] (mmput+0x54/0xd0)
[<c00520f0>] (mmput+0x54/0xd0) from [<c005972c>] (do_exit+0x360/0x990)
[<c005972c>] (do_exit+0x360/0x990) from [<c0059ef0>] (do_group_exit+0x84/0xc0)
[<c0059ef0>] (do_group_exit+0x84/0xc0) from [<c0066de0>] (get_signal_to_deliver+0x4d4/0x548)
[<c0066de0>] (get_signal_to_deliver+0x4d4/0x548) from [<c0011500>] (do_signal+0xa8/0x3b8)

Add a vm_fault handler which returns VM_FAULT_SIGBUS, and prevents the
wrong fallback to do_anonymous_page.

Change-Id: I43c227e489f74f4907f199caf99f571b61883064
Signed-off-by: Vinayak Menon <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
M1cha pushed a commit that referenced this pull request Sep 30, 2015
While disabling ConfigFS Android gadget, android_disconnect() calls
kill_all_hid_devices(), if CONFIG_USB_CONFIGFS_F_ACC is enabled, to free
the registered HIDs without checking whether the USB accessory device
really exist or not. If USB accessory device doesn't exist then we run into
following kernel panic:
----8<----
[  136.724761] Unable to handle kernel NULL pointer dereference at virtual address 00000064
[  136.724809] pgd = c0204000
[  136.731924] [00000064] *pgd=00000000
[  136.737830] Internal error: Oops: 5 [#1] SMP ARM
[  136.738108] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.18.0-rc4-00400-gf75300e-dirty #76
[  136.742788] task: c0fb19d8 ti: c0fa4000 task.ti: c0fa4000
[  136.750890] PC is at _raw_spin_lock_irqsave+0x24/0x60
[  136.756246] LR is at kill_all_hid_devices+0x24/0x114
---->8----

This patch adds a test to check if USB Accessory device exists before freeing HIDs.

Change-Id: Ie229feaf0de3f4f7a151fcaa9a994e34e15ff73b
Signed-off-by: Amit Pundir <[email protected]>
(cherry picked from commit 32a71bce154cb89a549b9b7d28e8cf03b889d849)
sndnvaps pushed a commit to multirom-aries/android_kernel_xiaomi_aries that referenced this pull request May 8, 2016
ARM has some private syscalls (for example, set_tls(2)) which lie
outside the range of NR_syscalls.  If any of these are called while
syscall tracing is being performed, out-of-bounds array access will
occur in the ftrace and perf sys_{enter,exit} handlers.

 # trace-cmd record -e raw_syscalls:* true && trace-cmd report
 ...
 true-653   [000]   384.675777: sys_enter:            NR 192 (0, 1000, 3, 4000022, ffffffff, 0)
 true-653   [000]   384.675812: sys_exit:             NR 192 = 1995915264
 true-653   [000]   384.675971: sys_enter:            NR 983045 (76f74480, 76f74000, 76f74b28, 76f74480, 76f76f74, 1)
 true-653   [000]   384.675988: sys_exit:             NR 983045 = 0
 ...

 # trace-cmd record -e syscalls:* true
 [   17.289329] Unable to handle kernel paging request at virtual address aaaaaace
 [   17.289590] pgd = 9e71c000
 [   17.289696] [aaaaaace] *pgd=00000000
 [   17.289985] Internal error: Oops: 5 [M1cha#1] PREEMPT SMP ARM
 [   17.290169] Modules linked in:
 [   17.290391] CPU: 0 PID: 704 Comm: true Not tainted 3.18.0-rc2+ #21
 [   17.290585] task: 9f4dab00 ti: 9e710000 task.ti: 9e710000
 [   17.290747] PC is at ftrace_syscall_enter+0x48/0x1f8
 [   17.290866] LR is at syscall_trace_enter+0x124/0x184

Fix this by ignoring out-of-NR_syscalls-bounds syscall numbers.

Commit cd0980f "tracing: Check invalid syscall nr while tracing syscalls"
added the check for less than zero, but it should have also checked
for greater than NR_syscalls.

Link: http://lkml.kernel.org/p/[email protected]

Fixes: cd0980f "tracing: Check invalid syscall nr while tracing syscalls"
Cc: [email protected] # 2.6.33+
Signed-off-by: Rabin Vincent <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>

Conflicts:
	kernel/trace/trace_syscalls.c

Change-Id: I512142f8f1e1b2a8dc063209666dbce9737377e7
sndnvaps pushed a commit to multirom-aries/android_kernel_xiaomi_aries that referenced this pull request May 8, 2016
If a user key gets negatively instantiated, an error code is cached in the
payload area.  A negatively instantiated key may be then be positively
instantiated by updating it with valid data.  However, the ->update key
type method must be aware that the error code may be there.

The following may be used to trigger the bug in the user key type:

    keyctl request2 user user "" @U
    keyctl add user user "a" @U

which manifests itself as:

	BUG: unable to handle kernel paging request at 00000000ffffff8a
	IP: [<ffffffff810a376f>] __call_rcu.constprop.76+0x1f/0x280 kernel/rcu/tree.c:3046
	PGD 7cc30067 PUD 0
	Oops: 0002 [M1cha#1] SMP
	Modules linked in:
	CPU: 3 PID: 2644 Comm: a.out Not tainted 4.3.0+ #49
	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
	task: ffff88003ddea700 ti: ffff88003dd88000 task.ti: ffff88003dd88000
	RIP: 0010:[<ffffffff810a376f>]  [<ffffffff810a376f>] __call_rcu.constprop.76+0x1f/0x280
	 [<ffffffff810a376f>] __call_rcu.constprop.76+0x1f/0x280 kernel/rcu/tree.c:3046
	RSP: 0018:ffff88003dd8bdb0  EFLAGS: 00010246
	RAX: 00000000ffffff82 RBX: 0000000000000000 RCX: 0000000000000001
	RDX: ffffffff81e3fe40 RSI: 0000000000000000 RDI: 00000000ffffff82
	RBP: ffff88003dd8bde0 R08: ffff88007d2d2da0 R09: 0000000000000000
	R10: 0000000000000000 R11: ffff88003e8073c0 R12: 00000000ffffff82
	R13: ffff88003dd8be68 R14: ffff88007d027600 R15: ffff88003ddea700
	FS:  0000000000b92880(0063) GS:ffff88007fd00000(0000) knlGS:0000000000000000
	CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
	CR2: 00000000ffffff8a CR3: 000000007cc5f000 CR4: 00000000000006e0
	Stack:
	 ffff88003dd8bdf0 ffffffff81160a8a 0000000000000000 00000000ffffff82
	 ffff88003dd8be68 ffff88007d027600 ffff88003dd8bdf0 ffffffff810a39e5
	 ffff88003dd8be20 ffffffff812a31ab ffff88007d027600 ffff88007d027620
	Call Trace:
	 [<ffffffff810a39e5>] kfree_call_rcu+0x15/0x20 kernel/rcu/tree.c:3136
	 [<ffffffff812a31ab>] user_update+0x8b/0xb0 security/keys/user_defined.c:129
	 [<     inline     >] __key_update security/keys/key.c:730
	 [<ffffffff8129e5c1>] key_create_or_update+0x291/0x440 security/keys/key.c:908
	 [<     inline     >] SYSC_add_key security/keys/keyctl.c:125
	 [<ffffffff8129fc21>] SyS_add_key+0x101/0x1e0 security/keys/keyctl.c:60
	 [<ffffffff8185f617>] entry_SYSCALL_64_fastpath+0x12/0x6a arch/x86/entry/entry_64.S:185

Note the error code (-ENOKEY) in EDX.

A similar bug can be tripped by:

    keyctl request2 trusted user "" @U
    keyctl add trusted user "a" @U

This should also affect encrypted keys - but that has to be correctly
parameterised or it will fail with EINVAL before getting to the bit that
will crashes.

Change-Id: I171d566f431c56208e1fe279f466d2d399a9ac7c
Reported-by: Dmitry Vyukov <[email protected]>
Signed-off-by: David Howells <[email protected]>
Acked-by: Mimi Zohar <[email protected]>
Signed-off-by: James Morris <[email protected]>
sndnvaps pushed a commit to multirom-aries/android_kernel_xiaomi_aries that referenced this pull request May 8, 2016
Below Kernel panic is observed due to race condition, where
sock_has_perm called in a thread and is trying to access sksec->sid
without checking sksec. Just before that, sk->sk_security was set
to NULL by selinux_sk_free_security through sk_free in other thread.

31704.949269:   <3> IPv4: Attempt to release alive inet socket dd81b200
31704.959049:   <1> Unable to handle kernel NULL pointer dereference at \
                        virtual address 00000000
31704.983562:   <1> pgd = c6b74000
31704.985248:   <1> [00000000] *pgd=00000000
31704.996591:   <0> Internal error: Oops: 5 [M1cha#1] PREEMPT SMP ARM
31705.001016:   <6> Modules linked in: adsprpc [last unloaded: wlan]
31705.006659:   <6> CPU: 1    Tainted: G           O  \
                        (3.4.0-g837ab9b-00003-g6bcd9c6 M1cha#1)
31705.014042:   <6> PC is at sock_has_perm+0x58/0xd4
31705.018292:   <6> LR is at sock_has_perm+0x58/0xd4
31705.022546:   <6> pc : [<c0341e8c>]    lr : [<c0341e8c>]    \
                                                  psr: 60000013
31705.022549:   <6> sp : dda27f00  ip : 00000000  fp : 5f36fc84
31705.034002:   <6> r10: 00004000  r9 : 0000009d  r8 : e8c2b700
31705.039211:   <6> r7 : dda27f24  r6 : dd81b200  r5 : 00000000  \
                                                  r4 : 00000000
31705.045721:   <6> r3 : 00000000  r2 : dda27ef8  r1 : 00000000  \
                                                  r0 : dda27f54
31705.052232:   <6> Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM \
                        Segment user
31705.059349:   <6> Control: 10c5787d  Table: 10d7406a  DAC: 00000015
.
.
.
.
31705.697816:   <6> [<c0341e8c>] (sock_has_perm+0x58/0xd4) from \
                    [<c033ed10>] (security_socket_getsockopt+0x14/0x1c)
31705.707534:   <6> [<c033ed10>] (security_socket_getsockopt+0x14/0x1c) \
                           from [<c0745c18>] (sys_getsockopt+0x34/0xa8)
31705.717343:   <6> [<c0745c18>] (sys_getsockopt+0x34/0xa8) from \
                    [<c0106140>] (ret_fast_syscall+0x0/0x30)
31705.726193:   <0> Code: e59832e8 e5933058 e5939004 ebfac736 (e5953000)
31705.732635:   <4> ---[ end trace 22889004dafd87bd ]---

Change-Id: I79c3fb525f35ea2494d53788788cd71a38a32d6b
Signed-off-by: Satya Durga Srinivasu Prabhala <[email protected]>
Signed-off-by: Osvaldo Banuelos <[email protected]>
sndnvaps pushed a commit to multirom-aries/android_kernel_xiaomi_aries that referenced this pull request May 8, 2016
An issue was observed when a userspace task exits.
The page which hits error here is the zero page.
In binder mmap, the whole of vma is not mapped.
On a task crash, when debuggerd reads the binder regions,
the unmapped areas fall to do_anonymous_page in handle_pte_fault,
due to the absence of a vm_fault handler. This results in
zero page being mapped. Later in zap_pte_range, vm_normal_page
returns zero page in the case of VM_MIXEDMAP and it results in the
error.

BUG: Bad page map in process mediaserver  pte:9dff379f pmd:9bfbd831
page:c0ed8e60 count:1 mapcount:-1 mapping:  (null) index:0x0
page flags: 0x404(referenced|reserved)
addr:40c3f000 vm_flags:10220051 anon_vma:  (null) mapping:d9fe0764 index:fd
vma->vm_ops->fault:   (null)
vma->vm_file->f_op->mmap: binder_mmap+0x0/0x274
CPU: 0 PID: 1463 Comm: mediaserver Tainted: G        W    3.10.17+ M1cha#1
[<c001549c>] (unwind_backtrace+0x0/0x11c) from [<c001200c>] (show_stack+0x10/0x14)
[<c001200c>] (show_stack+0x10/0x14) from [<c0103d78>] (print_bad_pte+0x158/0x190)
[<c0103d78>] (print_bad_pte+0x158/0x190) from [<c01055f0>] (unmap_single_vma+0x2e4/0x598)
[<c01055f0>] (unmap_single_vma+0x2e4/0x598) from [<c010618c>] (unmap_vmas+0x34/0x50)
[<c010618c>] (unmap_vmas+0x34/0x50) from [<c010a9e4>] (exit_mmap+0xc8/0x1e8)
[<c010a9e4>] (exit_mmap+0xc8/0x1e8) from [<c00520f0>] (mmput+0x54/0xd0)
[<c00520f0>] (mmput+0x54/0xd0) from [<c005972c>] (do_exit+0x360/0x990)
[<c005972c>] (do_exit+0x360/0x990) from [<c0059ef0>] (do_group_exit+0x84/0xc0)
[<c0059ef0>] (do_group_exit+0x84/0xc0) from [<c0066de0>] (get_signal_to_deliver+0x4d4/0x548)
[<c0066de0>] (get_signal_to_deliver+0x4d4/0x548) from [<c0011500>] (do_signal+0xa8/0x3b8)

Add a vm_fault handler which returns VM_FAULT_SIGBUS, and prevents the
wrong fallback to do_anonymous_page.

CRs-Fixed: 673147
Change-Id: I43730a51b6c819538b46c5e4dc5c96c8a384098d
Signed-off-by: Vinayak Menon <[email protected]>
Patch-mainline: linux-arm-kernel @ 06/02/14, 18:17
Signed-off-by: Vignesh Radhakrishnan <[email protected]>
Signed-off-by: Subbaraman Narayanamurthy <[email protected]>
sndnvaps pushed a commit to multirom-aries/android_kernel_xiaomi_aries that referenced this pull request May 20, 2016
commit 086ba77 upstream.

ARM has some private syscalls (for example, set_tls(2)) which lie
outside the range of NR_syscalls.  If any of these are called while
syscall tracing is being performed, out-of-bounds array access will
occur in the ftrace and perf sys_{enter,exit} handlers.

 # trace-cmd record -e raw_syscalls:* true && trace-cmd report
 ...
 true-653   [000]   384.675777: sys_enter:            NR 192 (0, 1000, 3, 4000022, ffffffff, 0)
 true-653   [000]   384.675812: sys_exit:             NR 192 = 1995915264
 true-653   [000]   384.675971: sys_enter:            NR 983045 (76f74480, 76f74000, 76f74b28, 76f74480, 76f76f74, 1)
 true-653   [000]   384.675988: sys_exit:             NR 983045 = 0
 ...

 # trace-cmd record -e syscalls:* true
 [   17.289329] Unable to handle kernel paging request at virtual address aaaaaace
 [   17.289590] pgd = 9e71c000
 [   17.289696] [aaaaaace] *pgd=00000000
 [   17.289985] Internal error: Oops: 5 [M1cha#1] PREEMPT SMP ARM
 [   17.290169] Modules linked in:
 [   17.290391] CPU: 0 PID: 704 Comm: true Not tainted 3.18.0-rc2+ #21
 [   17.290585] task: 9f4dab00 ti: 9e710000 task.ti: 9e710000
 [   17.290747] PC is at ftrace_syscall_enter+0x48/0x1f8
 [   17.290866] LR is at syscall_trace_enter+0x124/0x184

Fix this by ignoring out-of-NR_syscalls-bounds syscall numbers.

Commit cd0980f "tracing: Check invalid syscall nr while tracing syscalls"
added the check for less than zero, but it should have also checked
for greater than NR_syscalls.

Link: http://lkml.kernel.org/p/[email protected]

Fixes: cd0980f "tracing: Check invalid syscall nr while tracing syscalls"
Signed-off-by: Rabin Vincent <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <[email protected]>
sndnvaps pushed a commit to multirom-aries/android_kernel_xiaomi_aries that referenced this pull request May 20, 2016
commit 42d64e1 upstream.

The SELinux/NetLabel glue code has a locking bug that affects systems
with NetLabel enabled, see the kernel error message below.  This patch
corrects this problem by converting the bottom half socket lock to a
more conventional, and correct for this call-path, lock_sock() call.

 ===============================
 [ INFO: suspicious RCU usage. ]
 3.11.0-rc3+ #19 Not tainted
 -------------------------------
 net/ipv4/cipso_ipv4.c:1928 suspicious rcu_dereference_protected() usage!

 other info that might help us debug this:

 rcu_scheduler_active = 1, debug_locks = 0
 2 locks held by ping/731:
  #0:  (slock-AF_INET/1){+.-...}, at: [...] selinux_netlbl_socket_connect
  M1cha#1:  (rcu_read_lock){.+.+..}, at: [<...>] netlbl_conn_setattr

 stack backtrace:
 CPU: 1 PID: 731 Comm: ping Not tainted 3.11.0-rc3+ #19
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  0000000000000001 ffff88006f659d28 ffffffff81726b6a ffff88003732c500
  ffff88006f659d58 ffffffff810e4457 ffff88006b845a00 0000000000000000
  000000000000000c ffff880075aa2f50 ffff88006f659d90 ffffffff8169bec7
 Call Trace:
  [<ffffffff81726b6a>] dump_stack+0x54/0x74
  [<ffffffff810e4457>] lockdep_rcu_suspicious+0xe7/0x120
  [<ffffffff8169bec7>] cipso_v4_sock_setattr+0x187/0x1a0
  [<ffffffff8170f317>] netlbl_conn_setattr+0x187/0x190
  [<ffffffff8170f195>] ? netlbl_conn_setattr+0x5/0x190
  [<ffffffff8131ac9e>] selinux_netlbl_socket_connect+0xae/0xc0
  [<ffffffff81303025>] selinux_socket_connect+0x135/0x170
  [<ffffffff8119d127>] ? might_fault+0x57/0xb0
  [<ffffffff812fb146>] security_socket_connect+0x16/0x20
  [<ffffffff815d3ad3>] SYSC_connect+0x73/0x130
  [<ffffffff81739a85>] ? sysret_check+0x22/0x5d
  [<ffffffff810e5e2d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
  [<ffffffff81373d4e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
  [<ffffffff815d52be>] SyS_connect+0xe/0x10
  [<ffffffff81739a59>] system_call_fastpath+0x16/0x1b

Signed-off-by: Paul Moore <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
sndnvaps pushed a commit to multirom-aries/android_kernel_xiaomi_aries that referenced this pull request May 20, 2016
…ernor

For the sync_freq feature currently we check pcpu->policy->cur frequency
for each online cpu.  But for a CPU that isn't using interactive governor
or for an offline CPU, pcpu->policy can be null or an invalid value.
This patch tries to avoid that scenario by using pcpu->target_freq
instead of policy->cur to get the frequency of an online CPU.

Kernel crash without this patch:
[   20.132373] Unable to handle kernel NULL pointer dereference at virtual address 00000028
[   20.132375] pgd = c34f34c0
[   20.132377] pgd = ef6f2440
[   20.132383] [00000028] *pgd=00000000
[   20.132385]
[   20.132388] [00000028] *pgd=2e98f003, *pmd=00000000
[   20.132390] Internal error: Oops: 205 [M1cha#1] PREEMPT SMP ARM
[   20.132394] Modules linked in:
[   20.132398] CPU: 0 PID: 1560 Comm: chown Tainted: G        W    3.10.0-perf-gb12057b-00001-ga2c6c16-dirty #7
[   20.132401] task: ef9af300 ti: ee49c000 task.ti: ee49c000
[   20.132411] PC is at cpufreq_interactive_timer+0x10c/0x650
[   20.132415] LR is at cpufreq_interactive_timer+0x128/0x650
<snip>
[   20.133002] [<c07eb204>] (cpufreq_interactive_timer+0x10c/0x650) from [<c02804d8>] (call_timer_fn+0x80/0x198)
[   20.133012] [<c02804d8>] (call_timer_fn+0x80/0x198) from [<c0280acc>] (run_timer_softirq+0x1f8/0x270)
[   20.133019] [<c0280acc>] (run_timer_softirq+0x1f8/0x270) from [<c0279e20>] (__do_softirq+0x12c/0x2d4)
[   20.133025] [<c0279e20>] (__do_softirq+0x12c/0x2d4) from [<c027a2d4>] (irq_exit+0x74/0xc8)
[   20.133034] [<c027a2d4>] (irq_exit+0x74/0xc8) from [<c0206a00>] (handle_IRQ+0x68/0x8c)
[   20.133041] [<c0206a00>] (handle_IRQ+0x68/0x8c) from [<c02004b8>] (gic_handle_irq+0x3c/0x60)
[   20.133051] [<c02004b8>] (gic_handle_irq+0x3c/0x60) from [<c0ac6900>] (__irq_svc+0x40/0x70)
<snip>

Change-Id: I4adf0b35cd94004e06ea649c672988db45c54710
Signed-off-by: Vijay Ganti <[email protected]>
sndnvaps pushed a commit to multirom-aries/android_kernel_xiaomi_aries that referenced this pull request May 20, 2016
commit 6d1cff2a885850b78b40c34777b46cf5da5d1050 upstream.

We hit use after free on dereferncing pointer to task_smack struct in
smk_of_task() called from smack_task_to_inode().

task_security() macro uses task_cred_xxx() to get pointer to the task_smack.
task_cred_xxx() could be used only for non-pointer members of task's
credentials. It cannot be used for pointer members since what they point
to may disapper after dropping RCU read lock.

Mainly task_security() used this way:
	smk_of_task(task_security(p))

Intead of this introduce function smk_of_task_struct() which
takes task_struct as argument and returns pointer to smk_known struct
and do this under RCU read lock.
Bogus task_security() macro is not used anymore, so remove it.

KASan's report for this:

	AddressSanitizer: use after free in smack_task_to_inode+0x50/0x70 at addr c4635600
	=============================================================================
	BUG kmalloc-64 (Tainted: PO): kasan error
	-----------------------------------------------------------------------------

	Disabling lock debugging due to kernel taint
	INFO: Allocated in new_task_smack+0x44/0xd8 age=39 cpu=0 pid=1866
		kmem_cache_alloc_trace+0x88/0x1bc
		new_task_smack+0x44/0xd8
		smack_cred_prepare+0x48/0x21c
		security_prepare_creds+0x44/0x4c
		prepare_creds+0xdc/0x110
		smack_setprocattr+0x104/0x150
		security_setprocattr+0x4c/0x54
		proc_pid_attr_write+0x12c/0x194
		vfs_write+0x1b0/0x370
		SyS_write+0x5c/0x94
		ret_fast_syscall+0x0/0x48
	INFO: Freed in smack_cred_free+0xc4/0xd0 age=27 cpu=0 pid=1564
		kfree+0x270/0x290
		smack_cred_free+0xc4/0xd0
		security_cred_free+0x34/0x3c
		put_cred_rcu+0x58/0xcc
		rcu_process_callbacks+0x738/0x998
		__do_softirq+0x264/0x4cc
		do_softirq+0x94/0xf4
		irq_exit+0xbc/0x120
		handle_IRQ+0x104/0x134
		gic_handle_irq+0x70/0xac
		__irq_svc+0x44/0x78
		_raw_spin_unlock+0x18/0x48
		sync_inodes_sb+0x17c/0x1d8
		sync_filesystem+0xac/0xfc
		vdfs_file_fsync+0x90/0xc0
		vfs_fsync_range+0x74/0x7c
	INFO: Slab 0xd3b23f50 objects=32 used=31 fp=0xc4635600 flags=0x4080
	INFO: Object 0xc4635600 @offset=5632 fp=0x  (null)

	Bytes b4 c46355f0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a  ZZZZZZZZZZZZZZZZ
	Object c4635600: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
	Object c4635610: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
	Object c4635620: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
	Object c4635630: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5  kkkkkkkkkkkkkkk.
	Redzone c4635640: bb bb bb bb                                      ....
	Padding c46356e8: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a  ZZZZZZZZZZZZZZZZ
	Padding c46356f8: 5a 5a 5a 5a 5a 5a 5a 5a                          ZZZZZZZZ
	CPU: 5 PID: 834 Comm: launchpad_prelo Tainted: PBO 3.10.30 M1cha#1
	Backtrace:
	[<c00233a4>] (dump_backtrace+0x0/0x158) from [<c0023dec>] (show_stack+0x20/0x24)
	 r7:c4634010 r6:d3b23f50 r5:c4635600 r4:d1002140
	[<c0023dcc>] (show_stack+0x0/0x24) from [<c06d6d7c>] (dump_stack+0x20/0x28)
	[<c06d6d5c>] (dump_stack+0x0/0x28) from [<c01c1d50>] (print_trailer+0x124/0x144)
	[<c01c1c2c>] (print_trailer+0x0/0x144) from [<c01c1e88>] (object_err+0x3c/0x44)
	 r7:c4635600 r6:d1002140 r5:d3b23f50 r4:c4635600
	[<c01c1e4c>] (object_err+0x0/0x44) from [<c01cac18>] (kasan_report_error+0x2b8/0x538)
	 r6:d1002140 r5:d3b23f50 r4:c6429cf8 r3:c09e1aa7
	[<c01ca960>] (kasan_report_error+0x0/0x538) from [<c01c9430>] (__asan_load4+0xd4/0xf8)
	[<c01c935c>] (__asan_load4+0x0/0xf8) from [<c031e168>] (smack_task_to_inode+0x50/0x70)
	 r5:c4635600 r4:ca9da000
	[<c031e118>] (smack_task_to_inode+0x0/0x70) from [<c031af64>] (security_task_to_inode+0x3c/0x44)
	 r5:cca25e80 r4:c0ba9780
	[<c031af28>] (security_task_to_inode+0x0/0x44) from [<c023d614>] (pid_revalidate+0x124/0x178)
	 r6:00000000 r5:cca25e80 r4:cbabe3c0 r3:00008124
	[<c023d4f0>] (pid_revalidate+0x0/0x178) from [<c01db98c>] (lookup_fast+0x35c/0x43y4)
	 r9:c6429efc r8:00000101 r7:c079d940 r6:c6429e90 r5:c6429ed8 r4:c83c4148
	[<c01db630>] (lookup_fast+0x0/0x434) from [<c01deec8>] (do_last.isra.24+0x1c0/0x1108)
	[<c01ded08>] (do_last.isra.24+0x0/0x1108) from [<c01dff04>] (path_openat.isra.25+0xf4/0x648)
	[<c01dfe10>] (path_openat.isra.25+0x0/0x648) from [<c01e1458>] (do_filp_open+0x3c/0x88)
	[<c01e141c>] (do_filp_open+0x0/0x88) from [<c01ccb28>] (do_sys_open+0xf0/0x198)
	 r7:00000001 r6:c0ea2180 r5:0000000b r4:00000000
	[<c01cca38>] (do_sys_open+0x0/0x198) from [<c01ccc00>] (SyS_open+0x30/0x34)
	[<c01ccbd0>] (SyS_open+0x0/0x34) from [<c001db80>] (ret_fast_syscall+0x0/0x48)
	Read of size 4 by thread T834:
	Memory state around the buggy address:
	 c4635380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
	 c4635400: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
	 c4635480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
	 c4635500: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc
	 c4635580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
	>c4635600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
	           ^
	 c4635680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
	 c4635700: 00 00 00 00 04 fc fc fc fc fc fc fc fc fc fc fc
	 c4635780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
	 c4635800: 00 00 00 00 00 00 04 fc fc fc fc fc fc fc fc fc
	 c4635880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
	==================================================================

Signed-off-by: Andrey Ryabinin <[email protected]>
[lizf: Backported to 3.4:
 - smk_of_task() returns char* instead of smack_known *
 - replace task_security() with smk_of_task() with smk_of_task_struct()
   manually]
Signed-off-by: Zefan Li <[email protected]>
sndnvaps pushed a commit to multirom-aries/android_kernel_xiaomi_aries that referenced this pull request May 20, 2016
commit a28b2a47edcd0cb7c051b445f71a426000394606 upstream.

Passing zeroed drm_radeon_cs struct to DRM_IOCTL_RADEON_CS produces the
following oops.

Fix by always calling INIT_LIST_HEAD() to avoid the crash in list_sort().

----------------------------------

 #include <stdint.h>
 #include <fcntl.h>
 #include <unistd.h>
 #include <sys/ioctl.h>
 #include <drm/radeon_drm.h>

 static const struct drm_radeon_cs cs;

 int main(int argc, char **argv)
 {
         return ioctl(open(argv[1], O_RDWR), DRM_IOCTL_RADEON_CS, &cs);
 }

----------------------------------

[ttrantal@test2 ~]$ ./main /dev/dri/card0
[   46.904650] BUG: unable to handle kernel NULL pointer dereference at           (null)
[   46.905022] IP: [<ffffffff814d6df2>] list_sort+0x42/0x240
[   46.905022] PGD 68f29067 PUD 688b5067 PMD 0
[   46.905022] Oops: 0002 [M1cha#1] SMP
[   46.905022] CPU: 0 PID: 2413 Comm: main Not tainted 4.0.0-rc1+ #58
[   46.905022] Hardware name: Hewlett-Packard HP Compaq dc5750 Small Form Factor/0A64h, BIOS 786E3 v02.10 01/25/2007
[   46.905022] task: ffff880058e2bcc0 ti: ffff880058e64000 task.ti: ffff880058e64000
[   46.905022] RIP: 0010:[<ffffffff814d6df2>]  [<ffffffff814d6df2>] list_sort+0x42/0x240
[   46.905022] RSP: 0018:ffff880058e67998  EFLAGS: 00010246
[   46.905022] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[   46.905022] RDX: ffffffff81644410 RSI: ffff880058e67b40 RDI: ffff880058e67a58
[   46.905022] RBP: ffff880058e67a88 R08: 0000000000000000 R09: 0000000000000000
[   46.905022] R10: ffff880058e2bcc0 R11: ffffffff828e6ca0 R12: ffffffff81644410
[   46.905022] R13: ffff8800694b8018 R14: 0000000000000000 R15: ffff880058e679b0
[   46.905022] FS:  00007fdc65a65700(0000) GS:ffff88006d600000(0000) knlGS:0000000000000000
[   46.905022] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   46.905022] CR2: 0000000000000000 CR3: 0000000058dd9000 CR4: 00000000000006f0
[   46.905022] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   46.905022] DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
[   46.905022] Stack:
[   46.905022]  ffff880058e67b40 ffff880058e2bcc0 ffff880058e67a78 0000000000000000
[   46.905022]  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[   46.905022]  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[   46.905022] Call Trace:
[   46.905022]  [<ffffffff81644a65>] radeon_cs_parser_fini+0x195/0x220
[   46.905022]  [<ffffffff81645069>] radeon_cs_ioctl+0xa9/0x960
[   46.905022]  [<ffffffff815e1f7c>] drm_ioctl+0x19c/0x640
[   46.905022]  [<ffffffff810f8fdd>] ? trace_hardirqs_on_caller+0xfd/0x1c0
[   46.905022]  [<ffffffff810f90ad>] ? trace_hardirqs_on+0xd/0x10
[   46.905022]  [<ffffffff8160c066>] radeon_drm_ioctl+0x46/0x80
[   46.905022]  [<ffffffff81211868>] do_vfs_ioctl+0x318/0x570
[   46.905022]  [<ffffffff81462ef6>] ? selinux_file_ioctl+0x56/0x110
[   46.905022]  [<ffffffff81211b41>] SyS_ioctl+0x81/0xa0
[   46.905022]  [<ffffffff81dc6312>] system_call_fastpath+0x12/0x17
[   46.905022] Code: 48 89 b5 10 ff ff ff 0f 84 03 01 00 00 4c 8d bd 28 ff ff
ff 31 c0 48 89 fb b9 15 00 00 00 49 89 d4 4c 89 ff f3 48 ab 48 8b 46 08 <48> c7
00 00 00 00 00 48 8b 0e 48 85 c9 0f 84 7d 00 00 00 c7 85
[   46.905022] RIP  [<ffffffff814d6df2>] list_sort+0x42/0x240
[   46.905022]  RSP <ffff880058e67998>
[   46.905022] CR2: 0000000000000000
[   47.149253] ---[ end trace 09576b4e8b2c20b8 ]---

Reviewed-by: Christian König <[email protected]>
Signed-off-by: Tommi Rantala <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Signed-off-by: Zefan Li <[email protected]>
sndnvaps pushed a commit to multirom-aries/android_kernel_xiaomi_aries that referenced this pull request May 20, 2016
commit 6302ce4d80aa82b3fdb5c5cd68e7268037091b47 upstream.

This crash was reported:

[  366.947370] sd 3:0:1:0: [sdb] Spinning up disk....
[  368.804046] BUG: unable to handle kernel NULL pointer dereference at           (null)
[  368.804072] IP: [<ffffffff81358457>] __mutex_lock_common.isra.7+0x9c/0x15b
[  368.804098] PGD 0
[  368.804114] Oops: 0002 [M1cha#1] SMP
[  368.804143] CPU 1
[  368.804151] Modules linked in: sg netconsole s3g(PO) uinput joydev hid_multitouch usbhid hid snd_hda_codec_via cpufreq_userspace cpufreq_powersave cpufreq_stats uhci_hcd cpufreq_conservative snd_hda_intel snd_hda_codec snd_hwdep snd_pcm sdhci_pci snd_page_alloc sdhci snd_timer snd psmouse evdev serio_raw pcspkr soundcore xhci_hcd shpchp s3g_drm(O) mvsas mmc_core ahci libahci drm i2c_core acpi_cpufreq mperf video processor button thermal_sys dm_dmirror exfat_fs exfat_core dm_zcache dm_mod padlock_aes aes_generic padlock_sha iscsi_target_mod target_core_mod configfs sswipe libsas libata scsi_transport_sas picdev via_cputemp hwmon_vid fuse parport_pc ppdev lp parport autofs4 ext4 crc16 mbcache jbd2 sd_mod crc_t10dif usb_storage scsi_mod ehci_hcd usbcore usb_common
[  368.804749]
[  368.804764] Pid: 392, comm: kworker/u:3 Tainted: P        W  O 3.4.87-logicube-ng.22 M1cha#1 To be filled by O.E.M. To be filled by O.E.M./EPIA-M920
[  368.804802] RIP: 0010:[<ffffffff81358457>]  [<ffffffff81358457>] __mutex_lock_common.isra.7+0x9c/0x15b
[  368.804827] RSP: 0018:ffff880117001cc0  EFLAGS: 00010246
[  368.804842] RAX: 0000000000000000 RBX: ffff8801185030d0 RCX: ffff88008edcb420
[  368.804857] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff8801185030d4
[  368.804873] RBP: ffff8801181531c0 R08: 0000000000000020 R09: 00000000fffffffe
[  368.804885] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801185030d4
[  368.804899] R13: 0000000000000002 R14: ffff880117001fd8 R15: ffff8801185030d8
[  368.804916] FS:  0000000000000000(0000) GS:ffff88011fc80000(0000) knlGS:0000000000000000
[  368.804931] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  368.804946] CR2: 0000000000000000 CR3: 000000000160b000 CR4: 00000000000006e0
[  368.804962] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  368.804978] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  368.804995] Process kworker/u:3 (pid: 392, threadinfo ffff880117000000, task ffff8801181531c0)
[  368.805009] Stack:
[  368.805017]  ffff8801185030d8 0000000000000000 ffffffff8161ddf0 ffffffff81056f7c
[  368.805062]  000000000000b503 ffff8801185030d0 ffff880118503000 0000000000000000
[  368.805100]  ffff8801185030d0 ffff8801188b8000 ffff88008edcb420 ffffffff813583ac
[  368.805135] Call Trace:
[  368.805153]  [<ffffffff81056f7c>] ? up+0xb/0x33
[  368.805168]  [<ffffffff813583ac>] ? mutex_lock+0x16/0x25
[  368.805194]  [<ffffffffa018c414>] ? smp_execute_task+0x4e/0x222 [libsas]
[  368.805217]  [<ffffffffa018ce1c>] ? sas_find_bcast_dev+0x3c/0x15d [libsas]
[  368.805240]  [<ffffffffa018ce4f>] ? sas_find_bcast_dev+0x6f/0x15d [libsas]
[  368.805264]  [<ffffffffa018e989>] ? sas_ex_revalidate_domain+0x37/0x2ec [libsas]
[  368.805280]  [<ffffffff81355a2a>] ? printk+0x43/0x48
[  368.805296]  [<ffffffff81359a65>] ? _raw_spin_unlock_irqrestore+0xc/0xd
[  368.805318]  [<ffffffffa018b767>] ? sas_revalidate_domain+0x85/0xb6 [libsas]
[  368.805336]  [<ffffffff8104e5d9>] ? process_one_work+0x151/0x27c
[  368.805351]  [<ffffffff8104f6cd>] ? worker_thread+0xbb/0x152
[  368.805366]  [<ffffffff8104f612>] ? manage_workers.isra.29+0x163/0x163
[  368.805382]  [<ffffffff81052c4e>] ? kthread+0x79/0x81
[  368.805399]  [<ffffffff8135fea4>] ? kernel_thread_helper+0x4/0x10
[  368.805416]  [<ffffffff81052bd5>] ? kthread_flush_work_fn+0x9/0x9
[  368.805431]  [<ffffffff8135fea0>] ? gs_change+0x13/0x13
[  368.805442] Code: 83 7d 30 63 7e 04 f3 90 eb ab 4c 8d 63 04 4c 8d 7b 08 4c 89 e7 e8 fa 15 00 00 48 8b 43 10 4c 89 3c 24 48 89 63 10 48 89 44 24 08 <48> 89 20 83 c8 ff 48 89 6c 24 10 87 03 ff c8 74 35 4d 89 ee 41
[  368.805851] RIP  [<ffffffff81358457>] __mutex_lock_common.isra.7+0x9c/0x15b
[  368.805877]  RSP <ffff880117001cc0>
[  368.805886] CR2: 0000000000000000
[  368.805899] ---[ end trace b720682065d8f4cc ]---

It's directly caused by 89d3cf6 [SCSI] libsas: add mutex for SMP task
execution, but shows a deeper cause: expander functions expect to be able to
cast to and treat domain devices as expanders.  The correct fix is to only do
expander discover when we know we've got an expander device to avoid wrongly
casting a non-expander device.

Reported-by: Praveen Murali <[email protected]>
Tested-by: Praveen Murali <[email protected]>
Signed-off-by: James Bottomley <[email protected]>
Signed-off-by: Zefan Li <[email protected]>
sndnvaps pushed a commit to multirom-aries/android_kernel_xiaomi_aries that referenced this pull request May 20, 2016
An issue was observed when a userspace task exits.
The page which hits error here is the zero page.
In binder mmap, the whole of vma is not mapped.
On a task crash, when debuggerd reads the binder regions,
the unmapped areas fall to do_anonymous_page in handle_pte_fault,
due to the absence of a vm_fault handler. This results in
zero page being mapped. Later in zap_pte_range, vm_normal_page
returns zero page in the case of VM_MIXEDMAP and it results in the
error.

BUG: Bad page map in process mediaserver  pte:9dff379f pmd:9bfbd831
page:c0ed8e60 count:1 mapcount:-1 mapping:  (null) index:0x0
page flags: 0x404(referenced|reserved)
addr:40c3f000 vm_flags:10220051 anon_vma:  (null) mapping:d9fe0764 index:fd
vma->vm_ops->fault:   (null)
vma->vm_file->f_op->mmap: binder_mmap+0x0/0x274
CPU: 0 PID: 1463 Comm: mediaserver Tainted: G        W    3.10.17+ M1cha#1
[<c001549c>] (unwind_backtrace+0x0/0x11c) from [<c001200c>] (show_stack+0x10/0x14)
[<c001200c>] (show_stack+0x10/0x14) from [<c0103d78>] (print_bad_pte+0x158/0x190)
[<c0103d78>] (print_bad_pte+0x158/0x190) from [<c01055f0>] (unmap_single_vma+0x2e4/0x598)
[<c01055f0>] (unmap_single_vma+0x2e4/0x598) from [<c010618c>] (unmap_vmas+0x34/0x50)
[<c010618c>] (unmap_vmas+0x34/0x50) from [<c010a9e4>] (exit_mmap+0xc8/0x1e8)
[<c010a9e4>] (exit_mmap+0xc8/0x1e8) from [<c00520f0>] (mmput+0x54/0xd0)
[<c00520f0>] (mmput+0x54/0xd0) from [<c005972c>] (do_exit+0x360/0x990)
[<c005972c>] (do_exit+0x360/0x990) from [<c0059ef0>] (do_group_exit+0x84/0xc0)
[<c0059ef0>] (do_group_exit+0x84/0xc0) from [<c0066de0>] (get_signal_to_deliver+0x4d4/0x548)
[<c0066de0>] (get_signal_to_deliver+0x4d4/0x548) from [<c0011500>] (do_signal+0xa8/0x3b8)

Add a vm_fault handler which returns VM_FAULT_SIGBUS, and prevents the
wrong fallback to do_anonymous_page.

CRs-Fixed: 673147
Change-Id: I43730a51b6c819538b46c5e4dc5c96c8a384098d
Signed-off-by: Vinayak Menon <[email protected]>
Patch-mainline: linux-arm-kernel @ 06/02/14, 18:17
Signed-off-by: Vignesh Radhakrishnan <[email protected]>
Signed-off-by: Subbaraman Narayanamurthy <[email protected]>
sndnvaps pushed a commit to multirom-aries/android_kernel_xiaomi_aries that referenced this pull request May 20, 2016
While stressing the CPU hotplug path, sometimes we hit a problem
as shown below.

[57056.416774] ------------[ cut here ]------------
[57056.489232] ksoftirqd/1 (14): undefined instruction: pc=c01931e8
[57056.489245] Code: e594a000 eb085236 e15a0000 0a000000 (e7f001f2)
[57056.489259] ------------[ cut here ]------------
[57056.492840] kernel BUG at kernel/kernel/smpboot.c:134!
[57056.513236] Internal error: Oops - BUG: 0 [M1cha#1] PREEMPT SMP ARM
[57056.519055] Modules linked in: wlan(O) mhi(O)
[57056.523394] CPU: 0 PID: 14 Comm: ksoftirqd/1 Tainted: G        W  O 3.10.0-g3677c61-00008-g180c060 M1cha#1
[57056.532595] task: f0c8b000 ti: f0e78000 task.ti: f0e78000
[57056.537991] PC is at smpboot_thread_fn+0x124/0x218
[57056.542750] LR is at smpboot_thread_fn+0x11c/0x218
[57056.547528] pc : [<c01931e8>]    lr : [<c01931e0>]    psr: 200f0013
[57056.547528] sp : f0e79f30  ip : 00000000  fp : 00000000
[57056.558983] r10: 00000001  r9 : 00000000  r8 : f0e78000
[57056.564192] r7 : 00000001  r6 : c1195758  r5 : f0e78000  r4 : f0e5fd00
[57056.570701] r3 : 00000001  r2 : f0e79f20  r1 : 00000000  r0 : 00000000

This issue was always seen in the context of "ksoftirqd". It seems to
be happening because of a potential race condition in __kthread_parkme
where just after completing the parked completion, before the
ksoftirqd task has been scheduled again, it can go into running state.

Fix this by waiting for the task state to parked after waiting the
parked completion.

CRs-Fixed: 659674
Change-Id: If3f0e9b706eeb5d30d5a32f84378d35bb03fe794
Signed-off-by: Subbaraman Narayanamurthy <[email protected]>
sndnvaps pushed a commit to multirom-aries/android_kernel_xiaomi_aries that referenced this pull request May 20, 2016
On ARMv7 CPUs that cache first level page table entries (like the
Cortex-A15), using a reserved ASID while changing the TTBR or flushing
the TLB is unsafe.

This is because the CPU may cache the first level entry as the result of
a speculative memory access while the reserved ASID is assigned. After
the process owning the page tables dies, the memory will be reallocated
and may be written with junk values which can be interpreted as global,
valid PTEs by the processor. This will result in the TLB being populated
with bogus global entries.

This patch avoids the use of a reserved context ID in the v7 switch_mm
and ASID rollover code by temporarily using the swapper_pg_dir pointed
at by TTBR1, which contains only global entries that are not tagged
with ASIDs.

Reviewed-by: Frank Rowand <[email protected]>
Tested-by: Marc Zyngier <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
[[email protected]: add LPAE support]
Signed-off-by: Catalin Marinas <[email protected]>

Change-Id: I2dd82501dac5ee402765aaa0ffb3f7f577a603c9

ARM: Remove __ARCH_WANT_INTERRUPTS_ON_CTXSW on ASID-capable CPUs

Since the ASIDs must be unique to an mm across all the CPUs in a system,
the __new_context() function needs to broadcast a context reset event to
all the CPUs during ASID allocation if a roll-over occurred. Such IPIs
cannot be issued with interrupts disabled and ARM had to define
__ARCH_WANT_INTERRUPTS_ON_CTXSW.

This patch changes the check_context() function to
check_and_switch_context() called from switch_mm(). In case of
ASID-capable CPUs (ARMv6 onwards), if a new ASID is needed and the
interrupts are disabled, it defers the __new_context() and
cpu_switch_mm() calls to the post-lock switch hook where the interrupts
are enabled. Setting the reserved TTBR0 was also moved to
check_and_switch_context() from cpu_v7_switch_mm().

Reviewed-by: Will Deacon <[email protected]>
Tested-by: Will Deacon <[email protected]>
Reviewed-by: Frank Rowand <[email protected]>
Tested-by: Marc Zyngier <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>

Conflicts:
	arch/arm/mm/proc-v7-2level.S

Change-Id: I48e39730c49ff8d30ce566e8a03cf54557869a52

ARM: Remove current_mm per-cpu variable

The current_mm variable was used to store the new mm between the
switch_mm() and switch_to() calls where an IPI to reset the context
could have set the wrong mm. Since the interrupts are disabled during
context switch, there is no need for this variable, current->active_mm
already points to the current mm when interrupts are re-enabled.

Reviewed-by: Will Deacon <[email protected]>
Tested-by: Will Deacon <[email protected]>
Reviewed-by: Frank Rowand <[email protected]>
Tested-by: Marc Zyngier <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>

ARM: 7502/1: contextidr: avoid using bfi instruction during notifier

The bfi instruction is not available on ARMv6, so instead use an and/orr
sequence in the contextidr_notifier. This gets rid of the assembler
error:

  Assembler messages:
  Error: selected processor does not support ARM mode `bfi r3,r2,#0,#8'

Reported-by: Arnd Bergmann <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Russell King <[email protected]>

Conflicts:
	arch/arm/mm/context.c

Change-Id: Id64c0145d0dcd1cfe9e2aba59f86c08ec5fbf649

ARM: mm: remove IPI broadcasting on ASID rollover

ASIDs are allocated to MMU contexts based on a rolling counter. This
means that after 255 allocations we must invalidate all existing ASIDs
via an expensive IPI mechanism to synchronise all of the online CPUs and
ensure that all tasks execute with an ASID from the new generation.

This patch changes the rollover behaviour so that we rely instead on the
hardware broadcasting of the TLB invalidation to avoid the IPI calls.
This works by keeping track of the active ASID on each core, which is
then reserved in the case of a rollover so that currently scheduled
tasks can continue to run. For cores without hardware TLB broadcasting,
we keep track of pending flushes in a cpumask, so cores can flush their
local TLB before scheduling a new mm.

Reviewed-by: Catalin Marinas <[email protected]>
Tested-by: Marc Zyngier <[email protected]>
Signed-off-by: Will Deacon <[email protected]>

Conflicts:
	arch/arm/mm/context.c

Change-Id: I58990400aaaaef35319f7b3fb2f84fe7e46cb581

ARM: mm: avoid taking ASID spinlock on fastpath

When scheduling a new mm, we take a spinlock so that we can:

  1. Safely allocate a new ASID, if required
  2. Update our active_asids field without worrying about parallel
     updates to reserved_asids
  3. Ensure that we flush our local TLB, if required

However, this has the nasty affect of serialising context-switch across
all CPUs in the system. The usual (fast) case is where the next mm has
a valid ASID for the current generation. In such a scenario, we can
avoid taking the lock and instead use atomic64_xchg to update the
active_asids variable for the current CPU. If a rollover occurs on
another CPU (which would take the lock), when copying the active_asids
into the reserved_asids another atomic64_xchg is used to replace each
active_asids with 0. The fast path can then detect this case and fall
back to spinning on the lock.

Tested-by: Marc Zyngier <[email protected]>
Signed-off-by: Will Deacon <[email protected]>

ARM: mm: use bitmap operations when allocating new ASIDs

When allocating a new ASID, we must take care not to re-assign a
reserved ASID-value to a new mm. This requires us to check each
candidate ASID against those currently reserved by other cores before
assigning a new ASID to the current mm.

This patch improves the ASID allocation algorithm by using a
bitmap-based approach. Rather than iterating over the reserved ASID
array for each candidate ASID, we simply find the first zero bit,
ensuring that those indices corresponding to reserved ASIDs are set
when flushing during a rollover event.

Tested-by: Marc Zyngier <[email protected]>
Signed-off-by: Will Deacon <[email protected]>

ARM: 7649/1: mm: mm->context.id fix for big-endian

Since the new ASID code in b5466f8
("ARM: mm: remove IPI broadcasting on ASID rollover") was changed to
use 64bit operations it has broken the BE operation due to an issue
with the MM code accessing sub-fields of mm->context.id.

When running in BE mode we see the values in mm->context.id are stored
with the highest value first, so the LDR in the arch/arm/mm/proc-macros.S
reads the wrong part of this field. To resolve this, change the LDR in
the mmid macro to load from +4.

Acked-by: Will Deacon <[email protected]>
Signed-off-by: Ben Dooks <[email protected]>
Signed-off-by: Russell King <[email protected]>

ARM: 7658/1: mm: fix race updating mm->context.id on ASID rollover

If a thread triggers an ASID rollover, other threads of the same process
must be made to wait until the mm->context.id for the shared mm_struct
has been updated to new generation and associated book-keeping (e.g.
TLB invalidation) has ben performed.

However, there is a *tiny* window where both mm->context.id and the
relevant active_asids entry are updated to the new generation, but the
TLB flush has not been performed, which could allow another thread to
return to userspace with a dirty TLB, potentially leading to data
corruption. In reality this will never occur because one CPU would need
to perform a context-switch in the time it takes another to do a couple
of atomic test/set operations but we should plug the race anyway.

This patch moves the active_asids update until after the potential TLB
flush on context-switch.

Cc: <[email protected]> # 3.8
Reviewed-by: Catalin Marinas <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Russell King <[email protected]>

ARM: 7659/1: mm: make mm->context.id an atomic64_t variable

mm->context.id is updated under asid_lock when a new ASID is allocated
to an mm_struct. However, it is also read without the lock when a task
is being scheduled and checking whether or not the current ASID
generation is up-to-date.

If two threads of the same process are being scheduled in parallel and
the bottom bits of the generation in their mm->context.id match the
current generation (that is, the mm_struct has not been used for ~2^24
rollovers) then the non-atomic, lockless access to mm->context.id may
yield the incorrect ASID.

This patch fixes this issue by making mm->context.id and atomic64_t,
ensuring that the generation is always read consistently. For code that
only requires access to the ASID bits (e.g. TLB flushing by mm), then
the value is accessed directly, which GCC converts to an ldrb.

Cc: <[email protected]> # 3.8
Reviewed-by: Catalin Marinas <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Russell King <[email protected]>

Conflicts:
	arch/arm/include/asm/mmu.h

Change-Id: I682895d6357a91ecc439c8543fa94f1aecbfcb4c

ARM: 7661/1: mm: perform explicit branch predictor maintenance when required

The ARM ARM requires branch predictor maintenance if, for a given ASID,
the instructions at a specific virtual address appear to change.

From the kernel's point of view, that means:

	- Changing the kernel's view of memory (e.g. switching to the
	  identity map)
	- ASID rollover (since ASIDs will be re-allocated to new tasks)

This patch adds explicit branch predictor maintenance when either of the
two conditions above are met.

Reviewed-by: Catalin Marinas <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Russell King <[email protected]>

ARM: 7684/1: errata: Workaround for Cortex-A15 erratum 798181 (TLBI/DSB operations)

On Cortex-A15 (r0p0..r3p2) the TLBI/DSB are not adequately shooting down
all use of the old entries. This patch implements the erratum workaround
which consists of:

1. Dummy TLBIMVAIS and DSB on the CPU doing the TLBI operation.
2. Send IPI to the CPUs that are running the same mm (and ASID) as the
   one being invalidated (or all the online CPUs for global pages).
3. CPU receiving the IPI executes a DMB and CLREX (part of the exception
   return code already).

Signed-off-by: Catalin Marinas <[email protected]>
Signed-off-by: Russell King <[email protected]>

Conflicts:
	arch/arm/Kconfig

Change-Id: I4513d042301a1faad817b83434280462cc176df1

msm: rtb: Log the context id in the rtb

Store the context id in the register trace buffer.
The process id can be derived from the context id.
This gives a general idea about what process was last
running when the RTB stopped.

Change-Id: I2fb8934d008b8cf3666f1df2652846c15faca776
Signed-off-by: Laura Abbott <[email protected]>
(cherry picked from commit 445eb9a)

Conflicts:

	arch/arm/mach-msm/include/mach/msm_rtb.h

ARM: 7767/1: let the ASID allocator handle suspended animation

commit ae120d9 upstream.

When a CPU is running a process, the ASID for that process is
held in a per-CPU variable (the "active ASIDs" array). When
the ASID allocator handles a rollover, it copies the active
ASIDs into a "reserved ASIDs" array to ensure that a process
currently running on another CPU will continue to run unaffected.
The active array is zero-ed to indicate that a rollover occurred.

Because of this mechanism, a reserved ASID is only remembered for
a single rollover. A subsequent rollover will completely refill
the reserved ASIDs array.

In a severely oversubscribed environment where a CPU can be
prevented from running for extended periods of time (think virtual
machines), the above has a horrible side effect:

[P{a} denotes process P running with ASID a]

	CPU-0		CPU-1

	A{x}				[active = <x 0>]

	[suspended]	runs B{y}	[active = <x y>]

					[rollover:
					 active = <0 0>
					 reserved = <x y>]

			runs B{y}	[active = <0 y>
					 reserved = <x y>]

					[rollover:
					 active = <0 0>
					 reserved = <0 y>]

			runs C{x}	[active = <0 x>]

	[resumes]

	runs A{x}

At that stage, both A and C have the same ASID, with deadly
consequences.

The fix is to preserve reserved ASIDs across rollovers if
the CPU doesn't have an active ASID when the rollover occurs.

Acked-by: Will Deacon <[email protected]>
Acked-by: Catalin Carinas <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Russell King <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

ARM: 7768/1: prevent risks of out-of-bound access in ASID allocator

commit b8e4a47 upstream.

On a CPU that never ran anything, both the active and reserved ASID
fields are set to zero. In this case the ASID_TO_IDX() macro will
return -1, which is not a very useful value to index a bitmap.

Instead of trying to offset the ASID so that ASID M1cha#1 is actually
bit 0 in the asid_map bitmap, just always ignore bit 0 and start
the search from bit 1. This makes the code a bit more readable,
and without risk of OoB access.

Acked-by: Will Deacon <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Reported-by: Catalin Marinas <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Russell King <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

ARM: 7703/1: Disable preemption in broadcast_tlb*_a15_erratum()

Commit 93dc688 (ARM: 7684/1: errata: Workaround for Cortex-A15 erratum
798181 (TLBI/DSB operations)) introduces calls to smp_processor_id() and
smp_call_function_many() with preemption enabled. This patch disables
preemption and also optimises the smp_processor_id() call in
broadcast_tlb_mm_a15_erratum(). The broadcast_tlb_a15_erratum() function
is changed to use smp_call_function() which disables preemption.

Signed-off-by: Catalin Marinas <[email protected]>
Reported-by: Geoff Levand <[email protected]>
Reported-by: Nicolas Pitre <[email protected]>
Signed-off-by: Russell King <[email protected]>

ARM: 7769/1: Cortex-A15: fix erratum 798181 implementation

commit 0d0752b upstream.

Looking into the active_asids array is not enough, as we also need
to look into the reserved_asids array (they both represent processes
that are currently running).

Also, not holding the ASID allocator lock is racy, as another CPU
could schedule that process and trigger a rollover, making the erratum
workaround miss an IPI.

Exposing this outside of context.c is a little ugly on the side, so
let's define a new entry point that the erratum workaround can call
to obtain the cpumask.

Acked-by: Will Deacon <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Russell King <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

arm: mm: Clean ASID patchset

Change-Id: Id5b0cc6d5300d293e33baf6603453bde0df6d6d8

android/lowmemorykiller: Ignore tasks with freed mm

A killed task can stay in the task list long after its
memory has been returned to the system, therefore
ignore any tasks whose mm struct has been freed.

Change-Id: I76394b203b4ab2312437c839976f0ecb7b6dde4e
CRs-fixed: 450383
Signed-off-by: Liam Mark <[email protected]>

Conflicts:
	arch/arm/kernel/smp.c
sndnvaps pushed a commit to multirom-aries/android_kernel_xiaomi_aries that referenced this pull request Jun 9, 2016
…_open

commit e7ac6c6666bec0a354758a1298d3231e4a635362 upstream.

Two SLES11 SP3 servers encountered similar crashes simultaneously
following some kind of SAN/tape target issue:

...
qla2xxx [0000:81:00.0]-801c:3: Abort command issued nexus=3:0:2 --  1 2002.
qla2xxx [0000:81:00.0]-801c:3: Abort command issued nexus=3:0:2 --  1 2002.
qla2xxx [0000:81:00.0]-8009:3: DEVICE RESET ISSUED nexus=3:0:2 cmd=ffff882f89c2c7c0.
qla2xxx [0000:81:00.0]-800c:3: do_reset failed for cmd=ffff882f89c2c7c0.
qla2xxx [0000:81:00.0]-800f:3: DEVICE RESET FAILED: Task management failed nexus=3:0:2 cmd=ffff882f89c2c7c0.
qla2xxx [0000:81:00.0]-8009:3: TARGET RESET ISSUED nexus=3:0:2 cmd=ffff882f89c2c7c0.
qla2xxx [0000:81:00.0]-800c:3: do_reset failed for cmd=ffff882f89c2c7c0.
qla2xxx [0000:81:00.0]-800f:3: TARGET RESET FAILED: Task management failed nexus=3:0:2 cmd=ffff882f89c2c7c0.
qla2xxx [0000:81:00.0]-8012:3: BUS RESET ISSUED nexus=3:0:2.
qla2xxx [0000:81:00.0]-802b:3: BUS RESET SUCCEEDED nexus=3:0:2.
qla2xxx [0000:81:00.0]-505f:3: Link is operational (8 Gbps).
qla2xxx [0000:81:00.0]-8018:3: ADAPTER RESET ISSUED nexus=3:0:2.
qla2xxx [0000:81:00.0]-00af:3: Performing ISP error recovery - ha=ffff88bf04d18000.
 rport-3:0-0: blocked FC remote port time out: removing target and saving binding
qla2xxx [0000:81:00.0]-505f:3: Link is operational (8 Gbps).
qla2xxx [0000:81:00.0]-8017:3: ADAPTER RESET SUCCEEDED nexus=3:0:2.
 rport-2:0-0: blocked FC remote port time out: removing target and saving binding
sg_rq_end_io: device detached
BUG: unable to handle kernel NULL pointer dereference at 00000000000002a8
IP: [<ffffffff8133b268>] __pm_runtime_idle+0x28/0x90
PGD 7e6586f067 PUD 7e5af06067 PMD 0 [1739975.390354] Oops: 0002 [M1cha#1] SMP
CPU 0
...
Supported: No, Proprietary modules are loaded [1739975.390463]
Pid: 27965, comm: ABCD Tainted: PF           X 3.0.101-0.29-default M1cha#1 HP ProLiant DL580 Gen8
RIP: 0010:[<ffffffff8133b268>]  [<ffffffff8133b268>] __pm_runtime_idle+0x28/0x90
RSP: 0018:ffff8839dc1e7c68  EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff883f0592fc00 RCX: 0000000000000090
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000138
RBP: 0000000000000138 R08: 0000000000000010 R09: ffffffff81bd39d0
R10: 00000000000009c0 R11: ffffffff81025790 R12: 0000000000000001
R13: ffff883022212b80 R14: 0000000000000004 R15: ffff883022212b80
FS:  00007f8e54560720(0000) GS:ffff88407f800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000000002a8 CR3: 0000007e6ced6000 CR4: 00000000001407f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ABCD (pid: 27965, threadinfo ffff8839dc1e6000, task ffff883592e0c640)
Stack:
 ffff883f0592fc00 00000000fffffffa 0000000000000001 ffff883022212b80
 ffff883eff772400 ffffffffa03fa309 0000000000000000 0000000000000000
 ffffffffa04003a0 ffff883f063196c0 ffff887f0379a930 ffffffff8115ea1e
Call Trace:
 [<ffffffffa03fa309>] st_open+0x129/0x240 [st]
 [<ffffffff8115ea1e>] chrdev_open+0x13e/0x200
 [<ffffffff811588a8>] __dentry_open+0x198/0x310
 [<ffffffff81167d74>] do_last+0x1f4/0x800
 [<ffffffff81168fe9>] path_openat+0xd9/0x420
 [<ffffffff8116946c>] do_filp_open+0x4c/0xc0
 [<ffffffff8115a00f>] do_sys_open+0x17f/0x250
 [<ffffffff81468d92>] system_call_fastpath+0x16/0x1b
 [<00007f8e4f617fd0>] 0x7f8e4f617fcf
Code: eb d3 90 48 83 ec 28 40 f6 c6 04 48 89 6c 24 08 4c 89 74 24 20 48 89 fd 48 89 1c 24 4c 89 64 24 10 41 89 f6 4c 89 6c 24 18 74 11 <f0> ff 8f 70 01 00 00 0f 94 c0 45 31 ed 84 c0 74 2b 4c 8d a5 a0
RIP  [<ffffffff8133b268>] __pm_runtime_idle+0x28/0x90
 RSP <ffff8839dc1e7c68>
CR2: 00000000000002a8

Analysis reveals the cause of the crash to be due to STp->device
being NULL. The pointer was NULLed via scsi_tape_put(STp) when it
calls scsi_tape_release(). In st_open() we jump to err_out after
scsi_block_when_processing_errors() completes and returns the
device as offline (sdev_state was SDEV_DEL):

1180 /* Open the device. Needs to take the BKL only because of incrementing the SCSI host
1181    module count. */
1182 static int st_open(struct inode *inode, struct file *filp)
1183 {
1184         int i, retval = (-EIO);
1185         int resumed = 0;
1186         struct scsi_tape *STp;
1187         struct st_partstat *STps;
1188         int dev = TAPE_NR(inode);
1189         char *name;
...
1217         if (scsi_autopm_get_device(STp->device) < 0) {
1218                 retval = -EIO;
1219                 goto err_out;
1220         }
1221         resumed = 1;
1222         if (!scsi_block_when_processing_errors(STp->device)) {
1223                 retval = (-ENXIO);
1224                 goto err_out;
1225         }
...
1264  err_out:
1265         normalize_buffer(STp->buffer);
1266         spin_lock(&st_use_lock);
1267         STp->in_use = 0;
1268         spin_unlock(&st_use_lock);
1269         scsi_tape_put(STp); <-- STp->device = 0 after this
1270         if (resumed)
1271                 scsi_autopm_put_device(STp->device);
1272         return retval;

The ref count for the struct scsi_tape had already been reduced
to 1 when the .remove method of the st module had been called.
The kref_put() in scsi_tape_put() caused scsi_tape_release()
to be called:

0266 static void scsi_tape_put(struct scsi_tape *STp)
0267 {
0268         struct scsi_device *sdev = STp->device;
0269
0270         mutex_lock(&st_ref_mutex);
0271         kref_put(&STp->kref, scsi_tape_release); <-- calls this
0272         scsi_device_put(sdev);
0273         mutex_unlock(&st_ref_mutex);
0274 }

In scsi_tape_release() the struct scsi_device in the struct
scsi_tape gets set to NULL:

4273 static void scsi_tape_release(struct kref *kref)
4274 {
4275         struct scsi_tape *tpnt = to_scsi_tape(kref);
4276         struct gendisk *disk = tpnt->disk;
4277
4278         tpnt->device = NULL; <<<---- where the dev is nulled
4279
4280         if (tpnt->buffer) {
4281                 normalize_buffer(tpnt->buffer);
4282                 kfree(tpnt->buffer->reserved_pages);
4283                 kfree(tpnt->buffer);
4284         }
4285
4286         disk->private_data = NULL;
4287         put_disk(disk);
4288         kfree(tpnt);
4289         return;
4290 }

Although the problem was reported on SLES11.3 the problem appears
in linux-next as well.

The crash is fixed by reordering the code so we no longer access
the struct scsi_tape after the kref_put() is done on it in st_open().

Signed-off-by: Shane Seymour <[email protected]>
Signed-off-by: Darren Lavender <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Acked-by: Kai Mäkisara <[email protected]>
Signed-off-by: James Bottomley <[email protected]>
Signed-off-by: Zefan Li <[email protected]>
sndnvaps pushed a commit to multirom-aries/android_kernel_xiaomi_aries that referenced this pull request Jun 9, 2016
…hore set exits

commit 602b8593d2b4138c10e922eeaafe306f6b51817b upstream.

The current semaphore code allows a potential use after free: in
exit_sem we may free the task's sem_undo_list while there is still
another task looping through the same semaphore set and cleaning the
sem_undo list at freeary function (the task called IPC_RMID for the same
semaphore set).

For example, with a test program [1] running which keeps forking a lot
of processes (which then do a semop call with SEM_UNDO flag), and with
the parent right after removing the semaphore set with IPC_RMID, and a
kernel built with CONFIG_SLAB, CONFIG_SLAB_DEBUG and
CONFIG_DEBUG_SPINLOCK, you can easily see something like the following
in the kernel log:

   Slab corruption (Not tainted): kmalloc-64 start=ffff88003b45c1c0, len=64
   000: 6b 6b 6b 6b 6b 6b 6b 6b 00 6b 6b 6b 6b 6b 6b 6b  kkkkkkkk.kkkkkkk
   010: ff ff ff ff 6b 6b 6b 6b ff ff ff ff ff ff ff ff  ....kkkk........
   Prev obj: start=ffff88003b45c180, len=64
   000: 00 00 00 00 ad 4e ad de ff ff ff ff 5a 5a 5a 5a  .....N......ZZZZ
   010: ff ff ff ff ff ff ff ff c0 fb 01 37 00 88 ff ff  ...........7....
   Next obj: start=ffff88003b45c200, len=64
   000: 00 00 00 00 ad 4e ad de ff ff ff ff 5a 5a 5a 5a  .....N......ZZZZ
   010: ff ff ff ff ff ff ff ff 68 29 a7 3c 00 88 ff ff  ........h).<....
   BUG: spinlock wrong CPU on CPU#2, test/18028
   general protection fault: 0000 [M1cha#1] SMP
   Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_balloon virtio_rng virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr qxl ttm drm_kms_helper drm snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
   CPU: 2 PID: 18028 Comm: test Not tainted 4.2.0-rc5+ M1cha#1
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
   RIP: spin_dump+0x53/0xc0
   Call Trace:
     spin_bug+0x30/0x40
     do_raw_spin_unlock+0x71/0xa0
     _raw_spin_unlock+0xe/0x10
     freeary+0x82/0x2a0
     ? _raw_spin_lock+0xe/0x10
     semctl_down.clone.0+0xce/0x160
     ? __do_page_fault+0x19a/0x430
     ? __audit_syscall_entry+0xa8/0x100
     SyS_semctl+0x236/0x2c0
     ? syscall_trace_leave+0xde/0x130
     entry_SYSCALL_64_fastpath+0x12/0x71
   Code: 8b 80 88 03 00 00 48 8d 88 60 05 00 00 48 c7 c7 a0 2c a4 81 31 c0 65 8b 15 eb 40 f3 7e e8 08 31 68 00 4d 85 e4 44 8b 4b 08 74 5e <45> 8b 84 24 88 03 00 00 49 8d 8c 24 60 05 00 00 8b 53 04 48 89
   RIP  [<ffffffff810d6053>] spin_dump+0x53/0xc0
    RSP <ffff88003750fd68>
   ---[ end trace 783ebb76612867a0 ]---
   NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [test:18053]
   Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_balloon virtio_rng virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr qxl ttm drm_kms_helper drm snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
   CPU: 3 PID: 18053 Comm: test Tainted: G      D         4.2.0-rc5+ M1cha#1
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
   RIP: native_read_tsc+0x0/0x20
   Call Trace:
     ? delay_tsc+0x40/0x70
     __delay+0xf/0x20
     do_raw_spin_lock+0x96/0x140
     _raw_spin_lock+0xe/0x10
     sem_lock_and_putref+0x11/0x70
     SYSC_semtimedop+0x7bf/0x960
     ? handle_mm_fault+0xbf6/0x1880
     ? dequeue_task_fair+0x79/0x4a0
     ? __do_page_fault+0x19a/0x430
     ? kfree_debugcheck+0x16/0x40
     ? __do_page_fault+0x19a/0x430
     ? __audit_syscall_entry+0xa8/0x100
     ? do_audit_syscall_entry+0x66/0x70
     ? syscall_trace_enter_phase1+0x139/0x160
     SyS_semtimedop+0xe/0x10
     SyS_semop+0x10/0x20
     entry_SYSCALL_64_fastpath+0x12/0x71
   Code: 47 10 83 e8 01 85 c0 89 47 10 75 08 65 48 89 3d 1f 74 ff 7e c9 c3 0f 1f 44 00 00 55 48 89 e5 e8 87 17 04 00 66 90 c9 c3 0f 1f 00 <55> 48 89 e5 0f 31 89 c1 48 89 d0 48 c1 e0 20 89 c9 48 09 c8 c9
   Kernel panic - not syncing: softlockup: hung tasks

I wasn't able to trigger any badness on a recent kernel without the
proper config debugs enabled, however I have softlockup reports on some
kernel versions, in the semaphore code, which are similar as above (the
scenario is seen on some servers running IBM DB2 which uses semaphore
syscalls).

The patch here fixes the race against freeary, by acquiring or waiting
on the sem_undo_list lock as necessary (exit_sem can race with freeary,
while freeary sets un->semid to -1 and removes the same sem_undo from
list_proc or when it removes the last sem_undo).

After the patch I'm unable to reproduce the problem using the test case
[1].

[1] Test case used below:

    #include <stdio.h>
    #include <sys/types.h>
    #include <sys/ipc.h>
    #include <sys/sem.h>
    #include <sys/wait.h>
    #include <stdlib.h>
    #include <time.h>
    #include <unistd.h>
    #include <errno.h>

    #define NSEM 1
    #define NSET 5

    int sid[NSET];

    void thread()
    {
            struct sembuf op;
            int s;
            uid_t pid = getuid();

            s = rand() % NSET;
            op.sem_num = pid % NSEM;
            op.sem_op = 1;
            op.sem_flg = SEM_UNDO;

            semop(sid[s], &op, 1);
            exit(EXIT_SUCCESS);
    }

    void create_set()
    {
            int i, j;
            pid_t p;
            union {
                    int val;
                    struct semid_ds *buf;
                    unsigned short int *array;
                    struct seminfo *__buf;
            } un;

            /* Create and initialize semaphore set */
            for (i = 0; i < NSET; i++) {
                    sid[i] = semget(IPC_PRIVATE , NSEM, 0644 | IPC_CREAT);
                    if (sid[i] < 0) {
                            perror("semget");
                            exit(EXIT_FAILURE);
                    }
            }
            un.val = 0;
            for (i = 0; i < NSET; i++) {
                    for (j = 0; j < NSEM; j++) {
                            if (semctl(sid[i], j, SETVAL, un) < 0)
                                    perror("semctl");
                    }
            }

            /* Launch threads that operate on semaphore set */
            for (i = 0; i < NSEM * NSET * NSET; i++) {
                    p = fork();
                    if (p < 0)
                            perror("fork");
                    if (p == 0)
                            thread();
            }

            /* Free semaphore set */
            for (i = 0; i < NSET; i++) {
                    if (semctl(sid[i], NSEM, IPC_RMID))
                            perror("IPC_RMID");
            }

            /* Wait for forked processes to exit */
            while (wait(NULL)) {
                    if (errno == ECHILD)
                            break;
            };
    }

    int main(int argc, char **argv)
    {
            pid_t p;

            srand(time(NULL));

            while (1) {
                    p = fork();
                    if (p < 0) {
                            perror("fork");
                            exit(EXIT_FAILURE);
                    }
                    if (p == 0) {
                            create_set();
                            goto end;
                    }

                    /* Wait for forked processes to exit */
                    while (wait(NULL)) {
                            if (errno == ECHILD)
                                    break;
                    };
            }
    end:
            return 0;
    }

[[email protected]: use normal comment layout]
Signed-off-by: Herton R. Krzesinski <[email protected]>
Acked-by: Manfred Spraul <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Rafael Aquini <[email protected]>
CC: Aristeu Rozanski <[email protected]>
Cc: David Jeffery <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

Signed-off-by: Linus Torvalds <[email protected]>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <[email protected]>
sndnvaps pushed a commit to multirom-aries/android_kernel_xiaomi_aries that referenced this pull request Jun 9, 2016
commit ba51b6be38c122f7dab40965b4397aaf6188a464 upstream.

Hit the following splat testing VRF change for ipsec:

[  113.475692] ===============================
[  113.476194] [ INFO: suspicious RCU usage. ]
[  113.476667] 4.2.0-rc6-1+deb7u2+clUNRELEASED #3.2.65-1+deb7u2+clUNRELEASED Not tainted
[  113.477545] -------------------------------
[  113.478013] /work/monster-14/dsa/kernel.git/include/linux/rcupdate.h:568 Illegal context switch in RCU read-side critical section!
[  113.479288]
[  113.479288] other info that might help us debug this:
[  113.479288]
[  113.480207]
[  113.480207] rcu_scheduler_active = 1, debug_locks = 1
[  113.480931] 2 locks held by setkey/6829:
[  113.481371]  #0:  (&net->xfrm.xfrm_cfg_mutex){+.+.+.}, at: [<ffffffff814e9887>] pfkey_sendmsg+0xfb/0x213
[  113.482509]  M1cha#1:  (rcu_read_lock){......}, at: [<ffffffff814e767f>] rcu_read_lock+0x0/0x6e
[  113.483509]
[  113.483509] stack backtrace:
[  113.484041] CPU: 0 PID: 6829 Comm: setkey Not tainted 4.2.0-rc6-1+deb7u2+clUNRELEASED #3.2.65-1+deb7u2+clUNRELEASED
[  113.485422] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014
[  113.486845]  0000000000000001 ffff88001d4c7a98 ffffffff81518af2 ffffffff81086962
[  113.487732]  ffff88001d538480 ffff88001d4c7ac8 ffffffff8107ae75 ffffffff8180a154
[  113.488628]  0000000000000b30 0000000000000000 00000000000000d0 ffff88001d4c7ad8
[  113.489525] Call Trace:
[  113.489813]  [<ffffffff81518af2>] dump_stack+0x4c/0x65
[  113.490389]  [<ffffffff81086962>] ? console_unlock+0x3d6/0x405
[  113.491039]  [<ffffffff8107ae75>] lockdep_rcu_suspicious+0xfa/0x103
[  113.491735]  [<ffffffff81064032>] rcu_preempt_sleep_check+0x45/0x47
[  113.492442]  [<ffffffff8106404d>] ___might_sleep+0x19/0x1c8
[  113.493077]  [<ffffffff81064268>] __might_sleep+0x6c/0x82
[  113.493681]  [<ffffffff81133190>] cache_alloc_debugcheck_before.isra.50+0x1d/0x24
[  113.494508]  [<ffffffff81134876>] kmem_cache_alloc+0x31/0x18f
[  113.495149]  [<ffffffff814012b5>] skb_clone+0x64/0x80
[  113.495712]  [<ffffffff814e6f71>] pfkey_broadcast_one+0x3d/0xff
[  113.496380]  [<ffffffff814e7b84>] pfkey_broadcast+0xb5/0x11e
[  113.497024]  [<ffffffff814e82d1>] pfkey_register+0x191/0x1b1
[  113.497653]  [<ffffffff814e9770>] pfkey_process+0x162/0x17e
[  113.498274]  [<ffffffff814e9895>] pfkey_sendmsg+0x109/0x213

In pfkey_sendmsg the net mutex is taken and then pfkey_broadcast takes
the RCU lock.

Since pfkey_broadcast takes the RCU lock the allocation argument is
pointless since GFP_ATOMIC must be used between the rcu_read_{,un}lock.
The one call outside of rcu can be done with GFP_KERNEL.

Fixes: 7f6b9db ("af_key: locking change")
Signed-off-by: David Ahern <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <[email protected]>
sndnvaps pushed a commit to multirom-aries/android_kernel_xiaomi_aries that referenced this pull request Jun 9, 2016
commit 74e98eb085889b0d2d4908f59f6e00026063014f upstream.

There was no verification that an underlying transport exists when creating
a connection, this would cause dereferencing a NULL ptr.

It might happen on sockets that weren't properly bound before attempting to
send a message, which will cause a NULL ptr deref:

[135546.047719] kasan: GPF could be caused by NULL-ptr deref or user memory accessgeneral protection fault: 0000 [M1cha#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
[135546.051270] Modules linked in:
[135546.051781] CPU: 4 PID: 15650 Comm: trinity-c4 Not tainted 4.2.0-next-20150902-sasha-00041-gbaa1222-dirty #2527
[135546.053217] task: ffff8800835bc000 ti: ffff8800bc708000 task.ti: ffff8800bc708000
[135546.054291] RIP: __rds_conn_create (net/rds/connection.c:194)
[135546.055666] RSP: 0018:ffff8800bc70fab0  EFLAGS: 00010202
[135546.056457] RAX: dffffc0000000000 RBX: 0000000000000f2c RCX: ffff8800835bc000
[135546.057494] RDX: 0000000000000007 RSI: ffff8800835bccd8 RDI: 0000000000000038
[135546.058530] RBP: ffff8800bc70fb18 R08: 0000000000000001 R09: 0000000000000000
[135546.059556] R10: ffffed014d7a3a23 R11: ffffed014d7a3a21 R12: 0000000000000000
[135546.060614] R13: 0000000000000001 R14: ffff8801ec3d0000 R15: 0000000000000000
[135546.061668] FS:  00007faad4ffb700(0000) GS:ffff880252000000(0000) knlGS:0000000000000000
[135546.062836] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[135546.063682] CR2: 000000000000846a CR3: 000000009d137000 CR4: 00000000000006a0
[135546.064723] Stack:
[135546.065048]  ffffffffafe2055c ffffffffafe23fc1 ffffed00493097bf ffff8801ec3d0008
[135546.066247]  0000000000000000 00000000000000d0 0000000000000000 ac194a24c0586342
[135546.067438]  1ffff100178e1f78 ffff880320581b00 ffff8800bc70fdd0 ffff880320581b00
[135546.068629] Call Trace:
[135546.069028] ? __rds_conn_create (include/linux/rcupdate.h:856 net/rds/connection.c:134)
[135546.069989] ? rds_message_copy_from_user (net/rds/message.c:298)
[135546.071021] rds_conn_create_outgoing (net/rds/connection.c:278)
[135546.071981] rds_sendmsg (net/rds/send.c:1058)
[135546.072858] ? perf_trace_lock (include/trace/events/lock.h:38)
[135546.073744] ? lockdep_init (kernel/locking/lockdep.c:3298)
[135546.074577] ? rds_send_drop_to (net/rds/send.c:976)
[135546.075508] ? __might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3795)
[135546.076349] ? __might_fault (mm/memory.c:3795)
[135546.077179] ? rds_send_drop_to (net/rds/send.c:976)
[135546.078114] sock_sendmsg (net/socket.c:611 net/socket.c:620)
[135546.078856] SYSC_sendto (net/socket.c:1657)
[135546.079596] ? SYSC_connect (net/socket.c:1628)
[135546.080510] ? trace_dump_stack (kernel/trace/trace.c:1926)
[135546.081397] ? ring_buffer_unlock_commit (kernel/trace/ring_buffer.c:2479 kernel/trace/ring_buffer.c:2558 kernel/trace/ring_buffer.c:2674)
[135546.082390] ? trace_buffer_unlock_commit (kernel/trace/trace.c:1749)
[135546.083410] ? trace_event_raw_event_sys_enter (include/trace/events/syscalls.h:16)
[135546.084481] ? do_audit_syscall_entry (include/trace/events/syscalls.h:16)
[135546.085438] ? trace_buffer_unlock_commit (kernel/trace/trace.c:1749)
[135546.085515] rds_ib_laddr_check(): addr 36.74.25.172 ret -99 node type -1

Acked-by: Santosh Shilimkar <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Zefan Li <[email protected]>
sndnvaps pushed a commit to multirom-aries/android_kernel_xiaomi_aries that referenced this pull request Jun 9, 2016
commit 34f5b0066435ffb793049b84fafd29fa195bcf90 upstream.

If we didn't call ATMARP_MKIP before ATMARP_ENCAP the VCC descriptor is
non-existant and we'll end up dereferencing a NULL ptr:

[1033173.491930] kasan: GPF could be caused by NULL-ptr deref or user memory accessirq event stamp: 123386
[1033173.493678] general protection fault: 0000 [M1cha#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
[1033173.493689] Modules linked in:
[1033173.493697] CPU: 9 PID: 23815 Comm: trinity-c64 Not tainted 4.2.0-next-20150911-sasha-00043-g353d875-dirty #2545
[1033173.493706] task: ffff8800630c4000 ti: ffff880063110000 task.ti: ffff880063110000
[1033173.493823] RIP: clip_ioctl (net/atm/clip.c:320 net/atm/clip.c:689)
[1033173.493826] RSP: 0018:ffff880063117a88  EFLAGS: 00010203
[1033173.493828] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 000000000000000c
[1033173.493830] RDX: 0000000000000002 RSI: ffffffffb3f10720 RDI: 0000000000000014
[1033173.493832] RBP: ffff880063117b80 R08: ffff88047574d9a4 R09: 0000000000000000
[1033173.493834] R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff1000c622f53
[1033173.493836] R13: ffff8800cb905500 R14: ffff8808d6da2000 R15: 00000000fffffdfd
[1033173.493840] FS:  00007fa56b92d700(0000) GS:ffff880478000000(0000) knlGS:0000000000000000
[1033173.493843] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[1033173.493845] CR2: 0000000000000000 CR3: 00000000630e8000 CR4: 00000000000006a0
[1033173.493855] Stack:
[1033173.493862]  ffffffffb0b60444 000000000000eaea 0000000041b58ab3 ffffffffb3c3ce32
[1033173.493867]  ffffffffb0b6f3e0 ffffffffb0b60444 ffffffffb5ea2e50 1ffff1000c622f5e
[1033173.493873]  ffff8800630c4cd8 00000000000ee09a ffffffffb3ec4888 ffffffffb5ea2de8
[1033173.493874] Call Trace:
[1033173.494108] do_vcc_ioctl (net/atm/ioctl.c:170)
[1033173.494113] vcc_ioctl (net/atm/ioctl.c:189)
[1033173.494116] svc_ioctl (net/atm/svc.c:605)
[1033173.494200] sock_do_ioctl (net/socket.c:874)
[1033173.494204] sock_ioctl (net/socket.c:958)
[1033173.494244] do_vfs_ioctl (fs/ioctl.c:43 fs/ioctl.c:607)
[1033173.494290] SyS_ioctl (fs/ioctl.c:622 fs/ioctl.c:613)
[1033173.494295] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:186)
[1033173.494362] Code: fa 48 c1 ea 03 80 3c 02 00 0f 85 50 09 00 00 49 8b 9e 60 06 00 00 48 b8 00 00 00 00 00 fc ff df 48 8d 7b 14 48 89 fa 48 c1 ea 03 <0f> b6 04 02 48 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 14 09 00
All code

========
   0:   fa                      cli
   1:   48 c1 ea 03             shr    $0x3,%rdx
   5:   80 3c 02 00             cmpb   $0x0,(%rdx,%rax,1)
   9:   0f 85 50 09 00 00       jne    0x95f
   f:   49 8b 9e 60 06 00 00    mov    0x660(%r14),%rbx
  16:   48 b8 00 00 00 00 00    movabs $0xdffffc0000000000,%rax
  1d:   fc ff df
  20:   48 8d 7b 14             lea    0x14(%rbx),%rdi
  24:   48 89 fa                mov    %rdi,%rdx
  27:   48 c1 ea 03             shr    $0x3,%rdx
  2b:*  0f b6 04 02             movzbl (%rdx,%rax,1),%eax               <-- trapping instruction
  2f:   48 89 fa                mov    %rdi,%rdx
  32:   83 e2 07                and    $0x7,%edx
  35:   38 d0                   cmp    %dl,%al
  37:   7f 08                   jg     0x41
  39:   84 c0                   test   %al,%al
  3b:   0f 85 14 09 00 00       jne    0x955

Code starting with the faulting instruction
===========================================
   0:   0f b6 04 02             movzbl (%rdx,%rax,1),%eax
   4:   48 89 fa                mov    %rdi,%rdx
   7:   83 e2 07                and    $0x7,%edx
   a:   38 d0                   cmp    %dl,%al
   c:   7f 08                   jg     0x16
   e:   84 c0                   test   %al,%al
  10:   0f 85 14 09 00 00       jne    0x92a
[1033173.494366] RIP clip_ioctl (net/atm/clip.c:320 net/atm/clip.c:689)
[1033173.494368]  RSP <ffff880063117a88>

Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Zefan Li <[email protected]>
sndnvaps pushed a commit to multirom-aries/android_kernel_xiaomi_aries that referenced this pull request Jun 9, 2016
commit 8832317f662c06f5c06e638f57bfe89a71c9b266 upstream.

Currently we do not validate rtas.entry before calling enter_rtas(). This
leads to a kernel oops when user space calls rtas system call on a powernv
platform (see below). This patch adds code to validate rtas.entry before
making enter_rtas() call.

  Oops: Exception in kernel mode, sig: 4 [M1cha#1]
  SMP NR_CPUS=1024 NUMA PowerNV
  task: c000000004294b80 ti: c0000007e1a78000 task.ti: c0000007e1a78000
  NIP: 0000000000000000 LR: 0000000000009c14 CTR: c000000000423140
  REGS: c0000007e1a7b920 TRAP: 0e40   Not tainted  (3.18.17-340.el7_1.pkvm3_1_0.2400.1.ppc64le)
  MSR: 1000000000081000 <HV,ME>  CR: 00000000  XER: 00000000
  CFAR: c000000000009c0c SOFTE: 0
  NIP [0000000000000000]           (null)
  LR [0000000000009c14] 0x9c14
  Call Trace:
  [c0000007e1a7bba0] [c00000000041a7f4] avc_has_perm_noaudit+0x54/0x110 (unreliable)
  [c0000007e1a7bd80] [c00000000002ddc0] ppc_rtas+0x150/0x2d0
  [c0000007e1a7be30] [c000000000009358] syscall_exit+0x0/0x98

Fixes: 55190f8 ("powerpc: Add skeleton PowerNV platform")
Reported-by: NAGESWARA R. SASTRY <[email protected]>
Signed-off-by: Vasant Hegde <[email protected]>
[mpe: Reword change log, trim oops, and add stable + fixes]
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Zefan Li <[email protected]>
sndnvaps pushed a commit to multirom-aries/android_kernel_xiaomi_aries that referenced this pull request Jun 9, 2016
commit eddd3826a1a0190e5235703d1e666affa4d13b96 upstream.

Dmitry Vyukov reported the following using trinity and the memory
error detector AddressSanitizer
(https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel).

[ 124.575597] ERROR: AddressSanitizer: heap-buffer-overflow on
address ffff88002e280000
[ 124.576801] ffff88002e280000 is located 131938492886538 bytes to
the left of 28857600-byte region [ffffffff81282e0a, ffffffff82e0830a)
[ 124.578633] Accessed by thread T10915:
[ 124.579295] inlined in describe_heap_address
./arch/x86/mm/asan/report.c:164
[ 124.579295] #0 ffffffff810dd277 in asan_report_error
./arch/x86/mm/asan/report.c:278
[ 124.580137] M1cha#1 ffffffff810dc6a0 in asan_check_region
./arch/x86/mm/asan/asan.c:37
[ 124.581050] M1cha#2 ffffffff810dd423 in __tsan_read8 ??:0
[ 124.581893] #3 ffffffff8107c093 in get_wchan
./arch/x86/kernel/process_64.c:444

The address checks in the 64bit implementation of get_wchan() are
wrong in several ways:

 - The lower bound of the stack is not the start of the stack
   page. It's the start of the stack page plus sizeof (struct
   thread_info)

 - The upper bound must be:

       top_of_stack - TOP_OF_KERNEL_STACK_PADDING - 2 * sizeof(unsigned long).

   The 2 * sizeof(unsigned long) is required because the stack pointer
   points at the frame pointer. The layout on the stack is: ... IP FP
   ... IP FP. So we need to make sure that both IP and FP are in the
   bounds.

Fix the bound checks and get rid of the mix of numeric constants, u64
and unsigned long. Making all unsigned long allows us to use the same
function for 32bit as well.

Use READ_ONCE() when accessing the stack. This does not prevent a
concurrent wakeup of the task and the stack changing, but at least it
avoids TOCTOU.

Also check task state at the end of the loop. Again that does not
prevent concurrent changes, but it avoids walking for nothing.

Add proper comments while at it.

Reported-by: Dmitry Vyukov <[email protected]>
Reported-by: Sasha Levin <[email protected]>
Based-on-patch-from: Wolfram Gloger <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Reviewed-by: Dmitry Vyukov <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Kostya Serebryany <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: kasan-dev <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Wolfram Gloger <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
[lizf: Backported to 3.4:
 - s/READ_ONCE/ACCESS_ONCE
 - remove TOP_OF_KERNEL_STACK_PADDING]
Signed-off-by: Zefan Li <[email protected]>
sndnvaps pushed a commit to multirom-aries/android_kernel_xiaomi_aries that referenced this pull request Jun 9, 2016
…fy a fault

commit 2f84a8990ebbe235c59716896e017c6b2ca1200f upstream.

SunDong reported the following on

  https://bugzilla.kernel.org/show_bug.cgi?id=103841

	I think I find a linux bug, I have the test cases is constructed. I
	can stable recurring problems in fedora22(4.0.4) kernel version,
	arch for x86_64.  I construct transparent huge page, when the parent
	and child process with MAP_SHARE, MAP_PRIVATE way to access the same
	huge page area, it has the opportunity to lead to huge page copy on
	write failure, and then it will munmap the child corresponding mmap
	area, but then the child mmap area with VM_MAYSHARE attributes, child
	process munmap this area can trigger VM_BUG_ON in set_vma_resv_flags
	functions (vma - > vm_flags & VM_MAYSHARE).

There were a number of problems with the report (e.g.  it's hugetlbfs that
triggers this, not transparent huge pages) but it was fundamentally
correct in that a VM_BUG_ON in set_vma_resv_flags() can be triggered that
looks like this

	 vma ffff8804651fd0d0 start 00007fc474e00000 end 00007fc475e00000
	 next ffff8804651fd018 prev ffff8804651fd188 mm ffff88046b1b1800
	 prot 8000000000000027 anon_vma           (null) vm_ops ffffffff8182a7a0
	 pgoff 0 file ffff88106bdb9800 private_data           (null)
	 flags: 0x84400fb(read|write|shared|mayread|maywrite|mayexec|mayshare|dontexpand|hugetlb)
	 ------------
	 kernel BUG at mm/hugetlb.c:462!
	 SMP
	 Modules linked in: xt_pkttype xt_LOG xt_limit [..]
	 CPU: 38 PID: 26839 Comm: map Not tainted 4.0.4-default M1cha#1
	 Hardware name: Dell Inc. PowerEdge R810/0TT6JF, BIOS 2.7.4 04/26/2012
	 set_vma_resv_flags+0x2d/0x30

The VM_BUG_ON is correct because private and shared mappings have
different reservation accounting but the warning clearly shows that the
VMA is shared.

When a private COW fails to allocate a new page then only the process
that created the VMA gets the page -- all the children unmap the page.
If the children access that data in the future then they get killed.

The problem is that the same file is mapped shared and private.  During
the COW, the allocation fails, the VMAs are traversed to unmap the other
private pages but a shared VMA is found and the bug is triggered.  This
patch identifies such VMAs and skips them.

Signed-off-by: Mel Gorman <[email protected]>
Reported-by: SunDong <[email protected]>
Reviewed-by: Michal Hocko <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: David Rientjes <[email protected]>
Reviewed-by: Naoya Horiguchi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Zefan Li <[email protected]>
sndnvaps pushed a commit to multirom-aries/android_kernel_xiaomi_aries that referenced this pull request Jun 9, 2016
commit e81107d4c6bd098878af9796b24edc8d4a9524fd upstream.

My colleague ran into a program stall on a x86_64 server, where
n_tty_read() was waiting for data even if there was data in the buffer
in the pty.  kernel stack for the stuck process looks like below.
 #0 [ffff88303d107b58] __schedule at ffffffff815c4b20
 M1cha#1 [ffff88303d107bd0] schedule at ffffffff815c513e
 M1cha#2 [ffff88303d107bf0] schedule_timeout at ffffffff815c7818
 #3 [ffff88303d107ca0] wait_woken at ffffffff81096bd2
 #4 [ffff88303d107ce0] n_tty_read at ffffffff8136fa23
 #5 [ffff88303d107dd0] tty_read at ffffffff81368013
 #6 [ffff88303d107e20] __vfs_read at ffffffff811a3704
 #7 [ffff88303d107ec0] vfs_read at ffffffff811a3a57
 #8 [ffff88303d107f00] sys_read at ffffffff811a4306
 #9 [ffff88303d107f50] entry_SYSCALL_64_fastpath at ffffffff815c86d7

There seems to be two problems causing this issue.

First, in drivers/tty/n_tty.c, __receive_buf() stores the data and
updates ldata->commit_head using smp_store_release() and then checks
the wait queue using waitqueue_active().  However, since there is no
memory barrier, __receive_buf() could return without calling
wake_up_interactive_poll(), and at the same time, n_tty_read() could
start to wait in wait_woken() as in the following chart.

        __receive_buf()                         n_tty_read()
------------------------------------------------------------------------
if (waitqueue_active(&tty->read_wait))
/* Memory operations issued after the
   RELEASE may be completed before the
   RELEASE operation has completed */
                                        add_wait_queue(&tty->read_wait, &wait);
                                        ...
                                        if (!input_available_p(tty, 0)) {
smp_store_release(&ldata->commit_head,
                  ldata->read_head);
                                        ...
                                        timeout = wait_woken(&wait,
                                          TASK_INTERRUPTIBLE, timeout);
------------------------------------------------------------------------

The second problem is that n_tty_read() also lacks a memory barrier
call and could also cause __receive_buf() to return without calling
wake_up_interactive_poll(), and n_tty_read() to wait in wait_woken()
as in the chart below.

        __receive_buf()                         n_tty_read()
------------------------------------------------------------------------
                                        spin_lock_irqsave(&q->lock, flags);
                                        /* from add_wait_queue() */
                                        ...
                                        if (!input_available_p(tty, 0)) {
                                        /* Memory operations issued after the
                                           RELEASE may be completed before the
                                           RELEASE operation has completed */
smp_store_release(&ldata->commit_head,
                  ldata->read_head);
if (waitqueue_active(&tty->read_wait))
                                        __add_wait_queue(q, wait);
                                        spin_unlock_irqrestore(&q->lock,flags);
                                        /* from add_wait_queue() */
                                        ...
                                        timeout = wait_woken(&wait,
                                          TASK_INTERRUPTIBLE, timeout);
------------------------------------------------------------------------

There are also other places in drivers/tty/n_tty.c which have similar
calls to waitqueue_active(), so instead of adding many memory barrier
calls, this patch simply removes the call to waitqueue_active(),
leaving just wake_up*() behind.

This fixes both problems because, even though the memory access before
or after the spinlocks in both wake_up*() and add_wait_queue() can
sneak into the critical section, it cannot go past it and the critical
section assures that they will be serialized (please see "INTER-CPU
ACQUIRING BARRIER EFFECTS" in Documentation/memory-barriers.txt for a
better explanation).  Moreover, the resulting code is much simpler.

Latency measurement using a ping-pong test over a pty doesn't show any
visible performance drop.

Signed-off-by: Kosuke Tatsukawa <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
[lizf: Backported to 3.4:
 - adjust context
 - s/wake_up_interruptible_poll/wake_up_interruptible/
 - drop changes to __receive_buf() and n_tty_set_termios()]
Signed-off-by: Zefan Li <[email protected]>
sndnvaps pushed a commit to multirom-aries/android_kernel_xiaomi_aries that referenced this pull request Sep 30, 2016
Once we failed to merge inline data into inode page during flushing inline
inode, we will skip invoking inode_dec_dirty_pages, which makes dirty page
count incorrect, result in panic in ->evict_inode, Fix it.

------------[ cut here ]------------
kernel BUG at /home/yuchao/git/devf2fs/inode.c:336!
invalid opcode: 0000 [M1cha#1] PREEMPT SMP
CPU: 3 PID: 10004 Comm: umount Tainted: G           O    4.6.0-rc5+ #17
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
task: f0c33000 ti: c5212000 task.ti: c5212000
EIP: 0060:[<f89aacb5>] EFLAGS: 00010202 CPU: 3
EIP is at f2fs_evict_inode+0x85/0x490 [f2fs]
EAX: 00000001 EBX: c4529ea0 ECX: 00000001 EDX: 00000000
ESI: c0131000 EDI: f89dd0a0 EBP: c5213e9c ESP: c5213e7
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 80050033 CR2: b75878c0 CR3: 1a36a700 CR4: 000406f0
Stack:
 c4529ea0 c4529ef4 c5213e8c c176d45c c4529ef4 00000000 c4529ea0 c4529fac
 f89dd0a0 c5213eb0 c1204a68 c5213ed8 c452a2b4 c6680930 c5213ec0 c1204b64
 c6680d44 c6680620 c5213eec c120588d ee84b000 ee84b5c0 c5214000 ee84b5e0
Call Trace:
 [<c176d45c>] ? _raw_spin_unlock+0x2c/0x50
 [<c1204a68>] evict+0xa8/0x170
 [<c1204b64>] dispose_list+0x34/0x50
 [<c120588d>] evict_inodes+0x10d/0x130
 [<c11ea941>] generic_shutdown_super+0x41/0xe0
 [<c1185190>] ? unregister_shrinker+0x40/0x50
 [<c1185190>] ? unregister_shrinker+0x40/0x50
 [<c11eac52>] kill_block_super+0x22/0x70
 [<f89af23e>] kill_f2fs_super+0x1e/0x20 [f2fs]
 [<c11eae1d>] deactivate_locked_super+0x3d/0x70
 [<c11eb383>] deactivate_super+0x43/0x60
 [<c1208ec9>] cleanup_mnt+0x39/0x80
 [<c1208f50>] __cleanup_mnt+0x10/0x20
 [<c107d091>] task_work_run+0x71/0x90
 [<c105725a>] exit_to_usermode_loop+0x72/0x9e
 [<c1001c7c>] do_fast_syscall_32+0x19c/0x1c0
 [<c176dd48>] sysenter_past_esp+0x45/0x74
EIP: [<f89aacb5>] f2fs_evict_inode+0x85/0x490 [f2fs] SS:ESP 0068:c5213e78
---[ end trace d30536330b7fdc58 ]---

Change-Id: Iad209ae94828e8e38955459d1ea9573c9e11ede6
Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants