Re: Cherry-picked commit messages of interest from the 5.4 merge window

Subject: Re: Cherry-picked commit messages of interest from the 5.4 merge window
From: Alex Pilon <alp+l3go [ at ] alexpilon [ dot ] ca>
Date: Tue, 1 Oct 2019 09:52:47 -0400

On Tue, Oct 01, 2019 at 08:47:07AM -0400, Alex Pilon wrote:
> Richard suggested I just post this here in advance of the meeting. Mix
> of merge commits from Linus and the commits proper. […] More of these
> on my work laptop. Many of them somewhat relevant at work too.

Attached.

Plenty of more detailed commits in there should you git log v5.3.. --merges
--author='Linus Torvalds', like some sched, perf, and RCU ones. I only
highlighted the funny or otherwise interesting ones.

commit 110ea1d833ad291272d52e0a40a06157a3d9ba17
Author: Alexander Schremmer <alex [ at ] alexanderweb [ dot ] de>
Date:   Thu Aug 22 13:48:33 2019 +0200

    platform/x86: thinkpad_acpi: Add ThinkPad PrivacyGuard

    This feature is found optionally in T480s, T490, T490s.

    The feature is called lcdshadow and visible via
    /proc/acpi/ibm/lcdshadow.

    The ACPI methods \_SB.PCI0.LPCB.EC.HKEY.{GSSS,SSSS,TSSS,CSSS} are
    available in these machines. They get, set, toggle or change the state
    apparently.

    The patch was tested on a 5.0 series kernel on a T480s.

commit e86c2c8b9380440bbe761b8e2f63ab6b04a45ac2
Author: Brendan Shanks <bshanks [ at ] codeweavers [ dot ] com>
Date:   Thu Sep 5 16:22:21 2019 -0700

    x86/umip: Add emulation (spoofing) for UMIP covered instructions in 64-bit processes as well

    Add emulation (spoofing) of the SGDT, SIDT, and SMSW instructions for 64-bit
    processes.

    Wine users have encountered a number of 64-bit Windows games that use
    these instructions (particularly SGDT), and were crashing when run on
    UMIP-enabled systems.

commit e0d60a1e68a3fbf42cdf3546004e00230d9048ba
Merge: 22331f895298 6365b842aae4
Author: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
Date:   Mon Sep 16 19:06:29 2019 -0700

    Merge branch 'x86-entry-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

    Pull x86 entry updates from Ingo Molnar:
     "This contains x32 and compat syscall improvements, the biggest one of
      which splits x32 syscalls into their own table, which allows new
      syscalls to share the x32 and x86-64 number - which turns the
      512-547 special syscall numbers range into a legacy wart that won't be
      extended going forward"

commit f240652b6032b48ad7fa35c5e701cc4c8d697c0b
Author: Dave Hansen <dave [ dot ] hansen [ at ] linux [ dot ] intel [ dot ] com>
Date:   Fri Jul 5 10:53:21 2019 -0700

    x86/mpx: Remove MPX APIs

    MPX is being removed from the kernel due to a lack of support in the
    toolchain going forward (gcc).

    The first step is to remove the userspace-visible ABIs so that applications
    will stop using it.  The most visible one are the enable/disable prctl()s.
    Remove them first.

    This is the most minimal and least invasive change needed to ensure that
    apps stop using MPX with new kernels.

commit 7e67a859997aad47727aff9c5a32e160da079ce3
Merge: 772c1d06bd40 563c4f85f9f0
Author: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
Date:   Mon Sep 16 17:25:49 2019 -0700

    Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

    Pull scheduler updates from Ingo Molnar:

     […]

     - Add another series of patches that brings the -rt (PREEMPT_RT) tree
       closer to mainline: split the monolithic CONFIG_PREEMPT dependencies
       into a new CONFIG_PREEMPTION category that will allow the eventual
       introduction of CONFIG_PREEMPT_RT. Still a few more hundred patches
       to go though.

     […]

     - Improve load-balancing on AMD EPYC systems.

commit eb92692b2544d3f415887dbbc98499843dfe568b
Author: Quentin Perret <quentin [ dot ] perret [ at ] arm [ dot ] com>
Date:   Thu Sep 12 11:44:04 2019 +0200

    sched/fair: Speed-up energy-aware wake-ups
    
    EAS computes the energy impact of migrating a waking task when deciding
    on which CPU it should run. However, the current approach is known to
    have a high algorithmic complexity, which can result in prohibitively
    high wake-up latencies on systems with complex energy models, such as
    systems with per-CPU DVFS. On such systems, the algorithm complexity is
    in O(n^2) (ignoring the cost of searching for performance states in the
    EM) with 'n' the number of CPUs.
    
    To address this, re-factor the EAS wake-up path to compute the energy
    'delta' (with and without the task) on a per-performance domain basis,
    rather than system-wide, which brings the complexity down to O(n).
    
    No functional changes intended.
    
    Test results
    ~~~~~~~~~~~~
    
    * Setup: Tested on a Google Pixel 3, with a Snapdragon 845 (4+4 CPUs,
      A55/A75). Base kernel is 5.3-rc5 + Pixel3 specific patches. Android
      userspace, no graphics.
    
    * Test case:  Run a periodic rt-app task, with 16ms period, ramping down
      from 70% to 10%, in 5% steps of 500 ms each (json avail. at [1]).
      Frequencies of all CPUs are pinned to max (using scaling_min_freq
      CPUFreq sysfs entries) to reduce variability. The time to run
      select_task_rq_fair() is measured using the function profiler
      (/sys/kernel/debug/tracing/trace_stat/function*). See the test script
      for more details [2].
    
    Test 1:
    
    I hacked the DT to 'fake' per-CPU DVFS. That is, we end up with one
    CPUFreq policy per CPU (8 policies in total). Since all frequencies are
    pinned to max for the test, this should have no impact on the actual
    frequency selection, but it does in the EAS calculation.
    
          +---------------------------+----------------------------------+
          | Without patch             | With patch                       |
    +-----+-----+----------+----------+-----+-----------------+----------+
    | CPU | Hit | Avg (us) | s^2 (us) | Hit | Avg (us)        | s^2 (us) |
    |-----+-----+----------+----------+-----+-----------------+----------+
    |  0  | 274 | 38.303   | 1750.239 | 401 | 14.126 (-63.1%) | 146.625  |
    |  1  | 197 | 49.529   | 1695.852 | 314 | 16.135 (-67.4%) | 167.525  |
    |  2  | 142 | 34.296   | 1758.665 | 302 | 14.133 (-58.8%) | 130.071  |
    |  3  | 172 | 31.734   | 1490.975 | 641 | 14.637 (-53.9%) | 139.189  |
    |  4  | 316 | 7.834    | 178.217  | 425 | 5.413  (-30.9%) | 20.803   |
    |  5  | 447 | 8.424    | 144.638  | 556 | 5.929  (-29.6%) | 27.301   |
    |  6  | 581 | 14.886   | 346.793  | 456 | 5.711  (-61.6%) | 23.124   |
    |  7  | 456 | 10.005   | 211.187  | 997 | 4.708  (-52.9%) | 21.144   |
    +-----+-----+----------+----------+-----+-----------------+----------+
                 * Hit, Avg and s^2 are as reported by the function profiler
    
    Test 2:
    I also ran the same test with a normal DT, with 2 CPUFreq policies, to
    see if this causes regressions in the most common case.
    
          +---------------------------+----------------------------------+
          | Without patch             | With patch                       |
    +-----+-----+----------+----------+-----+-----------------+----------+
    | CPU | Hit | Avg (us) | s^2 (us) | Hit | Avg (us)        | s^2 (us) |
    |-----+-----+----------+----------+-----+-----------------+----------+
    |  0  | 345 | 22.184   | 215.321  | 580 | 18.635 (-16.0%) | 146.892  |
    |  1  | 358 | 18.597   | 200.596  | 438 | 12.934 (-30.5%) | 104.604  |
    |  2  | 359 | 25.566   | 200.217  | 397 | 10.826 (-57.7%) | 74.021   |
    |  3  | 362 | 16.881   | 200.291  | 718 | 11.455 (-32.1%) | 102.280  |
    |  4  | 457 | 3.822    | 9.895    | 757 | 4.616  (+20.8%) | 13.369   |
    |  5  | 344 | 4.301    | 7.121    | 594 | 5.320  (+23.7%) | 18.798   |
    |  6  | 472 | 4.326    | 7.849    | 464 | 5.648  (+30.6%) | 22.022   |
    |  7  | 331 | 4.630    | 13.937   | 408 | 5.299  (+14.4%) | 18.273   |
    +-----+-----+----------+----------+-----+-----------------+----------+
                 * Hit, Avg and s^2 are as reported by the function profiler
    
    In addition to these two tests, I also ran 50 iterations of the Lisa
    EAS functional test suite [3] with this patch applied on Arm Juno r0,
    Arm Juno r2, Arm TC2 and Hikey960, and could not see any regressions
    (all EAS functional tests are passing).
    
     [1] https://paste.debian.net/1100055/
     [2] https://paste.debian.net/1100057/
     [3] https://github.com/ARM-software/lisa/blob/master/lisa/tests/scheduler/eas_behaviour.py
    
    Signed-off-by: Quentin Perret <quentin [ dot ] perret [ at ] arm [ dot ] com>
    Cc: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
    Cc: Peter Zijlstra <peterz [ at ] infradead [ dot ] org>
    Cc: Thomas Gleixner <tglx [ at ] linutronix [ dot ] de>
    Cc: dietmar [ dot ] eggemann [ at ] arm [ dot ] com
    Cc: juri [ dot ] lelli [ at ] redhat [ dot ] com
    Cc: morten [ dot ] rasmussen [ at ] arm [ dot ] com
    Cc: qais [ dot ] yousef [ at ] arm [ dot ] com
    Cc: qperret [ at ] qperret [ dot ] net
    Cc: rjw [ at ] rjwysocki [ dot ] net
    Cc: tkjos [ at ] google [ dot ] com
    Cc: valentin [ dot ] schneider [ at ] arm [ dot ] com
    Cc: vincent [ dot ] guittot [ at ] linaro [ dot ] org
    Link: https://lkml.kernel.org/r/20190912094404 [ dot ] 13802-1-qperret [ at ] qperret [ dot ] net
    Signed-off-by: Ingo Molnar <mingo [ at ] kernel [ dot ] org>

End of an era.

commit cf07cb1ff4ea008abf06c95878c700cf1dd65c3e
Author: Christoph Hellwig <hch [ at ] lst [ dot ] de>
Date:   Tue Aug 13 09:25:01 2019 +0200

    ia64: remove support for the SGI SN2 platform

    The SGI SN2 (early Altix) is a very non-standard IA64 platform that was
    at the very high end of even IA64 hardware, and has been discontinued
    a long time ago.  Remove it because there no upstream users left, and it
    has magic hooks all over the kernel.

commit e77fafe9afb53b7f4d8176c5cd5c10c43a905bc8
Merge: 52a5525214d0 e376897f424a
Author: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
Date:   Mon Sep 16 14:31:40 2019 -0700

    Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
    
    Pull arm64 updates from Will Deacon:
     "Although there isn't tonnes of code in terms of line count, there are
      a fair few headline features which I've noted both in the tag and also
      in the merge commits when I pulled everything together.
    
      The part I'm most pleased with is that we had 35 contributors this
      time around, which feels like a big jump from the usual small group of
      core arm64 arch developers. Hopefully they all enjoyed it so much that
      they'll continue to contribute, but we'll see.
    
      It's probably worth highlighting that we've pulled in a branch from
      the risc-v folks which moves our CPU topology code out to where it can
      be shared with others.
    
      Summary:
    
       - 52-bit virtual addressing in the kernel
    
       - New ABI to allow tagged user pointers to be dereferenced by
         syscalls
    
       - Early RNG seeding by the bootloader
    
       […]
    
       - Fix TLB invalidation in light of recent architectural
         clarifications
    
       […]
    
       - Relaxation of implicit I/O memory barriers
    
       - Build with RELR relocations when toolchain supports them
    
       - Numerous cleanups and non-critical fixes"

commit c17112a5c413f20188da276c138484e7127cdc84
Merge: 4d856f72c10e 821cc7b0b205
Author: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
Date:   Mon Sep 16 09:28:19 2019 -0700

    Merge tag 'core-process-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
    
    Pull pidfd/waitid updates from Christian Brauner:
     "This contains two features and various tests.
    
      First, it adds support for waiting on process through pidfds by adding
      the P_PIDFD type to the waitid() syscall. This completes the basic
      functionality of the pidfd api (cf. [1]). In the meantime we also have
      a new adition to the userspace projects that make use of the pidfd
      api. The qt project was nice enough to send a mail pointing out that
      they have a pr up to switch to the pidfd api (cf. [2]).
    
      Second, this tag contains an extension to the waitid() syscall to make
      it possible to wait on the current process group in a race free manner
      (even though the actual problem is very unlikely) by specifing 0
      together with the P_PGID type. This extension traces back to a
      discussion on the glibc development mailing list.
    
      There are also a range of tests for the features above. Additionally,
      the test-suite which detected the pidfd-polling race we fixed in [3]
      is included in this tag"
    
    [1] https://lwn.net/Articles/794707/
    [2] https://codereview.qt-project.org/c/qt/qtbase/+/108456
    [3] commit b191d6491be6 ("pidfd: fix a poll race when setting exit_state")
    
    * tag 'core-process-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
      waitid: Add support for waiting for the current process group
      tests: add pidfd poll tests
      tests: move common definitions and functions into pidfd.h
      pidfd: add pidfd_wait tests
      pidfd: add P_PIDFD to waitid()

commit e444d51b14c4795074f485c79debd234931f0e49
Merge: c6b48dad92ae 1dce2df3ee06
Author: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
Date:   Wed Sep 18 10:50:47 2019 -0700

    Merge tag 'tty-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

    Pull tty/serial driver updates from Greg KH:
     "Even in this age, people are still making new serial port silicon,
      why...

commit e6874fc29410fabfdbc8c12b467f41a16cbcfd2b
Merge: e444d51b14c4 3fb73eddba10
Author: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
Date:   Wed Sep 18 11:05:34 2019 -0700

    Merge tag 'staging-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

    Pull staging and IIO driver updates from Greg KH:
     "Here is the big staging/iio driver update for 5.4-rc1.

      Lots of churn here, with a few driver/filesystems moving out of
      staging finally:

         - erofs moved out of staging

         - greybus core code moved out of staging

      Along with that, a new filesytem has been added:

         - extfat

      to provide support for those devices requiring that filesystem (i.e.
      transfer devices to/from windows systems or printers)

commit c6b48dad92aedaa9bdc013ee495cb5b1bbdf1f11
Merge: 1f7d290a7275 fb9617edf6c0
Author: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
Date:   Wed Sep 18 10:33:46 2019 -0700

    Merge tag 'usb-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

    Pull USB updates from Greg KH:
     "Here is the big set of USB patches for 5.4-rc1.

      Two major chunks of code are moving out of the tree and into the
      staging directory, uwb and wusb (wireless USB support), because there
      are no devices that actually use this protocol anymore, and what we
      have today probably doesn't work at all given that the maintainers
      left many many years ago. So move it to staging where it will be
      removed in a few releases if no one screams.

commit e6bc9de714972cac34daa1dc1567ee48a47a9342
Merge: b6c0d3577246 dc617f29dbe5
Author: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
Date:   Wed Sep 18 17:35:20 2019 -0700

    Merge tag 'vfs-5.4-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

    Pull swap access updates from Darrick Wong:
     "Prohibit writing to active swap files and swap partitions.

      There's no non-malicious use case for allowing userspace to scribble
      on storage that the kernel thinks it owns"

commit f60c55a94e1d127186566f06294f2dadd966e9b4
Merge: 734d1ed83e1f 95ae251fe828
Author: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
Date:   Wed Sep 18 16:59:14 2019 -0700

    Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt
    
    Pull fs-verity support from Eric Biggers:
     "fs-verity is a filesystem feature that provides Merkle tree based
      hashing (similar to dm-verity) for individual readonly files, mainly
      for the purpose of efficient authenticity verification.
    
      This pull request includes:
    
       (a) The fs/verity/ support layer and documentation.
    
       (b) fs-verity support for ext4 and f2fs.
    
      Compared to the original fs-verity patchset from last year, the UAPI
      to enable fs-verity on a file has been greatly simplified. Lots of
      other things were cleaned up too.
    
      fs-verity is planned to be used by two different projects on Android;
      most of the userspace code is in place already. Another userspace tool
      ("fsverity-utils"), and xfstests, are also available. e2fsprogs and
      f2fs-tools already have fs-verity support. Other people have shown
      interest in using fs-verity too.
    
      I've tested this on ext4 and f2fs with xfstests, both the existing
      tests and the new fs-verity tests. This has also been in linux-next
      since July 30 with no reported issues except a couple minor ones I
      found myself and folded in fixes for.
    
      Ted and I will be co-maintaining fs-verity"
    
    * tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
      f2fs: add fs-verity support
      ext4: update on-disk format documentation for fs-verity
      ext4: add fs-verity read support
      ext4: add basic fs-verity support
      fs-verity: support builtin file signatures
      fs-verity: add SHA-512 support
      fs-verity: implement FS_IOC_MEASURE_VERITY ioctl
      fs-verity: implement FS_IOC_ENABLE_VERITY ioctl
      fs-verity: add data verification hooks for ->readpages()
      fs-verity: add the hook for file ->setattr()
      fs-verity: add the hook for file ->open()
      fs-verity: add inode and superblock fields
      fs-verity: add Kconfig and the helper functions for hashing
      fs: uapi: define verity bit for FS_IOC_GETFLAGS
      fs-verity: add UAPI header
      fs-verity: add MAINTAINERS file entry
      fs-verity: add a documentation file

commit 734d1ed83e1f9b7bafb650033fb87c657858cf5b
Merge: d013cc800a2a 0642ea2409f3
Author: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
Date:   Wed Sep 18 16:08:52 2019 -0700

    Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt
    
    Pull fscrypt updates from Eric Biggers:
     "This is a large update to fs/crypto/ which includes:
    
       - Add ioctls that add/remove encryption keys to/from a
         filesystem-level keyring.
    
         These fix user-reported issues where e.g. an encrypted home
         directory can break NetworkManager, sshd, Docker, etc. because they
         don't get access to the needed keyring. These ioctls also provide a
         way to lock encrypted directories that doesn't use the
         vm.drop_caches sysctl, so is faster, more reliable, and doesn't
         always need root.
    
       - Add a new encryption policy version ("v2") which switches to a more
         standard, secure, and flexible key derivation function, and starts
         verifying that the correct key was supplied before using it.
    
         The key derivation improvement is needed for its own sake as well
         as for ongoing feature work for which the current way is too
         inflexible.
    
      Work is in progress to update both Android and the 'fscrypt' userspace
      tool to use both these features. (Working patches are available and
      just need to be reviewed+merged.) Chrome OS will likely use them too.
    
      This has also been tested on ext4, f2fs, and ubifs with xfstests --
      both the existing encryption tests, and the new tests for this. This
      has also been in linux-next since Aug 16 with no reported issues. I'm
      also using an fscrypt v2-encrypted home directory on my personal
      desktop"

commit 40144e49ff84c3bd6bd091b58115257670be8803
Author: Jan Kara <jack [ at ] suse [ dot ] cz>
Date:   Thu Aug 29 09:04:12 2019 -0700

    xfs: Fix stale data exposure when readahead races with hole punch

    Hole puching currently evicts pages from page cache and then goes on to
    remove blocks from the inode. This happens under both XFS_IOLOCK_EXCL
    and XFS_MMAPLOCK_EXCL which provides appropriate serialization with
    racing reads or page faults. However there is currently nothing that
    prevents readahead triggered by fadvise() or madvise() from racing with
    the hole punch and instantiating page cache page after hole punching has
    evicted page cache in xfs_flush_unmap_range() but before it has removed
    blocks from the inode. This page cache page will be mapping soon to be
    freed block and that can lead to returning stale data to userspace or
    even filesystem corruption.

    Fix the problem by protecting handling of readahead requests by
    XFS_IOLOCK_SHARED similarly as we protect reads.

    CC: stable [ at ] vger [ dot ] kernel [ dot ] org
    Link: https://lore.kernel.org/linux-fsdevel/CAOQ4uxjQNmxqmtA_VbYW0Su9rKRk2zobJmahcyeaEVOFKVQ5dw@mai>
    Reported-by: Amir Goldstein <amir73il [ at ] gmail [ dot ] com>
    Signed-off-by: Jan Kara <jack [ at ] suse [ dot ] cz>
    Reviewed-by: Darrick J. Wong <darrick [ dot ] wong [ at ] oracle [ dot ] com>
    Signed-off-by: Darrick J. Wong <darrick [ dot ] wong [ at ] oracle [ dot ] com>

commit ddbca70cc45c0ac97ff6d9529e45f10b8ae73ad4
Author: Dave Chinner <dchinner [ at ] redhat [ dot ] com>
Date:   Thu Aug 29 09:04:10 2019 -0700

    xfs: allocate xattr buffer on demand
    
    When doing file lookups and checking for permissions, we end up in
    xfs_get_acl() to see if there are any ACLs on the inode. This
    requires and xattr lookup, and to do that we have to supply a buffer
    large enough to hold an maximum sized xattr.
    
    On workloads were we are accessing a wide range of cache cold files
    under memory pressure (e.g. NFS fileservers) we end up spending a
    lot of time allocating the buffer. The buffer is 64k in length, so
    is a contiguous multi-page allocation, and if that then fails we
    fall back to vmalloc(). Hence the allocation here is /expensive/
    when we are looking up hundreds of thousands of files a second.
    
    Initial numbers from a bpf trace show average time in xfs_get_acl()
    is ~32us, with ~19us of that in the memory allocation. Note these
    are average times, so there are going to be affected by the worst
    case allocations more than the common fast case...
    
    To avoid this, we could just do a "null"  lookup to see if the ACL
    xattr exists and then only do the allocation if it exists. This,
    however, optimises the path for the "no ACL present" case at the
    expense of the "acl present" case. i.e. we can halve the time in
    xfs_get_acl() for the no acl case (i.e down to ~10-15us), but that
    then increases the ACL case by 30% (i.e. up to 40-45us).
    
    To solve this and speed up both cases, drive the xattr buffer
    allocation into the attribute code once we know what the actual
    xattr length is. For the no-xattr case, we avoid the allocation
    completely, speeding up that case. For the common ACL case, we'll
    end up with a fast heap allocation (because it'll be smaller than a
    page), and only for the rarer "we have a remote xattr" will we have
    a multi-page allocation occur. Hence the common ACL case will be
    much faster, too.
    
    Signed-off-by: Dave Chinner <dchinner [ at ] redhat [ dot ] com>
    Reviewed-by: Christoph Hellwig <hch [ at ] lst [ dot ] de>
    Reviewed-by: Darrick J. Wong <darrick [ dot ] wong [ at ] oracle [ dot ] com>
    Signed-off-by: Darrick J. Wong <darrick [ dot ] wong [ at ] oracle [ dot ] com>

commit 756c6f0f7efe8759ff6dda35d220e2e753e2b0e3
Author: Dave Chinner <dchinner [ at ] redhat [ dot ] com>
Date:   Thu Aug 29 09:04:08 2019 -0700

    xfs: reverse search directory freespace indexes
    
    When a directory is growing rapidly, new blocks tend to get added at
    the end of the directory. These end up at the end of the freespace
    index, and when the directory gets large finding these new
    freespaces gets expensive. The code does a linear search across the
    frespace index from the first block in the directory to the last,
    hence meaning the newly added space is the last index searched.
    
    Instead, do a reverse order index search, starting from the last
    block and index in the freespace index. This makes most lookups for
    free space on rapidly growing directories O(1) instead of O(N), but
    should not have any impact on random insert workloads because the
    average search length is the same regardless of which end of the
    array we start at.
    
    The result is a major improvement in large directory grow rates:
    
                    create time(sec) / rate (files/s)
     File count     vanilla             Prev commit         Patched
      10k         0.41 / 24.3k         0.42 / 23.8k       0.41 / 24.3k
      20k         0.74 / 27.0k         0.76 / 26.3k       0.75 / 26.7k
     100k         3.81 / 26.4k         3.47 / 28.8k       3.27 / 30.6k
     200k         8.58 / 23.3k         7.19 / 27.8k       6.71 / 29.8k
       1M        85.69 / 11.7k        48.53 / 20.6k      37.67 / 26.5k
       2M       280.31 /  7.1k       130.14 / 15.3k      79.55 / 25.2k
      10M      3913.26 /  2.5k                          552.89 / 18.1k
    
    Signed-off-by: Dave Chinner <dchinner [ at ] redhat [ dot ] com>
    Reviewed-by: Christoph Hellwig <hch [ at ] lst [ dot ] de>
    Reviewed-by: Darrick J. Wong <darrick [ dot ] wong [ at ] oracle [ dot ] com>
    Signed-off-by: Darrick J. Wong <darrick [ dot ] wong [ at ] oracle [ dot ] com>

commit 610125ab1e4b1b48dcffe74d9d82b0606bf1b923
Author: Dave Chinner <dchinner [ at ] redhat [ dot ] com>
Date:   Thu Aug 29 09:04:07 2019 -0700

    xfs: speed up directory bestfree block scanning

    When running a "create millions inodes in a directory" test
    recently, I noticed we were spending a huge amount of time
    converting freespace block headers from disk format to in-memory
    format:

    […]

commit f8f9ee479439c1be9e33c4404912a2a112c46200
Author: Dave Chinner <dchinner [ at ] redhat [ dot ] com>
Date:   Mon Aug 26 12:08:39 2019 -0700

    xfs: add kmem_alloc_io()
    
    Memory we use to submit for IO needs strict alignment to the
    underlying driver contraints. Worst case, this is 512 bytes. Given
    that all allocations for IO are always a power of 2 multiple of 512
    bytes, the kernel heap provides natural alignment for objects of
    these sizes and that suffices.
    
    Until, of course, memory debugging of some kind is turned on (e.g.
    red zones, poisoning, KASAN) and then the alignment of the heap
    objects is thrown out the window. Then we get weird IO errors and
    data corruption problems because drivers don't validate alignment
    and do the wrong thing when passed unaligned memory buffers in bios.
    
    TO fix this, introduce kmem_alloc_io(), which will guaranteeat least
    512 byte alignment of buffers for IO, even if memory debugging
    options are turned on. It is assumed that the minimum allocation
    size will be 512 bytes, and that sizes will be power of 2 mulitples
    of 512 bytes.
    
    Use this everywhere we allocate buffers for IO.
    
    This no longer fails with log recovery errors when KASAN is enabled
    due to the brd driver not handling unaligned memory buffers:
    
    # mkfs.xfs -f /dev/ram0 ; mount /dev/ram0 /mnt/test
    
    Signed-off-by: Dave Chinner <dchinner [ at ] redhat [ dot ] com>
    Reviewed-by: Christoph Hellwig <hch [ at ] lst [ dot ] de>
    Reviewed-by: Darrick J. Wong <darrick [ dot ] wong [ at ] oracle [ dot ] com>
    Signed-off-by: Darrick J. Wong <darrick [ dot ] wong [ at ] oracle [ dot ] com>

commit e2079e93f562c7f7a030eb7642017ee5eabaaa10
Author: Nathan Chancellor <natechancellor [ at ] gmail [ dot ] com>
Date:   Mon Aug 26 17:41:55 2019 -0700

    kbuild: Do not enable -Wimplicit-fallthrough for clang for now
    
    This functionally reverts commit bfd77145f35c ("Makefile: Convert
    -Wimplicit-fallthrough=3 to just -Wimplicit-fallthrough for clang").
    
    clang enabled support for -Wimplicit-fallthrough in C in r369414 [1],
    which causes a lot of warnings when building the kernel for two reasons:
    
    1. Clang does not support the /* fall through */ comments. There seems
       to be a general consensus in the LLVM community that this is not
       something they want to support. Joe Perches wrote a script to convert
       all of the comments to a "fallthrough" keyword that will be added to
       compiler_attributes.h [2] [3], which catches the vast majority of the
       comments. There doesn't appear to be any consensus in the kernel
       community when to do this conversion.
    
    2. Clang and GCC disagree about falling through to final case statements
       with no content or cases that simply break:
    
       https://godbolt.org/z/c8csDu
    
       This difference contributes at least 50 warnings in an allyesconfig
       build for x86, not considering other architectures. This difference
       will need to be discussed to see which compiler is right [4] [5].
    
    [1]: https://github.com/llvm/llvm-project/commit/1e0affb6e564b7361b0aadb38805f26deff4ecfc
    [2]: https://lore.kernel.org/lkml/61ddbb86d5e68a15e24ccb06d9b399bbf5ce2da7 [ dot ] camel [ at ] perches [ dot ] com/
    [3]: https://lore.kernel.org/lkml/1d2830aadbe9d8151728a7df5b88528fc72a0095 [ dot ] 1564549413 [ dot ] git [ dot ] joe [ at ] perches [ dot ] com/
    [4]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91432
    [5]: https://github.com/ClangBuiltLinux/linux/issues/636
    
    Given these two problems need discussion and coordination, do not enable
    -Wimplicit-fallthrough with clang right now. Add a comment to explain
    what is going on as well. This commit should be reverted once these two
    issues are fully flushed out and resolved.
    
    Suggested-by: Masahiro Yamada <yamada [ dot ] masahiro [ at ] socionext [ dot ] com>
    Signed-off-by: Nathan Chancellor <natechancellor [ at ] gmail [ dot ] com>
    Acked-by: Miguel Ojeda <miguel [ dot ] ojeda [ dot ] sandonis [ at ] gmail [ dot ] com>
    Acked-by: Nick Desaulniers <ndesaulniers [ at ] google [ dot ] com>
    Acked-by: Gustavo A. R. Silva <gustavo [ at ] embeddedor [ dot ] com>
    Signed-off-by: Masahiro Yamada <yamada [ dot ] masahiro [ at ] socionext [ dot ] com>

commit aec256d0ecd561036f188dbc8fa7924c47a9edfd
Author: Joao Moreno <mail [ at ] joaomoreno [ dot ] com>
Date:   Tue Sep 3 16:46:32 2019 +0200

    HID: apple: Fix stuck function keys when using FN

    This fixes an issue in which key down events for function keys would be
    repeatedly emitted even after the user has raised the physical key. For
    example, the driver fails to emit the F5 key up event when going through
    the following steps:
    - fnmode=1: hold FN, hold F5, release FN, release F5
    - fnmode=2: hold F5, hold FN, release F5, release FN

    The repeated F5 key down events can be easily verified using xev.

commit 1b5fb415442eb3ec946d48afe8c87b0f2fd42d7c
Merge: 5825a95fe925 21ab8580b383
Author: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
Date:   Mon Sep 23 11:39:56 2019 -0700

    Merge tag 'safesetid-bugfix-5.4' of git://github.com/micah-morton/linux

    Pull SafeSetID fix from Micah Morton:
     "Jann Horn sent some patches to fix some bugs in SafeSetID for 5.3.
      After he had done his testing there were a couple small code tweaks
      that went in and caused this bug.

      From what I can see SafeSetID is broken in 5.3 and crashes the kernel
      every time during initialization if you try to use it. I came across
      this bug when backporting Jann's changes for 5.3 to older kernels
      (4.14 and 4.19). I've tested on a Chrome OS device with those kernels
      and verified that this change fixes things.

      It doesn't seem super useful to have this bake in linux-next, since it
      is completely broken in 5.3 and nobody noticed"

    * tag 'safesetid-bugfix-5.4' of git://github.com/micah-morton/linux:
      LSM: SafeSetID: Stop releasing uninitialized ruleset

Ouch. Harsh.

Author: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
Date:   Mon Sep 23 11:21:04 2019 -0700

    Merge tag 'selinux-pr-20190917' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

    Pull selinux updates from Paul Moore:

     - Add LSM hooks, and SELinux access control hooks, for dnotify,
       fanotify, and inotify watches. This has been discussed with both the
       LSM and fs/notify folks and everybody is good with these new hooks.

     […]

     - Improve our network object labeling cache so that we always return
       the object's label, even when under memory pressure. Previously we
       would return an error if we couldn't allocate a new cache entry, now
       we always return the label even if we can't create a new cache entry
       for it.

     […]

commit 99cb0dbd47a15d395bf3faa78dc122bc5efe3fc0
Author: Song Liu <songliubraving [ at ] fb [ dot ] com>
Date:   Mon Sep 23 15:38:00 2019 -0700

    mm,thp: add read-only THP support for (non-shmem) FS

    This patch is (hopefully) the first step to enable THP for non-shmem
    filesystems.

    This patch enables an application to put part of its text sections to THP
    via madvise, for example:

        madvise((void *)0x600000, 0x200000, MADV_HUGEPAGE);

    We tried to reuse the logic for THP on tmpfs.

    Currently, write is not supported for non-shmem THP.  khugepaged will only
    process vma with VM_DENYWRITE.  sys_mmap() ignores VM_DENYWRITE requests
    (see ksys_mmap_pgoff).  The only way to create vma with VM_DENYWRITE is
    execve().  This requirement limits non-shmem THP to text sections.

    The next patch will handle writes, which would only happen when the all
    the vmas with VM_DENYWRITE are unmapped.

    An EXPERIMENTAL config, READ_ONLY_THP_FOR_FS, is added to gate this
    feature.

commit 1c6c15971e4709953f75082a5d44212536b1c2b7
Author: Hillf Danton <hdanton [ at ] sina [ dot ] com>
Date:   Mon Sep 23 15:37:26 2019 -0700

    mm, reclaim: make should_continue_reclaim perform dryrun detection

    Patch series "address hugetlb page allocation stalls", v2.

    Allocation of hugetlb pages via sysctl or procfs can stall for minutes or
    hours.  A simple example on a two node system with 8GB of memory is as
    follows:

    echo 4096 > /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages
    echo 4096 > /proc/sys/vm/nr_hugepages

    Obviously, both allocation attempts will fall short of their 8GB goal.
    However, one or both of these commands may stall and not be interruptible.
    The issues were initially discussed in mail thread [1] and RFC code at
    [2].

    This series addresses the issues causing the stalls.  There are two
    distinct fixes, a cleanup, and an optimization.  The reclaim patch by
    Hillf and compaction patch by Vlasitmil address corner cases in their
    respective areas.  hugetlb page allocation could stall due to either of
    these issues.  Vlasitmil added a cleanup patch after Hillf's
    modifications.  The hugetlb patch by Mike is an optimization suggested
    during the debug and development process.

    [1] http://lkml.kernel.org/r/d38a095e-dc39-7e82-bb76-2c9247929f07 [ at ] oracle [ dot ] com
    [2] http://lkml.kernel.org/r/20190724175014 [ dot ] 9935-1-mike [ dot ] kravetz [ at ] oracle [ dot ] com

    This patch (of 4):

    Address the issue of should_continue_reclaim returning true too often for
    __GFP_RETRY_MAYFAIL attempts when !nr_reclaimed and nr_scanned.  This was
    observed during hugetlb page allocation causing stalls for minutes or
    hours.

    We can stop reclaiming pages if compaction reports it can make a progress.
    There might be side-effects for other high-order allocations that would
    potentially benefit from reclaiming more before compaction so that they
    would be faster and less likely to stall.  However, the consequences of
    premature/over-reclaim are considered worse.

    We can also bail out of reclaiming pages if we know that there are not
    enough inactive lru pages left to satisfy the costly allocation.

    We can give up reclaiming pages too if we see dryrun occur, with the
    certainty of plenty of inactive pages.  IOW with dryrun detected, we are
    sure we have reclaimed as many pages as we could.

commit 70cb6d2677905121bfc7fdf5babfd8444218edd9
Author: Edward Chron <echron [ at ] arista [ dot ] com>
Date:   Mon Sep 23 15:37:11 2019 -0700

    mm/oom: add oom_score_adj and pgtables to Killed process message
    
    For an OOM event: print oom_score_adj value for the OOM Killed process to
    document what the oom score adjust value was at the time the process was
    OOM Killed.  The adjustment value can be set by user code and it affects
    the resulting oom_score so it is used to influence kill process selection.
    
    When eligible tasks are not printed (sysctl oom_dump_tasks = 0) printing
    this value is the only documentation of the value for the process being
    killed.  Having this value on the Killed process message is useful to
    document if a miscconfiguration occurred or to confirm that the
    oom_score_adj configuration applies as expected.
    
    An example which illustates both misconfiguration and validation that the
    oom_score_adj was applied as expected is:
    
    Aug 14 23:00:02 testserver kernel: Out of memory: Killed process 2692
     (systemd-udevd) total-vm:1056800kB, anon-rss:1052760kB, file-rss:4kB,
     shmem-rss:0kB pgtables:22kB oom_score_adj:1000
    
    The systemd-udevd is a critical system application that should have an
    oom_score_adj of -1000.  It was miconfigured to have a adjustment of 1000
    making it a highly favored OOM kill target process.  The output documents
    both the misconfiguration and the fact that the process was correctly
    targeted by OOM due to the miconfiguration.  This can be quite helpful for
    triage and problem determination.
    
    The addition of the pgtables_bytes shows page table usage by the process
    and is a useful measure of the memory size of the process.
    
    Link: http://lkml.kernel.org/r/20190822173157 [ dot ] 1569-1-echron [ at ] arista [ dot ] com
    Signed-off-by: Edward Chron <echron [ at ] arista [ dot ] com>
    Acked-by: Michal Hocko <mhocko [ at ] suse [ dot ] com>
    Acked-by: David Rientjes <rientjes [ at ] google [ dot ] com>
    Signed-off-by: Andrew Morton <akpm [ at ] linux-foundation [ dot ] org>
    Signed-off-by: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>

commit 1ba6fc9af35bf97c84567d9b3eeb26629d1e3af0
Author: Johannes Weiner <hannes [ at ] cmpxchg [ dot ] org>
Date:   Mon Sep 23 15:35:01 2019 -0700

    mm: vmscan: do not share cgroup iteration between reclaimers
    
    One of our services observed a high rate of cgroup OOM kills in the
    presence of large amounts of clean cache.  Debugging showed that the
    culprit is the shared cgroup iteration in page reclaim.
    
    Under high allocation concurrency, multiple threads enter reclaim at the
    same time.  Fearing overreclaim when we first switched from the single
    global LRU to cgrouped LRU lists, we introduced a shared iteration state
    for reclaim invocations - whether 1 or 20 reclaimers are active
    concurrently, we only walk the cgroup tree once: the 1st reclaimer
    reclaims the first cgroup, the second the second one etc.  With more
    reclaimers than cgroups, we start another walk from the top.
    
    This sounded reasonable at the time, but the problem is that reclaim
    concurrency doesn't scale with allocation concurrency.  As reclaim
    concurrency increases, the amount of memory individual reclaimers get to
    scan gets smaller and smaller.  Individual reclaimers may only see one
    cgroup per cycle, and that may not have much reclaimable memory.  We see
    individual reclaimers declare OOM when there is plenty of reclaimable
    memory available in cgroups they didn't visit.
    
    This patch does away with the shared iterator, and every reclaimer is
    allowed to scan the full cgroup tree and see all of reclaimable memory,
    just like it would on a non-cgrouped system.  This way, when OOM is
    declared, we know that the reclaimer actually had a chance.
    
    To still maintain fairness in reclaim pressure, disallow cgroup reclaim
    from bailing out of the tree walk early.  Kswapd and regular direct
    reclaim already don't bail, so it's not clear why limit reclaim would have
    to, especially since it only walks subtrees to begin with.
    
    This change completely eliminates the OOM kills on our service, while
    showing no signs of overreclaim - no increased scan rates, %sys time, or
    abrupt free memory spikes.  I tested across 100 machines that have 64G of
    RAM and host about 300 cgroups each.
    
    [ It's possible overreclaim never was a *practical* issue to begin
      with - it was simply a concern we had on the mailing lists at the
      time, with no real data to back it up. But we have also added more
      bail-out conditions deeper inside reclaim (e.g. the proportional
      exit in shrink_node_memcg) since. Regardless, now we have data that
      suggests full walks are more reliable and scale just fine. ]
    
    Link: http://lkml.kernel.org/r/20190812192316 [ dot ] 13615-1-hannes [ at ] cmpxchg [ dot ] org
    Signed-off-by: Johannes Weiner <hannes [ at ] cmpxchg [ dot ] org>
    Reviewed-by: Roman Gushchin <guro [ at ] fb [ dot ] com>
    Acked-by: Michal Hocko <mhocko [ at ] suse [ dot ] com>
    Cc: Vladimir Davydov <vdavydov [ dot ] dev [ at ] gmail [ dot ] com>
    Signed-off-by: Andrew Morton <akpm [ at ] linux-foundation [ dot ] org>
    Signed-off-by: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>

Some bfq commits that didn't claim to be a big deal and that I cannot evaluate the impact of.

Interesting…

commit 0183eb8bb59d45f26ec4fc73aaa416067fe6c0be
Author: Jean Delvare <jdelvare [ at ] suse [ dot ] de>
Date:   Fri Aug 2 14:55:26 2019 +0200

    i2c: piix4: Add ACPI support

    Enable the i2c-piix4 SMBus controller driver to enumerate I2C slave
    devices using ACPI. It builds on the related I2C mux device work
    in commit 8eb5c87a92c0 ("i2c: add ACPI support for I2C mux ports")

    In the i2c-piix4 driver the adapters are enumerated as:
     Main SMBus adapter Port 0, Port 2, ..., aux port (i.e., ASF adapter)

commit 97f9a3c4eee55b0178b518ae7114a6a53372913d (HEAD -> master, origin/master)
Merge: 1eb80d6ffb17 dc925a36060e
Author: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
Date:   Sun Sep 29 19:52:52 2019 -0700

    Merge tag 'char-misc-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

    Pull Documentation/process update from Greg KH:
     "Here are two small Documentation/process/embargoed-hardware-issues.rst
      file updates that missed my previous char/misc pull request.

      The first one adds an Intel representative for the process, and the
      second one cleans up the text a bit more when it comes to how the
      disclosure rules work, as it was a bit confusing to some companies"

    * tag 'char-misc-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
      Documentation/process: Clarify disclosure rules
      Documentation/process: Volunteer as the ambassador for Intel

commit dc925a36060e8cef050a9d05c64dae1c30dc5027
Author: Thomas Gleixner <tglx [ at ] linutronix [ dot ] de>
Date:   Wed Sep 25 10:29:49 2019 +0200

    Documentation/process: Clarify disclosure rules
    
    The role of the contact list provided by the disclosing party and how it
    affects the disclosure process and the ability to include experts into
    the development process is not really well explained.
    
    Neither is it entirely clear when the disclosing party will be informed
    about the fact that a developer who is not covered by an employer NDA needs
    to be brought in and disclosed.
    
    Explain the role of the contact list and the information policy along with
    an eventual conflict resolution better.
    
    Reported-by: Dave Hansen <dave [ dot ] hansen [ at ] linux [ dot ] intel [ dot ] com>
    Signed-off-by: Thomas Gleixner <tglx [ at ] linutronix [ dot ] de>
    Acked-by: Dave Hansen <dave [ dot ] hansen [ at ] linux [ dot ] intel [ dot ] com>
    Link: https://lore.kernel.org/r/alpine [ dot ] DEB [ dot ] 2 [ dot ] 21 [ dot ] 1909251028390 [ dot ] 10825 [ at ] nanos [ dot ] tec [ dot ] linutronix [ dot ] de
    Signed-off-by: Greg Kroah-Hartman <gregkh [ at ] linuxfoundation [ dot ] org>

diff --git a/Documentation/process/embargoed-hardware-issues.rst b/Documentation/process/embargoed-hardware-issues.rst
index e57b9f39c69f..a3c3349046c4 100644
--- a/Documentation/process/embargoed-hardware-issues.rst
+++ b/Documentation/process/embargoed-hardware-issues.rst
@@ -143,6 +143,20 @@ via their employer, they cannot enter individual non-disclosure agreements
 in their role as Linux kernel developers. They will, however, agree to
 adhere to this documented process and the Memorandum of Understanding.
 
+The disclosing party should provide a list of contacts for all other
+entities who have already been, or should be, informed about the issue.
+This serves several purposes:
+
+ - The list of disclosed entities allows communication accross the
+   industry, e.g. other OS vendors, HW vendors, etc.
+
+ - The disclosed entities can be contacted to name experts who should
+   participate in the mitigation development.
+
+ - If an expert which is required to handle an issue is employed by an
+   listed entity or member of an listed entity, then the response teams can
+   request the disclosure of that expert from that entity. This ensures
+   that the expert is also part of the entity's response team.
 
 Disclosure
 """"""""""
@@ -158,10 +172,7 @@ Mitigation development
 """"""""""""""""""""""
 
 The initial response team sets up an encrypted mailing-list or repurposes
-an existing one if appropriate. The disclosing party should provide a list
-of contacts for all other parties who have already been, or should be,
-informed about the issue. The response team contacts these parties so they
-can name experts who should be subscribed to the mailing-list.
+an existing one if appropriate.
 
 Using a mailing-list is close to the normal Linux development process and
 has been successfully used in developing mitigations for various hardware
@@ -175,9 +186,24 @@ development branch against the mainline kernel and backport branches for
 stable kernel versions as necessary.
 
 The initial response team will identify further experts from the Linux
-kernel developer community as needed and inform the disclosing party about
-their participation. Bringing in experts can happen at any time of the
-development process and often needs to be handled in a timely manner.
+kernel developer community as needed. Bringing in experts can happen at any
+time of the development process and needs to be handled in a timely manner.
+
+If an expert is employed by or member of an entity on the disclosure list
+provided by the disclosing party, then participation will be requested from
+the relevant entity.
+
+If not, then the disclosing party will be informed about the experts
+participation. The experts are covered by the Memorandum of Understanding
+and the disclosing party is requested to acknowledge the participation. In
+case that the disclosing party has a compelling reason to object, then this
+objection has to be raised within five work days and resolved with the
+incident team immediately. If the disclosing party does not react within
+five work days this is taken as silent acknowledgement.
+
+After acknowledgement or resolution of an objection the expert is disclosed
+by the incident team and brought into the development process.
+
 
 Coordinated release
 """""""""""""""""""

commit 3f2dc2798b81531fd93a3b9b7c39da47ec689e55
Merge: a3c0e7b1fe1f 02f03c4206c1
Author: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
Date:   Sun Sep 29 19:25:39 2019 -0700

    Merge branch 'entropy'
    
    Merge active entropy generation updates.
    
    This is admittedly partly "for discussion".  We need to have a way
    forward for the boot time deadlocks where user space ends up waiting for
    more entropy, but no entropy is forthcoming because the system is
    entirely idle just waiting for something to happen.
    
    While this was triggered by what is arguably a user space bug with
    GDM/gnome-session asking for secure randomness during early boot, when
    they didn't even need any such truly secure thing, the issue ends up
    being that our "getrandom()" interface is prone to that kind of
    confusion, because people don't think very hard about whether they want
    to block for sufficient amounts of entropy.
    
    The approach here-in is to decide to not just passively wait for entropy
    to happen, but to start actively collecting it if it is missing.  This
    is not necessarily always possible, but if the architecture has a CPU
    cycle counter, there is a fair amount of noise in the exact timings of
    reasonably complex loads.
    
    We may end up tweaking the load and the entropy estimates, but this
    should be at least a reasonable starting point.
    
    As part of this, we also revert the revert of the ext4 IO pattern
    improvement that ended up triggering the reported lack of external
    entropy.
    
    * getrandom() active entropy waiting:
      Revert "Revert "ext4: make __ext4_get_inode_loc plug""
      random: try to actively add entropy rather than passively wait for it

commit 02f03c4206c1b2a7451d3b3546f86c9c783eac13
Author: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
Date:   Sun Sep 29 17:59:23 2019 -0700

    Revert "Revert "ext4: make __ext4_get_inode_loc plug""
    
    This reverts commit 72dbcf72156641fde4d8ea401e977341bfd35a05.
    
    Instead of waiting forever for entropy that may just not happen, we now
    try to actively generate entropy when required, and are thus hopefully
    avoiding the problem that caused the nice ext4 IO pattern fix to be
    reverted.
    
    So revert the revert.
    
    Cc: Ahmed S. Darwish <darwish [ dot ] 07 [ at ] gmail [ dot ] com>
    Cc: Ted Ts'o <tytso [ at ] mit [ dot ] edu>
    Cc: Willy Tarreau <w [ at ] 1wt [ dot ] eu>
    Cc: Alexander E. Patrakov <patrakov [ at ] gmail [ dot ] com>
    Signed-off-by: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>

commit 50ee7529ec4500c88f8664560770a7a1b65db72b
Author: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
Date:   Sat Sep 28 16:53:52 2019 -0700

    random: try to actively add entropy rather than passively wait for it
    
    For 5.3 we had to revert a nice ext4 IO pattern improvement, because it
    caused a bootup regression due to lack of entropy at bootup together
    with arguably broken user space that was asking for secure random
    numbers when it really didn't need to.
    
    See commit 72dbcf721566 (Revert "ext4: make __ext4_get_inode_loc plug").
    
    This aims to solve the issue by actively generating entropy noise using
    the CPU cycle counter when waiting for the random number generator to
    initialize.  This only works when you have a high-frequency time stamp
    counter available, but that's the case on all modern x86 CPU's, and on
    most other modern CPU's too.
    
    What we do is to generate jitter entropy from the CPU cycle counter
    under a somewhat complex load: calling the scheduler while also
    guaranteeing a certain amount of timing noise by also triggering a
    timer.
    
    I'm sure we can tweak this, and that people will want to look at other
    alternatives, but there's been a number of papers written on jitter
    entropy, and this should really be fairly conservative by crediting one
    bit of entropy for every timer-induced jump in the cycle counter.  Not
    because the timer itself would be all that unpredictable, but because
    the interaction between the timer and the loop is going to be.
    
    Even if (and perhaps particularly if) the timer actually happens on
    another CPU, the cacheline interaction between the loop that reads the
    cycle counter and the timer itself firing is going to add perturbations
    to the cycle counter values that get mixed into the entropy pool.
    
    As Thomas pointed out, with a modern out-of-order CPU, even quite simple
    loops show a fair amount of hard-to-predict timing variability even in
    the absense of external interrupts.  But this tries to take that further
    by actually having a fairly complex interaction.
    
    This is not going to solve the entropy issue for architectures that have
    no CPU cycle counter, but it's not clear how (and if) that is solvable,
    and the hardware in question is largely starting to be irrelevant.  And
    by doing this we can at least avoid some of the even more contentious
    approaches (like making the entropy waiting time out in order to avoid
    the possibly unbounded waiting).
    
    Cc: Ahmed Darwish <darwish [ dot ] 07 [ at ] gmail [ dot ] com>
    Cc: Thomas Gleixner <tglx [ at ] linutronix [ dot ] de>
    Cc: Theodore Ts'o <tytso [ at ] mit [ dot ] edu>
    Cc: Nicholas Mc Guire <hofrat [ at ] opentech [ dot ] at>
    Cc: Andy Lutomirski <luto [ at ] kernel [ dot ] org>
    Cc: Kees Cook <keescook [ at ] chromium [ dot ] org>
    Cc: Willy Tarreau <w [ at ] 1wt [ dot ] eu>
    Cc: Alexander E. Patrakov <patrakov [ at ] gmail [ dot ] com>
    Cc: Lennart Poettering <mzxreary [ at ] 0pointer [ dot ] de>
    Signed-off-by: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>

If only I could order commits by their message length.

Saw RGB's patches go on. Do explain if a fundamental or important change.

Usual Spectre:

    commit 223cea6a4f0552b86fb25e3b8bbd00469816cd7a (HEAD -> master, origin/master)
    Merge: 2f0f6503e375 993773d11d45
    Author: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
    Date:   Mon Jul 8 12:23:00 2019 -0700

        Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

        Pull x86 pti updates from Thomas Gleixner:
         "The speculative paranoia departement delivers a few more plugs for
          possible (probably theoretical) spectre/mds leaks"

        * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
          x86/tls: Fix possible spectre-v1 in do_get_thread_area()
          x86/ptrace: Fix possible spectre-v1 in ptrace_get_debugreg()
          x86/speculation/mds: Eliminate leaks by trace_hardirqs_on()

Fun description:

    commit 2f0f6503e37551eb8d8d5e4d27c78d28a30fed5a
    Merge: 13324c42c140 e44252f4fe79
    Author: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
    Date:   Mon Jul 8 12:16:40 2019 -0700

        Merge branch 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

        Pull x86 timer updates from Thomas Gleixner:
         "A rather large series consolidating the HPET code, which was triggered
          by the attempt to bolt HPET NMI watchdog support on to the existing
          maze with the usual duct tape and super glue approach.

          This mainly removes two separate partially redundant storage layers
          and consolidates them into a single one which provides a consistent
          view of the different HPET channels and their usage and allows to
          integrate HPET NMI watchdog support (if it turns out to be feasible)
          in a non intrusive way"

          The maximum time a MWAIT can halt in userspace is controlled by the
          kernel and can be adjusted by the sysadmin.

Spinlocks in userspace, manually? Why? Thought this what futex was for:

    commit 13324c42c1401ad838208ee1e98f3821fce1cd86
    Merge: ab2486a9ee32 049331f277fe
    Author: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
    Date:   Mon Jul 8 11:59:59 2019 -0700

        Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

        Pull x86 CPU feature updates from Thomas Gleixner:
         "Updates for x86 CPU features:

           - Support for UMWAIT/UMONITOR, which allows to use MWAIT and MONITOR
             instructions in user space to save power e.g. in HPC workloads
             which spin wait on synchronization points.

New one in a while?

    - Support for the new x86 vendor Zhaoxin who develops processors
      based on the VIA Centaur technology.

Bluntness:

    - The addition and late revert of the FSGSBASE support. The revert
      was required as it turned out that the code still has hard to
      diagnose issues. Yet another engineering trainwreck...

Bit I disabled it entirely on mine…

    commit 0d37dde70655be73575d011be1bffaf0e3b16ea9
    Merge: 0902d5011cfa 7f0a5e075583
    Author: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
    Date:   Mon Jul 8 11:42:09 2019 -0700

        Merge branch 'x86-entry-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

        Pull x86 vsyscall updates from Thomas Gleixner:
         "Further hardening of the legacy vsyscall by providing support for
          execute only mode and switching the default to it.

          This prevents a certain class of attacks which rely on the vsyscall
          page being accessible at a fixed address in the canonical kernel
          address space"

Okay, but what does this mean for me?

    commit 0902d5011cfaabd6a09326299ef77e1c8735fb89
    Merge: 927ba67a63c7 f8a8fe61fec8
    Author: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
    Date:   Mon Jul 8 11:22:57 2019 -0700

        Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

        Pull x96 apic updates from Thomas Gleixner:
         "Updates for the x86 APIC interrupt handling and APIC timer:

           - Fix a long standing issue with spurious interrupts which was caused
             by the big vector management rework a few years ago. Robert Hodaszi
             provided finally enough debug data and an excellent initial failure
             analysis which allowed to understand the underlying issues.

Who cares? We're all stuck on x86 anyway!

    commit 927ba67a63c72ee87d655e30183d1576c3717d3e
    Merge: 2a1ccd31420a 9176ab1b8480
    Author: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
    Date:   Mon Jul 8 11:06:29 2019 -0700

        Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

        Pull timer updates from Thomas Gleixner:
         "The timer and timekeeping departement delivers:

          Core:

           - […]

             This gets rid of the unnecessary different copies of the same code
             and brings all architectures on the same level of VDSO
             functionality.

Hey Ben, this supposed to compete with AMD's Ryzen/Threadripper?

    commit 222a21d29521d144f3dd7a0bc4d4020e448f0126
    Merge: 8faef7125d02 eb876fbc248e
    Author: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
    Date:   Mon Jul 8 18:28:44 2019 -0700

        Merge branch 'x86-topology-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

        Pull x86 topology updates from Ingo Molnar:
         "Implement multi-die topology support on Intel CPUs and expose the die
          topology to user-space tooling, by Len Brown, Kan Liang and Zhang Rui.

          These changes should have no effect on the kernel's existing
          understanding of topologies, i.e. there should be no behavioral impact
          on cache, NUMA, scheduler, perf and other topologies and overall
          system performance"

Holy shit!

    commit e1928328699a582a540b105e5f4c160832a7fdcb
    Merge: 46f1ec23a469 9156e545765e
    Author: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
    Date:   Mon Jul 8 16:12:03 2019 -0700

        Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

        Pull locking updates from Ingo Molnar:
         "The main changes in this cycle are:

           - rwsem scalability improvements, phase #2, by Waiman Long, which are
             rather impressive:

               "On a 2-socket 40-core 80-thread Skylake system with 40 reader
                and writer locking threads, the min/mean/max locking operations
                done in a 5-second testing window before the patchset were:

                 40 readers, Iterations Min/Mean/Max = 1,807/1,808/1,810
                 40 writers, Iterations Min/Mean/Max = 1,807/50,344/151,255

                After the patchset, they became:

                 40 readers, Iterations Min/Mean/Max = 30,057/31,359/32,741
                 40 writers, Iterations Min/Mean/Max = 94,466/95,845/97,098"

             There's a lot of changes to the locking implementation that makes
             it similar to qrwlock, including owner handoff for more fair
             locking.


Oh, wait, microbenchmark!

             Another microbenchmark shows how across the spectrum the
             improvements are:

               "With a locking microbenchmark running on 5.1 based kernel, the
                total locking rates (in kops/s) on a 2-socket Skylake system
                with equal numbers of readers and writers (mixed) before and
                after this patchset were:

                # of Threads   Before Patch      After Patch
                ------------   ------------      -----------
                     2            2,618             4,193
                     4            1,202             3,726
                     8              802             3,622
                    16              729             3,359
                    32              319             2,826
                    64              102             2,744"

             The changes are extensive and the patch-set has been through
             several iterations addressing various locking workloads. There
             might be more regressions, but unless they are pathological I
             believe we want to use this new implementation as the baseline
             going forward.

But does this matter to you guys, as programmers?

       - atomic64_t cross-arch type cleanups by Mark Rutland: over the last
         ~10 years of atomic64_t existence the various types used by the
         APIs only had to be self-consistent within each architecture -
         which means they became wildly inconsistent across architectures.
         Mark puts and end to this by reworking all the atomic64
         implementations to use 's64' as the base type for atomic64_t, and
         to ensure that this type is consistently used for parameters and
         return values in the API, avoiding further problems in this area.

Does this IOMMU stuff matter?

    commit 6b04014f3f151ed62878327813859e76e8e23d78
    Merge: c6b6cebbc597 d95c3885865b
    Author: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
    Date:   Tue Jul 9 09:21:02 2019 -0700

        Merge tag 'iommu-updates-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

        Pull iommu updates from Joerg Roedel:

         - Make the dma-iommu code more generic so that it can be used outside
           of the ARM context with other IOMMU drivers. Goal is to make use of
           it on x86 too.

         - Generic IOMMU domain support for the Intel VT-d driver. This driver
           now makes more use of common IOMMU code to allocate default domains
           for the devices it handles.

         - An IOMMU fault reporting API to userspace. With that the IOMMU fault
           handling can be done in user-space, for example to forward the faults
           to a VM.

         - Better handling for reserved regions requested by the firmware. These
           can be 'relaxed' now, meaning that those don't prevent a device being
           attached to a VM.

Kernel people have higher standards:

    commit e9a83bd2322035ed9d7dcf35753d3f984d76c6a5 (HEAD -> master, origin/master)
    Merge: 7011b7e1b702 454f96f2b738
    Author: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
    Date:   Tue Jul 9 12:34:26 2019 -0700

        Merge tag 'docs-5.3' of git://git.lwn.net/linux

        Pull Documentation updates from Jonathan Corbet:
         "It's been a relatively busy cycle for docs:

           - […]

           - A new document on how to use merges and rebases in kernel repos,
             and one on Spectre vulnerabilities.

           - Various improvements to the build system, including automatic
             markup of function() references because some people, for reasons I
             will never understand, were of the opinion that
             :c:func:``function()`` is unattractive and not fun to type.

Ewwwwwwwwwwwwwwwww:

    commit b7d5c9239855f99762e8a547bea03a436e8a12e8
    Merge: 608745f12462 8ff80fbe7e98
    Author: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
    Date:   Tue Jul 9 11:35:38 2019 -0700

        Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

        Pull x86 boot updates from Thomas Gleixner:
         "Assorted updates to kexec/kdump:

           - Proper kexec support for 4/5-level paging and jumping from a
             5-level to a 4-level paging kernel.

You didn't do this the first time? Not impressed.

    commit 5450e8a316a64cddcbc15f90733ebc78aa736545
    Merge: 29cd581b5949 172bb24a4f48
    Author: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
    Date:   Wed Jul 10 22:17:21 2019 -0700

        Merge tag 'pidfd-updates-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
        
        Pull pidfd updates from Christian Brauner:
         "This adds two main features.
        
           - First, it adds polling support for pidfds. This allows process
             managers to know when a (non-parent) process dies in a race-free
             way.
        
             The notification mechanism used follows the same logic that is
             currently used when the parent of a task is notified of a child's
             death. With this patchset it is possible to put pidfds in an
             {e}poll loop and get reliable notifications for process (i.e.
             thread-group) exit.
        
           - The second feature compliments the first one by making it possible
             to retrieve pollable pidfds for processes that were not created
             using CLONE_PIDFD.
        
             A lot of processes get created with traditional PID-based calls
             such as fork() or clone() (without CLONE_PIDFD). For these
             processes a caller can currently not create a pollable pidfd. This
             is a problem for Android's low memory killer (LMK) and service
             managers such as systemd.
        
          Both patchsets are accompanied by selftests.
        
          It's perhaps worth noting that the work done so far and the work done
          in this branch for pidfd_open() and polling support do already see
          some adoption:
        
           - Android is in the process of backporting this work to all their LTS
             kernels [1]
        
           - Service managers make use of pidfd_send_signal but will need to
             wait until we enable waiting on pidfds for full adoption.
        
           - And projects I maintain make use of both pidfd_send_signal and
             CLONE_PIDFD [2] and will use polling support and pidfd_open() too"
        
        [1] https://android-review.googlesource.com/q/topic:%22pidfd+polling+support+4.9+backport%22
            https://android-review.googlesource.com/q/topic:%22pidfd+polling+support+4.14+backport%22
            https://android-review.googlesource.com/q/topic:%22pidfd+polling+support+4.19+backport%22
        
        [2] https://github.com/lxc/lxc/blob/aab6e3eb73c343231cdde775db938994fc6f2803/src/lxc/start.c#L1753
        
        * tag 'pidfd-updates-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
          tests: add pidfd_open() tests
          arch: wire-up pidfd_open()
          pid: add pidfd_open()
          pidfd: add polling selftests
          pidfd: add polling support


    commit b53b0b9d9a613c418057f6cb921c2f40a6f78c24
    Author: Joel Fernandes (Google) <joel [ at ] joelfernandes [ dot ] org>
    Date:   Tue Apr 30 12:21:53 2019 -0400

        pidfd: add polling support
        
        This patch adds polling support to pidfd.
        
        Android low memory killer (LMK) needs to know when a process dies once
        it is sent the kill signal. It does so by checking for the existence of
        /proc/pid which is both racy and slow. For example, if a PID is reused
        between when LMK sends a kill signal and checks for existence of the
        PID, since the wrong PID is now possibly checked for existence.
        Using the polling support, LMK will be able to get notified when a process
        exists in race-free and fast way, and allows the LMK to do other things
        (such as by polling on other fds) while awaiting the process being killed
        to die.
        
        For notification to polling processes, we follow the same existing
        mechanism in the kernel used when the parent of the task group is to be
        notified of a child's death (do_notify_parent). This is precisely when the
        tasks waiting on a poll of pidfd are also awakened in this patch.
        
        We have decided to include the waitqueue in struct pid for the following
        reasons:
        1. The wait queue has to survive for the lifetime of the poll. Including
           it in task_struct would not be option in this case because the task can
           be reaped and destroyed before the poll returns.
        
        2. By including the struct pid for the waitqueue means that during
           de_thread(), the new thread group leader automatically gets the new
           waitqueue/pid even though its task_struct is different.
        
        Appropriate test cases are added in the second patch to provide coverage of
        all the cases the patch is handling.
        
        Cc: Andy Lutomirski <luto [ at ] amacapital [ dot ] net>
        Cc: Steven Rostedt <rostedt [ at ] goodmis [ dot ] org>
        Cc: Daniel Colascione <dancol [ at ] google [ dot ] com>
        Cc: Jann Horn <jannh [ at ] google [ dot ] com>
        Cc: Tim Murray <timmurray [ at ] google [ dot ] com>
        Cc: Jonathan Kowalski <bl0pbl33p [ at ] gmail [ dot ] com>
        Cc: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
        Cc: Al Viro <viro [ at ] zeniv [ dot ] linux [ dot ] org [ dot ] uk>
        Cc: Kees Cook <keescook [ at ] chromium [ dot ] org>
        Cc: David Howells <dhowells [ at ] redhat [ dot ] com>
        Cc: Oleg Nesterov <oleg [ at ] redhat [ dot ] com>
        Cc: kernel-team [ at ] android [ dot ] com
        Reviewed-by: Oleg Nesterov <oleg [ at ] redhat [ dot ] com>
        Co-developed-by: Daniel Colascione <dancol [ at ] google [ dot ] com>
        Signed-off-by: Daniel Colascione <dancol [ at ] google [ dot ] com>
        Signed-off-by: Joel Fernandes (Google) <joel [ at ] joelfernandes [ dot ] org>
        Signed-off-by: Christian Brauner <christian [ at ] brauner [ dot ] io>

This feature comes too late for Ben:

    commit d2b6b4c832f7e3067709e8d4970b7b82b44419ac
    Merge: 0248a8be6d21 b78fa45d4edb
    Author: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
    Date:   Wed Jul 10 21:22:43 2019 -0700

        Merge tag 'nfsd-5.3' of git://linux-nfs.org/~bfields/linux

        Pull nfsd updates from Bruce Fields:
         "Highlights:

           - Add a new /proc/fs/nfsd/clients/ directory which exposes some
             long-requested information about NFSv4 clients (like open files)
             and allows forced revocation of client state.

It's not OCFS2 like Ben asked, but GFS is being updated. AFS is still being updated AF.

Still funny, this casefold feature:

    commit 3ae72562ad917df36a1b1247d749240e3b4865db
    Author: Gabriel Krisman Bertazi <krisman [ at ] collabora [ dot ] com>
    Date:   Wed Jun 19 23:45:09 2019 -0400

        ext4: optimize case-insensitive lookups

        Temporarily cache a casefolded version of the file name under lookup in
        ext4_filename, to avoid repeatedly casefolding it.  I got up to 30%
        speedup on lookups of large directories (>100k entries), depending on
        the length of the string under lookup.

        Signed-off-by: Gabriel Krisman Bertazi <krisman [ at ] collabora [ dot ] com>
        Signed-off-by: Theodore Ts'o <tytso [ at ] mit [ dot ] edu>

Scary. Is my data okay!?!?!??!?

    commit 40f06c799539739a08a56be8a096f56aeed05731
    Merge: a47f5c56b2eb fe0da9c09b2d
    Author: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
    Date:   Wed Jul 10 20:32:37 2019 -0700

        Merge tag 'copy-file-range-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

        Pull copy_file_range updates from Darrick Wong:
         "This fixes numerous parameter checking problems and inconsistent
          behaviors in the new(ish) copy_file_range system call.

          Now the system call will actually check its range parameters
          correctly; refuse to copy into files for which the caller does not
          have sufficient privileges; update mtime and strip setuid like file
          writes are supposed to do; and allows copying up to the EOF of the
          source file instead of failing the call like we used to.

LOL, still fixing ext2:

    commit 682f7c5c465d7ac4107e51dbf2a847a026b384e8
    Merge: e6983afd9254 fa33cdbf3ece
    Author: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
    Date:   Wed Jul 10 20:27:07 2019 -0700

        Merge tag 'for_v5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

        Pull ext2, udf and quota updates from Jan Kara:

         - some ext2 fixes and cleanups

clone3:

    commit 8f6ccf6159aed1f04c6d179f61f6fb2691261e84
    Merge: 5450e8a316a6 d68dbb0c9ac8
    Author: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
    Date:   Thu Jul 11 10:09:44 2019 -0700

        Merge tag 'clone3-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
        
        Pull clone3 system call from Christian Brauner:
         "This adds the clone3 syscall which is an extensible successor to clone
          after we snagged the last flag with CLONE_PIDFD during the 5.2 merge
          window for clone(). It cleanly supports all of the flags from clone()
          and thus all legacy workloads.
        
          There are few user visible differences between clone3 and clone.
          First, CLONE_DETACHED will cause EINVAL with clone3 so we can reuse
          this flag. Second, the CSIGNAL flag is deprecated and will cause
          EINVAL to be reported. It is superseeded by a dedicated "exit_signal"
          argument in struct clone_args thus freeing up even more flags. And
          third, clone3 gives CLONE_PIDFD a dedicated return argument in struct
          clone_args instead of abusing CLONE_PARENT_SETTID's parent_tidptr
          argument.
        
          The clone3 uapi is designed to be easy to handle on 32- and 64 bit:
        
            /* uapi */
            struct clone_args {
                    __aligned_u64 flags;
                    __aligned_u64 pidfd;
                    __aligned_u64 child_tid;
                    __aligned_u64 parent_tid;
                    __aligned_u64 exit_signal;
                    __aligned_u64 stack;
                    __aligned_u64 stack_size;
                    __aligned_u64 tls;
            };
        
          and a separate kernel struct is used that uses proper kernel typing:
        
            /* kernel internal */
            struct kernel_clone_args {
                    u64 flags;
                    int __user *pidfd;
                    int __user *child_tid;
                    int __user *parent_tid;
                    int exit_signal;
                    unsigned long stack;
                    unsigned long stack_size;
                    unsigned long tls;
            };
        
          The system call comes with a size argument which enables the kernel to
          detect what version of clone_args userspace is passing in. clone3
          validates that any additional bytes a given kernel does not know about
          are set to zero and that the size never exceeds a page.
        
          A nice feature is that this patchset allowed us to cleanup and
          simplify various core kernel codepaths in kernel/fork.c by making the
          internal _do_fork() function take struct kernel_clone_args even for
          legacy clone().
        
          This patch also unblocks the time namespace patchset which wants to
          introduce a new CLONE_TIMENS flag.
        
          Note, that clone3 has only been wired up for x86{_32,64}, arm{64}, and
          xtensa. These were the architectures that did not require special
          massaging.
        
          Other architectures treat fork-like system calls individually and
          after some back and forth neither Arnd nor I felt confident that we
          dared to add clone3 unconditionally to all architectures. We agreed to
          leave this up to individual architecture maintainers. This is why
          there's an additional patch that introduces __ARCH_WANT_SYS_CLONE3
          which any architecture can set once it has implemented support for
          clone3. The patch also adds a cond_syscall(clone3) for architectures
          such as nios2 or h8300 that generate their syscall table by simply
          including asm-generic/unistd.h. The hope is to get rid of
          __ARCH_WANT_SYS_CLONE3 and cond_syscall() rather soon"
        
        * tag 'clone3-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
          arch: handle arches who do not yet define clone3
          arch: wire-up clone3() syscall
          fork: add clone3

    commit 7f192e3cd316ba58c88dfa26796cf77789dd9872
    Author: Christian Brauner <christian [ at ] brauner [ dot ] io>
    Date:   Sat May 25 11:36:41 2019 +0200

        fork: add clone3
        
        This adds the clone3 system call.
        
        As mentioned several times already (cf. [7], [8]) here's the promised
        patchset for clone3().
        
        We recently merged the CLONE_PIDFD patchset (cf. [1]). It took the last
        free flag from clone().
        
        Independent of the CLONE_PIDFD patchset a time namespace has been discussed
        at Linux Plumber Conference last year and has been sent out and reviewed
        (cf. [5]). It is expected that it will go upstream in the not too distant
        future. However, it relies on the addition of the CLONE_NEWTIME flag to
        clone(). The only other good candidate - CLONE_DETACHED - is currently not
        recyclable as we have identified at least two large or widely used
        codebases that currently pass this flag (cf. [2], [3], and [4]). Given that
        CLONE_PIDFD grabbed the last clone() flag the time namespace is effectively
        blocked. clone3() has the advantage that it will unblock this patchset
        again. In general, clone3() is extensible and allows for the implementation
        of new features.
        
        The idea is to keep clone3() very simple and close to the original clone(),
        specifically, to keep on supporting old clone()-based workloads.
        We know there have been various creative proposals how a new process
        creation syscall or even api is supposed to look like. Some people even
        going so far as to argue that the traditional fork()+exec() split should be
        abandoned in favor of an in-kernel version of spawn(). Independent of
        whether or not we personally think spawn() is a good idea this patchset has
        and does not want to have anything to do with this.
        One stance we take is that there's no real good alternative to
        clone()+exec() and we need and want to support this model going forward;
        independent of spawn().
        The following requirements guided clone3():
        - bump the number of available flags
        - move arguments that are currently passed as separate arguments
          in clone() into a dedicated struct clone_args
          - choose a struct layout that is easy to handle on 32 and on 64 bit
          - choose a struct layout that is extensible
          - give new flags that currently need to abuse another flag's dedicated
            return argument in clone() their own dedicated return argument
            (e.g. CLONE_PIDFD)
          - use a separate kernel internal struct kernel_clone_args that is
            properly typed according to current kernel conventions in fork.c and is
            different from  the uapi struct clone_args
        - port _do_fork() to use kernel_clone_args so that all process creation
          syscalls such as fork(), vfork(), clone(), and clone3() behave identical
          (Arnd suggested, that we can probably also port do_fork() itself in a
           separate patchset.)
        - ease of transition for userspace from clone() to clone3()
          This very much means that we do *not* remove functionality that userspace
          currently relies on as the latter is a good way of creating a syscall
          that won't be adopted.
        - do not try to be clever or complex: keep clone3() as dumb as possible
        
        In accordance with Linus suggestions (cf. [11]), clone3() has the following
        signature:
        
        /* uapi */
        struct clone_args {
                __aligned_u64 flags;
                __aligned_u64 pidfd;
                __aligned_u64 child_tid;
                __aligned_u64 parent_tid;
                __aligned_u64 exit_signal;
                __aligned_u64 stack;
                __aligned_u64 stack_size;
                __aligned_u64 tls;
        };
        
        /* kernel internal */
        struct kernel_clone_args {
                u64 flags;
                int __user *pidfd;
                int __user *child_tid;
                int __user *parent_tid;
                int exit_signal;
                unsigned long stack;
                unsigned long stack_size;
                unsigned long tls;
        };
        
        long sys_clone3(struct clone_args __user *uargs, size_t size)
        
        clone3() cleanly supports all of the supported flags from clone() and thus
        all legacy workloads.
        The advantage of sticking close to the old clone() is the low cost for
        userspace to switch to this new api. Quite a lot of userspace apis (e.g.
        pthreads) are based on the clone() syscall. With the new clone3() syscall
        supporting all of the old workloads and opening up the ability to add new
        features should make switching to it for userspace more appealing. In
        essence, glibc can just write a simple wrapper to switch from clone() to
        clone3().
        
        There has been some interest in this patchset already. We have received a
        patch from the CRIU corner for clone3() that would set the PID/TID of a
        restored process without /proc/sys/kernel/ns_last_pid to eliminate a race.
        
        /* User visible differences to legacy clone() */
        - CLONE_DETACHED will cause EINVAL with clone3()
        - CSIGNAL is deprecated
          It is superseeded by a dedicated "exit_signal" argument in struct
          clone_args freeing up space for additional flags.
          This is based on a suggestion from Andrei and Linus (cf. [9] and [10])
        
        /* References */
        [1]: b3e5838252665ee4cfa76b82bdf1198dca81e5be
        [2]: https://dxr.mozilla.org/mozilla-central/source/security/sandbox/linux/SandboxFilter.cpp#343
        [3]: https://git.musl-libc.org/cgit/musl/tree/src/thread/pthread_create.c#n233
        [4]: https://sources.debian.org/src/blcr/0.8.5-2.3/cr_module/cr_dump_self.c/?hl=740#L740
        [5]: https://lore.kernel.org/lkml/20190425161416 [ dot ] 26600-1-dima [ at ] arista [ dot ] com/
        [6]: https://lore.kernel.org/lkml/20190425161416 [ dot ] 26600-2-dima [ at ] arista [ dot ] com/
        [7]: https://lore.kernel.org/lkml/CAHrFyr5HxpGXA2YrKza-oB-GGwJCqwPfyhD-Y5wbktWZdt0sGQ [ at ] mail [ dot ] gmail [ dot ] com/
        [8]: https://lore.kernel.org/lkml/20190524102756 [ dot ] qjsjxukuq2f4t6bo [ at ] brauner [ dot ] io/
        [9]: https://lore.kernel.org/lkml/20190529222414 [ dot ] GA6492 [ at ] gmail [ dot ] com/
        [10]: https://lore.kernel.org/lkml/CAHk-=whQP-Ykxi=zSYaV9iXsHsENa+2fdj-zYKwyeyed63Lsfw [ at ] mail [ dot ] gmail [ dot ] com/
        [11]: https://lore.kernel.org/lkml/CAHk-=wieuV4hGwznPsX-8E0G2FKhx3NjZ9X3dTKh5zKd+iqOBw [ at ] mail [ dot ] gmail [ dot ] com/
        
        Suggested-by: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
        Signed-off-by: Christian Brauner <christian [ at ] brauner [ dot ] io>
        Acked-by: Arnd Bergmann <arnd [ at ] arndb [ dot ] de>
        Acked-by: Serge Hallyn <serge [ at ] hallyn [ dot ] com>
        Cc: Kees Cook <keescook [ at ] chromium [ dot ] org>
        Cc: Pavel Emelyanov <xemul [ at ] virtuozzo [ dot ] com>
        Cc: Jann Horn <jannh [ at ] google [ dot ] com>
        Cc: David Howells <dhowells [ at ] redhat [ dot ] com>
        Cc: Andrew Morton <akpm [ at ] linux-foundation [ dot ] org>
        Cc: Oleg Nesterov <oleg [ at ] redhat [ dot ] com>
        Cc: Adrian Reber <adrian [ at ] lisas [ dot ] de>
        Cc: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
        Cc: Andrei Vagin <avagin [ at ] gmail [ dot ] com>
        Cc: Al Viro <viro [ at ] zeniv [ dot ] linux [ dot ] org [ dot ] uk>
        Cc: Florian Weimer <fweimer [ at ] redhat [ dot ] com>
        Cc: linux-api [ at ] vger [ dot ] kernel [ dot ] org

commit 70e6e1b971e46f5c1c2d72217ba62401a2edc22b
Merge: 07ab9d5bc53d a50a3f4b6a31
Author: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
Date:   Sat Jul 20 10:33:44 2019 -0700

    Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

    Pull CONFIG_PREEMPT_RT stub config from Thomas Gleixner:
     "The real-time preemption patch set exists for almost 15 years now and
      while the vast majority of infrastructure and enhancements have found
      their way into the mainline kernel, the final integration of RT is
      still missing.

      Over the course of the last few years, we have worked on reducing the
      intrusivenness of the RT patches by refactoring kernel infrastructure
      to be more real-time friendly. Almost all of these changes were
      benefitial to the mainline kernel on their own, so there was no
      objection to integrate them.

      Though except for the still ongoing printk refactoring, the remaining
      changes which are required to make RT a first class mainline citizen
      are not longer arguable as immediately beneficial for the mainline
      kernel. Most of them are either reordering code flows or adding RT
      specific functionality.

      But this now has hit a wall and turned into a classic hen and egg
      problem:

         Maintainers are rightfully wary vs. these changes as they make only
         sense if the final integration of RT into the mainline kernel takes
         place.

      Adding CONFIG_PREEMPT_RT aims to solve this as a clear sign that RT
      will be fully integrated into the mainline kernel. The final
      integration of the missing bits and pieces will be of course done with
      the same careful approach as we have used in the past.

      While I'm aware that you are not entirely enthusiastic about that, I
      think that RT should receive the same treatment as any other widely
      used out of tree functionality, which we have accepted into mainline
      over the years.

      RT has become the de-facto standard real-time enhancement and is
      shipped by enterprise, embedded and community distros. It's in use
      throughout a wide range of industries: telecommunications, industrial
      automation, professional audio, medical devices, data acquisition,
      automotive - just to name a few major use cases.

      RT development is backed by a Linuxfoundation project which is
      supported by major stakeholders of this technology. The funding will
      continue over the actual inclusion into mainline to make sure that the
      functionality is neither introducing regressions, regressing itself,
      nor becomes subject to bitrot. There is also a lifely user community
      around RT as well, so contrary to the grim situation 5 years ago, it's
      a healthy project.

      As RT is still a good vehicle to exercise rarely used code paths and
      to detect hard to trigger issues, you could at least view it as a QA
      tool if nothing else"

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      sched/rt, Kconfig: Introduce CONFIG_PREEMPT_RT

Purism Librem5 devkit now has a DT in kernel

    10) Use promisc for unsupported number of filters, from Justin Chen.


commit 933a90bf4f3505f8ec83bda21a3c7d70d7c2b426
Merge: 5f4fc6d440d7 037f11b4752f
Author: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
Date:   Fri Jul 19 10:42:02 2019 -0700

    Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

    Pull vfs mount updates from Al Viro:
     "The first part of mount updates.

      Convert filesystems to use the new mount API"

Features:

 - Allow NFS client to set up multiple TCP connections to the server
   using a new 'nconnect=X' mount option. Queue length is used to
   balance load.

 - Enhance statistics reporting to report on all transports when using
   multiple connections.

 - Speed up SUNRPC by removing bh-safe spinlocks

 - Add a mechanism to allow NFSv4 to request that containers set a
   unique per-host identifier for when the hostname is not set.

 - Ensure NFSv4 updates the lease_time after a clientid update

SMB 3.1.1 GCM instead of CCM

commit a29a0a467e2c02fe4287c2d4eff86c9eb6beff0c
Merge: bed38c3e2dca d7852fbd0f04
Author: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
Date:   Thu Jul 25 08:36:29 2019 -0700

    Merge branch 'access-creds'

    The access() (and faccessat()) credentials change can cause an
    unnecessary load on the RCU machinery because every access() call ends
    up freeing the temporary access credential using RCU.

    This isn't really noticeable on small machines, but if you have hundreds
    of cores you can cause huge slowdowns due to RCU storms.

    It's easy to avoid: the temporary access crededntials aren't actually
    normally accessed using RCU at all, so we can avoid the whole issue by
    just marking them as such.

    * access-creds:
      access: avoid the RCU grace period for the temporary subjective credentials

commit d7852fbd0f0423937fa287a598bfde188bb68c22
Author: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
Date:   Thu Jul 11 09:54:40 2019 -0700

    access: avoid the RCU grace period for the temporary subjective credentials

    It turns out that 'access()' (and 'faccessat()') can cause a lot of RCU
    work because it installs a temporary credential that gets allocated and
    freed for each system call.

    The allocation and freeing overhead is mostly benign, but because
    credentials can be accessed under the RCU read lock, the freeing
    involves a RCU grace period.

    Which is not a huge deal normally, but if you have a lot of access()
    calls, this causes a fair amount of seconday damage: instead of having a
    nice alloc/free patterns that hits in hot per-CPU slab caches, you have
    all those delayed free's, and on big machines with hundreds of cores,
    the RCU overhead can end up being enormous.

    But it turns out that all of this is entirely unnecessary.  Exactly
    because access() only installs the credential as the thread-local
    subjective credential, the temporary cred pointer doesn't actually need
    to be RCU free'd at all.  Once we're done using it, we can just free it
    synchronously and avoid all the RCU overhead.

    So add a 'non_rcu' flag to 'struct cred', which can be set by users that
    know they only use it in non-RCU context (there are other potential
    users for this).  We can make it a union with the rcu freeing list head
    that we need for the RCU case, so this doesn't need any extra storage.

    Note that this also makes 'get_current_cred()' clear the new non_rcu
    flag, in case we have filesystems that take a long-term reference to the
    cred and then expect the RCU delayed freeing afterwards.  It's not
    entirely clear that this is required, but it makes for clear semantics:
    the subjective cred remains non-RCU as long as you only access it
    synchronously using the thread-local accessors, but you _can_ use it as
    a generic cred if you want to.

    It is possible that we should just remove the whole RCU markings for
    ->cred entirely.  Only ->real_cred is really supposed to be accessed
    through RCU, and the long-term cred copies that nfs uses might want to
    explicitly re-enable RCU freeing if required, rather than have
    get_current_cred() do it implicitly.

    But this is a "minimal semantic changes" change for the immediate
    problem.

    Acked-by: Peter Zijlstra (Intel) <peterz [ at ] infradead [ dot ] org>
    Acked-by: Eric Dumazet <edumazet [ at ] google [ dot ] com>
    Acked-by: Paul E. McKenney <paulmck [ at ] linux [ dot ] ibm [ dot ] com>
    Cc: Oleg Nesterov <oleg [ at ] redhat [ dot ] com>
    Cc: Jan Glauber <jglauber [ at ] marvell [ dot ] com>
    Cc: Jiri Kosina <jikos [ at ] kernel [ dot ] org>
    Cc: Jayachandran Chandrasekharan Nair <jnair [ at ] marvell [ dot ] com>
    Cc: Greg KH <greg [ at ] kroah [ dot ] com>
    Cc: Kees Cook <keescook [ at ] chromium [ dot ] org>
    Cc: David Howells <dhowells [ at ] redhat [ dot ] com>
    Cc: Miklos Szeredi <miklos [ at ] szeredi [ dot ] hu>
    Cc: Al Viro <viro [ at ] zeniv [ dot ] linux [ dot ] org [ dot ] uk>
    Signed-off-by: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>

On the other hand, I love PG commit messages:

    On the other hand, it emerges that FreeBSD and possibly other packagers
    are so wedded to backwards compatibility that they hack the IANA data
    to keep the old spelling --- and not just that old spelling, but even
    older spellings that IANA used back in the stone age.  This caused the
    filter logic to fail to suppress "Factory" at all on such platforms,
    though the formatting problem is definitely real in that case.

commit 0432a0a066b05361b6d4d26522233c3c76c9e5da
Merge: af42e7450f4b 33a58980ff3c
Author: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
Date:   Sat Aug 3 10:51:29 2019 -0700

    Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

    Pull vdso timer fixes from Thomas Gleixner:
     "A series of commits to deal with the regression caused by the generic
      VDSO implementation.

      The usage of clock_gettime64() for 32bit compat fallback syscalls
      caused seccomp filters to kill innocent processes because they only
      allow clock_gettime().

      Handle the compat syscalls with clock_gettime() as before, which is
      not a functional problem for the VDSO as the legacy compat application
      interface is not y2038 safe anyway. It's just extra fallback code
      which needs to be implemented on every architecture.

      It's opt in for now so that it does not break the compile of already
      converted architectures in linux-next. Once these are fixed, the
      #ifdeffery goes away.

      So much for trying to be smart and reuse code..."

I thought we had stack leak plugins ages ago?

    commit 2e616d9f9ce8d469db4cd0a019cdc2ff3feab577
    Author: Darrick J. Wong <darrick [ dot ] wong [ at ] oracle [ dot ] com>
    Date:   Sun Jul 28 21:12:32 2019 -0700

        xfs: fix stack contents leakage in the v1 inumber ioctls

        Explicitly initialize the onstack structures to zero so we don't leak
        kernel memory into userspace when converting the in-core inumbers
        structure to the v1 inogrp ioctl structure.  Add a comment about why we
        have to use memset to ensure that the padding holes in the structures
        are set to zero.

Woof. Shots fired.

    commit 0e31225f99e077d0b8c7f8577aab39e766e2477b
    Merge: 4f1a6ef1df6f 9c8c9c7cdb4c
    Author: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
    Date:   Fri Aug 2 18:53:51 2019 -0700

        Merge tag 'drm-fixes-2019-08-02-1' of git://anongit.freedesktop.org/drm/drm

        Pull more drm fixes from Daniel Vetter:
         "Dave sends his pull, everyone realizes they've been asleep at the
          wheel and hits send on their own pulls :-/

          Normally I'd just ignore these all because w/e for me and Dave. But
          this time around the latecomers also included drm-intel-fixes, which
          failed to send out a -fixes pull thus far for this release (screwed up
          vacation coverage, despite that 2/3 maintainers were around ... they
          all look appropriately guilty), and that really is overdue to get
          landed.

When Linux comments on your commits:

    commit 6e6d05360b80f196ed07061327f03346b204abea
    Merge: 10e5ddd71fb3 e82f04ec6ba9
    Author: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
    Date:   Fri Aug 2 14:46:33 2019 -0700

        Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

        Pull SCSI fixes from James Bottomley:
         "Seven fixes to four drivers with no core changes.

          The mpt3sas one is theoretical until we get a CPU that goes up to 64
          bits physical, the qla2xxx one fixes an oops in a driver
          initialization error leg and the others are mostly cosmetic"

        [ The fcoe patches may be worth highlighting - they may be "just"
          cleanups, but they simplify and fix the odd fc_rport_priv structure
          handling rules so that the new gcc-9 warnings about memset crossing
          structure boundaries are gone.

          The old code was hard for humans to understand too, and really
          confused the compiler sanity checks  - Linus ]

Good old GCC.

People still doing manual array lists, like Vim.

    commit 7086751c5e4eb3cfee0b98df0d3cedc8bff47d35
    Author: Jan Edmund Lazo <jan [ dot ] lazo [ at ] mail [ dot ] utoronto [ dot ] ca>
    Date:   Mon Aug 5 17:42:41 2019 -0400

        vim-patch:8.1.1439: ga_grow(): 1.5x growth rate #10699

        Problem:    Json_encode() is very slow for large results.
        Solution:   In the growarray use a growth of at least 50%. (Ken Takata,
                    closes vim/vim#4461)
        https://github.com/vim/vim/commit/c47ed44be76a520ded90913099771999c8a79eeb

    diff --git a/src/nvim/garray.c b/src/nvim/garray.c
    index 74fd9d89c..1cfc2b617 100644
    --- a/src/nvim/garray.c
    +++ b/src/nvim/garray.c
    @@ -89,6 +89,14 @@ void ga_grow(garray_T *gap, int n)
       if (n < gap->ga_growsize) {
         n = gap->ga_growsize;
       }
    +
    +  // A linear growth is very inefficient when the array grows big.  This
    +  // is a compromise between allocating memory that won't be used and too
    +  // many copy operations. A factor of 1.5 seems reasonable.
    +  if (n < gap->ga_len / 2) {
    +    n = gap->ga_len / 2;
    +  }
    +
       int new_maxlen = gap->ga_len + n;

       size_t new_size = (size_t)gap->ga_itemsize * (size_t)new_maxlen;

commit 4368c4bc9d36821690d6bb2e743d5a075b6ddb55
Merge: 0eb0ce0a78e1 4c92057661a3
Author: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
Date:   Tue Aug 6 11:22:22 2019 -0700

    Merge branch 'x86/grand-schemozzle' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

    Pull pti updates from Thomas Gleixner:
     "The performance deterioration departement is not proud at all to
      present yet another set of speculation fences to mitigate the next
      chapter in the 'what could possibly go wrong' story.

      The new vulnerability belongs to the Spectre class and affects GS
      based data accesses and has therefore been dubbed 'Grand Schemozzle'
      for secret communication purposes. It's officially listed as
      CVE-2019-1125.

      Conditional branches in the entry paths which contain a SWAPGS
      instruction (interrupts and exceptions) can be mis-speculated which
      results in speculative accesses with a wrong GS base.

      This can happen on entry from user mode through a mis-speculated
      branch which takes the entry from kernel mode path and therefore does
      not execute the SWAPGS instruction. The following speculative accesses
      are done with user GS base.

      On entry from kernel mode the mis-speculated branch executes the
      SWAPGS instruction in the entry from user mode path which has the same
      effect that the following GS based accesses are done with user GS
      base.

      If there is a disclosure gadget available in these code paths the
      mis-speculated data access can be leaked through the usual side
      channels.

      The entry from user mode issue affects all CPUs which have speculative
      execution. The entry from kernel mode issue affects only Intel CPUs
      which can speculate through SWAPGS. On CPUs from other vendors SWAPGS
      has semantics which prevent that.

      SMAP migitates both problems but only when the CPU is not affected by
      the Meltdown vulnerability.

      The mitigation is to issue LFENCE instructions in the entry from
      kernel mode path for all affected CPUs and on the affected Intel CPUs
      also in the entry from user mode path unless PTI is enabled because
      the CR3 write is serializing.

      The fences are as usual enabled conditionally and can be completely
      disabled on the kernel command line. The Spectre V1 documentation is
      updated accordingly.

      A big "Thank You!" goes to Josh for doing the heavy lifting for this
      round of hardware misfeature 'repair'. Of course also "Thank You!" to
      everybody else who contributed in one way or the other"

    * 'x86/grand-schemozzle' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      Documentation: Add swapgs description to the Spectre v1 documentation
      x86/speculation/swapgs: Exclude ATOMs from speculation through SWAPGS
      x86/entry/64: Use JMP instead of JMPQ
      x86/speculation: Enable Spectre v1 swapgs mitigations
      x86/speculation: Prepare entry code for Spectre v1 swapgs mitigations

Been seeing these clang things become more popular over the years:

    commit 23df57afe8eebff6ece05a815934f2f70a851e0a
    Merge: bf1881cf484d ed4289e8b488
    Author: Linus Torvalds <torvalds [ at ] linux-foundation [ dot ] org>
    Date:   Sat Aug 10 10:17:19 2019 -0700

        Merge tag 'powerpc-5.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

        Pull powerpc fix from Michael Ellerman:
         "Just one fix, a revert of a commit that was meant to be a minor
          improvement to some inline asm, but ended up having no real benefit
          with GCC and broke booting 32-bit machines when using Clang.

          Thanks to: Arnd Bergmann, Christophe Leroy, Nathan Chancellor, Nick
          Desaulniers, Segher Boessenkool"

        * tag 'powerpc-5.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
          Revert "powerpc: slightly improve cache helpers"


    Secure RPC: Disable insecure Kerberos encryption types (SUNRPC_DISABLE_INSECURE_ENCTYPES) [N/y/?] (NEW) ?

CONFIG_SUNRPC_DISABLE_INSECURE_ENCTYPES:

Choose Y here to disable the use of deprecated encryption types
with the Kerberos version 5 GSS-API mechanism (RFC 1964). The
deprecated encryption types include DES-CBC-MD5, DES-CBC-CRC,
and DES-CBC-MD4. These types were deprecated by RFC 6649 because
they were found to be insecure.

N is the default because many sites have deployed KDCs and
keytabs that contain only these deprecated encryption types.
Choosing Y prevents the use of known-insecure encryption types
but might result in compatibility problems.

references

Cherry-picked commit messages of interest from the 5.4 merge window, Alex Pilon

message navigation

previous by date: Cherry-picked commit messages of interest from the 5.4 merge window
next by date: Reminder of L3GO meeting this evening.
previous by thread: Cherry-picked commit messages of interest from the 5.4 merge window
next by thread: Reminder of L3GO meeting this evening.