zfs 1.6.0rc1 and Photos

All your general support questions for OpenZFS on OS X.

Re: zfs 1.6.0rc1 and Photos

Postby lundman » Mon Jan 30, 2017 4:41 pm

You have a thread near the bottom with is heavy in dedup code, calling ddt_get_dedup_histogram() from spa_sync, this is slowing down your tx_sync, and zfs_rmnode calls above. It doesn't look dead, but very bogged down.

A similar ZOL issue is https://github.com/zfsonlinux/zfs/issues/5400, which suggested dataset might be close to quota/space limits
User avatar
lundman
 
Posts: 1335
Joined: Thu Mar 06, 2014 2:05 pm
Location: Tokyo, Japan

Re: zfs 1.6.0rc1 and Photos

Postby rottegift » Mon Jan 30, 2017 6:21 pm

lundman wrote:You have a thread near the bottom with is heavy in dedup code, calling ddt_get_dedup_histogram()


In particular in spindump3-switchingfaces-... see:

Thread 0x123b 1000 samples (1-1000) priority 81 (base 81) cpu time 9.965

which is spending a lot of time in deduplication code (the get_histogram as lundman mentions, and also ddt_sync(), so you are writing to a dedup!=off dataset or zvol).

It's also one of only two kernel_task threads with cpu time > 1 second, and one of only five threads on your whole system that racked up > 1 cpu second during your spindump run (and two of those are directly related to spindump).

The other busy kernel thread is doing txg_sync and waiting on the deduplication thread, so essentially your deduplication activity is stalling all your I/O on that pool while waiting for synchronous reads (which are all effectivaly random I/O, and so waiting for your disks to seek).

Unlike lundman, I think this is the fundamental source of your problem, and the differences 1.5 - 1.6 from upstream that do better scheduling of writes in the normal case is letting you do more write() calls to a dedup!=off dataset than before, so worsening the head-of-line blocking whenever a deduplication table entry needs to be read in (synchronously, during txg_sync). Also you were probably memory constrained on the amount of dirty data you could produce under 1.5's spl, but now you can use the whole kstat.zfs.darwin.tunable.zfs_dirty_data_max. If that's about 1.5G, and with 128k records, you're looking at ~ 12000 ddt consultations per txg, and when your ARC is cold, that means more than 12000 random I/Os issued for the DDT entries and the metadata pointing to them, all done while syncing the transaction group (txg) out to disk. This will head-of-line block async reads (probably causing Photos.app to pizza wheel), and leave with you only up to dirty_data_max worth of async writes before all writes to that pool block.

Are you saving photos to a dataset where dedup is anything but "off" ? Or is this another dataset on the same pool doing concurrent writing ?

Photos won't deduplicate well, and you have a spinning-rust pool, so this seems like a recipe for bad behaviour.

In fact, friends don't let friends do deduplication *at all*, and certainly not on anything that doesn't have >> 2.0x deduplication (zpool get dedup), and even then not on anything where random LBA-to-LBA seek times are more than a microsecond or so (disks are ~ 10 milliseconds).

Moreover, you have lots of free space according to your zpool iostat -v output, so you should consider turning deduplication off and doing a local zfs send/recv (with -ec flags) from each deduplicated dataset to a nondeduplicated replacement. This will shrink your DDT, which you will want, because there are situations wherein the entire DDT has to be traversed at import time, when your ARC is cold. All sync tasks on all pools will block during a long import that has to do that. (Additionally, scrubs traverse the DDT first, and the random I/Os will interfere with normal I/O on the pool for the first hours of the scrub).

ETA: you can have more dirty data in 1.6 by design. (defensively you could try setting the tunable very low, but this will kill performance to all pools / datasets; it's a global value, although it controls the maximum of dirty data *per* pool).
rottegift
 
Posts: 26
Joined: Fri Apr 25, 2014 12:00 am

Re: zfs 1.6.0rc1 and Photos

Postby roemer » Tue Jan 31, 2017 6:38 am

rottegift wrote:
lundman wrote:You have a thread near the bottom with is heavy in dedup code, calling ddt_get_dedup_histogram()


In particular in spindump3-switchingfaces-... see:
Thread 0x123b 1000 samples (1-1000) priority 81 (base 81) cpu time 9.965
which is spending a lot of time in deduplication code (the get_histogram as lundman mentions, and also ddt_sync(), so you are writing to a dedup!=off dataset or zvol).
...
Are you saving photos to a dataset where dedup is anything but "off" ? Or is this another dataset on the same pool doing concurrent writing ?


Thanks to both of you for your analysis.

@lundman: I think I still have 1.6 TB available in my pool and no quota set. Where should it run out of space?

@rottegift: I do not intend to have dedup on at all, because I am aware that it would be a resource hog. But perhaps I misconfigured my pool??
(I am not aware of concurrent writes on the same pool, except for perhaps the background processes of Photos plus Photos itself. How can I check?).

Code: Select all
bash-3.2$ zfs get all ztank/Pictures
NAME            PROPERTY               VALUE                   SOURCE
ztank/Pictures  type                   filesystem              -
ztank/Pictures  creation               Mon Apr 14 22:56 2014   -
ztank/Pictures  used                   407G                    -
ztank/Pictures  available              1.60T                   -
ztank/Pictures  referenced             359G                    -
ztank/Pictures  compressratio          1.00x                   -
ztank/Pictures  mounted                yes                     -
ztank/Pictures  quota                  none                    default
ztank/Pictures  reservation            none                    default
ztank/Pictures  recordsize             128K                    default
ztank/Pictures  mountpoint             /Users/Shared/Pictures  local
ztank/Pictures  sharenfs               off                     default
ztank/Pictures  checksum               on                      default
ztank/Pictures  compression            off                     default
ztank/Pictures  atime                  off                     inherited from ztank
ztank/Pictures  devices                on                      default
ztank/Pictures  exec                   on                      default
ztank/Pictures  setuid                 on                      default
ztank/Pictures  readonly               off                     default
ztank/Pictures  zoned                  off                     default
ztank/Pictures  snapdir                hidden                  default
ztank/Pictures  aclmode                passthrough             default
ztank/Pictures  aclinherit             restricted              default
ztank/Pictures  canmount               on                      default
ztank/Pictures  xattr                  on                      default
ztank/Pictures  copies                 1                       default
ztank/Pictures  version                5                       -
ztank/Pictures  utf8only               on                      -
ztank/Pictures  normalization          formD                   -
ztank/Pictures  casesensitivity        insensitive             -
ztank/Pictures  vscan                  off                     default
ztank/Pictures  nbmand                 off                     default
ztank/Pictures  sharesmb               off                     default
ztank/Pictures  refquota               none                    default
ztank/Pictures  refreservation         none                    default
ztank/Pictures  primarycache           all                     default
ztank/Pictures  secondarycache         all                     default
ztank/Pictures  usedbysnapshots        48.0G                   -
ztank/Pictures  usedbydataset          359G                    -
ztank/Pictures  usedbychildren         0                       -
ztank/Pictures  usedbyrefreservation   0                       -
ztank/Pictures  logbias                latency                 default
ztank/Pictures  dedup                  off                     default
ztank/Pictures  mlslabel               none                    default
ztank/Pictures  sync                   standard                default
ztank/Pictures  refcompressratio       1.00x                   -
ztank/Pictures  written                123M                    -
ztank/Pictures  logicalused            398G                    -
ztank/Pictures  logicalreferenced      354G                    -
ztank/Pictures  filesystem_limit       none                    default
ztank/Pictures  snapshot_limit         none                    default
ztank/Pictures  filesystem_count       none                    default
ztank/Pictures  snapshot_count         none                    default
ztank/Pictures  snapdev                hidden                  default
ztank/Pictures  com.apple.browse       on                      default
ztank/Pictures  com.apple.ignoreowner  off                     default
ztank/Pictures  com.apple.mimic_hfs    on                      inherited from ztank
ztank/Pictures  shareafp               off                     default
ztank/Pictures  redundant_metadata     all                     default
ztank/Pictures  overlay                off                     default


Zpool features:
Code: Select all
bash-3.2$ zpool get all
NAME   PROPERTY                       VALUE                          SOURCE
ztank  size                           10.9T                          -
ztank  capacity                       66%                            -
ztank  altroot                        -                              default
ztank  health                         ONLINE                         -
ztank  guid                           14676218948438013388           default
ztank  version                        -                              default
ztank  bootfs                         -                              default
ztank  delegation                     on                             default
ztank  autoreplace                    off                            default
ztank  cachefile                      -                              default
ztank  failmode                       continue                       local
ztank  listsnapshots                  off                            default
ztank  autoexpand                     off                            default
ztank  dedupditto                     0                              default
ztank  dedupratio                     1.00x                          -
ztank  free                           3.63T                          -
ztank  allocated                      7.24T                          -
ztank  readonly                       off                            -
ztank  ashift                         12                             local
ztank  comment                        -                              default
ztank  expandsize                     -                              -
ztank  freeing                        0                              default
ztank  fragmentation                  7%                             -
ztank  leaked                         0                              default
ztank  feature@async_destroy          enabled                        local
ztank  feature@empty_bpobj            active                         local
ztank  feature@lz4_compress           active                         local
ztank  feature@multi_vdev_crash_dump  disabled                       local
ztank  feature@spacemap_histogram     active                         local
ztank  feature@enabled_txg            active                         local
ztank  feature@hole_birth             active                         local
ztank  feature@extensible_dataset     enabled                        local
ztank  feature@embedded_data          active                         local
ztank  feature@bookmarks              enabled                        local
ztank  feature@filesystem_limits      disabled                       local
ztank  feature@large_blocks           disabled                       local
ztank  feature@sha512                 disabled                       local
ztank  feature@skein                  disabled                       local
ztank  feature@edonr                  disabled                       local


Seems Ok (besides that I didn't update the pool yet to the latest feature set).
What am I missing here?
roemer
 
Posts: 73
Joined: Sat Mar 15, 2014 2:32 pm

Re: zfs 1.6.0rc1 and Photos

Postby rottegift » Tue Jan 31, 2017 12:23 pm

Is ztank your only imported pool? If so, please run zdb -DDD ztank and paste results. If there is more than one pool run zdb -DDD for each pool.
rottegift
 
Posts: 26
Joined: Fri Apr 25, 2014 12:00 am

Re: zfs 1.6.0rc1 and Photos

Postby roemer » Tue Jan 31, 2017 4:31 pm

rottegift wrote:Is ztank your only imported pool? If so, please run zdb -DDD ztank and paste results. If there is more than one pool run zdb -DDD for each pool.


There's only one pool imported and there is no unusual CPU-activity at this moment.
All seems fine. Shall I run this again once I get into another stress-test situation?

Code: Select all
bash-3.2$ zpool list
NAME    SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
ztank  10.9T  7.24T  3.63T         -     7%    66%  1.00x  ONLINE  -

bash-3.2$ sudo zdb -DDD ztank
All DDTs are empty

bash-3.2$ sudo zdb -DDD ztank/Pictures
Dataset ztank/Pictures [ZPL], ID 102, cr_txg 9392, 359G, 1466587 objects
roemer
 
Posts: 73
Joined: Sat Mar 15, 2014 2:32 pm

Re: zfs 1.6.0rc1 and Photos

Postby rottegift » Tue Jan 31, 2017 9:25 pm

zdb -DDD only does useful things on a pool (there's a man page if you're dying to know).

This is really strange, why is your kernel entering ddt_sync() at all?

We'll have a think and get back to you.
rottegift
 
Posts: 26
Joined: Fri Apr 25, 2014 12:00 am

Re: zfs 1.6.0rc1 and Photos

Postby roemer » Wed Feb 01, 2017 1:23 am

rottegift wrote:... Are you saving photos to a dataset where dedup is anything but "off" ?

As far as I can tell (cf. my previous posts), dedup is indeed off.

rottegift wrote:Or is this another dataset on the same pool doing concurrent writing ?


I cannot completely rule this out as the photo library is on a file server which shares multiple datasets, all of which are stored on the same zfs pool.
Up-to the change to Photos 2.x and to O3X 1.6.0rc1, I had no issues with this setup.

There is one additional piece of information, which might be of interest:
I had used this pool just shortly before to temporarily backup the content of a 2 TB external HDD, to be able to securely do some maintenance on that drive.
During this operation, the pool was almost completely full (something like a few GB still free).
Once finished, I deleted that temporary backup again. Delete was very fast and seemed to have freed all this data again.
But could it be that there is some 'lazy' data cleanup going on which I trigger now and then with many writes to my Pictures dataset?
roemer
 
Posts: 73
Joined: Sat Mar 15, 2014 2:32 pm

Previous

Return to General Help

Who is online

Users browsing this forum: Google [Bot] and 25 guests