Sunday, August 24, 2025

ZFS Managability

In this blog post, we will focus on ZFS from a manageability perspective. We will cover following topics

ZFS Storage Status Monitoring

To check ZFS ZPOOL brief status, use command zpool status -x

 root@bhyve01:~ # zpool status -x  
 all pools are healthy  

To check ZFS ZPOOL more verbose status, use command zpool status

 root@bhyve01:~ # zpool status   
  pool: OS-DATA  
  state: ONLINE  
 config:  
      NAME     STATE   READ WRITE CKSUM  
      OS-DATA  ONLINE     0     0     0  
       da0     ONLINE     0     0     0  
 errors: No known data errors  
  pool: STORAGE-DATA  
  state: ONLINE  
 config:  
      NAME         STATE   READ WRITE CKSUM  
      STORAGE-DATA ONLINE     0     0     0  
       raidz2-0    ONLINE     0     0     0  
        da2        ONLINE     0     0     0  
        da3        ONLINE     0     0     0  
        da4        ONLINE     0     0     0  
        da5        ONLINE     0     0     0  
        da6        ONLINE     0     0     0  
        da7        ONLINE     0     0     0  
      logs       
       nda1p1      ONLINE     0     0     0  
      cache  
       nda1p2      ONLINE     0     0     0  
 errors: No known data errors  
 root@bhyve01:~ #   

Physical Disk Monitoring

First of all, you should know all your disk devices. Disk devices can be liseted by command camcontrol devlist

 root@bhyve01:~ # camcontrol devlist  
 <SEAGATE ST9146852SS HT64>       at scbus0 target 0 lun 0 (pass0,da0)  
 <SEAGATE ST9146853SS YS09>       at scbus0 target 1 lun 0 (pass1,da1)  
 <ATA ST9500620NS AA0E>           at scbus0 target 2 lun 0 (pass2,da2)  
 <ATA ST9500620NS AA0E>           at scbus0 target 3 lun 0 (pass3,da3)  
 <ATA ST9500620NS AA0E>           at scbus0 target 4 lun 0 (pass4,da4)  
 <ATA ST9500620NS AA09>           at scbus0 target 5 lun 0 (pass5,da5)  
 <ATA ST9500620NS AA0E>           at scbus0 target 6 lun 0 (pass6,da6)  
 <ATA ST9500620NS AA09>           at scbus0 target 7 lun 0 (pass7,da7)  
 <INTEL SSDPEKNW512G8 002C>       at scbus1 target 0 lun 1 (pass8,nda0)  
 <KINGSTON SNVS1000GB S8442101>   at scbus2 target 0 lun 1 (pass9,nda1)  
 <SanDisk Ultra 1.00>             at scbus3 target 0 lun 0 (pass10,da8)  
 root@bhyve01:~ #   

In this section, we’ll work with example commands and outputs, so it’s important to first understand the disk layout. The layout of the disks in one of my homelab servers is illustrated in the figure below.

Disk Layout

S.M.A.R.T 

To monitor status of particular disks we should leverage S.M.A.R.T (Self-Monitoring, Analysis and Reporting Technology), which is an industry standard for monitoring physical disks. It can be installed by command pkg install smartmontools

SmartMonTools contains two applications

  • smartctl - Control and Monitor Utility for SMART Disks  
  • smartd - SMART Disk Monitoring Daemon

SMART Disk Monitoring Daemon

SMART Disk Monitoring Daemon must be enabled by command sysrc smartd_enable="YES"

smartd must have configuration file, otherwise, you cannot start the service. We can simply use the default configuration by following copy command ... cp /usr/local/etc/smartd.conf.sample /usr/local/etc/smartd.conf ... and start the service. 

 root@bhyve01:~ # cp /usr/local/etc/smartd.conf.sample /usr/local/etc/smartd.conf  
 root@bhyve01:~ # service smartd start  
 Starting smartd.  
 root@bhyve01:~ # service smartd status  
 smartd is running as pid 4010.  
 root@bhyve01:~ #   

SMARTCTL

For manual status checking of particular disk, we can use utility smartctl.

Let's check /dev/da0, which is in my case 146 GB SAS disk used for ZPOOL OS-DATA. 

 root@bhyve01:~ # smartctl -a /dev/da0  
 smartctl 7.5 2025-04-30 r5714 [FreeBSD 14.3-RELEASE amd64] (local build)  
 Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org  
 === START OF INFORMATION SECTION ===  
 Vendor:              SEAGATE  
 Product:             ST9146852SS  
 Revision:            HT64  
 Compliance:          SPC-3  
 User Capacity:       146,815,733,760 bytes [146 GB]  
 Logical block size:  512 bytes  
 Rotation Rate:       15000 rpm  
 Form Factor:         2.5 inches  
 Logical Unit id:     0x5000c500395f4c03  
 Serial number:       6TB1ADGK  
 Device type:         disk  
 Transport protocol:  SAS (SPL-4)  
 Local Time is:       Sun Aug 24 08:34:21 2025 UTC  
 SMART support is:    Available - device has SMART capability.  
 SMART support is:    Enabled  
 Temperature Warning: Disabled or Not Supported
 
 === START OF READ SMART DATA SECTION ===  
 SMART Health Status: OK  
 
 Current Drive Temperature:   28 C  
 Drive Trip Temperature:      68 C  
 
 Accumulated power on time, hours:minutes 8649:27  
 Elements in grown defect list: 9  
 
 Vendor (Seagate Cache) information  
  Blocks sent to initiator = 1764496082  
  Blocks received from initiator = 3668687841  
  Blocks read from cache and sent to initiator = 701693283  
  Number of read and write commands whose size <= segment size = 3075002927  
  Number of read and write commands whose size > segment size = 37  
 
 Vendor (Seagate/Hitachi) factory information  
  number of hours powered up = 8649.45  
  number of minutes until next internal SMART test = 21  
 
 Error counter log:  
            Errors Corrected by      Total  Correction   Gigabytes  Total  
                ECC     rereads/  errors  algorithm   processed  uncorrected  
       fast | delayed  rewrites corrected invocations  [10^9 bytes] errors  
 read:  2358881651    0     0 2358881651  2358881651    4445.103      0  
 write:     0    0     0     0     0   246574.111      0  
 verify: 1366420009    0     0 1366420009  1366420009   73660.356      0  
 
 Non-medium error count:    12  
 
 SMART Self-test log  
 Num Test             Status       segment LifeTime LBA_first_err [SK ASC ASQ]  
     Description                    number  (hours)  
 # 1 Background long  Completed         16    1                 - [-   -    -]  
 # 2 Background long  Completed         16    0                 - [-   -    -]  
 # 3 Background short Completed         16    0                 - [-   -    -]  
 
 Long (extended) Self-test duration: 1680 seconds [28.0 minutes]  
 root@bhyve01:~ #   

Let's check /dev/nda0, which is NVMe disk used in my case for ZFS caches (SLOG/write-cache, L2ARC/read-cache). Note, that it is necessary to define type of disk with option -d nvme

 root@bhyve01:~ # smartctl -a -d nvme /dev/nvme0  
 smartctl 7.5 2025-04-30 r5714 [FreeBSD 14.3-RELEASE amd64] (local build)  
 Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org  
 
 === START OF INFORMATION SECTION ===  
 Model Number:                      INTEL SSDPEKNW512G8  
 Serial Number:                     PHNH94220BKH512A  
 Firmware Version:                  002C  
 PCI Vendor/Subsystem ID:           0x8086  
 IEEE OUI Identifier:               0x5cd2e4  
 Controller ID:                     1  
 NVMe Version:                      1.3  
 Number of Namespaces:              1  
 Namespace 1 Size/Capacity:         512,110,190,592 [512 GB]  
 Namespace 1 Formatted LBA Size:    512  
 Local Time is:                     Sun Aug 24 09:01:01 2025 UTC  
 Firmware Updates (0x14):           2 Slots, no Reset required  
 Optional Admin Commands (0x0017):  Security Format Frmw_DL Self_Test  
 Optional NVM Commands (0x005f):    Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp  
 Log Page Attributes (0x0f):        S/H_per_NS Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg  
 Maximum Data Transfer Size:        32 Pages  
 Warning Comp. Temp. Threshold:     77 Celsius  
 Critical Comp. Temp. Threshold:    80 Celsius  
 
 Supported Power States  
 St Op    Max  Active   Idle  RL RT WL WT Ent_Lat Ex_Lat  
  0 +    3.50W      -      -   0  0  0  0       0      0  
  1 +    2.70W      -      -   1  1  1  1       0      0  
  2 +    2.00W      -      -   2  2  2  2       0      0  
  3 -  0.0250W      -      -   3  3  3  3    5000   5000  
  4 -  0.0040W      -      -   4  4  4  4    5000   9000  
 
 Supported LBA Sizes (NSID 0x1)  
 Id Fmt Data Metadt Rel_Perf  
  0 +    512      0        0  
 
 === START OF SMART DATA SECTION ===  
 SMART overall-health self-assessment test result: PASSED  
 
 SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)  
 Critical Warning:                  0x00  
 Temperature:                       36 Celsius  
 Available Spare:                   100%  
 Available Spare Threshold:         10%  
 Percentage Used:                   10%  
 Data Units Read:                   21,855,633 [11.1 TB]  
 Data Units Written:                62,654,859 [32.0 TB]  
 Host Read Commands:                783,452,025  
 Host Write Commands:               3,944,291,927  
 Controller Busy Time:              83,884  
 Power Cycles:                      114  
 Power On Hours:                    13,298  
 Unsafe Shutdowns:                  36  
 Media and Data Integrity Errors:   0  
 Error Information Log Entries:     0  
 Warning Comp. Temperature Time:    0  
 Critical Comp. Temperature Time:   0  
 Thermal Temp. 1 Transition Count:  1007  
 Thermal Temp. 1 Total Time:        6850  
 
 Error Information (NVMe Log 0x01, 16 of 256 entries)  
 No Errors Logged
 
 Self-test Log (NVMe Log 0x06, NSID 0xffffffff)  
 Self-test status: No self-test in progress  
 No Self-tests Logged  
 root@bhyve01:~ #   

Let's check /dev/da2, which is in my case 500 GB NL-SAS (SATA) disk used for ZPOOL STORAGE-DATA.

 root@bhyve01:~ # smartctl -a /dev/da2   
 smartctl 7.5 2025-04-30 r5714 [FreeBSD 14.3-RELEASE amd64] (local build)  
 Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org  

 === START OF INFORMATION SECTION ===  
 Model Family:     Seagate Constellation.2 (SATA)  
 Device Model:     ST9500620NS  
 Serial Number:    9XF1XTW3  
 LU WWN Device Id: 5 000c50 04e891313  
 Add. Product Id:  DELL(tm)  
 Firmware Version: AA0E  
 User Capacity:    500,107,862,016 bytes [500 GB]  
 Sector Size:      512 bytes logical/physical  
 Rotation Rate:    7200 rpm  
 Form Factor:      2.5 inches  
 Device is:        In smartctl database 7.5/5706  
 ATA Version is:   ATA8-ACS T13/1699-D revision 4  
 SATA Version is:  SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s)  
 Local Time is:    Sun Aug 24 09:25:20 2025 UTC  
 SMART support is: Available - device has SMART capability.  
 SMART support is: Enabled  

 === START OF READ SMART DATA SECTION ===  
 SMART overall-health self-assessment test result: PASSED  

 General SMART Values:  
 Offline data collection status: (0x82)     Offline data collection activity  
                                            was completed without error.  
                                            Auto Offline Data Collection: Enabled.  
 Self-test execution status:     (   0)     The previous self-test routine completed  
                                            without error or no self-test has ever   
                                            been run.  
 Total time to complete Offline   
 data collection:                (  90)     seconds.  
 Offline data collection  
 capabilities:                   (0x7b)     SMART execute Offline immediate.  
                                            Auto Offline data collection on/off support.  
                                            Suspend Offline collection upon new  
                                            command.  
                                            Offline surface scan supported.  
                                            Self-test supported.  
                                            Conveyance Self-test supported.  
                                            Selective Self-test supported.  
 SMART capabilities:           (0x0003)     Saves SMART data before entering  
                                            power-saving mode.  
                                            Supports SMART auto save timer.  
 Error logging capability:       (0x01)     Error logging supported.  
                                            General Purpose Logging supported.  
 Short self-test routine   
 recommended polling time:        (  2)     minutes.  
 Extended self-test routine  
 recommended polling time:       ( 102)     minutes.  
 Conveyance self-test routine  
 recommended polling time:       (   3)     minutes.  
 SCT capabilities:             (0x10bd)     SCT Status supported.  
                                            SCT Error Recovery Control supported.  
                                            SCT Feature Control supported.  
                                            SCT Data Table supported.  
 
 SMART Attributes Data Structure revision number: 10  
 Vendor Specific SMART Attributes with Thresholds:  
 ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE     UPDATED WHEN_FAILED RAW_VALUE  
  1  Raw_Read_Error_Rate     0x000f  083   063   044    Pre-fail Always      -       229836741  
  3  Spin_Up_Time            0x0003  096   096   000    Pre-fail Always      -       0  
  4  Start_Stop_Count        0x0032  100   100   020    Old_age  Always      -       258  
  5  Reallocated_Sector_Ct   0x0033  100   100   036    Pre-fail Always      -       0  
  7  Seek_Error_Rate         0x000f  091   060   030    Pre-fail Always      -       1558590529  
  9  Power_On_Hours          0x0032  095   011   000    Old_age  Always      -       4916  
  10 Spin_Retry_Count        0x0013  100   100   097    Pre-fail Always      -       0  
  12 Power_Cycle_Count       0x0032  100   100   020    Old_age  Always      -       255  
 184 End-to-End_Error        0x0032  100   100   099    Old_age  Always      -       0  
 187 Reported_Uncorrect      0x0032  100   100   000    Old_age  Always      -       0  
 188 Command_Timeout         0x0032  100   100   000    Old_age  Always      -       0  
 189 High_Fly_Writes         0x003a  100   100   000    Old_age  Always      -       0  
 190 Airflow_Temperature_Cel 0x0022  077   061   045    Old_age  Always      -       23 (Min/Max 23/32)  
 191 G-Sense_Error_Rate      0x0032  100   100   000    Old_age  Always      -       0  
 192 Power-Off_Retract_Count 0x0032  100   100   000    Old_age  Always      -       249  
 193 Load_Cycle_Count        0x0032  098   098   000    Old_age  Always      -       5851  
 194 Temperature_Celsius     0x0022  023   040   000    Old_age  Always      -       23 (0 7 0 0 0)  
 195 Hardware_ECC_Recovered  0x001a  119   099   000    Old_age  Always      -       229836741  
 197 Current_Pending_Sector  0x0012  100   100   000    Old_age  Always      -       0  
 198 Offline_Uncorrectable   0x0010  100   100   000    Old_age  Offline     -       0  
 199 UDMA_CRC_Error_Count    0x003e  200   200   000    Old_age  Always      -       0  
 240 Head_Flying_Hours       0x0000  100   253   000    Old_age  Offline     -       82322 (191 120 0)  
 241 Total_LBAs_Written      0x0000  100   253   000    Old_age  Offline     -       995382041  
 242 Total_LBAs_Read         0x0000  100   253   000    Old_age  Offline     -       722691417  
 
 SMART Error Log Version: 1  
 No Errors Logged  
 
 SMART Self-test log structure revision number 1  
 Num Test_Description  Status               Remaining LifeTime(hours) LBA_of_first_error  
 # 1 Short offline     Completed without error    00%        3        -  
 # 2 Extended offline  Completed without error    00%        3        -  
 # 3 Short offline     Completed without error    00%        1        -  
 
 SMART Selective self-test log data structure revision number 1  
  SPAN  MIN_LBA MAX_LBA  CURRENT_TEST_STATUS  
     1        0       0  Not_testing  
     2        0       0  Not_testing  
     3        0       0  Not_testing  
     4        0       0  Not_testing  
     5        0       0  Not_testing  
 Selective self-test flags (0x0):  
  After scanning selected spans, do NOT read-scan remainder of disk.  
 If Selective self-test is pending on power-up, resume after 0 minute delay.  
 
 The above only provides legacy SMART information - try 'smartctl -x' for more  
 root@bhyve01:~ #   

Let's check /dev/da8, which is in my case 16 GB USB disk used for UEFI boot loader and FreeBSD OS Root File System. 

 root@bhyve01:~ # smartctl -a -d scsi -T permissive /dev/da8   
 smartctl 7.5 2025-04-30 r5714 [FreeBSD 14.3-RELEASE amd64] (local build)  
 Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org  
 
 === START OF INFORMATION SECTION ===  
 Vendor:              SanDisk  
 Product:             Ultra  
 Revision:            1.00  
 Compliance:          SPC-4  
 User Capacity:       15,376,318,464 bytes [15.3 GB]  
 Logical block size:  512 bytes  
 Device type:         disk  
 Local Time is:       Sun Aug 24 09:43:49 2025 UTC  
 SMART support is:    Available - device has SMART capability.  
 SMART support is:    Enabled  
 Temperature Warning: Disabled or Not Supported  
 
 === START OF READ SMART DATA SECTION ===  
 SMART Health Status: OK  
 Current Drive Temperature:   0 C  
 Drive Trip Temperature:      0 C  
 
 Error Counter logging not supported  
 
 Device does not support Self Test logging  
 root@bhyve01:~ #   

As you can see, there isn’t much information available for consumer-grade USB disks. We only get basic disk details and a simple binary SMART health status - either OK or Not OK.

ssd_report-smartmontools-0.4   SSD health report  

You can leverage Prometheus exporter for centralized monitoring. 

smartctl_exporter-0.14.0       Prometheus metrics exporter for smartctl 

ZFS Capacity Monitoring

You can display ZFS pool capacity using the zpool get capacity command.

 root@bhyve01:~ # zpool get capacity  
 NAME         PROPERTY VALUE SOURCE  
 OS-DATA      capacity 0%    -  
 STORAGE-DATA capacity 43%   -  
 root@bhyve01:~ #   

To view the capacity of ZFS datasets and their mount points, use the zfs list command.

 root@bhyve01:~ # zfs list  
 NAME                          USED AVAIL REFER MOUNTPOINT  
 OS-DATA                      76.2M  132G   24K /OS-DATA  
 OS-DATA/home                 31.5K  132G 31.5K /home  
 OS-DATA/tmp                    24K  132G   24K /tmp  
 OS-DATA/var                  73.6M  132G 73.6M /var  
 STORAGE-DATA                  800G  997G 44.0K /STORAGE-DATA  
 STORAGE-DATA/bhyve-datastore  800G  997G  800G /STORAGE-DATA/bhyve-datastore  
 root@bhyve01:~ #   

On the STORAGE-DATA ZFS pool, 800 GB is used and 997 GB is available. This means that the zpool get capacity output of 43% for the STORAGE-DATA dataset represents the used capacity.

ZFS Performance Monitoring

Storage performance can be measured on various measure points or layers. In next section we will cover 

  • ZPOOL Layer : zpool iostat monitors I/O statistics on ZPOOL layer
  • ZFS Layer : zpool iostat monitors I/O statistics on ZPOOL layer
  • GEOM (Physical Disk) Layer : GEOM (GEometric Overlay Manager) is the primary storage framework in FreeBSD, providing a modular and extensible way to manage disk I/O requests. gstat utility monitor I/O statistics on physical disk layer.
  • TOP (Process) Layer : top is widely-used command-line utility in FreeBSD that provides a real-time, dynamic view of a running system from process point of view.

ZPOOL Layer (zpool iostat)

We can monitor ZFS performance with zpool iostat. This is the most essential tool for monitoring ZFS I/O performance. It reports real-time statistics for ZFS pools and devices. Below is example of zfs iostat monitoring with refresh every 5 seconds.

 root@bhyve01:/STORAGE-DATA/bhyve-datastore # zpool iostat 5  
               capacity    operations   bandwidth   
 pool         alloc  free  read write  read write  
 ------------ ----- ----- ----- ----- ----- -----  
 OS-DATA      74.7M  136G     0     0    58 2.81K  
 STORAGE-DATA 1.02T 1.70T     4   208  140K 15.4M  
 ------------ ----- ----- ----- ----- ----- -----  
 OS-DATA      74.9M  136G     0    14     0 45.1K  
 STORAGE-DATA 1.03T 1.69T     0 2.31K     0  528M  
 ------------ ----- ----- ----- ----- ----- -----  
 ^C  
 root@bhyve01:/STORAGE-DATA/bhyve-datastore #   

zpool iostat -v: Shows statistics for individual vdevs and the underlying physical disks. Below is example of zfs iostat -v monitoring with refresh every 5 seconds.

 root@bhyve01:/STORAGE-DATA/bhyve-datastore # zpool iostat -v 5  
               capacity    operations   bandwidth   
 pool         alloc  free  read write  read write  
 ------------ ----- ----- ----- ----- ----- -----  
 OS-DATA      74.8M  136G     0     0    58 2.83K  
  da0         74.8M  136G     0     0    58 2.83K  
 ------------ ----- ----- ----- ----- ----- -----  
 STORAGE-DATA 1.12T 1.60T     4   225  139K 16.6M  
  raidz2-0    1.12T 1.60T     4   223  139K 16.3M  
    da2           -     -     0    37 22.2K 2.72M  
    da3           -     -     0    37 22.3K 2.72M  
    da4           -     -     0    36 25.6K 2.72M  
    da5           -     -     0    37 25.4K 2.72M  
    da6           -     -     0    37 21.9K 2.72M  
    da7           -     -     0    36 22.0K 2.72M  
 logs             -     -     -     -     -     -  
  nda1p1       128K 39.5G     0     2     5  282K  
 cache            -     -     -     -     -     -  
  nda1p2       457G  343G    14    48 1.76M 6.07M  
 ------------ ----- ----- ----- ----- ----- -----  
 ^C  
 root@bhyve01:/STORAGE-DATA/bhyve-datastore #  

ZFS Layer (zfs-stats, zfs-mon)

zfs-stats 

Various ZFS related internal statistics are published via the sysctl interface. We can leverage zfs-stats package which summarizes those statistics in a more human-readable way and logically grouping them together.

You must install zfs-stats package by following command ...

pkg install zfs-stats 

... and use zfs-stats -A to show ZFS related internal statistics.

 root@bhyve01:~ # zfs-stats -A  
 ------------------------------------------------------------------------  
 ZFS Subsystem Report                        Sun Aug 24 13:51:34 2025  
 ------------------------------------------------------------------------  
 ARC Summary: (HEALTHY)  
      Memory Throttle Count:                 0  
 ARC Misc:  
      Deleted:                               765.00     k  
      Mutex Misses:                          232  
      Evict Skips:                           3  
 ARC Size:                        89.99%     114.20     GiB  
      Target Size: (Adaptive)     89.95%     114.15     GiB  
      Min Size (Hard Limit):      3.15%      4.00       GiB  
      Max Size (High Water):      31:1       126.90     GiB  
      Compressed Data Size:                  109.77     GiB  
      Decompressed Data Size:                110.34     GiB  
      Compression Factor:                    1.01  
 ARC Size Breakdown:    
      Recently Used Cache Size:   38.89%     44.41      GiB  
      Frequently Used Cache Size: 61.11%     69.79      GiB  
 ARC Hash Breakdown:  
      Elements Max:                          4.62       m  
      Elements Current:          100.00%     4.62       m  
      Collisions:                            2.02       m  
      Chain Max:                             6  
      Chains:                                532.27     k  
 ------------------------------------------------------------------------  
 root@bhyve01:~ #   

The output from zfs-stats -A provides a detailed report on the ZFS Adaptive Replacement Cache (ARC), which is ZFS's primary in-memory cache. The output shows the health, size, and efficiency of the ARC, which is crucial for ZFS performance. 

Let's break down provided output ...

  • ARC Summary
    • ARC Summary: (HEALTHY):
      • This line indicates that the ARC is operating normally and is not under memory pressure.
    • Memory Throttle Count (0)
      • This is the number of times the ARC had to reduce its size aggressively to free up memory for other processes. In other words, it is the number of times that the ZFS ARC has had to reduce its memory usage because of demands elsewhere in the system.
      • A value of 0 is excellent and indicates no memory contention.
      • In case of memory contention, you might consider setting the maximum size of the ARC (vfs.zfs.arc_max) to a value that makes ZFS coexist with your other workloads better.
  • ARC Size and Performance
    • ARC Size (114.20 GiB)
      • This shows the current size of the ARC (114.20 GiB) and what percentage it is of its maximum possible size (89.99%). 
      • Based on these information, maximum ARC Size is 126.9 GB (114.2 / 89.99 * 100)  
    • Target Size
      • The Adaptive target size is the ideal size the ARC is aiming for based on current memory usage. It shows the ARC is near its target (114.15 GiB), which is a sign of stability.
    • Min/Max Size (126.9 GiB)
      • These are the hard limits for the ARC's size. The ARC will not shrink below the Min Size (4 GiB) and will not grow beyond the Max Size (126.9 GiB). The ratio 31:1 is the default ratio of the maximum ARC size to the minimum ARC size.
  • Compression and Cache Breakdown
    • Compressed Data Size (109.77 GiB)
      • The amount of data stored in the ARC after being compressed. This shows that ZFS is effectively compressing data before it is cached.
    • Decompressed Data Size (110.34 GiB)
      • The size of the data stored in the ARC if it were not compressed.
    • Compression Factor (1.01)
      • The ratio of the decompressed size to the compressed size (1.01). This value shows very little compression is occurring, which may be because the data is already in a non-compressible format (e.g., JPEG images, video files). A factor of 2.0 would indicate data is being compressed to half its original size.
    • Recently Used Cache Size and Frequently Used Cache Size
      • These metrics are key to the ARC's efficiency. They represent the two lists that form the ARC's L1 and L2 caches. The output shows a good balance, with the frequently used list containing more data (61.11%), which is expected for a well-tuned cache.
  • ARC Hash Breakdown
    • Elements Max (4.62 m) / Current (4.62 m)
      • The number of entries in the ARC's hash table. The output shows the current number of elements is at its maximum, which is normal for a full cache.
    • Collisions (2.02 m)
      • The number of times a hash key points to a "bucket" that already contains an entry.
    • Chain Max (6)
      • The maximum length of a hash collision chain.
    • Chains (532.27 k)
      • The total number of chains. The values for collisions and chains are normal and do not indicate a performance issue.

In summary, the output above indicates the ZFS ARC is running very well on this system. It is healthy, operating at its target size, and effectively using a large portion of the available memory for caching data. The compression factor is low, but this is a function of the data itself, not a sign of a problem with the ARC. 

zfs-mon

The sysutils/zfs-stats package also includes a second tool, zfs-mon, which looks at how a subset
of the kstats are changing over time. This can provide useful insight into how the requests are being bro-
ken down, and how the various caching layers in ZFS are being used. The stats break down the performance of the ARC, L2ARC, the filesystem prefetch, and the device prefetching code. It also breaks down data vs metadata operations. By default, ZFS limits the amount of cache available for metadata to 25% of the max ARC size. If the total storage capacity is very large—and most operations impact only the metadata of the files, not the content—increasing the amount of the ARC that can be used for metadata can actually increase performance, since otherwise the ARC may be 3/4s full of content that will not be referenced again before it is replaced with other content.

 ZFS real-time cache activity monitor  
 Seconds elapsed: 19  
 
 Cache hits and misses:  
                                   1s   10s   60s   tot  
                      ARC hits: 13933  7076 16866 16866  
                    ARC misses:     0     0     0     0  
          ARC demand data hits:  8932  5549 11277 11277  
        ARC demand data misses:     0     0     0     0  
      ARC demand metadata hits:  4999  1527  5588  5588  
    ARC demand metadata misses:     0     0     0     0  
        ARC prefetch data hits:     0     0     0     0  
      ARC prefetch data misses:     0     0     0     0  
    ARC prefetch metadata hits:     0     0     0     0  
  ARC prefetch metadata misses:     0     0     0     0  
                    L2ARC hits:     0     0     0     0  
                  L2ARC misses:     0     0     0     0  
                   ZFETCH hits:     0     0     1     1  
                 ZFETCH misses:  7433  4611  9378  9378  
 
 Cache efficiency percentage:  
                           10s    60s    tot  
                   ARC: 100.00 100.00 100.00  
       ARC demand data: 100.00 100.00 100.00  
   ARC demand metadata: 100.00 100.00 100.00  
     ARC prefetch data:   0.00   0.00   0.00  
 ARC prefetch metadata:   0.00   0.00   0.00  
                 L2ARC:   0.00   0.00   0.00  
                ZFETCH:   0.00   0.01   0.01  
 ^C  
 root@bhyve01:~ # zfs-mon -a  

The above output from zfs-mon provides a real-time snapshot of the ZFS Adaptive Replacement Cache (ARC) and L2ARC performance. The analysis shows that the ARC is performing with extremely high efficiency, while a significant number of ZFETCH misses indicate that ZFS is prefetching data that isn't being used. This is because I run synthetic benchmarks. Real storage workload normally leverage prefetching data more than synthetic workload.

Let's break down provided output ... 

  • Cache Hits and Misses
    • This section is the core of the analysis, showing the number of I/O requests that are being served directly from cache (hits) versus those that require a read from the underlying storage (misses). The columns 1s, 10s, 60s, and tot represent the counts over the last 1, 10, and 60 seconds, and the total since zfs-mon started, respectively.
    • ARC hits and misses 
      • The key takeaway here is that there are thousands of ARC hits and zero ARC misses. This is an excellent result that indicates the working data set is small enough to fit entirely within the system's RAM. All data requests are being served from the fast ARC cache, avoiding slow disk I/O.
    • ARC demand data hits/misses 
      • This refers to data that was requested by a process. The output shows thousands of hits and zero misses, confirming that all user-requested data is being served from the ARC.
    • ARC demand metadata hits/misses
      • This is for filesystem metadata (e.g., file permissions, directory structures). The thousands of hits and zero misses show that the metadata is also fully cached, ensuring extremely fast file system operations.
    • ARC prefetch hits/misses
      • Prefetching is when ZFS proactively loads data into the cache that it anticipates a process will need. The prefetch hits are zero, while there are thousands of ZFETCH misses. This is the most significant observation in this output. It is caused by synthetic storage workload (fio benchmark).
  • ZFETCH Hits and Misses
    • ZFETCH misses
      • ZFETCH is the prefetch mechanism in ZFS. The high number of ZFETCH misses (9378 in total) means that ZFS is prefetching data that is not being used. This can be inefficient as it consumes system resources (CPU and I/O) to load data that is never requested by an application.
    • ZFETCH hits
      • The single ZFETCH hit (1) suggests that only one of these thousands of prefetch operations was actually useful.
  • Cache Efficiency Percentage
    • This table confirms the findings from the hit/miss analysis.
    • ARC, ARC demand data, ARC demand metadata
      • The 100% efficiency across all these categories reinforces that the ARC is satisfying all data and metadata requests.
    • ARC prefetch and L2ARC
      • The 0% efficiency is a direct result of the lack of prefetch hits and L2ARC hits. The L2ARC, if it existed, would not be used because all data is being served from the primary ARC in RAM.
    • ZFETCH
      • The extremely low efficiency of 0.01% is the clearest indicator of the high number of useless prefetch misses.

In our example scenario (fio benchmark), the ZFS cache is performing exceptionally well for its primary function (demand hits), serving all requests from the fast in-memory ARC. However, there is a clear inefficiency in the prefetching mechanism (ZFETCH) where ZFS is pre-loading a lot of data that is never used. That's because we used synthetic storage workload (fio benchmark) and not real storage workload, which typically leverage prefetching mechanism.

ZFS KSTATS

ZFS presents an impressive number of stats and counters via the kstat interface. On FreeBSD, this is currently exposed via the kstats.zfs sysctl mibs.

One of the advantages of ZFS is the ARC (Adaptive Replacement Cache), which provides better cache
hit ratios than a standard LRU (Least Recently Used) cache. 

Looking at the various stats about the ARC can provide insight into what is happening with a system.

  • kstat.zfs.misc.arcstats.c_max
    • The target maximum size of the ARC.
  • kstat.zfs.misc.arcstats.c_min
    • The target minimum size of the ARC. The ARC will not shrink below this size, although it can be adjusted with the vfs.zfs.arc_min sysctl.
  • kstat.zfs.misc.arcstats.size
    • The current size of the ARC; if this is less than the maximum, your system has either not had enough activity to fill the ARC, or memory pressure from other processes has caused the ARC to shrink.
  • kstat.zfs.misc.arcstats.c
    • The current target size of the ARC. If the current size of the ARC is less than this value, the ARC will try to grow.
  • kstat.zfs.misc.arcstats.arc_meta_used
    • The amount of the ARC used to store metadata rather than user data. If this value has reached vfs.zfs.arc_meta_limit (which defaults to 25% of vfs.zfs.arc_max), then consider raising or lowering the fraction of the ARC used for metadata.

Caching more metadata will increase the speed of directory scans and other operations, at the cost of
decreasing the amount of user data that can be cached.

Below is output of above stats from my system ...

 root@bhyve01:~ # sysctl kstat.zfs.misc.arcstats.c_max  
 kstat.zfs.misc.arcstats.c_max: 136263184384  
 root@bhyve01:~ # sysctl kstat.zfs.misc.arcstats.c_min  
 kstat.zfs.misc.arcstats.c_min: 4291778944  
 root@bhyve01:~ # sysctl kstat.zfs.misc.arcstats.size  
 kstat.zfs.misc.arcstats.size: 122459869072  
 root@bhyve01:~ # sysctl kstat.zfs.misc.arcstats.c  
 kstat.zfs.misc.arcstats.c: 122493951746  
 root@bhyve01:~ # sysctl kstat.zfs.misc.arcstats.arc_meta_used  
 kstat.zfs.misc.arcstats.arc_meta_used: 1771825944  
 root@bhyve01:~ #   

Analysis of the above output

  •  kstat.zfs.misc.arcstats.c_max: 136,263,184,384 bytes (~126.9 GB)
    • This is the maximum size the ARC is allowed to grow to. This value is derived from the system's total physical memory (RAM).
  • kstat.zfs.misc.arcstats.c_min: 4,291,778,944 bytes (~4 GB)
    • This is the minimum size the ARC will shrink to under memory pressure. ZFS will always keep at least this much RAM dedicated to the ARC.
  • kstat.zfs.misc.arcstats.size: 122,459,869,072 bytes (~114 GB)
    • This is the current size of the ARC. The fact that it's close to c_max indicates that the system has a large working set of data and metadata that fits comfortably within the cache, and there's no significant memory pressure forcing the ARC to shrink.
  • kstat.zfs.misc.arcstats.c: 122,493,951,746 bytes (~114 GB)
    • This is the target size of the ARC, which is the size the ARC is aiming for based on the current workload. As this value is very close to the current size, it confirms that the ARC is stable and operating efficiently at its target.
  • kstat.zfs.misc.arcstats.arc_meta_used: 1,771,825,944 bytes (~1.65 GB)
    • This is the amount of ARC memory being used to cache metadata (e.g., file names, permissions, and directory structures). The fact that over 1.6 GB is dedicated to metadata shows that a significant portion of the ARC is used for fast file system lookups, which is a key strength of ZFS.

In summary, the output confirms that your ARC is healthy, operating near its maximum capacity, and effectively caching a large amount of both data and metadata for optimal performance.

GEOM Layer

We can also monitor physical disks with utility gstat which print statistics about GEOM disks.  

gstat -p -I 3s

 dT: 3.026s w: 3.000s  
  L(q) ops/s  r/s  kBps  ms/r   w/s  kBps   ms/w    %busy Name  
   0   0      0    0     0.000  0    0      0.000   0.0 | nda0  
   0   2417   0    0     0.000  2417 308363 0.710   19.4| nda1  
   0   0      0    0     0.000  0    0      0.000    0.0| da0  
   0   0      0    0     0.000  0    0      0.000    0.0| da1  
   17  105    0    0     0.000  105  103148 118.9  100.4| da2  
   19  109    0    0     0.000  109  107737 134.1  100.5| da3  
   10  108    0    0     0.000  108  105611 176.7   99.9| da4  
   13  112    0    0     0.000  112  102640 165.6   99.7| da5  
   19  110    0    0     0.000  110  105136 164.5  100.2| da6  
   12  106    0    0     0.000  106  99394  179.9   99.8| da7  
   0   0      0    0     0.000  0    0      0.000    0.0| da8  

In example output above, we can see that NL-SAS disks are handling ~100 IOPS and NVMe read-cache (nda1) is handling ~2400 IOPS (308 MB/s).

TOP

One of the fastest ways to figure out which application (process) is causing all of the I/O is to use top. On FreeBSD top has a -m flag to change the mode. In I/O mode, instead of tracking applications by CPU and memory usage, it tracks reads, writes, and other I/O operations. This can help to determine which application is consuming all of the I/O resources. Command top -m io can be used to see how many I/Os are coming from various processes. Option -o read allows you to specify a column to sort the process list by the amount of data they have read from the disk. Option -o write sorts the process list by the amount of data they have write to the disk.

Output from command top -m io -o read is depicted below.

 last pid: 4139; load averages: 0.17, 0.35, 0.37                    up 0+13:17:11 13:30:04  
 37 processes: 1 running, 36 sleeping  
 CPU: 0.2% user, 0.0% nice, 7.8% system, 0.3% interrupt, 91.7% idle  
 Mem: 4092K Active, 75M Inact, 29M Laundry, 120G Wired, 223M Buf, 4520M Free  
 ARC: 114G Total, 66G MFU, 46G MRU, 1913M Anon, 585M Header, 13M Other  
    108G Compressed, 109G Uncompressed, 1.01:1 Ratio  
 Swap: 130G Total, 130G Free  
  PID USERNAME   VCSW IVCSW  READ WRITE FAULT TOTAL PERCENT COMMAND  
  4133 root     1425    12  1425   358     0  1783  12.21% fio  
  4135 root     1425    20  1425   426     0  1851  12.68% fio  
  4134 root     1424    17  1424   384     0  1808  12.38% fio  
  4131 root     1424    11  1424   386     0  1810  12.40% fio  
  4132 root     1424    14  1424   381     0  1805  12.36% fio  
  4130 root     1422    17  1422   415     0  1837  12.58% fio  
  4137 root     1421    23  1421   407     0  1828  12.52% fio  
  4136 root     1419    34  1418   462     0  1880  12.87% fio  
  2128 root        0     0     0     0     0     0   0.00% getty  
  2129 root        0     0     0     0     0     0   0.00% getty  
  2130 root        0     0     0     0     0     0   0.00% getty  
  2131 root        0     0     0     0     0     0   0.00% getty  
  2132 root        0     0     0     0     0     0   0.00% getty  
  2133 root        0     0     0     0     0     0   0.00% getty  
  2134 root        0     0     0     0     0     0   0.00% getty  
  2135 root        0     0     0     0     0     0   0.00% getty  
  3164 root        0     0     0     0     0     0   0.00% sshd-session  
  3167 dpasek      4     0     0     0     0     0   0.00% sshd-session  
  3168 dpasek      0     0     0     0     0     0   0.00% sh  
  3178 dpasek      0     0     0     0     0     0   0.00% su  
  3179 root        0     0     0     0     0     0   0.00% sh  
  3238 root        0     0     0     0     0     0   0.00% sshd-session  
  3241 dpasek      2     0     0     0     0     0   0.00% sshd-session  
  3242 dpasek      0     0     0     0     0     0   0.00% sh  
 root@bhyve01:~ # top -m io -o read  

In the output above, we see 8 fio processes reading ~1420 I/O and writing ~400 I/O during the default top interval of 2 seconds. To calculate IOPS (I/O per second), simply divide these values by 2. Alternatively, if you want top to show IOPS directly, you can switch to a 1-second interval with the command: top -m io -s 1.

Periodic and E-mail notifications 

In FreeBSD, periodic is a framework for running system maintenance scripts at different intervals (daily, weekly, and monthly). It is highly configurable through the /etc/periodic.conf file, which allows you to enable or disable specific scripts and customize their behavior, including sending notifications.

To use periodic for notifications, you typically configure two main things:

  •     Which scripts should send notifications.
  •     Where to send those notifications (e.g., to which email address). 

Configure e-mail

Use DragonFly Mail Agent (DMA). DMA is a small Mail Transport Agent (MTA), designed for home and office use. It accepts mails from locally installed Mail User Agents (MUA) and delivers the mails either locally or to a remote destination. Remote delivery includes several features like TLS/SSL support and SMTP authentication. DMA is now part of FreeBSD base system.

To enable DMA please edit /etc/mail/mailer.conf to replace all lines referring to another MTA with the following:

sendmail    /usr/local/libexec/dma
send-mail    /usr/local/libexec/dma
mailq        /usr/local/libexec/dma 

Disable Sendmail in FreeBSD system 

sysrc sendmail_enable="NONE"
sysrc dma_enable="YES" 

If you want anything in your queue to be flushed at on boot or before
shutdown, add the following to rc.conf as well:

sysrc dma_flushq_enable="YES" 

cat >> /etc/dma/dma.conf << EOF
 
SMARTHOST smtp.uw.cz
PORT 587
AUTHPATH /usr/local/etc/dma/auth.conf
SECURETRANSFER
STARTTLS 
MASQUERADE david.pasek@uw.cz 
EOF
 
cat >> /etc/dma/auth.conf << EOF
smtp.gmail.com david.pasek@uw.cz:[Password] 
EOF
 
# Set permissions so only root can read 
chmod 600 /etc/dma/auth.conf

After dma configuration, test e-mail sending. 

Test e-mail sending

Send test e-mail from console ...

echo "Hello from FreeBSD dma!" | mail -s "Test DMA Gmail" you@example.com

Check /var/log/maillog if something fails.

The Basics of periodic.conf

The main configuration file is /etc/periodic.conf. You should not edit this file directly. Instead, create a new file named /etc/periodic.conf.local to store your custom settings. This ensures your changes are not overwritten during a system update.

Configure Email Notifications

The periodic system uses a mail server (like sendmail or postfix) to send reports. To receive these reports via email, you must configure the following variables in /etc/periodic.conf.local:

  •     daily_show_info: Set to "YES" to include informative messages in the daily report.
  •     daily_output: Set the email address where you want to receive the daily report.
  •     daily_show_badconfig: Set to "YES" to show warnings for bad configurations.

Here's an example for /etc/periodic.conf.local:

#
# Configure daily reports
#
daily_show_info="YES"
daily_output="root"
daily_show_badconfig="YES"

#
# Enable the security report
#
daily_security_output="root"
daily_security_show_info="YES"
daily_security_show_rc_info="YES"


#
# Configure weekly reports
#
weekly_show_info="YES"
weekly_output="root"
weekly_show_badconfig="YES"

#
# Configure monthly reports
#
monthly_show_info="YES"
monthly_output="root"
monthly_show_badconfig="YES"

All periodic e-mails goes to user root.

Enable a periodic Script

If you want daily security report, add configuration to /etc/periodic.conf.local

#
# Enable the security report
#
daily_security_output="root"
daily_security_show_info="YES"
daily_security_show_rc_info="YES"


What above configuration does?

It checks for system vulnerabilities and package updates. 

  • daily_security_output: Specifies the email address for the security report. This can be different from the main daily report.
  • daily_security_show_info: Includes informational messages in the security report.
  • daily_security_show_rc_info: Includes information about the system's rc configuration.

The smartd_periodic Script (for S.M.A.R.T. notifications)

The smartmontools package we installed previously includes a periodic script that can be used to report S.M.A.R.T. information. This script, named smartd_periodic, is an ideal way to get notifications about disk health.

Enable the script by creating or editing /etc/periodic.conf.local:

# 
# Enable S.M.A.R.T. periodic checks 
# 
daily_smartd_periodic_enable="YES" 
daily_smartd_periodic_output="root@example.com" 
daily_smartd_periodic_flags="-H -l error -l selftest -a"

What above configuration does?

  • daily_smartd_periodic_enable="YES": This is the critical line that activates the smartd_periodic script.
  • daily_smartd_periodic_output: Where the report will be sent.
  • daily_smartd_periodic_flags: These flags are passed to the smartctl command that the script runs.
    • -H: Check the overall S.M.A.R.T. health status.
    • -l error: Show the S.M.A.R.T. error log.
    • -l selftest: Show the results of the S.M.A.R.T. self-tests.
    • -a: Show all S.M.A.R.T. attributes.

We should specify which drives to check. The smartd_periodic script uses a file to determine which devices to check. By default, this is /etc/smartd.conf but the periodic script may use a different configuration.

You have two options. First option is to set

daily_smartd_periodic_disks="/dev/nda0 /dev/nda1"

or, you can simply point it to the default smartd.conf file

 daily_smartd_periodic_conf="/usr/local/etc/smartd.conf"

Verification

To test your setup and ensure the reports are generated, you can manually run the periodic scripts by following command.

# Run the daily script manually
sh /etc/periodic/daily


This will run all the daily tasks and send the output to the configured email address. You can check your /var/mail/root file or the mailbox of the configured user.

Important: For email notifications to work, you must have a functioning mail server (sendmail is the default in the base system) and network connectivity. 

Conclusion

FreeBSD's ZFS is very good volume manager and file system with enterprise storage features. You should use S.M.A.R.T to monitor physical disk health and ZFS tools (zpool, zfs) for ZFS related information.  

Sources

[1] Allan Jude . Monitoring ZFShttps://freebsdfoundation.org/wp-content/uploads/2017/12/Monitoring-ZFS.pdf

[2] RoboNuggie . SmartmonTools on FreeBSD 14.0 : https://youtu.be/sIt-iFX9gss?si=02zkilNeZuzWhzIz

[3] FreeBSD Forum . ZFS Health and Status Monitoring : https://forums.freebsd.org/threads/zfs-health-and-status-monitoring.48376/

 

 

 

 

 

 

 

 

 

 

 

 

 

 

No comments:

Post a Comment