In this blog post, we will focus on ZFS from a manageability perspective. We will cover following topics
- ZFS Storage Status Monitoring
- Physical Disk Monitoring
- ZFS Capacity Monitoring
- ZFS Performance Monitoring
- Periodic and E-mail notifications
ZFS Storage Status Monitoring
To check ZFS ZPOOL brief status, use command zpool status -x
root@bhyve01:~ #
zpool status -x
all pools are healthy
To check ZFS ZPOOL more verbose status, use command zpool status
root@bhyve01:~ #
zpool status
pool: OS-DATA state: ONLINE config: NAME STATE READ WRITE CKSUM OS-DATA ONLINE 0 0 0 da0 ONLINE 0 0 0 errors: No known data errors pool: STORAGE-DATA state: ONLINE config: NAME STATE READ WRITE CKSUM STORAGE-DATA ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 da2 ONLINE 0 0 0 da3 ONLINE 0 0 0 da4 ONLINE 0 0 0 da5 ONLINE 0 0 0 da6 ONLINE 0 0 0 da7 ONLINE 0 0 0 logs nda1p1 ONLINE 0 0 0 cache nda1p2 ONLINE 0 0 0 errors: No known data errors root@bhyve01:~ #
Physical Disk Monitoring
First of all, you should know all your disk devices. Disk devices can be liseted by command camcontrol devlist
root@bhyve01:~ #
camcontrol devlist
<SEAGATE ST9146852SS HT64> at scbus0 target 0 lun 0 (pass0,da0) <SEAGATE ST9146853SS YS09> at scbus0 target 1 lun 0 (pass1,da1) <ATA ST9500620NS AA0E> at scbus0 target 2 lun 0 (pass2,da2) <ATA ST9500620NS AA0E> at scbus0 target 3 lun 0 (pass3,da3) <ATA ST9500620NS AA0E> at scbus0 target 4 lun 0 (pass4,da4) <ATA ST9500620NS AA09> at scbus0 target 5 lun 0 (pass5,da5) <ATA ST9500620NS AA0E> at scbus0 target 6 lun 0 (pass6,da6) <ATA ST9500620NS AA09> at scbus0 target 7 lun 0 (pass7,da7) <INTEL SSDPEKNW512G8 002C> at scbus1 target 0 lun 1 (pass8,nda0) <KINGSTON SNVS1000GB S8442101> at scbus2 target 0 lun 1 (pass9,nda1) <SanDisk Ultra 1.00> at scbus3 target 0 lun 0 (pass10,da8) root@bhyve01:~ #
In this section, we’ll work with example commands and outputs, so it’s important to first understand the disk layout. The layout of the disks in one of my homelab servers is illustrated in the figure below.
![]() |
Disk Layout |
S.M.A.R.T
To monitor status of particular disks we should leverage S.M.A.R.T (Self-Monitoring, Analysis and Reporting Technology), which is an industry standard for monitoring physical disks. It can be installed by command pkg install smartmontools
SmartMonTools contains two applications
- smartctl - Control and Monitor Utility for SMART Disks
- smartd - SMART Disk Monitoring Daemon
SMART Disk Monitoring Daemon
SMART Disk Monitoring Daemon must be enabled by command sysrc smartd_enable="YES"
smartd must have configuration file, otherwise, you cannot start the service. We can simply use the default configuration by following copy command ... cp /usr/local/etc/smartd.conf.sample /usr/local/etc/smartd.conf ... and start the service.
root@bhyve01:~ #
cp /usr/local/etc/smartd.conf.sample /usr/local/etc/smartd.conf
root@bhyve01:~ #
service smartd start
Starting smartd. root@bhyve01:~ #
service smartd status
smartd is running as pid 4010. root@bhyve01:~ #
SMARTCTL
For manual status checking of particular disk, we can use utility smartctl.
Let's check /dev/da0, which is in my case 146 GB SAS disk used for ZPOOL OS-DATA.
root@bhyve01:~ #
smartctl -a /dev/da0
smartctl 7.5 2025-04-30 r5714 [FreeBSD 14.3-RELEASE amd64] (local build) Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Vendor: SEAGATE Product: ST9146852SS Revision: HT64 Compliance: SPC-3 User Capacity: 146,815,733,760 bytes [146 GB] Logical block size: 512 bytes Rotation Rate: 15000 rpm Form Factor: 2.5 inches Logical Unit id: 0x5000c500395f4c03 Serial number: 6TB1ADGK Device type: disk Transport protocol: SAS (SPL-4) Local Time is: Sun Aug 24 08:34:21 2025 UTC SMART support is: Available - device has SMART capability. SMART support is: Enabled Temperature Warning: Disabled or Not Supported === START OF READ SMART DATA SECTION === SMART Health Status: OK Current Drive Temperature: 28 C Drive Trip Temperature: 68 C Accumulated power on time, hours:minutes 8649:27 Elements in grown defect list: 9 Vendor (Seagate Cache) information Blocks sent to initiator = 1764496082 Blocks received from initiator = 3668687841 Blocks read from cache and sent to initiator = 701693283 Number of read and write commands whose size <= segment size = 3075002927 Number of read and write commands whose size > segment size = 37 Vendor (Seagate/Hitachi) factory information number of hours powered up = 8649.45 number of minutes until next internal SMART test = 21 Error counter log: Errors Corrected by Total Correction Gigabytes Total ECC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 2358881651 0 0 2358881651 2358881651 4445.103 0 write: 0 0 0 0 0 246574.111 0 verify: 1366420009 0 0 1366420009 1366420009 73660.356 0 Non-medium error count: 12 SMART Self-test log Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ] Description number (hours) # 1 Background long Completed 16 1 - [- - -] # 2 Background long Completed 16 0 - [- - -] # 3 Background short Completed 16 0 - [- - -] Long (extended) Self-test duration: 1680 seconds [28.0 minutes] root@bhyve01:~ #
Let's check /dev/nda0, which is NVMe disk used in my case for ZFS caches (SLOG/write-cache, L2ARC/read-cache). Note, that it is necessary to define type of disk with option -d nvme
root@bhyve01:~ #
smartctl -a -d nvme /dev/nvme0
smartctl 7.5 2025-04-30 r5714 [FreeBSD 14.3-RELEASE amd64] (local build) Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Number: INTEL SSDPEKNW512G8 Serial Number: PHNH94220BKH512A Firmware Version: 002C PCI Vendor/Subsystem ID: 0x8086 IEEE OUI Identifier: 0x5cd2e4 Controller ID: 1 NVMe Version: 1.3 Number of Namespaces: 1 Namespace 1 Size/Capacity: 512,110,190,592 [512 GB] Namespace 1 Formatted LBA Size: 512 Local Time is: Sun Aug 24 09:01:01 2025 UTC Firmware Updates (0x14): 2 Slots, no Reset required Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp Log Page Attributes (0x0f): S/H_per_NS Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Maximum Data Transfer Size: 32 Pages Warning Comp. Temp. Threshold: 77 Celsius Critical Comp. Temp. Threshold: 80 Celsius Supported Power States St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat 0 + 3.50W - - 0 0 0 0 0 0 1 + 2.70W - - 1 1 1 1 0 0 2 + 2.00W - - 2 2 2 2 0 0 3 - 0.0250W - - 3 3 3 3 5000 5000 4 - 0.0040W - - 4 4 4 4 5000 9000 Supported LBA Sizes (NSID 0x1) Id Fmt Data Metadt Rel_Perf 0 + 512 0 0 === START OF SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff) Critical Warning: 0x00 Temperature: 36 Celsius Available Spare: 100% Available Spare Threshold: 10% Percentage Used: 10% Data Units Read: 21,855,633 [11.1 TB] Data Units Written: 62,654,859 [32.0 TB] Host Read Commands: 783,452,025 Host Write Commands: 3,944,291,927 Controller Busy Time: 83,884 Power Cycles: 114 Power On Hours: 13,298 Unsafe Shutdowns: 36 Media and Data Integrity Errors: 0 Error Information Log Entries: 0 Warning Comp. Temperature Time: 0 Critical Comp. Temperature Time: 0 Thermal Temp. 1 Transition Count: 1007 Thermal Temp. 1 Total Time: 6850 Error Information (NVMe Log 0x01, 16 of 256 entries) No Errors Logged Self-test Log (NVMe Log 0x06, NSID 0xffffffff) Self-test status: No self-test in progress No Self-tests Logged root@bhyve01:~ #
Let's check /dev/da2, which is in my case 500 GB NL-SAS (SATA) disk used for ZPOOL STORAGE-DATA.
root@bhyve01:~ #
smartctl -a /dev/da2
smartctl 7.5 2025-04-30 r5714 [FreeBSD 14.3-RELEASE amd64] (local build) Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Seagate Constellation.2 (SATA) Device Model: ST9500620NS Serial Number: 9XF1XTW3 LU WWN Device Id: 5 000c50 04e891313 Add. Product Id: DELL(tm) Firmware Version: AA0E User Capacity: 500,107,862,016 bytes [500 GB] Sector Size: 512 bytes logical/physical Rotation Rate: 7200 rpm Form Factor: 2.5 inches Device is: In smartctl database 7.5/5706 ATA Version is: ATA8-ACS T13/1699-D revision 4 SATA Version is: SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s) Local Time is: Sun Aug 24 09:25:20 2025 UTC SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 90) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 102) minutes. Conveyance self-test routine recommended polling time: ( 3) minutes. SCT capabilities: (0x10bd) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 083 063 044 Pre-fail Always - 229836741 3 Spin_Up_Time 0x0003 096 096 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 258 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 091 060 030 Pre-fail Always - 1558590529 9 Power_On_Hours 0x0032 095 011 000 Old_age Always - 4916 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 255 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 077 061 045 Old_age Always - 23 (Min/Max 23/32) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 249 193 Load_Cycle_Count 0x0032 098 098 000 Old_age Always - 5851 194 Temperature_Celsius 0x0022 023 040 000 Old_age Always - 23 (0 7 0 0 0) 195 Hardware_ECC_Recovered 0x001a 119 099 000 Old_age Always - 229836741 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 82322 (191 120 0) 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 995382041 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 722691417 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 3 - # 2 Extended offline Completed without error 00% 3 - # 3 Short offline Completed without error 00% 1 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. The above only provides legacy SMART information - try 'smartctl -x' for more root@bhyve01:~ #
Let's check /dev/da8, which is in my case 16 GB USB disk used for UEFI boot loader and FreeBSD OS Root File System.
root@bhyve01:~ #
smartctl -a -d scsi -T permissive /dev/da8
smartctl 7.5 2025-04-30 r5714 [FreeBSD 14.3-RELEASE amd64] (local build) Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Vendor: SanDisk Product: Ultra Revision: 1.00 Compliance: SPC-4 User Capacity: 15,376,318,464 bytes [15.3 GB] Logical block size: 512 bytes Device type: disk Local Time is: Sun Aug 24 09:43:49 2025 UTC SMART support is: Available - device has SMART capability. SMART support is: Enabled Temperature Warning: Disabled or Not Supported === START OF READ SMART DATA SECTION === SMART Health Status: OK Current Drive Temperature: 0 C Drive Trip Temperature: 0 C Error Counter logging not supported Device does not support Self Test logging root@bhyve01:~ #
As you can see, there isn’t much information available for consumer-grade USB disks. We only get basic disk details and a simple binary SMART health status - either OK or Not OK.
ssd_report-smartmontools-0.4 SSD health report
You can leverage Prometheus exporter for centralized monitoring.
smartctl_exporter-0.14.0 Prometheus metrics exporter for smartctl
ZFS Capacity Monitoring
You can display ZFS pool capacity using the zpool get capacity command.
root@bhyve01:~ #
zpool get capacity
NAME PROPERTY VALUE SOURCE OS-DATA capacity 0% - STORAGE-DATA capacity 43% - root@bhyve01:~ #
To view the capacity of ZFS datasets and their mount points, use the zfs list command.
root@bhyve01:~ #
zfs list
NAME USED AVAIL REFER MOUNTPOINT OS-DATA 76.2M 132G 24K /OS-DATA OS-DATA/home 31.5K 132G 31.5K /home OS-DATA/tmp 24K 132G 24K /tmp OS-DATA/var 73.6M 132G 73.6M /var STORAGE-DATA 800G 997G 44.0K /STORAGE-DATA STORAGE-DATA/bhyve-datastore 800G 997G 800G /STORAGE-DATA/bhyve-datastore root@bhyve01:~ #
On the STORAGE-DATA ZFS pool, 800 GB is used and 997 GB is available. This means that the zpool get capacity output of 43% for the STORAGE-DATA dataset represents the used capacity.
ZFS Performance Monitoring
Storage performance can be measured on various measure points or layers. In next section we will cover
- ZPOOL Layer : zpool iostat monitors I/O statistics on ZPOOL layer
- ZFS Layer : zpool iostat monitors I/O statistics on ZPOOL layer
- GEOM (Physical Disk) Layer : GEOM (GEometric Overlay Manager) is the primary storage framework in FreeBSD, providing a modular and extensible way to manage disk I/O requests. gstat utility monitor I/O statistics on physical disk layer.
- TOP (Process) Layer : top is widely-used command-line utility in FreeBSD that provides a real-time, dynamic view of a running system from process point of view.
ZPOOL Layer (zpool iostat)
We can monitor ZFS performance with zpool iostat. This is the most essential tool for monitoring ZFS I/O performance. It reports real-time statistics for ZFS pools and devices. Below is example of zfs iostat monitoring with refresh every 5 seconds.
root@bhyve01:/STORAGE-DATA/bhyve-datastore #
zpool iostat 5
capacity operations bandwidth pool alloc free read write read write ------------ ----- ----- ----- ----- ----- ----- OS-DATA 74.7M 136G 0 0 58 2.81K STORAGE-DATA 1.02T 1.70T 4 208 140K 15.4M ------------ ----- ----- ----- ----- ----- ----- OS-DATA 74.9M 136G 0 14 0 45.1K STORAGE-DATA 1.03T 1.69T 0 2.31K 0 528M ------------ ----- ----- ----- ----- ----- ----- ^C root@bhyve01:/STORAGE-DATA/bhyve-datastore #
zpool iostat -v: Shows statistics for individual vdevs and the underlying physical disks. Below is example of zfs iostat -v monitoring with refresh every 5 seconds.
root@bhyve01:/STORAGE-DATA/bhyve-datastore #
zpool iostat -v 5
capacity operations bandwidth pool alloc free read write read write ------------ ----- ----- ----- ----- ----- ----- OS-DATA 74.8M 136G 0 0 58 2.83K da0 74.8M 136G 0 0 58 2.83K ------------ ----- ----- ----- ----- ----- ----- STORAGE-DATA 1.12T 1.60T 4 225 139K 16.6M raidz2-0 1.12T 1.60T 4 223 139K 16.3M da2 - - 0 37 22.2K 2.72M da3 - - 0 37 22.3K 2.72M da4 - - 0 36 25.6K 2.72M da5 - - 0 37 25.4K 2.72M da6 - - 0 37 21.9K 2.72M da7 - - 0 36 22.0K 2.72M logs - - - - - - nda1p1 128K 39.5G 0 2 5 282K cache - - - - - - nda1p2 457G 343G 14 48 1.76M 6.07M ------------ ----- ----- ----- ----- ----- ----- ^C root@bhyve01:/STORAGE-DATA/bhyve-datastore #
ZFS Layer (zfs-stats, zfs-mon)
zfs-stats
Various ZFS related internal statistics are published via the sysctl interface. We can leverage zfs-stats package which summarizes those statistics in a more human-readable way and logically grouping them together.
You must install zfs-stats package by following command ...
pkg install zfs-stats
... and use zfs-stats -A to show ZFS related internal statistics.
root@bhyve01:~ #
zfs-stats -A
------------------------------------------------------------------------ ZFS Subsystem Report Sun Aug 24 13:51:34 2025 ------------------------------------------------------------------------ ARC Summary: (HEALTHY) Memory Throttle Count: 0 ARC Misc: Deleted: 765.00 k Mutex Misses: 232 Evict Skips: 3 ARC Size: 89.99% 114.20 GiB Target Size: (Adaptive) 89.95% 114.15 GiB Min Size (Hard Limit): 3.15% 4.00 GiB Max Size (High Water): 31:1 126.90 GiB Compressed Data Size: 109.77 GiB Decompressed Data Size: 110.34 GiB Compression Factor: 1.01 ARC Size Breakdown: Recently Used Cache Size: 38.89% 44.41 GiB Frequently Used Cache Size: 61.11% 69.79 GiB ARC Hash Breakdown: Elements Max: 4.62 m Elements Current: 100.00% 4.62 m Collisions: 2.02 m Chain Max: 6 Chains: 532.27 k ------------------------------------------------------------------------ root@bhyve01:~ #
The output from zfs-stats -A provides a detailed report on the ZFS Adaptive Replacement Cache (ARC), which is ZFS's primary in-memory cache. The output shows the health, size, and efficiency of the ARC, which is crucial for ZFS performance.
Let's break down provided output ...
- ARC Summary
- ARC Summary: (HEALTHY):
- This line indicates that the ARC is operating normally and is not under memory pressure.
- Memory Throttle Count (0)
- This is the number of times the ARC had to reduce its size aggressively to free up memory for other processes. In other words, it is the number of times that the ZFS ARC has had to reduce its memory usage because of demands elsewhere in the system.
- A value of 0 is excellent and indicates no memory contention.
- In case of memory contention, you might consider setting the maximum size of the ARC (vfs.zfs.arc_max) to a value that makes ZFS coexist with your other workloads better.
- ARC Size and Performance
- ARC Size (114.20 GiB)
- This shows the current size of the ARC (114.20 GiB) and what percentage it is of its maximum possible size (89.99%).
- Based on these information, maximum ARC Size is 126.9 GB (114.2 / 89.99 * 100)
- Target Size
- The Adaptive target size is the ideal size the ARC is aiming for based on current memory usage. It shows the ARC is near its target (114.15 GiB), which is a sign of stability.
- Min/Max Size (126.9 GiB)
- These are the hard limits for the ARC's size. The ARC will not shrink below the Min Size (4 GiB) and will not grow beyond the Max Size (126.9 GiB). The ratio 31:1 is the default ratio of the maximum ARC size to the minimum ARC size.
- Compression and Cache Breakdown
- Compressed Data Size (109.77 GiB)
- The amount of data stored in the ARC after being compressed. This shows that ZFS is effectively compressing data before it is cached.
- Decompressed Data Size (110.34 GiB)
- The size of the data stored in the ARC if it were not compressed.
- Compression Factor (1.01)
- The ratio of the decompressed size to the compressed size (1.01). This value shows very little compression is occurring, which may be because the data is already in a non-compressible format (e.g., JPEG images, video files). A factor of 2.0 would indicate data is being compressed to half its original size.
- Recently Used Cache Size and Frequently Used Cache Size
- These metrics are key to the ARC's efficiency. They represent the two lists that form the ARC's L1 and L2 caches. The output shows a good balance, with the frequently used list containing more data (61.11%), which is expected for a well-tuned cache.
- ARC Hash Breakdown
- Elements Max (4.62 m) / Current (4.62 m)
- The number of entries in the ARC's hash table. The output shows the current number of elements is at its maximum, which is normal for a full cache.
- Collisions (2.02 m)
- The number of times a hash key points to a "bucket" that already contains an entry.
- Chain Max (6)
- The maximum length of a hash collision chain.
- Chains (532.27 k)
- The total number of chains. The values for collisions and chains are normal and do not indicate a performance issue.
In summary, the output above indicates the ZFS ARC is running very well on this system. It is healthy, operating at its target size, and effectively using a large portion of the available memory for caching data. The compression factor is low, but this is a function of the data itself, not a sign of a problem with the ARC.
zfs-mon
The sysutils/zfs-stats package also includes a second tool, zfs-mon, which looks at how a subset
of the kstats are changing over time. This can provide useful insight into how the requests are being bro-
ken down, and how the various caching layers in ZFS are being used. The stats break down the performance of the ARC, L2ARC, the filesystem prefetch, and the device prefetching code. It also breaks down data vs metadata operations. By default, ZFS limits the amount of cache available for metadata to 25% of the max ARC size. If the total storage capacity is very large—and most operations impact only the metadata of the files, not the content—increasing the amount of the ARC that can be used for metadata can actually increase performance, since otherwise the ARC may be 3/4s full of content that will not be referenced again before it is replaced with other content.
ZFS real-time cache activity monitor Seconds elapsed: 19 Cache hits and misses: 1s 10s 60s tot ARC hits: 13933 7076 16866 16866 ARC misses: 0 0 0 0 ARC demand data hits: 8932 5549 11277 11277 ARC demand data misses: 0 0 0 0 ARC demand metadata hits: 4999 1527 5588 5588 ARC demand metadata misses: 0 0 0 0 ARC prefetch data hits: 0 0 0 0 ARC prefetch data misses: 0 0 0 0 ARC prefetch metadata hits: 0 0 0 0 ARC prefetch metadata misses: 0 0 0 0 L2ARC hits: 0 0 0 0 L2ARC misses: 0 0 0 0 ZFETCH hits: 0 0 1 1 ZFETCH misses: 7433 4611 9378 9378 Cache efficiency percentage: 10s 60s tot ARC: 100.00 100.00 100.00 ARC demand data: 100.00 100.00 100.00 ARC demand metadata: 100.00 100.00 100.00 ARC prefetch data: 0.00 0.00 0.00 ARC prefetch metadata: 0.00 0.00 0.00 L2ARC: 0.00 0.00 0.00 ZFETCH: 0.00 0.01 0.01 ^C root@bhyve01:~ #
zfs-mon -a
The above output from zfs-mon provides a real-time snapshot of the ZFS Adaptive Replacement Cache (ARC) and L2ARC performance. The analysis shows that the ARC is performing with extremely high efficiency, while a significant number of ZFETCH misses indicate that ZFS is prefetching data that isn't being used. This is because I run synthetic benchmarks. Real storage workload normally leverage prefetching data more than synthetic workload.
Let's break down provided output ...
- Cache Hits and Misses
- This section is the core of the analysis, showing the number of I/O requests that are being served directly from cache (hits) versus those that require a read from the underlying storage (misses). The columns 1s, 10s, 60s, and tot represent the counts over the last 1, 10, and 60 seconds, and the total since zfs-mon started, respectively.
- ARC hits and misses
- The key takeaway here is that there are thousands of ARC hits and zero ARC misses. This is an excellent result that indicates the working data set is small enough to fit entirely within the system's RAM. All data requests are being served from the fast ARC cache, avoiding slow disk I/O.
- ARC demand data hits/misses
- This refers to data that was requested by a process. The output shows thousands of hits and zero misses, confirming that all user-requested data is being served from the ARC.
- ARC demand metadata hits/misses
- This is for filesystem metadata (e.g., file permissions, directory structures). The thousands of hits and zero misses show that the metadata is also fully cached, ensuring extremely fast file system operations.
- ARC prefetch hits/misses
- Prefetching is when ZFS proactively loads data into the cache that it anticipates a process will need. The prefetch hits are zero, while there are thousands of ZFETCH misses. This is the most significant observation in this output. It is caused by synthetic storage workload (fio benchmark).
- ZFETCH Hits and Misses
- ZFETCH misses
- ZFETCH is the prefetch mechanism in ZFS. The high number of ZFETCH misses (9378 in total) means that ZFS is prefetching data that is not being used. This can be inefficient as it consumes system resources (CPU and I/O) to load data that is never requested by an application.
- ZFETCH hits
- The single ZFETCH hit (1) suggests that only one of these thousands of prefetch operations was actually useful.
- Cache Efficiency Percentage
- This table confirms the findings from the hit/miss analysis.
- ARC, ARC demand data, ARC demand metadata
- The 100% efficiency across all these categories reinforces that the ARC is satisfying all data and metadata requests.
- ARC prefetch and L2ARC
- The 0% efficiency is a direct result of the lack of prefetch hits and L2ARC hits. The L2ARC, if it existed, would not be used because all data is being served from the primary ARC in RAM.
- ZFETCH
- The extremely low efficiency of 0.01% is the clearest indicator of the high number of useless prefetch misses.
In our example scenario (fio benchmark), the ZFS cache is performing exceptionally well for its primary function (demand hits), serving all requests from the fast in-memory ARC. However, there is a clear inefficiency in the prefetching mechanism (ZFETCH) where ZFS is pre-loading a lot of data that is never used. That's because we used synthetic storage workload (fio benchmark) and not real storage workload, which typically leverage prefetching mechanism.
ZFS KSTATS
ZFS presents an impressive number of stats and counters via the kstat interface. On FreeBSD, this is currently exposed via the kstats.zfs sysctl mibs.
One of the advantages of ZFS is the ARC (Adaptive Replacement Cache), which provides better cache
hit ratios than a standard LRU (Least Recently Used) cache.
Looking at the various stats about the ARC can provide insight into what is happening with a system.
- kstat.zfs.misc.arcstats.c_max
- The target maximum size of the ARC.
- kstat.zfs.misc.arcstats.c_min
- The target minimum size of the ARC. The ARC will not shrink below this size, although it can be adjusted with the vfs.zfs.arc_min sysctl.
- kstat.zfs.misc.arcstats.size
- The current size of the ARC; if this is less than the maximum, your system has either not had enough activity to fill the ARC, or memory pressure from other processes has caused the ARC to shrink.
- kstat.zfs.misc.arcstats.c
- The current target size of the ARC. If the current size of the ARC is less than this value, the ARC will try to grow.
- kstat.zfs.misc.arcstats.arc_meta_used
- The amount of the ARC used to store metadata rather than user data. If this value has reached vfs.zfs.arc_meta_limit (which defaults to 25% of vfs.zfs.arc_max), then consider raising or lowering the fraction of the ARC used for metadata.
Caching more metadata will increase the speed of directory scans and other operations, at the cost of
decreasing the amount of user data that can be cached.
Below is output of above stats from my system ...
root@bhyve01:~ #
sysctl kstat.zfs.misc.arcstats.c_max
kstat.zfs.misc.arcstats.c_max: 136263184384 root@bhyve01:~ #
sysctl kstat.zfs.misc.arcstats.c_min
kstat.zfs.misc.arcstats.c_min: 4291778944 root@bhyve01:~ #
sysctl kstat.zfs.misc.arcstats.size
kstat.zfs.misc.arcstats.size: 122459869072 root@bhyve01:~ #
sysctl kstat.zfs.misc.arcstats.c
kstat.zfs.misc.arcstats.c: 122493951746
root@bhyve01:~ #
sysctl kstat.zfs.misc.arcstats.arc_meta_used
kstat.zfs.misc.arcstats.arc_meta_used: 1771825944 root@bhyve01:~ #
Analysis of the above output
- kstat.zfs.misc.arcstats.c_max: 136,263,184,384 bytes (~126.9 GB)
- This is the maximum size the ARC is allowed to grow to. This value is derived from the system's total physical memory (RAM).
- kstat.zfs.misc.arcstats.c_min: 4,291,778,944 bytes (~4 GB)
- This is the minimum size the ARC will shrink to under memory pressure. ZFS will always keep at least this much RAM dedicated to the ARC.
- kstat.zfs.misc.arcstats.size: 122,459,869,072 bytes (~114 GB)
- This is the current size of the ARC. The fact that it's close to c_max indicates that the system has a large working set of data and metadata that fits comfortably within the cache, and there's no significant memory pressure forcing the ARC to shrink.
- kstat.zfs.misc.arcstats.c: 122,493,951,746 bytes (~114 GB)
- This is the target size of the ARC, which is the size the ARC is aiming for based on the current workload. As this value is very close to the current size, it confirms that the ARC is stable and operating efficiently at its target.
- kstat.zfs.misc.arcstats.arc_meta_used: 1,771,825,944 bytes (~1.65 GB)
- This is the amount of ARC memory being used to cache metadata (e.g., file names, permissions, and directory structures). The fact that over 1.6 GB is dedicated to metadata shows that a significant portion of the ARC is used for fast file system lookups, which is a key strength of ZFS.
In summary, the output confirms that your ARC is healthy, operating near its maximum capacity, and effectively caching a large amount of both data and metadata for optimal performance.
GEOM Layer
We can also monitor physical disks with utility gstat which print statistics about GEOM disks.
gstat -p -I 3s
dT: 3.026s w: 3.000s
L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
0 0 0 0 0.000 0 0 0.000 0.0 | nda0
0 2417 0 0 0.000 2417 308363 0.710 19.4| nda1
0 0 0 0 0.000 0 0 0.000 0.0| da0
0 0 0 0 0.000 0 0 0.000 0.0| da1
17 105 0 0 0.000 105 103148 118.9 100.4| da2
19 109 0 0 0.000 109 107737 134.1 100.5| da3
10 108 0 0 0.000 108 105611 176.7 99.9| da4
13 112 0 0 0.000 112 102640 165.6 99.7| da5
19 110 0 0 0.000 110 105136 164.5 100.2| da6
12 106 0 0 0.000 106 99394 179.9 99.8| da7
0 0 0 0 0.000 0 0 0.000 0.0| da8
In example output above, we can see that NL-SAS disks are handling ~100 IOPS and NVMe read-cache (nda1) is handling ~2400 IOPS (308 MB/s).
TOP
One of the fastest ways to figure out which application (process) is causing all of the I/O is to use top. On FreeBSD top has a -m flag to change the mode. In I/O mode, instead of tracking applications by CPU and memory usage, it tracks reads, writes, and other I/O operations. This can help to determine which application is consuming all of the I/O resources. Command top -m io can be used to see how many I/Os are coming from various processes. Option -o read allows you to specify a column to sort the process list by the amount of data they have read from the disk. Option -o write sorts the process list by the amount of data they have write to the disk.
Output from command top -m io -o read is depicted below.
last pid: 4139; load averages: 0.17, 0.35, 0.37 up 0+13:17:11 13:30:04 37 processes: 1 running, 36 sleeping CPU: 0.2% user, 0.0% nice, 7.8% system, 0.3% interrupt, 91.7% idle Mem: 4092K Active, 75M Inact, 29M Laundry, 120G Wired, 223M Buf, 4520M Free ARC: 114G Total, 66G MFU, 46G MRU, 1913M Anon, 585M Header, 13M Other 108G Compressed, 109G Uncompressed, 1.01:1 Ratio Swap: 130G Total, 130G Free PID USERNAME VCSW IVCSW READ WRITE FAULT TOTAL PERCENT COMMAND 4133 root 1425 12 1425 358 0 1783 12.21% fio 4135 root 1425 20 1425 426 0 1851 12.68% fio 4134 root 1424 17 1424 384 0 1808 12.38% fio 4131 root 1424 11 1424 386 0 1810 12.40% fio 4132 root 1424 14 1424 381 0 1805 12.36% fio 4130 root 1422 17 1422 415 0 1837 12.58% fio 4137 root 1421 23 1421 407 0 1828 12.52% fio 4136 root 1419 34 1418 462 0 1880 12.87% fio 2128 root 0 0 0 0 0 0 0.00% getty 2129 root 0 0 0 0 0 0 0.00% getty 2130 root 0 0 0 0 0 0 0.00% getty 2131 root 0 0 0 0 0 0 0.00% getty 2132 root 0 0 0 0 0 0 0.00% getty 2133 root 0 0 0 0 0 0 0.00% getty 2134 root 0 0 0 0 0 0 0.00% getty 2135 root 0 0 0 0 0 0 0.00% getty 3164 root 0 0 0 0 0 0 0.00% sshd-session 3167 dpasek 4 0 0 0 0 0 0.00% sshd-session 3168 dpasek 0 0 0 0 0 0 0.00% sh 3178 dpasek 0 0 0 0 0 0 0.00% su 3179 root 0 0 0 0 0 0 0.00% sh 3238 root 0 0 0 0 0 0 0.00% sshd-session 3241 dpasek 2 0 0 0 0 0 0.00% sshd-session 3242 dpasek 0 0 0 0 0 0 0.00% sh root@bhyve01:~ #
top -m io -o read
In the output above, we see 8 fio processes reading ~1420 I/O and writing ~400 I/O during the default top interval of 2 seconds. To calculate IOPS (I/O per second), simply divide these values by 2. Alternatively, if you want top to show IOPS directly, you can switch to a 1-second interval with the command: top -m io -s 1.
Periodic and E-mail notifications
In FreeBSD, periodic is a framework for running system maintenance scripts at different intervals (daily, weekly, and monthly). It is highly configurable through the /etc/periodic.conf file, which allows you to enable or disable specific scripts and customize their behavior, including sending notifications.
To use periodic for notifications, you typically configure two main things:
- Which scripts should send notifications.
- Where to send those notifications (e.g., to which email address).
Configure e-mail
Use DragonFly Mail Agent (DMA). DMA is a small Mail Transport Agent (MTA), designed for home and office use. It accepts mails from locally installed Mail User Agents (MUA) and delivers the mails either locally or to a remote destination. Remote delivery includes several features like TLS/SSL support and SMTP authentication. DMA is now part of FreeBSD base system.
To enable DMA please edit /etc/mail/mailer.conf to replace all lines referring to another MTA with the following:
sendmail /usr/local/libexec/dma
send-mail /usr/local/libexec/dma
mailq /usr/local/libexec/dma
Disable Sendmail in FreeBSD system
sysrc sendmail_enable="NONE"
sysrc dma_enable="YES"
If you want anything in your queue to be flushed at on boot or before
shutdown, add the following to rc.conf as well:
sysrc dma_flushq_enable="YES"
PORT 587
AUTHPATH /usr/local/etc/dma/auth.conf
STARTTLS
MASQUERADE david.pasek@uw.cz
After dma configuration, test e-mail sending.
Test e-mail sending
Send test e-mail from console ...
echo "Hello from FreeBSD dma!" | mail -s "Test DMA Gmail" you@example.com
Check /var/log/maillog if something fails.
The Basics of periodic.conf
The main configuration file is /etc/periodic.conf. You should not edit this file directly. Instead, create a new file named /etc/periodic.conf.local to store your custom settings. This ensures your changes are not overwritten during a system update.
Configure Email Notifications
The periodic system uses a mail server (like sendmail or postfix) to send reports. To receive these reports via email, you must configure the following variables in /etc/periodic.conf.local:
- daily_show_info: Set to "YES" to include informative messages in the daily report.
- daily_output: Set the email address where you want to receive the daily report.
- daily_show_badconfig: Set to "YES" to show warnings for bad configurations.
Here's an example for /etc/periodic.conf.local:
#
# Configure daily reports
#
daily_show_info="YES"
daily_output="root"
daily_show_badconfig="YES"
#
# Enable the security report
#
daily_security_output="root"
daily_security_show_info="YES"
daily_security_show_rc_info="YES"
#
# Configure weekly reports
#
weekly_show_info="YES"
weekly_output="root"
weekly_show_badconfig="YES"
#
# Configure monthly reports
#
monthly_show_info="YES"
monthly_output="root"
monthly_show_badconfig="YES"
All periodic e-mails goes to user root.
Enable a periodic Script
If you want daily security report, add configuration to /etc/periodic.conf.local
#
# Enable the security report
#
daily_security_output="root"
daily_security_show_info="YES"
daily_security_show_rc_info="YES"
What above configuration does?
It checks for system vulnerabilities and package updates.
- daily_security_output: Specifies the email address for the security report. This can be different from the main daily report.
- daily_security_show_info: Includes informational messages in the security report.
- daily_security_show_rc_info: Includes information about the system's rc configuration.
The smartd_periodic Script (for S.M.A.R.T. notifications)
The smartmontools package we installed previously includes a periodic script that can be used to report S.M.A.R.T. information. This script, named smartd_periodic, is an ideal way to get notifications about disk health.
Enable the script by creating or editing /etc/periodic.conf.local:
What above configuration does?
- daily_smartd_periodic_enable="YES": This is the critical line that activates the smartd_periodic script.
- daily_smartd_periodic_output: Where the report will be sent.
- daily_smartd_periodic_flags: These flags are passed to the smartctl command that the script runs.
- -H: Check the overall S.M.A.R.T. health status.
- -l error: Show the S.M.A.R.T. error log.
- -l selftest: Show the results of the S.M.A.R.T. self-tests.
- -a: Show all S.M.A.R.T. attributes.
We should specify which drives to check. The smartd_periodic script uses a file to determine which devices to check. By default, this is /etc/smartd.conf but the periodic script may use a different configuration.
You have two options. First option is to set
daily_smartd_periodic_disks="/dev/nda0 /dev/nda1"
or, you can simply point it to the default smartd.conf file
daily_smartd_periodic_conf="/usr/local/etc/smartd.conf"
Verification
To test your setup and ensure the reports are generated, you can manually run the periodic scripts by following command.
# Run the daily script manually
sh /etc/periodic/daily
This will run all the daily tasks and send the output to the configured email address. You can check your /var/mail/root file or the mailbox of the configured user.
Important: For email notifications to work, you must have a functioning mail server (sendmail is the default in the base system) and network connectivity.
Conclusion
FreeBSD's ZFS is very good volume manager and file system with enterprise storage features. You should use S.M.A.R.T to monitor physical disk health and ZFS tools (zpool, zfs) for ZFS related information.
Sources
[1] Allan Jude . Monitoring ZFS : https://freebsdfoundation.org/wp-content/uploads/2017/12/Monitoring-ZFS.pdf
[2] RoboNuggie . SmartmonTools on FreeBSD 14.0 : https://youtu.be/sIt-iFX9gss?si=02zkilNeZuzWhzIz
[3] FreeBSD Forum . ZFS Health and Status Monitoring : https://forums.freebsd.org/threads/zfs-health-and-status-monitoring.48376/
No comments:
Post a Comment