Issue
The Symptoms
- File copy failure occurs for large files, typically sized greater than 10 Gigabytes. It is not known to happen for small files.
- The problem happens mostly when the system is busy, i.e. the level of system loading is unusually high.
- As the Share Disk usage increases, there is a higher likelihood the problem will occur.
- The problem is unpredictable.
- When the problem happens, it is preceded by a significant drop in transfer speed. Then the file copying fails.
- Windows Explorer reports the error in the pop-up alert pictured here:
The Root Cause
This issue might be caused by a defect of the GFS2 file system. In a clustered file system, the file locking mechanism is used when managing multiple nodes simultaneously. However, the file lock has known issues, including the following:- listing entries in a folder while it is being written to
- large file writing and deletion
- large Share Disk (file system)capacity support
- Note that each Share Disk on the Vess R2000 has its own file system.
- memory resource allocation while writing large files (files larger than about 10 Gb)
2.3 Block Allocation Issues This section provides a summary of issues related to block allocation in GFS2 file systems. Even though applications that only write data typically do not care how or where a block is allocated, a little knowledge about how block allocation works can help you optimize performance. 2.3.1 Leave Free Space in the File System When a GFS2 file system is nearly full, the block allocator starts to have a difficult time finding space for new blocks to be allocated. As a result, blocks given out by the allocator tend to be squeezed into the end of a resource group or in tiny slices where file fragmentation is much more likely. This file fragmentation can cause performance problems. In addition, when a GFS2 is nearly full, the GFS2 block allocator spends more time searching through multiple resource groups, and that adds lock contention that would not necessarily be there on a file system that has ample free space. This also can cause performance problems.For these reasons, it is recommended that you not run a file system that is more than 85 percent full, although this figure may vary depending on workload. 2.3.3. Preallocate, If Possible If files are preallocated, block allocations can be avoided altogether and the file system can run more efficiently. Newer versions of GFS2 include the fallocate(1) system call, which you can use to preallocate blocks of data. |
Validation
In order to replicate the problem, create several large Share Disks and fill them to 80% or greater capacity. The rate of occurrence increases as capacity usage percentage increases.
1. ≒ 80%
size
usage
CP 30GB
CP 30GB
CP 30GB
CP 30GB
CP 30GB
CP 30GB
CP 30GB
8.0T
80%
1
1
1
8.0T
79%
1
1
1
1
1
7.0T
80%
1
1
1
1
1
1
23T
79%
1
1
1
1
1
1
23T
74%
X
X
X
2. ≒ 85%
size
usage
CP 30GB
CP 30GB
CP 30GB
CP 30GB
CP 30GB
CP 30GB
CP 30GB
8.0T
84%
1
1
1
8.0T
84%
1
1
1
1
1
7.0T
85%
1
1
1
1
1
1
23T
83%
1
1
1
1
1
23T
84%
X
X
X
X
3. ≒ 90%
size
usage
CP 30GB
CP 30GB
CP 30GB
CP 30GB
CP 30GB
CP 30GB
CP 30GB
8.0T
89%
X
X
X
8.0T
89%
1
1
1
1
1
7.0T
91%
1
1
1
1
1
1
23T
88%
1
1
1
1
1
23T
89%
X
X
X
X
4. ≒ 95%
size
usage
CP 30GB
CP 30GB
CP 30GB
CP 30GB
CP 30GB
CP 30GB
CP 30GB
8.0T
95%
X
1
X
8.0T
95%
X
X
1
X
X
X
7.0T
95%
1
1
1
1
1
1
1
23T
94%
X
X
1
X
X
23T
95%
X
X
X
- The rate of occurrence increases with the size of the file (over 10Gb)
- The rate of occurrence increases as available capacity decreases
Comparison Test on XFS
To help determine if the problem is caused by a defect in the GFS2 file system, the same test is run using the XFS file system. The results of the comparison test are listed in the tables below.1. ≒ 80%
size
usage
CP 30GB
CP 30GB
CP 30GB
CP 30GB
CP 30GB
CP 30GB
8.0T
79%
1
1
1
8.0T
79%
1
1
1
1
1
7.0T
79%
1
1
1
1
1
1
23T
78%
1
1
1
1
1
23T
78%
1
1
1
2. ≒ 85%
size
usage
CP 30GB
CP 30GB
CP 30GB
CP 30GB
CP 30GB
CP 30GB
8.0T
84%
1
1
1
8.0T
85%
1
1
1
1
1
7.0T
86%
1
1
1
1
1
1
23T
84%
1
1
1
1
1
23T
84%
1
1
1
3. ≒ 90%
size
usage
CP 30GB
CP 30GB
CP 30GB
CP 30GB
CP 30GB
CP 30GB
8.0T
90%
1
1
1
8.0T
90%
1
1
1
1
1
7.0T
90%
1
1
1
1
1
1
23T
90%
1
1
1
1
1
23T
89%
1
1
1
4. ≒ 95%
size
usage
CP 30GB
CP 30GB
CP 30GB
CP 30GB
CP 30GB
CP 30GB
8.0T
96%
1
1
1
8.0T
97%
1
1
1
1
1
7.0T
98%
1
1
1
1
1
1
23T
95%
1
1
1
1
1
23T
95%
1
1
1
Short Term Workaround
- Keep free space of at least 15% or more.
- Avoid copying more than 3 big (> 10Gb) files at the same time.
Preferred Solution
- Use XFS file system to replace GFS2 in next FW version (SR2.4).
- Migrate data from GFS2 to XFS file system if it is needed. Steps:
- Upgrade FW to SR2.4
- Create new Share Disk
- Configure Share Disk Clone from old Share Disk to new one
- Delete old Share Disk and get free capacity
- Repeat steps 2~4