path: root/Documentation
diff options
authorLinus Torvalds <>2010-08-10 11:26:52 -0700
committerLinus Torvalds <>2010-08-10 11:26:52 -0700
commit5f248c9c251c60af3403902b26e08de43964ea0b (patch)
tree6d3328e72a7e4015a64017eb30be18095c6a3c64 /Documentation
parentf6cec0ae58c17522a7bc4e2f39dae19f199ab534 (diff)
parentdca332528bc69e05f67161e1ed59929633d5e63d (diff)
Merge branch 'for-linus' of git://
* 'for-linus' of git:// (96 commits) no need for list_for_each_entry_safe()/resetting with superblock list Fix sget() race with failing mount vfs: don't hold s_umount over close_bdev_exclusive() call sysv: do not mark superblock dirty on remount sysv: do not mark superblock dirty on mount btrfs: remove junk sb_dirt change BFS: clean up the superblock usage AFFS: wait for sb synchronization when needed AFFS: clean up dirty flag usage cifs: truncate fallout mbcache: fix shrinker function return value mbcache: Remove unused features add f_flags to struct statfs(64) pass a struct path to vfs_statfs update VFS documentation for method changes. All filesystems that need invalidate_inode_buffers() are doing that explicitly convert remaining ->clear_inode() to ->evict_inode() Make ->drop_inode() just return whether inode needs to be dropped fs/inode.c:clear_inode() is gone fs/inode.c:evict() doesn't care about delete vs. non-delete paths now ... Fix up trivial conflicts in fs/nilfs2/super.c
Diffstat (limited to 'Documentation')
2 files changed, 57 insertions, 10 deletions
diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 96d4293607ec..bbcc15651a21 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -92,8 +92,8 @@ prototypes:
void (*destroy_inode)(struct inode *);
void (*dirty_inode) (struct inode *);
int (*write_inode) (struct inode *, int);
- void (*drop_inode) (struct inode *);
- void (*delete_inode) (struct inode *);
+ int (*drop_inode) (struct inode *);
+ void (*evict_inode) (struct inode *);
void (*put_super) (struct super_block *);
void (*write_super) (struct super_block *);
int (*sync_fs)(struct super_block *sb, int wait);
@@ -101,14 +101,13 @@ prototypes:
int (*unfreeze_fs) (struct super_block *);
int (*statfs) (struct dentry *, struct kstatfs *);
int (*remount_fs) (struct super_block *, int *, char *);
- void (*clear_inode) (struct inode *);
void (*umount_begin) (struct super_block *);
int (*show_options)(struct seq_file *, struct vfsmount *);
ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t);
ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
locking rules:
- All may block.
+ All may block [not true, see below]
None have BKL
@@ -116,22 +115,25 @@ destroy_inode:
dirty_inode: (must not sleep)
drop_inode: !!!inode_lock!!!
put_super: write
write_super: read
sync_fs: read
freeze_fs: read
unfreeze_fs: read
-statfs: no
-remount_fs: maybe (see below)
+statfs: maybe(read) (see below)
+remount_fs: write
umount_begin: no
show_options: no (namespace_sem)
quota_read: no (see below)
quota_write: no (see below)
-->remount_fs() will have the s_umount exclusive lock if it's already mounted.
-When called from get_sb_single, it does NOT have the s_umount lock.
+->statfs() has s_umount (shared) when called by ustat(2) (native or
+compat), but that's an accident of bad API; s_umount is used to pin
+the superblock down when we only have dev_t given us by userland to
+identify the superblock. Everything else (statfs(), fstatfs(), etc.)
+doesn't hold it when calling ->statfs() - superblock is pinned down
+by resolving the pathname passed to syscall.
->quota_read() and ->quota_write() functions are both guaranteed to
be the only ones operating on the quota file by the quota code (via
dqio_sem) (unless an admin really wants to screw up something and
diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting
index a7e9746ee7ea..b12c89538680 100644
--- a/Documentation/filesystems/porting
+++ b/Documentation/filesystems/porting
@@ -273,3 +273,48 @@ it's safe to remove it. If you don't need it, remove it.
deliberate; as soon as struct block_device * is propagated in a reasonable
way by that code fixing will become trivial; until then nothing can be
+ block truncatation on error exit from ->write_begin, and ->direct_IO
+moved from generic methods (block_write_begin, cont_write_begin,
+nobh_write_begin, blockdev_direct_IO*) to callers. Take a look at
+ext2_write_failed and callers for an example.
+ ->truncate is going away. The whole truncate sequence needs to be
+implemented in ->setattr, which is now mandatory for filesystems
+implementing on-disk size changes. Start with a copy of the old inode_setattr
+and vmtruncate, and the reorder the vmtruncate + foofs_vmtruncate sequence to
+be in order of zeroing blocks using block_truncate_page or similar helpers,
+size update and on finally on-disk truncation which should not fail.
+inode_change_ok now includes the size checks for ATTR_SIZE and must be called
+in the beginning of ->setattr unconditionally.
+ ->clear_inode() and ->delete_inode() are gone; ->evict_inode() should
+be used instead. It gets called whenever the inode is evicted, whether it has
+remaining links or not. Caller does *not* evict the pagecache or inode-associated
+metadata buffers; getting rid of those is responsibility of method, as it had
+been for ->delete_inode().
+ ->drop_inode() returns int now; it's called on final iput() with inode_lock
+held and it returns true if filesystems wants the inode to be dropped. As before,
+generic_drop_inode() is still the default and it's been updated appropriately.
+generic_delete_inode() is also alive and it consists simply of return 1. Note that
+all actual eviction work is done by caller after ->drop_inode() returns.
+ clear_inode() is gone; use end_writeback() instead. As before, it must
+be called exactly once on each call of ->evict_inode() (as it used to be for
+each call of ->delete_inode()). Unlike before, if you are using inode-associated
+metadata buffers (i.e. mark_buffer_dirty_inode()), it's your responsibility to
+call invalidate_inode_buffers() before end_writeback().
+ No async writeback (and thus no calls of ->write_inode()) will happen
+after end_writeback() returns, so actions that should not overlap with ->write_inode()
+(e.g. freeing on-disk inode if i_nlink is 0) ought to be done after that call.
+ NOTE: checking i_nlink in the beginning of ->write_inode() and bailing out
+if it's zero is not *and* *never* *had* *been* enough. Final unlink() and iput()
+may happen while the inode is in the middle of ->write_inode(); e.g. if you blindly
+free the on-disk inode, you may end up doing that while ->write_inode() is writing
+to it.