MinFS

MinFS is a simple, unix-like filesystem built for Zircon.

It currently supports files up to 4 GB in size.

Using MinFS

Host Device (QEMU Only)

  • Create a disk image that stores MinFS

    # (Linux)
    $ truncate --size=16G blk.bin
    # (Mac)
    $ mkfile -n 16g blk.bin
    
  • Execute the run zircon script on your platform with the '--' to pass arguments directly to QEMU and then use '-hda' to point to the file. If you wish to attach additional devices, you can supply them with '-hdb', '-hdc, and so on.

    fx set bringup.x64
    fx build
    fx qemu -- -hda blk.bin
    

Target Device (QEMU and Real Hardware)

BE CAREFUL NOT TO FORMAT THE WRONG DEVICE. If in doubt, only run the following commands through QEMU. The lsblk command can be used to see more information about the devices accessible from Zircon.

  • Within zircon, lsblk can be used to list the block devices currently on the system. On this example system below, /dev/class/block/000 is a raw block device.

    > lsblk
    ID  DEV      DRV      SIZE TYPE           LABEL
    000 block    block     16G
    
  • Let's add a GPT to this block device.

    > gpt init /dev/class/block/000
    ...
    > lsblk
    ID  DEV      DRV      SIZE TYPE           LABEL
    002 block    block     16G
    
  • Now that we have a GPT on this device, let's check what we can do with it. (NOTE: after manipulating the gpt, the device number may change. Use lsblk to keep track of how to refer to the block device).

    > gpt dump /dev/class/block/002
    blocksize=512 blocks=33554432
    Partition table is valid
    GPT contains usable blocks from 34 to 33554398 (inclusive)
    Total: 0 partitions
    
  • gpt dump tells us some important info: it tells us (1) How big blocks are, and (2) which blocks we can actually use. Let's fill part of the disk with a MinFS filesystem.

    > gpt add 34 20000000 minfs /dev/class/block/002
    
  • Within Zircon, format the partition as MinFS. Using lsblk you should see a block device, which is the whole disk, and a slightly smaller device, which is the partition. In the above output, the partition is device 003, and would have the path /dev/class/block/003

    > mkfs <PARTITION_PATH> minfs
    
  • If you want the device to be mounted automatically on reboot, use the GPT tool to set its type. As we did above, you must use lsblk again to locate the entry for the disk. We want to edit the type of the zero-th partition. Here we use the keyword 'fuchsia-data' to set the type GUID, but if you wanted to use an arbitrary GUID you would supply it where 'fuchsia-data' is used.

    > gpt edit 0 type fuchsia-data <DEVICE_PATH>
    
  • On any future boots, the partition will be mounted automatically at /data.

  • If you don't want the partition to be mounted automatically, you can update the visibility (or GUID) of the partition, and simply mount it manually.

    > mount <PARTITION_PATH> /data
    
  • Any files written to /data (the mount point for this GUID) will persist across boots. To test this, try making a file on the new MinFS volume, rebooting, and observing it still exists.

    > touch /data/foobar
    > dm reboot
    > ls /data
    
  • To find out which block device/file system is mounted at each subdirectory under a given path, use the following command:

    > df <PATH>
    

Minfs operations

The following section describes what IOs are performed to complete a simple end user operation like read()/write().

Assumptions

  • No operation, read or write, is cached or batched. Each of these operations are like calling with sync and direct io set.
  • For rename: The destination file does not exist. Rename can delete a file if the destination of the rename operation is a valid file. This assumption keeps the math simple.
  • The "Write" operation issues a single data block write to a previously unaccessed portion of the vnode.
  • The "Overwrite" operation issues a single data block write to a portion of the block that has previously been allocated from an earlier "Write" operation.

Keys to the columns.

  1. OPERATION: The action requested by a client of the filesystem.
  2. BLOCK TYPE: Each fileystem operation results in accessing one or more types of blocks.
    • Data: Contains user data and directory entries.
    • Indirect: Indirect block in file block map tree
    • Dindirect: Double indirect block in file block map tree.
    • Inode table: Inode table block that holds one or more inodes.
    • Inode bitmap: Contains an array of bits each representing free/used state of inodes.
    • Data bitmap: Contains an array of bits each representing free/used state of data blocks.
    • Superblock: Contains data describing layout and state of the filesystem.
  3. IO TYPE: What type, read/write, of IO access it is.
  4. JOURNALED: Whether the IO will be journaled. Reads are not journaled but some of the writes are journaled.
  5. CONDITIONALLY ACCESSED: Depending on the OPERATION's input parameter and state of the filesystem, some blocks are conditionally accessed.
    • No: IO is always performed.
    • Yes: Filesystem state and input parameters decide whether this IO is needed or not.
  6. READ COUNT: Number of filesystem blocks read.
  7. WRITE COUNT (IGNORING JOURNAL): Number of filesystem blocks written. Writing to journal or journaling overhead are not counted towards this number.
  8. WRITE COUNT (WITH JOURNAL): Number of filesystem blocks written to journal and then to the final location. This does not include the blocks journal writes to maintain the journal state.

A row <operation> Total, like "Create Total", gives the total number of blocks read/written. For operations involving journaling, the journal writes two more blocks, journal entry header and commit block, per operation. The number under write count for Total is the sum of WRITE COUNT (WITH JOURNALING) and journaling overhead, which is 2 blocks per operation.

Superblock, Inode table, Inode bitmap, Data bitmap, and a part of Journal are cached in memory while starting(mount/fsck) the filesystem. So, Read IOs are never issued for those BLOCK TYPES.

OPERATION BLOCK TYPE IO TYPE JOURNALED CONDITIONALLY ACCESSED READ COUNT WRITE COUNT(IGNORING JOURNAL) WRITE COUNT(WITH JOURNAL) COMMENTS
Lookup/Open Data Read No No >=1 0 0 If the directory is large, multiple blocks are read.
Indirect Read No Yes >=0 0 0 Lookup can be served by direct blocks. So indirect is optional.
DIndirect Read No Yes >=0 0 0 Lookup can be served by direct blocks. So dindirect is optional.
Lookup/Open Total >=1 0 0
Create Data Read No No >=1 0 0 Create involves lookup first for name collisions.
Indirect Read No Yes >=0 0 0
DIndirect Read No Yes >=0 0 0
Data Write Yes No 0 >=1 >=2
Indirect Write Yes Yes 0 >=0 >=0
DIndirect Write Yes Yes 0 >=0 >=0
Inode table Write Yes No 0 1 2 Inode for the new file.
Inode bitmap Write Yes No 0 1 2 Mark inode as allocated.
Data bitmap Write Yes No 0 >=0 >=0 Directory may grow to contain new directory entry.
Superblock Write Yes No 0 1 2 Among other things, allocated inode number changes.
Create Total >=1 >=4 >=10 Includes 2 blocks for journal entry.
Rename Data Read No No >=1 0 0 Rename involves a lookup in source directory.
Indirect Read No Yes >=0 0 0
DIndirect Read No Yes >=0 0 0
Data Write Yes No 0 >=1 >=2 Source directory entry.
Indirect Write Yes Yes 0 >=0 >=0
DIndirect Write Yes Yes 0 >=0 >=0
Inode table Write Yes No 0 1 2 To update source directory inode.
Data Read No No >=0 0 0 Rename involves a lookup in source directory.
Indirect Read No Yes >=0 0 0
DIndirect Read No Yes >=0 0 0
Data Write Yes Yes 0 >=0 >=0 Writing destination directory entry.
Indirect Write Yes Yes 0 >=0 >=0
DIndirect Write Yes Yes 0 >=0 >=0
Inode table Write Yes Yes 0 1 2 To update destination directory inode.
Inode table Write Yes No 0 1 2 Renamed file’s mtime.
Data bitmap Write Yes No 0 >=0 >=0 In case we allocated data, indirect or Dindirect block(s).
Superblock Write Yes No 0 1 2
Rename Total >=1 >=5 >=12 Includes 2 blocks for journal entry.
Read Data Read No No >=1 0 0
Indirect Read No Yes >=0 0 0
DIndirect Read No Yes >=0 0 0
Read Total >=1 0 0
Write Indirect Read No Yes >=0 0 0 Even if the write is not overwriting, we may share (D)indirect block with existing data. Leading to read modify write.
DIndirect Read No Yes >=0 0 0
Data Write No No 0 1 1
Indirect Write Yes Yes 0 >=0 >=0
DIndirect Write Yes Yes 0 >=0 >=0
Inode table Write Yes No 0 1 2 Inode's mtime update.
Data bitmap Write Yes No 0 1 2 For the allocated block.
Superblock Write Yes No 0 1 2 Change in number of allocated blocks.
Write Total >=0 >=4 >=9 Includes 2 blocks for journal entry.
Overwrite Data Read No Yes >=0 0 0 Read modify write.
Indirect Read No Yes >=0 0 0
DIndirect Read No Yes >=0 0 0
Data Write No No 0 1 1
Indirect Write Yes Yes 0 >=0 >=0
DIndirect Write Yes Yes 0 >=0 >=0
Inode table Write Yes No 0 1 2
Data bitmap Write Yes No 0 1 2 Write new allocation.
Data bitmap Write Yes No 0 >=0 >=0 Free old block. This block bit may belong to allocated block bitmap.
Superblock Write Yes No 0 1 2
Overwrite Total >=0 >=4 >=9 Includes 2 blocks for journal entry.