NAME
9P
—
Simple Distributed File
System
DESCRIPTION
9P
is a protocol that implements a
distributed file systems. It provides primitives to manage (create, read,
write and delete) sets of files remotely. These files don't necessarily need
to be actually stored on a disk, they may be, for example, synthesise on
demand from external sources.
A client transmits requests (T-messages) to a server, which returns replies (R-messages) to the client. The combined acts of transmitting a request of a particular type and receiving a reply is called a transaction of that type.
Each message consists of a sequence of bytes mostly grouped in one, two or four integer fields transmitted in little-endian order (least significant byte first). Data items of larger or variable lengths are represented by a two-byte field specifying the length followed by the actual data. The only exception to this rule are QIDs, thirteen byte long objects, that are sent as-is.
Text strings are represented with a two-byte count and the sequence of UNICODE codepoints encoded in UTF-8. Text strings in 9p are not NUL-terminated. The NUL-terminator is illegal in all text strings and thus excluded from paths, user names and so on.
Fields are hereafter denoted as
type[1] tag[2] fid[4]
to indicate that type is one byte long, tag two and fid four. Strings are denoted as name[s] and are sent on the wire as
length[2] string[length]
A qid, described later, is a 13-byte value that is sent on the wire as
type[1] version[4] path[8]
MESSAGE STRUCTURE
Every message has a header with the following fields:
len[4] type[1] tag[2]
where len indicates the overall length of the message, including itself; type is one byte indicating the type of the message and the tag is a number choosen by the client that indicate uniquely the request. Then follows an optional body whose structure depends on the type of the message.
The message types are as follows: (the header is omitted for brevity)
version
- Negotiate the version and maximum message size.
msize[4] version[s] msize[4] version[s]
The
version
request must be the first message sent, and the client cannot issue further requests until receiving the Rversion reply. tag should beNOTAG
(-1 or 255). The client suggest a msize (the maximum size for packets) and the protocol version used, the server replies with a msize smaller or equal to the one proposed by the client. The version string must always begin with the two character “9P”. If the server don't understand the client required version, should reply with a Rversion using the version string “unknown” and not use a Rerror. attach
- Populate the namespace
fid[4] afid[4] uname[s] aname[s] qid[13]
The
attach
message binds the given fid to the root of the file tree identified by aname. uname identifies the user and afid specifies a fid previously established by an auth message, or the specialNOFID
value (defined as (u32int_t)~0) if the authentication is not required. clunk
- Close fids.
fid[4] ⟨empty response⟩
Once a fid has been clunked (closed) it becomes “free” and the same value can be used for subsequential
walk
orattach
requests.The actual file on the disk is not removed unless it was opened with the
ORCLOSE
flag. error
- Return an error string.
⟨no request⟩ ename[s]
The Rerror message is used to return an error string describing the failure of a request. The tag indicates the failed request.
Note that there isn't a
Terror
request for obvious reason and it's not possible for a server to reply to aTversion
orTflush
usingRerror
. flush
- Abort an ongoing operation.
oldtag[2] ⟨empty response⟩
Given the asynchronous nature of the protocol, the server may respond to the pending request before responding to the
Tflush
and is possible for a client to send multipleTflush
for the same operation. The client must wait to receive a correspondingRflush
before reusing oldtag for subsequent messages.If a response for oldtag is received before the
Rflush
reply, the client must assume that the operation was completed with success (fid allocated, files created, ...) If no response is received before theRflush
then the transaction is considered to have been successfully cancelled.Note that the tag of this request and the corresponding reply is NOT oldtag but a new tag value.
walk
- Traverse a file tree.
fid[4] newfid[4] nwname[2] nwname*(wname[s]) nwqid[2] nwqid*(qid[13])
The nwname components are walked in order starting from fid (which must point to a directory) and, if successful, newfid is associated to the reached file.
It is possible for fid and newfid to be equal, in this case the fid is “mutated”, otherwise newfid must be unused. As a special case, a walk of zero components duplicates the fid.
If the first element cannot be walked for any reason an
Rerror
is returned. Otherwise,Rwalk
is returned with a number of qids equal to the file viside by the walk. A client can thus detect a walk when that the replied nwqid number is not equal to the nwname field in the request. Only when walk return successfully newfid will be affected.A maximum of 16 component can be used per walk request.
open
- Prepare a fid for I/O.
fid[4] mode[1] qid[13] iounit[4]
mode determines the type of I/O:
- 0x00 (
OREAD
) - Open the file for reading.
- 0x01 (
OWRITE
) - Open the file for writing.
- 0x02 (
ORDWD
) - Open the file for both reading and writing.
- 0x03 (
OEXEC
) - Open for exec.
Additionally, the following flags can be or'ed to mode:
- 0x10 (
OTRUNC
) - Truncate the file before opening
- 0x40 (
ORCLOSE
) - Remove the file upon
clunk
.
The returned iounit is the optimal blocksize for I/O.
- 0x00 (
create
- Create a file
fid[4] name[s] perm[4] mode[1] qid[13] iounit[4]
The call attempts to create a file named name in the directory identified by fid according to perm and then to open it with mode into the given fid.
It is illegal to use an already opened fid or to attempt to create the “.” or “..” entries.
read
- Read data at offset
fid[4] offset[8] count[4] count[4] data[count]
fid must have been prepared for I/O with a previous
open
call. The returned count is zero when reaching end-of-file and may be lesser than what requested.Directories are a stream of stat structures, as described in
stat
, and for them the read request message must have offset equal to zero or the value of offset in the previous read on the directory plus the number of bytes returned in the previous read. Thus, is not possible to seek into directories except for rewinding. write
- Write data at offset
fid[4] offset[8] count[4] data[count] count[4]
fid must have been prepared for I/O with a previous
open
orcreate
call. The returned count is the amount of data actually written and may differ from the one in the request. stat
- Get file status
fid[4] stat[n]
The stat structure is made by the following fields:
- size[2]
- total byte count of the following data
- type[2]
- for kernel use
- dev[4]
- for kernel use
- qid[13]
- server unique identifier of the file
- mode[4]
- permissions and flags
- atime[4]
- last access time
- mtime[4]
- last modification time
- length[8]
- length of file in bytes
- name[s]
- file name (must be “/” if the file is the root directory of the server)
- uid[s]
- owner name
- gid[s]
- group name
- muid[s]
- name of the user who last modified the file.
Note that the size is always present, even in the
wstat
call. While it may be considered redundant, it's kept to simplify the parsing of the stat entries in a directory. wstat
- Change file attributes
fid[4] stat[n] ⟨empty response⟩
fid must have been prepared for writing with a previous
open
orcreate
call.The stat structure is the same described in
stat
.The stat structure sent reflect what changes the client wishes to make to the given fid. To leave some fields as unchanged, use empty string or the maximum allowed value for integral fields. For example, to avoid changing the permission of the fid use 0xFFFFFFFF, or (uint32_t)-1.
remove
- Remove and clunk fid
fid[4] ⟨empty response⟩
After a
remove
call, even if an error is returned, the fid is closed.