2020-03-04 16:55:40 +01:00
|
|
|
|
TAR(5) BSD File Formats Manual TAR(5)
|
|
|
|
|
|
|
|
|
|
NAME
|
2021-12-09 12:22:14 +01:00
|
|
|
|
tar — format of tape archive files
|
2020-03-04 16:55:40 +01:00
|
|
|
|
|
|
|
|
|
DESCRIPTION
|
|
|
|
|
The tar archive format collects any number of files, directories, and
|
|
|
|
|
other file system objects (symbolic links, device nodes, etc.) into a
|
|
|
|
|
single stream of bytes. The format was originally designed to be used
|
|
|
|
|
with tape drives that operate with fixed-size blocks, but is widely used
|
|
|
|
|
as a general packaging mechanism.
|
|
|
|
|
|
|
|
|
|
General Format
|
|
|
|
|
A tar archive consists of a series of 512-byte records. Each file system
|
|
|
|
|
object requires a header record which stores basic metadata (pathname,
|
|
|
|
|
owner, permissions, etc.) and zero or more records containing any file
|
2021-12-09 12:22:14 +01:00
|
|
|
|
data. The end of the archive is indicated by two records consisting en‐
|
|
|
|
|
tirely of zero bytes.
|
2020-03-04 16:55:40 +01:00
|
|
|
|
|
|
|
|
|
For compatibility with tape drives that use fixed block sizes, programs
|
|
|
|
|
that read or write tar files always read or write a fixed number of
|
2021-12-09 12:22:14 +01:00
|
|
|
|
records with each I/O operation. These “blocks” are always a multiple of
|
|
|
|
|
the record size. The maximum block size supported by early implementa‐
|
|
|
|
|
tions was 10240 bytes or 20 records. This is still the default for most
|
|
|
|
|
implementations although block sizes of 1MiB (2048 records) or larger are
|
|
|
|
|
commonly used with modern high-speed tape drives. (Note: the terms
|
|
|
|
|
“block” and “record” here are not entirely standard; this document fol‐
|
|
|
|
|
lows the convention established by John Gilmore in documenting pdtar.)
|
2020-03-04 16:55:40 +01:00
|
|
|
|
|
|
|
|
|
Old-Style Archive Format
|
|
|
|
|
The original tar archive format has been extended many times to include
|
|
|
|
|
additional information that various implementors found necessary. This
|
|
|
|
|
section describes the variant implemented by the tar command included in
|
|
|
|
|
Version 7 AT&T UNIX, which seems to be the earliest widely-used version
|
|
|
|
|
of the tar program.
|
|
|
|
|
|
|
|
|
|
The header record for an old-style tar archive consists of the following:
|
|
|
|
|
|
|
|
|
|
struct header_old_tar {
|
|
|
|
|
char name[100];
|
|
|
|
|
char mode[8];
|
|
|
|
|
char uid[8];
|
|
|
|
|
char gid[8];
|
|
|
|
|
char size[12];
|
|
|
|
|
char mtime[12];
|
|
|
|
|
char checksum[8];
|
|
|
|
|
char linkflag[1];
|
|
|
|
|
char linkname[100];
|
|
|
|
|
char pad[255];
|
|
|
|
|
};
|
|
|
|
|
All unused bytes in the header record are filled with nulls.
|
|
|
|
|
|
2021-12-09 12:22:14 +01:00
|
|
|
|
name Pathname, stored as a null-terminated string. Early tar imple‐
|
2020-03-04 16:55:40 +01:00
|
|
|
|
mentations only stored regular files (including hardlinks to
|
|
|
|
|
those files). One common early convention used a trailing "/"
|
2021-12-09 12:22:14 +01:00
|
|
|
|
character to indicate a directory name, allowing directory per‐
|
2020-03-04 16:55:40 +01:00
|
|
|
|
missions and owner information to be archived and restored.
|
|
|
|
|
|
|
|
|
|
mode File mode, stored as an octal number in ASCII.
|
|
|
|
|
|
|
|
|
|
uid, gid
|
|
|
|
|
User id and group id of owner, as octal numbers in ASCII.
|
|
|
|
|
|
|
|
|
|
size Size of file, as octal number in ASCII. For regular files only,
|
|
|
|
|
this indicates the amount of data that follows the header. In
|
|
|
|
|
particular, this field was ignored by early tar implementations
|
|
|
|
|
when extracting hardlinks. Modern writers should always store a
|
|
|
|
|
zero length for hardlink entries.
|
|
|
|
|
|
2021-12-09 12:22:14 +01:00
|
|
|
|
mtime Modification time of file, as an octal number in ASCII. This in‐
|
|
|
|
|
dicates the number of seconds since the start of the epoch,
|
2020-03-04 16:55:40 +01:00
|
|
|
|
00:00:00 UTC January 1, 1970. Note that negative values should
|
|
|
|
|
be avoided here, as they are handled inconsistently.
|
|
|
|
|
|
|
|
|
|
checksum
|
|
|
|
|
Header checksum, stored as an octal number in ASCII. To compute
|
|
|
|
|
the checksum, set the checksum field to all spaces, then sum all
|
|
|
|
|
bytes in the header using unsigned arithmetic. This field should
|
|
|
|
|
be stored as six octal digits followed by a null and a space
|
|
|
|
|
character. Note that many early implementations of tar used
|
2021-12-09 12:22:14 +01:00
|
|
|
|
signed arithmetic for the checksum field, which can cause inter‐
|
2020-03-04 16:55:40 +01:00
|
|
|
|
operability problems when transferring archives between systems.
|
|
|
|
|
Modern robust readers compute the checksum both ways and accept
|
|
|
|
|
the header if either computation matches.
|
|
|
|
|
|
|
|
|
|
linkflag, linkname
|
|
|
|
|
In order to preserve hardlinks and conserve tape, a file with
|
|
|
|
|
multiple links is only written to the archive the first time it
|
|
|
|
|
is encountered. The next time it is encountered, the linkflag is
|
2021-12-09 12:22:14 +01:00
|
|
|
|
set to an ASCII ‘1’ and the linkname field holds the first name
|
2020-03-04 16:55:40 +01:00
|
|
|
|
under which this file appears. (Note that regular files have a
|
|
|
|
|
null value in the linkflag field.)
|
|
|
|
|
|
|
|
|
|
Early tar implementations varied in how they terminated these fields.
|
|
|
|
|
The tar command in Version 7 AT&T UNIX used the following conventions
|
|
|
|
|
(this is also documented in early BSD manpages): the pathname must be
|
|
|
|
|
null-terminated; the mode, uid, and gid fields must end in a space and a
|
|
|
|
|
null byte; the size and mtime fields must end in a space; the checksum is
|
2021-12-09 12:22:14 +01:00
|
|
|
|
terminated by a null and a space. Early implementations filled the nu‐
|
|
|
|
|
meric fields with leading spaces. This seems to have been common prac‐
|
|
|
|
|
tice until the IEEE Std 1003.1-1988 (“POSIX.1”) standard was released.
|
2020-03-04 16:55:40 +01:00
|
|
|
|
For best portability, modern implementations should fill the numeric
|
|
|
|
|
fields with leading zeros.
|
|
|
|
|
|
|
|
|
|
Pre-POSIX Archives
|
2021-12-09 12:22:14 +01:00
|
|
|
|
An early draft of IEEE Std 1003.1-1988 (“POSIX.1”) served as the basis
|
2020-03-04 16:55:40 +01:00
|
|
|
|
for John Gilmore's pdtar program and many system implementations from the
|
|
|
|
|
late 1980s and early 1990s. These archives generally follow the POSIX
|
|
|
|
|
ustar format described below with the following variations:
|
2021-12-09 12:22:14 +01:00
|
|
|
|
• The magic value consists of the five characters “ustar” followed
|
|
|
|
|
by a space. The version field contains a space character fol‐
|
|
|
|
|
lowed by a null.
|
|
|
|
|
• The numeric fields are generally filled with leading spaces (not
|
2020-03-04 16:55:40 +01:00
|
|
|
|
leading zeros as recommended in the final standard).
|
2021-12-09 12:22:14 +01:00
|
|
|
|
• The prefix field is often not used, limiting pathnames to the 100
|
2020-03-04 16:55:40 +01:00
|
|
|
|
characters of old-style archives.
|
|
|
|
|
|
|
|
|
|
POSIX ustar Archives
|
2021-12-09 12:22:14 +01:00
|
|
|
|
IEEE Std 1003.1-1988 (“POSIX.1”) defined a standard tar file format to be
|
|
|
|
|
read and written by compliant implementations of tar(1). This format is
|
|
|
|
|
often called the “ustar” format, after the magic value used in the
|
|
|
|
|
header. (The name is an acronym for “Unix Standard TAR”.) It extends
|
2020-03-04 16:55:40 +01:00
|
|
|
|
the historic format with new fields:
|
|
|
|
|
|
|
|
|
|
struct header_posix_ustar {
|
|
|
|
|
char name[100];
|
|
|
|
|
char mode[8];
|
|
|
|
|
char uid[8];
|
|
|
|
|
char gid[8];
|
|
|
|
|
char size[12];
|
|
|
|
|
char mtime[12];
|
|
|
|
|
char checksum[8];
|
|
|
|
|
char typeflag[1];
|
|
|
|
|
char linkname[100];
|
|
|
|
|
char magic[6];
|
|
|
|
|
char version[2];
|
|
|
|
|
char uname[32];
|
|
|
|
|
char gname[32];
|
|
|
|
|
char devmajor[8];
|
|
|
|
|
char devminor[8];
|
|
|
|
|
char prefix[155];
|
|
|
|
|
char pad[12];
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
typeflag
|
|
|
|
|
Type of entry. POSIX extended the earlier linkflag field with
|
|
|
|
|
several new type values:
|
2021-12-09 12:22:14 +01:00
|
|
|
|
“0” Regular file. NUL should be treated as a synonym, for
|
2020-03-04 16:55:40 +01:00
|
|
|
|
compatibility purposes.
|
2021-12-09 12:22:14 +01:00
|
|
|
|
“1” Hard link.
|
|
|
|
|
“2” Symbolic link.
|
|
|
|
|
“3” Character device node.
|
|
|
|
|
“4” Block device node.
|
|
|
|
|
“5” Directory.
|
|
|
|
|
“6” FIFO node.
|
|
|
|
|
“7” Reserved.
|
|
|
|
|
Other A POSIX-compliant implementation must treat any unrecog‐
|
2020-03-04 16:55:40 +01:00
|
|
|
|
nized typeflag value as a regular file. In particular,
|
2021-12-09 12:22:14 +01:00
|
|
|
|
writers should ensure that all entries have a valid file‐
|
2020-03-04 16:55:40 +01:00
|
|
|
|
name so that they can be restored by readers that do not
|
|
|
|
|
support the corresponding extension. Uppercase letters
|
|
|
|
|
"A" through "Z" are reserved for custom extensions. Note
|
|
|
|
|
that sockets and whiteout entries are not archivable.
|
2021-12-09 12:22:14 +01:00
|
|
|
|
It is worth noting that the size field, in particular, has dif‐
|
2020-03-04 16:55:40 +01:00
|
|
|
|
ferent meanings depending on the type. For regular files, of
|
|
|
|
|
course, it indicates the amount of data following the header.
|
|
|
|
|
For directories, it may be used to indicate the total size of all
|
2021-12-09 12:22:14 +01:00
|
|
|
|
files in the directory, for use by operating systems that pre-al‐
|
|
|
|
|
locate directory space. For all other types, it should be set to
|
|
|
|
|
zero by writers and ignored by readers.
|
2020-03-04 16:55:40 +01:00
|
|
|
|
|
2021-12-09 12:22:14 +01:00
|
|
|
|
magic Contains the magic value “ustar” followed by a NUL byte to indi‐
|
|
|
|
|
cate that this is a POSIX standard archive. Full compliance re‐
|
|
|
|
|
quires the uname and gname fields be properly set.
|
2020-03-04 16:55:40 +01:00
|
|
|
|
|
|
|
|
|
version
|
2021-12-09 12:22:14 +01:00
|
|
|
|
Version. This should be “00” (two copies of the ASCII digit
|
2020-03-04 16:55:40 +01:00
|
|
|
|
zero) for POSIX standard archives.
|
|
|
|
|
|
|
|
|
|
uname, gname
|
|
|
|
|
User and group names, as null-terminated ASCII strings. These
|
|
|
|
|
should be used in preference to the uid/gid values when they are
|
|
|
|
|
set and the corresponding names exist on the system.
|
|
|
|
|
|
|
|
|
|
devmajor, devminor
|
2021-12-09 12:22:14 +01:00
|
|
|
|
Major and minor numbers for character device or block device en‐
|
|
|
|
|
try.
|
2020-03-04 16:55:40 +01:00
|
|
|
|
|
|
|
|
|
name, prefix
|
|
|
|
|
If the pathname is too long to fit in the 100 bytes provided by
|
|
|
|
|
the standard format, it can be split at any / character with the
|
|
|
|
|
first portion going into the prefix field. If the prefix field
|
|
|
|
|
is not empty, the reader will prepend the prefix value and a /
|
|
|
|
|
character to the regular name field to obtain the full pathname.
|
|
|
|
|
The standard does not require a trailing / character on directory
|
2021-12-09 12:22:14 +01:00
|
|
|
|
names, though most implementations still include this for compat‐
|
2020-03-04 16:55:40 +01:00
|
|
|
|
ibility reasons.
|
|
|
|
|
|
|
|
|
|
Note that all unused bytes must be set to NUL.
|
|
|
|
|
|
2021-12-09 12:22:14 +01:00
|
|
|
|
Field termination is specified slightly differently by POSIX than by pre‐
|
2020-03-04 16:55:40 +01:00
|
|
|
|
vious implementations. The magic, uname, and gname fields must have a
|
|
|
|
|
trailing NUL. The pathname, linkname, and prefix fields must have a
|
|
|
|
|
trailing NUL unless they fill the entire field. (In particular, it is
|
|
|
|
|
possible to store a 256-character pathname if it happens to have a / as
|
|
|
|
|
the 156th character.) POSIX requires numeric fields to be zero-padded in
|
|
|
|
|
the front, and requires them to be terminated with either space or NUL
|
|
|
|
|
characters.
|
|
|
|
|
|
2021-12-09 12:22:14 +01:00
|
|
|
|
Currently, most tar implementations comply with the ustar format, occa‐
|
2020-03-04 16:55:40 +01:00
|
|
|
|
sionally extending it by adding new fields to the blank area at the end
|
|
|
|
|
of the header record.
|
|
|
|
|
|
|
|
|
|
Numeric Extensions
|
|
|
|
|
There have been several attempts to extend the range of sizes or times
|
|
|
|
|
supported by modifying how numbers are stored in the header.
|
|
|
|
|
|
|
|
|
|
One obvious extension to increase the size of files is to eliminate the
|
|
|
|
|
terminating characters from the various numeric fields. For example, the
|
|
|
|
|
standard only allows the size field to contain 11 octal digits, reserving
|
|
|
|
|
the twelfth byte for a trailing NUL character. Allowing 12 octal digits
|
|
|
|
|
allows file sizes up to 64 GB.
|
|
|
|
|
|
2021-12-09 12:22:14 +01:00
|
|
|
|
Another extension, utilized by GNU tar, star, and other newer tar imple‐
|
2020-03-04 16:55:40 +01:00
|
|
|
|
mentations, permits binary numbers in the standard numeric fields. This
|
|
|
|
|
is flagged by setting the high bit of the first byte. The remainder of
|
|
|
|
|
the field is treated as a signed twos-complement value. This permits
|
|
|
|
|
95-bit values for the length and time fields and 63-bit values for the
|
|
|
|
|
uid, gid, and device numbers. In particular, this provides a consistent
|
|
|
|
|
way to handle negative time values. GNU tar supports this extension for
|
2021-12-09 12:22:14 +01:00
|
|
|
|
the length, mtime, ctime, and atime fields. Joerg Schilling's star pro‐
|
2020-03-04 16:55:40 +01:00
|
|
|
|
gram and the libarchive library support this extension for all numeric
|
|
|
|
|
fields. Note that this extension is largely obsoleted by the extended
|
|
|
|
|
attribute record provided by the pax interchange format.
|
|
|
|
|
|
|
|
|
|
Another early GNU extension allowed base-64 values rather than octal.
|
2021-12-09 12:22:14 +01:00
|
|
|
|
This extension was short-lived and is no longer supported by any imple‐
|
2020-03-04 16:55:40 +01:00
|
|
|
|
mentation.
|
|
|
|
|
|
|
|
|
|
Pax Interchange Format
|
|
|
|
|
There are many attributes that cannot be portably stored in a POSIX ustar
|
2021-12-09 12:22:14 +01:00
|
|
|
|
archive. IEEE Std 1003.1-2001 (“POSIX.1”) defined a “pax interchange
|
|
|
|
|
format” that uses two new types of entries to hold text-formatted meta‐
|
|
|
|
|
data that applies to following entries. Note that a pax interchange for‐
|
2020-03-04 16:55:40 +01:00
|
|
|
|
mat archive is a ustar archive in every respect. The new data is stored
|
2021-12-09 12:22:14 +01:00
|
|
|
|
in ustar-compatible archive entries that use the “x” or “g” typeflag. In
|
|
|
|
|
particular, older implementations that do not fully support these exten‐
|
|
|
|
|
sions will extract the metadata into regular files, where the metadata
|
|
|
|
|
can be examined as necessary.
|
|
|
|
|
|
|
|
|
|
An entry in a pax interchange format archive consists of one or two stan‐
|
|
|
|
|
dard ustar entries, each with its own header and data. The first op‐
|
|
|
|
|
tional entry stores the extended attributes for the following entry.
|
|
|
|
|
This optional first entry has an "x" typeflag and a size field that indi‐
|
2020-03-04 16:55:40 +01:00
|
|
|
|
cates the total size of the extended attributes. The extended attributes
|
|
|
|
|
themselves are stored as a series of text-format lines encoded in the
|
|
|
|
|
portable UTF-8 encoding. Each line consists of a decimal number, a
|
|
|
|
|
space, a key string, an equals sign, a value string, and a new line. The
|
|
|
|
|
decimal number indicates the length of the entire line, including the
|
|
|
|
|
initial length field and the trailing newline. An example of such a
|
|
|
|
|
field is:
|
|
|
|
|
25 ctime=1084839148.1212\n
|
|
|
|
|
Keys in all lowercase are standard keys. Vendors can add their own keys
|
|
|
|
|
by prefixing them with an all uppercase vendor name and a period. Note
|
2021-12-09 12:22:14 +01:00
|
|
|
|
that, unlike the historic header, numeric values are stored using deci‐
|
2020-03-04 16:55:40 +01:00
|
|
|
|
mal, not octal. A description of some common keys follows:
|
|
|
|
|
|
|
|
|
|
atime, ctime, mtime
|
|
|
|
|
File access, inode change, and modification times. These fields
|
|
|
|
|
can be negative or include a decimal point and a fractional
|
|
|
|
|
value.
|
|
|
|
|
|
|
|
|
|
hdrcharset
|
|
|
|
|
The character set used by the pax extension values. By default,
|
|
|
|
|
all textual values in the pax extended attributes are assumed to
|
|
|
|
|
be in UTF-8, including pathnames, user names, and group names.
|
|
|
|
|
In some cases, it is not possible to translate local conventions
|
|
|
|
|
into UTF-8. If this key is present and the value is the six-
|
2021-12-09 12:22:14 +01:00
|
|
|
|
character ASCII string “BINARY”, then all textual values are as‐
|
|
|
|
|
sumed to be in a platform-dependent multi-byte encoding. Note
|
|
|
|
|
that there are only two valid values for this key: “BINARY” or
|
|
|
|
|
“ISO-IR 10646 2000 UTF-8”. No other values are permitted by the
|
|
|
|
|
standard, and the latter value should generally not be used as it
|
|
|
|
|
is the default when this key is not specified. In particular,
|
|
|
|
|
this flag should not be used as a general mechanism to allow
|
2020-03-04 16:55:40 +01:00
|
|
|
|
filenames to be stored in arbitrary encodings.
|
|
|
|
|
|
|
|
|
|
uname, uid, gname, gid
|
|
|
|
|
User name, group name, and numeric UID and GID values. The user
|
|
|
|
|
name and group name stored here are encoded in UTF8 and can thus
|
|
|
|
|
include non-ASCII characters. The UID and GID fields can be of
|
|
|
|
|
arbitrary length.
|
|
|
|
|
|
|
|
|
|
linkpath
|
|
|
|
|
The full path of the linked-to file. Note that this is encoded
|
|
|
|
|
in UTF8 and can thus include non-ASCII characters.
|
|
|
|
|
|
|
|
|
|
path The full pathname of the entry. Note that this is encoded in
|
|
|
|
|
UTF8 and can thus include non-ASCII characters.
|
|
|
|
|
|
|
|
|
|
realtime.*, security.*
|
2021-12-09 12:22:14 +01:00
|
|
|
|
These keys are reserved and may be used for future standardiza‐
|
2020-03-04 16:55:40 +01:00
|
|
|
|
tion.
|
|
|
|
|
|
|
|
|
|
size The size of the file. Note that there is no length limit on this
|
|
|
|
|
field, allowing conforming archives to store files much larger
|
|
|
|
|
than the historic 8GB limit.
|
|
|
|
|
|
|
|
|
|
SCHILY.*
|
2021-12-09 12:22:14 +01:00
|
|
|
|
Vendor-specific attributes used by Joerg Schilling's star imple‐
|
2020-03-04 16:55:40 +01:00
|
|
|
|
mentation.
|
|
|
|
|
|
|
|
|
|
SCHILY.acl.access, SCHILY.acl.default, SCHILY.acl.ace
|
|
|
|
|
Stores the access, default and NFSv4 ACLs as textual strings in a
|
|
|
|
|
format that is an extension of the format specified by POSIX.1e
|
|
|
|
|
draft 17. In particular, each user or group access specification
|
|
|
|
|
can include an additional colon-separated field with the numeric
|
|
|
|
|
UID or GID. This allows ACLs to be restored on systems that may
|
|
|
|
|
not have complete user or group information available (such as
|
|
|
|
|
when NIS/YP or LDAP services are temporarily unavailable).
|
|
|
|
|
|
|
|
|
|
SCHILY.devminor, SCHILY.devmajor
|
|
|
|
|
The full minor and major numbers for device nodes.
|
|
|
|
|
|
|
|
|
|
SCHILY.fflags
|
|
|
|
|
The file flags.
|
|
|
|
|
|
|
|
|
|
SCHILY.realsize
|
|
|
|
|
The full size of the file on disk. XXX explain? XXX
|
|
|
|
|
|
|
|
|
|
SCHILY.dev, SCHILY.ino, SCHILY.nlinks
|
|
|
|
|
The device number, inode number, and link count for the entry.
|
|
|
|
|
In particular, note that a pax interchange format archive using
|
|
|
|
|
Joerg Schilling's SCHILY.* extensions can store all of the data
|
|
|
|
|
from struct stat.
|
|
|
|
|
|
|
|
|
|
LIBARCHIVE.*
|
|
|
|
|
Vendor-specific attributes used by the libarchive library and
|
|
|
|
|
programs that use it.
|
|
|
|
|
|
|
|
|
|
LIBARCHIVE.creationtime
|
|
|
|
|
The time when the file was created. (This should not be confused
|
2021-12-09 12:22:14 +01:00
|
|
|
|
with the POSIX “ctime” attribute, which refers to the time when
|
2020-03-04 16:55:40 +01:00
|
|
|
|
the file metadata was last changed.)
|
|
|
|
|
|
|
|
|
|
LIBARCHIVE.xattr.namespace.key
|
|
|
|
|
Libarchive stores POSIX.1e-style extended attributes using keys
|
2021-12-09 12:22:14 +01:00
|
|
|
|
of this form. The key value is URL-encoded: All non-ASCII char‐
|
|
|
|
|
acters and the two special characters “=” and “%” are encoded as
|
|
|
|
|
“%” followed by two uppercase hexadecimal digits. The value of
|
|
|
|
|
this key is the extended attribute value encoded in base 64. XXX
|
|
|
|
|
Detail the base-64 format here XXX
|
2020-03-04 16:55:40 +01:00
|
|
|
|
|
|
|
|
|
VENDOR.*
|
|
|
|
|
XXX document other vendor-specific extensions XXX
|
|
|
|
|
|
|
|
|
|
Any values stored in an extended attribute override the corresponding
|
2021-12-09 12:22:14 +01:00
|
|
|
|
values in the regular tar header. Note that compliant readers should ig‐
|
|
|
|
|
nore the regular fields when they are overridden. This is important, as
|
|
|
|
|
existing archivers are known to store non-compliant values in the stan‐
|
|
|
|
|
dard header fields in this situation. There are no limits on length for
|
|
|
|
|
any of these fields. In particular, numeric fields can be arbitrarily
|
|
|
|
|
large. All text fields are encoded in UTF8. Compliant writers should
|
|
|
|
|
store only portable 7-bit ASCII characters in the standard ustar header
|
|
|
|
|
and use extended attributes whenever a text value contains non-ASCII
|
|
|
|
|
characters.
|
2020-03-04 16:55:40 +01:00
|
|
|
|
|
|
|
|
|
In addition to the x entry described above, the pax interchange format
|
2021-12-09 12:22:14 +01:00
|
|
|
|
also supports a g entry. The g entry is identical in format, but speci‐
|
|
|
|
|
fies attributes that serve as defaults for all subsequent archive en‐
|
|
|
|
|
tries. The g entry is not widely used.
|
2020-03-04 16:55:40 +01:00
|
|
|
|
|
|
|
|
|
Besides the new x and g entries, the pax interchange format has a few
|
|
|
|
|
other minor variations from the earlier ustar format. The most troubling
|
|
|
|
|
one is that hardlinks are permitted to have data following them. This
|
|
|
|
|
allows readers to restore any hardlink to a file without having to rewind
|
|
|
|
|
the archive to find an earlier entry. However, it creates complications
|
|
|
|
|
for robust readers, as it is no longer clear whether or not they should
|
|
|
|
|
ignore the size field for hardlink entries.
|
|
|
|
|
|
|
|
|
|
GNU Tar Archives
|
2021-12-09 12:22:14 +01:00
|
|
|
|
The GNU tar program started with a pre-POSIX format similar to that de‐
|
|
|
|
|
scribed earlier and has extended it using several different mechanisms:
|
2020-03-04 16:55:40 +01:00
|
|
|
|
It added new fields to the empty space in the header (some of which was
|
|
|
|
|
later used by POSIX for conflicting purposes); it allowed the header to
|
2021-12-09 12:22:14 +01:00
|
|
|
|
be continued over multiple records; and it defined new entries that mod‐
|
2020-03-04 16:55:40 +01:00
|
|
|
|
ify following entries (similar in principle to the x entry described
|
|
|
|
|
above, but each GNU special entry is single-purpose, unlike the general-
|
2021-12-09 12:22:14 +01:00
|
|
|
|
purpose x entry). As a result, GNU tar archives are not POSIX compati‐
|
|
|
|
|
ble, although more lenient POSIX-compliant readers can successfully ex‐
|
|
|
|
|
tract most GNU tar archives.
|
2020-03-04 16:55:40 +01:00
|
|
|
|
|
|
|
|
|
struct header_gnu_tar {
|
|
|
|
|
char name[100];
|
|
|
|
|
char mode[8];
|
|
|
|
|
char uid[8];
|
|
|
|
|
char gid[8];
|
|
|
|
|
char size[12];
|
|
|
|
|
char mtime[12];
|
|
|
|
|
char checksum[8];
|
|
|
|
|
char typeflag[1];
|
|
|
|
|
char linkname[100];
|
|
|
|
|
char magic[6];
|
|
|
|
|
char version[2];
|
|
|
|
|
char uname[32];
|
|
|
|
|
char gname[32];
|
|
|
|
|
char devmajor[8];
|
|
|
|
|
char devminor[8];
|
|
|
|
|
char atime[12];
|
|
|
|
|
char ctime[12];
|
|
|
|
|
char offset[12];
|
|
|
|
|
char longnames[4];
|
|
|
|
|
char unused[1];
|
|
|
|
|
struct {
|
|
|
|
|
char offset[12];
|
|
|
|
|
char numbytes[12];
|
|
|
|
|
} sparse[4];
|
|
|
|
|
char isextended[1];
|
|
|
|
|
char realsize[12];
|
|
|
|
|
char pad[17];
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
typeflag
|
|
|
|
|
GNU tar uses the following special entry types, in addition to
|
|
|
|
|
those defined by POSIX:
|
|
|
|
|
|
|
|
|
|
7 GNU tar treats type "7" records identically to type "0"
|
|
|
|
|
records, except on one obscure RTOS where they are used
|
|
|
|
|
to indicate the pre-allocation of a contiguous file on
|
|
|
|
|
disk.
|
|
|
|
|
|
2021-12-09 12:22:14 +01:00
|
|
|
|
D This indicates a directory entry. Unlike the POSIX-stan‐
|
2020-03-04 16:55:40 +01:00
|
|
|
|
dard "5" typeflag, the header is followed by data records
|
|
|
|
|
listing the names of files in this directory. Each name
|
|
|
|
|
is preceded by an ASCII "Y" if the file is stored in this
|
|
|
|
|
archive or "N" if the file is not stored in this archive.
|
|
|
|
|
Each name is terminated with a null, and an extra null
|
2021-12-09 12:22:14 +01:00
|
|
|
|
marks the end of the name list. The purpose of this en‐
|
|
|
|
|
try is to support incremental backups; a program restor‐
|
|
|
|
|
ing from such an archive may wish to delete files on disk
|
|
|
|
|
that did not exist in the directory when the archive was
|
|
|
|
|
made.
|
2020-03-04 16:55:40 +01:00
|
|
|
|
|
|
|
|
|
Note that the "D" typeflag specifically violates POSIX,
|
|
|
|
|
which requires that unrecognized typeflags be restored as
|
|
|
|
|
normal files. In this case, restoring the "D" entry as a
|
|
|
|
|
file could interfere with subsequent creation of the
|
|
|
|
|
like-named directory.
|
|
|
|
|
|
2021-12-09 12:22:14 +01:00
|
|
|
|
K The data for this entry is a long linkname for the fol‐
|
2020-03-04 16:55:40 +01:00
|
|
|
|
lowing regular entry.
|
|
|
|
|
|
2021-12-09 12:22:14 +01:00
|
|
|
|
L The data for this entry is a long pathname for the fol‐
|
2020-03-04 16:55:40 +01:00
|
|
|
|
lowing regular entry.
|
|
|
|
|
|
|
|
|
|
M This is a continuation of the last file on the previous
|
|
|
|
|
volume. GNU multi-volume archives guarantee that each
|
|
|
|
|
volume begins with a valid entry header. To ensure this,
|
|
|
|
|
a file may be split, with part stored at the end of one
|
2021-12-09 12:22:14 +01:00
|
|
|
|
volume, and part stored at the beginning of the next vol‐
|
|
|
|
|
ume. The "M" typeflag indicates that this entry contin‐
|
2020-03-04 16:55:40 +01:00
|
|
|
|
ues an existing file. Such entries can only occur as the
|
|
|
|
|
first or second entry in an archive (the latter only if
|
2021-12-09 12:22:14 +01:00
|
|
|
|
the first entry is a volume label). The size field spec‐
|
2020-03-04 16:55:40 +01:00
|
|
|
|
ifies the size of this entry. The offset field at bytes
|
2021-12-09 12:22:14 +01:00
|
|
|
|
369-380 specifies the offset where this file fragment be‐
|
|
|
|
|
gins. The realsize field specifies the total size of the
|
|
|
|
|
file (which must equal size plus offset). When extract‐
|
|
|
|
|
ing, GNU tar checks that the header file name is the one
|
|
|
|
|
it is expecting, that the header offset is in the correct
|
|
|
|
|
sequence, and that the sum of offset and size is equal to
|
|
|
|
|
realsize.
|
2020-03-04 16:55:40 +01:00
|
|
|
|
|
|
|
|
|
N Type "N" records are no longer generated by GNU tar.
|
|
|
|
|
They contained a list of files to be renamed or symlinked
|
|
|
|
|
after extraction; this was originally used to support
|
2021-12-09 12:22:14 +01:00
|
|
|
|
long names. The contents of this record are a text de‐
|
|
|
|
|
scription of the operations to be done, in the form
|
|
|
|
|
“Rename %s to %s\n” or “Symlink %s to %s\n”; in either
|
|
|
|
|
case, both filenames are escaped using K&R C syntax. Due
|
|
|
|
|
to security concerns, "N" records are now generally ig‐
|
|
|
|
|
nored when reading archives.
|
|
|
|
|
|
|
|
|
|
S This is a “sparse” regular file. Sparse files are stored
|
|
|
|
|
as a series of fragments. The header contains a list of
|
|
|
|
|
fragment offset/length pairs. If more than four such en‐
|
|
|
|
|
tries are required, the header is extended as necessary
|
|
|
|
|
with “extra” header extensions (an older format that is
|
|
|
|
|
no longer used), or “sparse” extensions.
|
2020-03-04 16:55:40 +01:00
|
|
|
|
|
|
|
|
|
V The name field should be interpreted as a tape/volume
|
|
|
|
|
header name. This entry should generally be ignored on
|
|
|
|
|
extraction.
|
|
|
|
|
|
2021-12-09 12:22:14 +01:00
|
|
|
|
magic The magic field holds the five characters “ustar” followed by a
|
2020-03-04 16:55:40 +01:00
|
|
|
|
space. Note that POSIX ustar archives have a trailing null.
|
|
|
|
|
|
|
|
|
|
version
|
|
|
|
|
The version field holds a space character followed by a null.
|
|
|
|
|
Note that POSIX ustar archives use two copies of the ASCII digit
|
2021-12-09 12:22:14 +01:00
|
|
|
|
“0”.
|
2020-03-04 16:55:40 +01:00
|
|
|
|
|
|
|
|
|
atime, ctime
|
|
|
|
|
The time the file was last accessed and the time of last change
|
|
|
|
|
of file information, stored in octal as with mtime.
|
|
|
|
|
|
|
|
|
|
longnames
|
|
|
|
|
This field is apparently no longer used.
|
|
|
|
|
|
|
|
|
|
Sparse offset / numbytes
|
|
|
|
|
Each such structure specifies a single fragment of a sparse file.
|
|
|
|
|
The two fields store values as octal numbers. The fragments are
|
2021-12-09 12:22:14 +01:00
|
|
|
|
each padded to a multiple of 512 bytes in the archive. On ex‐
|
|
|
|
|
traction, the list of fragments is collected from the header (in‐
|
|
|
|
|
cluding any extension headers), and the data is then read and
|
2020-03-04 16:55:40 +01:00
|
|
|
|
written to the file at appropriate offsets.
|
|
|
|
|
|
|
|
|
|
isextended
|
2021-12-09 12:22:14 +01:00
|
|
|
|
If this is set to non-zero, the header will be followed by addi‐
|
|
|
|
|
tional “sparse header” records. Each such record contains infor‐
|
|
|
|
|
mation about as many as 21 additional sparse blocks as shown
|
2020-03-04 16:55:40 +01:00
|
|
|
|
here:
|
|
|
|
|
|
|
|
|
|
struct gnu_sparse_header {
|
|
|
|
|
struct {
|
|
|
|
|
char offset[12];
|
|
|
|
|
char numbytes[12];
|
|
|
|
|
} sparse[21];
|
|
|
|
|
char isextended[1];
|
|
|
|
|
char padding[7];
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
realsize
|
|
|
|
|
A binary representation of the file's complete size, with a much
|
|
|
|
|
larger range than the POSIX file size. In particular, with M
|
|
|
|
|
type files, the current entry is only a portion of the file. In
|
|
|
|
|
that case, the POSIX size field will indicate the size of this
|
|
|
|
|
entry; the realsize field will indicate the total size of the
|
|
|
|
|
file.
|
|
|
|
|
|
|
|
|
|
GNU tar pax archives
|
|
|
|
|
GNU tar 1.14 (XXX check this XXX) and later will write pax interchange
|
|
|
|
|
format archives when you specify the --posix flag. This format follows
|
2021-12-09 12:22:14 +01:00
|
|
|
|
the pax interchange format closely, using some SCHILY tags and introduc‐
|
2020-03-04 16:55:40 +01:00
|
|
|
|
ing new keywords to store sparse file information. There have been three
|
2021-12-09 12:22:14 +01:00
|
|
|
|
iterations of the sparse file support, referred to as “0.0”, “0.1”, and
|
|
|
|
|
“1.0”.
|
2020-03-04 16:55:40 +01:00
|
|
|
|
|
|
|
|
|
GNU.sparse.numblocks, GNU.sparse.offset, GNU.sparse.numbytes,
|
|
|
|
|
GNU.sparse.size
|
2021-12-09 12:22:14 +01:00
|
|
|
|
The “0.0” format used an initial GNU.sparse.numblocks attribute
|
2020-03-04 16:55:40 +01:00
|
|
|
|
to indicate the number of blocks in the file, a pair of
|
|
|
|
|
GNU.sparse.offset and GNU.sparse.numbytes to indicate the offset
|
|
|
|
|
and size of each block, and a single GNU.sparse.size to indicate
|
|
|
|
|
the full size of the file. This is not the same as the size in
|
|
|
|
|
the tar header because the latter value does not include the size
|
|
|
|
|
of any holes. This format required that the order of attributes
|
|
|
|
|
be preserved and relied on readers accepting multiple appearances
|
|
|
|
|
of the same attribute names, which is not officially permitted by
|
|
|
|
|
the standards.
|
|
|
|
|
|
|
|
|
|
GNU.sparse.map
|
2021-12-09 12:22:14 +01:00
|
|
|
|
The “0.1” format used a single attribute that stored a comma-sep‐
|
|
|
|
|
arated list of decimal numbers. Each pair of numbers indicated
|
|
|
|
|
the offset and size, respectively, of a block of data. This does
|
|
|
|
|
not work well if the archive is extracted by an archiver that
|
|
|
|
|
does not recognize this extension, since many pax implementations
|
|
|
|
|
simply discard unrecognized attributes.
|
2020-03-04 16:55:40 +01:00
|
|
|
|
|
|
|
|
|
GNU.sparse.major, GNU.sparse.minor, GNU.sparse.name, GNU.sparse.realsize
|
2021-12-09 12:22:14 +01:00
|
|
|
|
The “1.0” format stores the sparse block map in one or more
|
2020-03-04 16:55:40 +01:00
|
|
|
|
512-byte blocks prepended to the file data in the entry body.
|
|
|
|
|
The pax attributes indicate the existence of this map (via the
|
|
|
|
|
GNU.sparse.major and GNU.sparse.minor fields) and the full size
|
|
|
|
|
of the file. The GNU.sparse.name holds the true name of the
|
|
|
|
|
file. To avoid confusion, the name stored in the regular tar
|
2021-12-09 12:22:14 +01:00
|
|
|
|
header is a modified name so that extraction errors will be ap‐
|
|
|
|
|
parent to users.
|
2020-03-04 16:55:40 +01:00
|
|
|
|
|
|
|
|
|
Solaris Tar
|
|
|
|
|
XXX More Details Needed XXX
|
|
|
|
|
|
2021-12-09 12:22:14 +01:00
|
|
|
|
Solaris tar (beginning with SunOS XXX 5.7 ?? XXX) supports an “extended”
|
|
|
|
|
format that is fundamentally similar to pax interchange format, with the
|
|
|
|
|
following differences:
|
|
|
|
|
• Extended attributes are stored in an entry whose type is X, not
|
2020-03-04 16:55:40 +01:00
|
|
|
|
x, as used by pax interchange format. The detailed format of
|
2021-12-09 12:22:14 +01:00
|
|
|
|
this entry appears to be the same as detailed above for the x en‐
|
|
|
|
|
try.
|
|
|
|
|
• An additional A header is used to store an ACL for the following
|
|
|
|
|
regular entry. The body of this entry contains a seven-digit oc‐
|
|
|
|
|
tal number followed by a zero byte, followed by the textual ACL
|
2020-03-04 16:55:40 +01:00
|
|
|
|
description. The octal value is the number of ACL entries plus a
|
|
|
|
|
constant that indicates the ACL type: 01000000 for POSIX.1e ACLs
|
|
|
|
|
and 03000000 for NFSv4 ACLs.
|
|
|
|
|
|
|
|
|
|
AIX Tar
|
|
|
|
|
XXX More details needed XXX
|
|
|
|
|
|
|
|
|
|
AIX Tar uses a ustar-formatted header with the type A for storing coded
|
|
|
|
|
ACL information. Unlike the Solaris format, AIX tar writes this header
|
|
|
|
|
after the regular file body to which it applies. The pathname in this
|
|
|
|
|
header is either NFS4 or AIXC to indicate the type of ACL stored. The
|
|
|
|
|
actual ACL is stored in platform-specific binary format.
|
|
|
|
|
|
|
|
|
|
Mac OS X Tar
|
|
|
|
|
The tar distributed with Apple's Mac OS X stores most regular files as
|
|
|
|
|
two separate files in the tar archive. The two files have the same name
|
2021-12-09 12:22:14 +01:00
|
|
|
|
except that the first one has “._” prepended to the last path element.
|
|
|
|
|
This special file stores an AppleDouble-encoded binary blob with addi‐
|
|
|
|
|
tional metadata about the second file, including ACL, extended at‐
|
|
|
|
|
tributes, and resources. To recreate the original file on disk, each
|
2020-03-04 16:55:40 +01:00
|
|
|
|
separate file can be extracted and the Mac OS X copyfile() function can
|
|
|
|
|
be used to unpack the separate metadata file and apply it to th regular
|
2021-12-09 12:22:14 +01:00
|
|
|
|
file. Conversely, the same function provides a “pack” option to encode
|
2020-03-04 16:55:40 +01:00
|
|
|
|
the extended metadata from a file into a separate file whose contents can
|
|
|
|
|
then be put into a tar archive.
|
|
|
|
|
|
2021-12-09 12:22:14 +01:00
|
|
|
|
Note that the Apple extended attributes interact badly with long file‐
|
2020-03-04 16:55:40 +01:00
|
|
|
|
names. Since each file is stored with the full name, a separate set of
|
|
|
|
|
extensions needs to be included in the archive for each one, doubling the
|
|
|
|
|
overhead required for files with long names.
|
|
|
|
|
|
|
|
|
|
Summary of tar type codes
|
|
|
|
|
The following list is a condensed summary of the type codes used in tar
|
|
|
|
|
header records generated by different tar implementations. More details
|
|
|
|
|
about specific implementations can be found above:
|
|
|
|
|
NUL Early tar programs stored a zero byte for regular files.
|
|
|
|
|
0 POSIX standard type code for a regular file.
|
|
|
|
|
1 POSIX standard type code for a hard link description.
|
|
|
|
|
2 POSIX standard type code for a symbolic link description.
|
|
|
|
|
3 POSIX standard type code for a character device node.
|
|
|
|
|
4 POSIX standard type code for a block device node.
|
|
|
|
|
5 POSIX standard type code for a directory.
|
|
|
|
|
6 POSIX standard type code for a FIFO.
|
|
|
|
|
7 POSIX reserved.
|
|
|
|
|
7 GNU tar used for pre-allocated files on some systems.
|
|
|
|
|
A Solaris tar ACL description stored prior to a regular file header.
|
|
|
|
|
A AIX tar ACL description stored after the file body.
|
|
|
|
|
D GNU tar directory dump.
|
|
|
|
|
K GNU tar long linkname for the following header.
|
|
|
|
|
L GNU tar long pathname for the following header.
|
|
|
|
|
M GNU tar multivolume marker, indicating the file is a continuation of
|
|
|
|
|
a file from the previous volume.
|
|
|
|
|
N GNU tar long filename support. Deprecated.
|
|
|
|
|
S GNU tar sparse regular file.
|
|
|
|
|
V GNU tar tape/volume header name.
|
|
|
|
|
X Solaris tar general-purpose extension header.
|
|
|
|
|
g POSIX pax interchange format global extensions.
|
|
|
|
|
x POSIX pax interchange format per-file extensions.
|
|
|
|
|
|
|
|
|
|
SEE ALSO
|
|
|
|
|
ar(1), pax(1), tar(1)
|
|
|
|
|
|
|
|
|
|
STANDARDS
|
|
|
|
|
The tar utility is no longer a part of POSIX or the Single Unix Standard.
|
2021-12-09 12:22:14 +01:00
|
|
|
|
It last appeared in Version 2 of the Single UNIX Specification (“SUSv2”).
|
|
|
|
|
It has been supplanted in subsequent standards by pax(1). The ustar for‐
|
|
|
|
|
mat is currently part of the specification for the pax(1) utility. The
|
|
|
|
|
pax interchange file format is new with IEEE Std 1003.1-2001 (“POSIX.1”).
|
2020-03-04 16:55:40 +01:00
|
|
|
|
|
|
|
|
|
HISTORY
|
|
|
|
|
A tar command appeared in Seventh Edition Unix, which was released in
|
|
|
|
|
January, 1979. It replaced the tp program from Fourth Edition Unix which
|
|
|
|
|
in turn replaced the tap program from First Edition Unix. John Gilmore's
|
|
|
|
|
pdtar public-domain implementation (circa 1987) was highly influential
|
|
|
|
|
and formed the basis of GNU tar (circa 1988). Joerg Shilling's star
|
|
|
|
|
archiver is another open-source (CDDL) archiver (originally developed
|
|
|
|
|
circa 1985) which features complete support for pax interchange format.
|
|
|
|
|
|
|
|
|
|
This documentation was written as part of the libarchive and bsdtar
|
|
|
|
|
project by Tim Kientzle <kientzle@FreeBSD.org>.
|
|
|
|
|
|
|
|
|
|
BSD December 27, 2016 BSD
|