mirror of
https://github.com/cmclark00/retro-imager.git
synced 2025-05-19 16:35:20 +01:00
- Update bunlded libarchive version used on Windows/Mac - Enable requested zstd support while we are at it. Closes #211
335 lines
19 KiB
Text
335 lines
19 KiB
Text
LIBARCHIVE-FORMATS(5) BSD File Formats Manual LIBARCHIVE-FORMATS(5)
|
||
|
||
NAME
|
||
libarchive-formats — archive formats supported by the libarchive library
|
||
|
||
DESCRIPTION
|
||
The libarchive(3) library reads and writes a variety of streaming archive
|
||
formats. Generally speaking, all of these archive formats consist of a
|
||
series of “entries”. Each entry stores a single file system object, such
|
||
as a file, directory, or symbolic link.
|
||
|
||
The following provides a brief description of each format supported by
|
||
libarchive, with some information about recognized extensions or limita‐
|
||
tions of the current library support. Note that just because a format is
|
||
supported by libarchive does not imply that a program that uses
|
||
libarchive will support that format. Applications that use libarchive
|
||
specify which formats they wish to support, though many programs do use
|
||
libarchive convenience functions to enable all supported formats.
|
||
|
||
Tar Formats
|
||
The libarchive(3) library can read most tar archives. It can write
|
||
POSIX-standard “ustar” and “pax interchange” formats as well as v7 tar
|
||
format and a subset of the legacy GNU tar format.
|
||
|
||
All tar formats store each entry in one or more 512-byte records. The
|
||
first record is used for file metadata, including filename, timestamp,
|
||
and mode information, and the file data is stored in subsequent records.
|
||
Later variants have extended this by either appropriating undefined areas
|
||
of the header record, extending the header to multiple records, or by
|
||
storing special entries that modify the interpretation of subsequent en‐
|
||
tries.
|
||
|
||
gnutar The libarchive(3) library can read most GNU-format tar archives.
|
||
It currently supports the most popular GNU extensions, including
|
||
modern long filename and linkname support, as well as atime and
|
||
ctime data. The libarchive library does not support multi-volume
|
||
archives, nor the old GNU long filename format. It can read GNU
|
||
sparse file entries, including the new POSIX-based formats.
|
||
|
||
The libarchive(3) library can write GNU tar format, including
|
||
long filename and linkname support, as well as atime and ctime
|
||
data.
|
||
|
||
pax The libarchive(3) library can read and write POSIX-compliant pax
|
||
interchange format archives. Pax interchange format archives are
|
||
an extension of the older ustar format that adds a separate entry
|
||
with additional attributes stored as key/value pairs immediately
|
||
before each regular entry. The presence of these additional en‐
|
||
tries is the only difference between pax interchange format and
|
||
the older ustar format. The extended attributes are of unlimited
|
||
length and are stored as UTF-8 Unicode strings. Keywords defined
|
||
in the standard are in all lowercase; vendors are allowed to de‐
|
||
fine custom keys by preceding them with the vendor name in all
|
||
uppercase. When writing pax archives, libarchive uses many of
|
||
the SCHILY keys defined by Joerg Schilling's “star” archiver and
|
||
a few LIBARCHIVE keys. The libarchive library can read most of
|
||
the SCHILY keys and most of the GNU keys introduced by GNU tar.
|
||
It silently ignores any keywords that it does not understand.
|
||
|
||
The pax interchange format converts filenames to Unicode and
|
||
stores them using the UTF-8 encoding. Prior to libarchive 3.0,
|
||
libarchive erroneously assumed that the system wide-character
|
||
routines natively supported Unicode. This caused it to mis-han‐
|
||
dle non-ASCII filenames on systems that did not satisfy this as‐
|
||
sumption.
|
||
|
||
restricted pax
|
||
The libarchive library can also write pax archives in which it
|
||
attempts to suppress the extended attributes entry whenever pos‐
|
||
sible. The result will be identical to a ustar archive unless
|
||
the extended attributes entry is required to store a long file
|
||
name, long linkname, extended ACL, file flags, or if any of the
|
||
standard ustar data (user name, group name, UID, GID, etc) cannot
|
||
be fully represented in the ustar header. In all cases, the re‐
|
||
sult can be dearchived by any program that can read POSIX-compli‐
|
||
ant pax interchange format archives. Programs that correctly
|
||
read ustar format (see below) will also be able to read this for‐
|
||
mat; any extended attributes will be extracted as separate files
|
||
stored in PaxHeader directories.
|
||
|
||
ustar The libarchive library can both read and write this format. This
|
||
format has the following limitations:
|
||
• Device major and minor numbers are limited to 21 bits. Nodes
|
||
with larger numbers will not be added to the archive.
|
||
• Path names in the archive are limited to 255 bytes. (Shorter
|
||
if there is no / character in exactly the right place.)
|
||
• Symbolic links and hard links are stored in the archive with
|
||
the name of the referenced file. This name is limited to 100
|
||
bytes.
|
||
• Extended attributes, file flags, and other extended security
|
||
information cannot be stored.
|
||
• Archive entries are limited to 8 gigabytes in size.
|
||
Note that the pax interchange format has none of these restric‐
|
||
tions. The ustar format is old and widely supported. It is rec‐
|
||
ommended when compatibility is the primary concern.
|
||
|
||
v7 The libarchive library can read and write the legacy v7 tar for‐
|
||
mat. This format has the following limitations:
|
||
• Only regular files, directories, and symbolic links can be
|
||
archived. Block and character device nodes, FIFOs, and sock‐
|
||
ets cannot be archived.
|
||
• Path names in the archive are limited to 100 bytes.
|
||
• Symbolic links and hard links are stored in the archive with
|
||
the name of the referenced file. This name is limited to 100
|
||
bytes.
|
||
• User and group information are stored as numeric IDs; there
|
||
is no provision for storing user or group names.
|
||
• Extended attributes, file flags, and other extended security
|
||
information cannot be stored.
|
||
• Archive entries are limited to 8 gigabytes in size.
|
||
Generally, users should prefer the ustar format for portability
|
||
as the v7 tar format is both less useful and less portable.
|
||
|
||
The libarchive library also reads a variety of commonly-used extensions
|
||
to the basic tar format. These extensions are recognized automatically
|
||
whenever they appear.
|
||
|
||
Numeric extensions.
|
||
The POSIX standards require fixed-length numeric fields to be
|
||
written with some character position reserved for terminators.
|
||
Libarchive allows these fields to be written without terminator
|
||
characters. This extends the allowable range; in particular, us‐
|
||
tar archives with this extension can support entries up to 64 gi‐
|
||
gabytes in size. Libarchive also recognizes base-256 values in
|
||
most numeric fields. This essentially removes all limitations on
|
||
file size, modification time, and device numbers.
|
||
|
||
Solaris extensions
|
||
Libarchive recognizes ACL and extended attribute records written
|
||
by Solaris tar.
|
||
|
||
The first tar program appeared in Seventh Edition Unix in 1979. The
|
||
first official standard for the tar file format was the “ustar” (Unix
|
||
Standard Tar) format defined by POSIX in 1988. POSIX.1-2001 extended the
|
||
ustar format to create the “pax interchange” format.
|
||
|
||
Cpio Formats
|
||
The libarchive library can read and write a number of common cpio vari‐
|
||
ants. A cpio archive stores each entry as a fixed-size header followed
|
||
by a variable-length filename and variable-length data. Unlike the tar
|
||
format, the cpio format does only minimal padding of the header or file
|
||
data. There are several cpio variants, which differ primarily in how
|
||
they store the initial header: some store the values as octal or hexadec‐
|
||
imal numbers in ASCII, others as binary values of varying byte order and
|
||
length.
|
||
|
||
binary The libarchive library transparently reads both big-endian and
|
||
little-endian variants of the the two binary cpio formats; the
|
||
original one from PWB/UNIX, and the later, more widely used,
|
||
variant. This format used 32-bit binary values for file size and
|
||
mtime, and 16-bit binary values for the other fields. The for‐
|
||
mats support only the file types present in UNIX at the time of
|
||
their creation. File sizes are limited to 24 bits in the PWB
|
||
format, because of the limits of the file system, and to 31 bits
|
||
in the newer binary format, where signed 32 bit longs were used.
|
||
|
||
odc This is the POSIX standardized format, which is officially known
|
||
as the “cpio interchange format” or the “octet-oriented cpio
|
||
archive format” and sometimes unofficially referred to as the
|
||
“old character format”. This format stores the header contents
|
||
as octal values in ASCII. It is standard, portable, and immune
|
||
from byte-order confusion. File sizes and mtime are limited to
|
||
33 bits (8GB file size), other fields are limited to 18 bits.
|
||
|
||
SVR4/newc
|
||
The libarchive library can read both CRC and non-CRC variants of
|
||
this format. The SVR4 format uses eight-digit hexadecimal values
|
||
for all header fields. This limits file size to 4GB, and also
|
||
limits the mtime and other fields to 32 bits. The SVR4 format
|
||
can optionally include a CRC of the file contents, although
|
||
libarchive does not currently verify this CRC.
|
||
|
||
Cpio first appeared in PWB/UNIX 1.0, which was released within AT&T in
|
||
1977. PWB/UNIX 1.0 formed the basis of System III Unix, released outside
|
||
of AT&T in 1981. This makes cpio older than tar, although cpio was not
|
||
included in Version 7 AT&T Unix. As a result, the tar command became
|
||
much better known in universities and research groups that used Version
|
||
7. The combination of the find and cpio utilities provided very precise
|
||
control over file selection. Unfortunately, the format has many limita‐
|
||
tions that make it unsuitable for widespread use. Only the POSIX format
|
||
permits files over 4GB, and its 18-bit limit for most other fields makes
|
||
it unsuitable for modern systems. In addition, cpio formats only store
|
||
numeric UID/GID values (not usernames and group names), which can make it
|
||
very difficult to correctly transfer archives across systems with dissim‐
|
||
ilar user numbering.
|
||
|
||
Shar Formats
|
||
A “shell archive” is a shell script that, when executed on a POSIX-com‐
|
||
pliant system, will recreate a collection of file system objects. The
|
||
libarchive library can write two different kinds of shar archives:
|
||
|
||
shar The traditional shar format uses a limited set of POSIX commands,
|
||
including echo(1), mkdir(1), and sed(1). It is suitable for
|
||
portably archiving small collections of plain text files. How‐
|
||
ever, it is not generally well-suited for large archives (many
|
||
implementations of sh(1) have limits on the size of a script) nor
|
||
should it be used with non-text files.
|
||
|
||
shardump
|
||
This format is similar to shar but encodes files using
|
||
uuencode(1) so that the result will be a plain text file regard‐
|
||
less of the file contents. It also includes additional shell
|
||
commands that attempt to reproduce as many file attributes as
|
||
possible, including owner, mode, and flags. The additional com‐
|
||
mands used to restore file attributes make shardump archives less
|
||
portable than plain shar archives.
|
||
|
||
ISO9660 format
|
||
Libarchive can read and extract from files containing ISO9660-compliant
|
||
CDROM images. In many cases, this can remove the need to burn a physical
|
||
CDROM just in order to read the files contained in an ISO9660 image. It
|
||
also avoids security and complexity issues that come with virtual mounts
|
||
and loopback devices. Libarchive supports the most common Rockridge ex‐
|
||
tensions and has partial support for Joliet extensions. If both exten‐
|
||
sions are present, the Joliet extensions will be used and the Rockridge
|
||
extensions will be ignored. In particular, this can create problems with
|
||
hardlinks and symlinks, which are supported by Rockridge but not by
|
||
Joliet.
|
||
|
||
Libarchive reads ISO9660 images using a streaming strategy. This allows
|
||
it to read compressed images directly (decompressing on the fly) and al‐
|
||
lows it to read images directly from network sockets, pipes, and other
|
||
non-seekable data sources. This strategy works well for optimized
|
||
ISO9660 images created by many popular programs. Such programs collect
|
||
all directory information at the beginning of the ISO9660 image so it can
|
||
be read from a physical disk with a minimum of seeking. However, not all
|
||
ISO9660 images can be read in this fashion.
|
||
|
||
Libarchive can also write ISO9660 images. Such images are fully opti‐
|
||
mized with the directory information preceding all file data. This is
|
||
done by storing all file data to a temporary file while collecting direc‐
|
||
tory information in memory. When the image is finished, libarchive
|
||
writes out the directory structure followed by the file data. The loca‐
|
||
tion used for the temporary file can be changed by the usual environment
|
||
variables.
|
||
|
||
Zip format
|
||
Libarchive can read and write zip format archives that have uncompressed
|
||
entries and entries compressed with the “deflate” algorithm. Other zip
|
||
compression algorithms are not supported. It can extract jar archives,
|
||
archives that use Zip64 extensions and self-extracting zip archives.
|
||
Libarchive can use either of two different strategies for reading Zip ar‐
|
||
chives: a streaming strategy which is fast and can handle extremely large
|
||
archives, and a seeking strategy which can correctly process self-ex‐
|
||
tracting Zip archives and archives with deleted members or other in-place
|
||
modifications.
|
||
|
||
The streaming reader processes Zip archives as they are read. It can
|
||
read archives of arbitrary size from tape or network sockets, and can de‐
|
||
code Zip archives that have been separately compressed or encoded. How‐
|
||
ever, self-extracting Zip archives and archives with certain types of
|
||
modifications cannot be correctly handled. Such archives require that
|
||
the reader first process the Central Directory, which is ordinarily lo‐
|
||
cated at the end of a Zip archive and is thus inaccessible to the stream‐
|
||
ing reader. If the program using libarchive has enabled seek support,
|
||
then libarchive will use this to processes the central directory first.
|
||
|
||
In particular, the seeking reader must be used to correctly handle self-
|
||
extracting archives. Such archives consist of a program followed by a
|
||
regular Zip archive. The streaming reader cannot parse the initial pro‐
|
||
gram portion, but the seeking reader starts by reading the Central Direc‐
|
||
tory from the end of the archive. Similarly, Zip archives that have been
|
||
modified in-place can have deleted entries or other garbage data that can
|
||
only be accurately detected by first reading the Central Directory.
|
||
|
||
Archive (library) file format
|
||
The Unix archive format (commonly created by the ar(1) archiver) is a
|
||
general-purpose format which is used almost exclusively for object files
|
||
to be read by the link editor ld(1). The ar format has never been stan‐
|
||
dardised. There are two common variants: the GNU format derived from
|
||
SVR4, and the BSD format, which first appeared in 4.4BSD. The two differ
|
||
primarily in their handling of filenames longer than 15 characters: the
|
||
GNU/SVR4 variant writes a filename table at the beginning of the archive;
|
||
the BSD format stores each long filename in an extension area adjacent to
|
||
the entry. Libarchive can read both extensions, including archives that
|
||
may include both types of long filenames. Programs using libarchive can
|
||
write GNU/SVR4 format if they provide an entry called // containing a
|
||
filename table to be written into the archive before any of the entries.
|
||
Any entries whose names are not in the filename table will be written us‐
|
||
ing BSD-style long filenames. This can cause problems for programs such
|
||
as GNU ld that do not support the BSD-style long filenames.
|
||
|
||
mtree
|
||
Libarchive can read and write files in mtree(5) format. This format is
|
||
not a true archive format, but rather a textual description of a file hi‐
|
||
erarchy in which each line specifies the name of a file and provides spe‐
|
||
cific metadata about that file. Libarchive can read all of the keywords
|
||
supported by both the NetBSD and FreeBSD versions of mtree(8), although
|
||
many of the keywords cannot currently be stored in an archive_entry ob‐
|
||
ject. When writing, libarchive supports use of the
|
||
archive_write_set_options(3) interface to specify which keywords should
|
||
be included in the output. If libarchive was compiled with access to
|
||
suitable cryptographic libraries (such as the OpenSSL libraries), it can
|
||
compute hash entries such as sha512 or md5 from file data being written
|
||
to the mtree writer.
|
||
|
||
When reading an mtree file, libarchive will locate the corresponding
|
||
files on disk using the contents keyword if present or the regular file‐
|
||
name. If it can locate and open the file on disk, it will use that to
|
||
fill in any metadata that is missing from the mtree file and will read
|
||
the file contents and return those to the program using libarchive. If
|
||
it cannot locate and open the file on disk, libarchive will return an er‐
|
||
ror for any attempt to read the entry body.
|
||
|
||
7-Zip
|
||
Libarchive can read and write 7-Zip format archives. TODO: Need more in‐
|
||
formation
|
||
|
||
CAB
|
||
Libarchive can read Microsoft Cabinet ( “CAB”) format archives. TODO:
|
||
Need more information.
|
||
|
||
LHA
|
||
TODO: Information about libarchive's LHA support
|
||
|
||
RAR
|
||
Libarchive has limited support for reading RAR format archives. Cur‐
|
||
rently, libarchive can read RARv3 format archives which have been either
|
||
created uncompressed, or compressed using any of the compression methods
|
||
supported by the RARv3 format. Libarchive can also read self-extracting
|
||
RAR archives.
|
||
|
||
Warc
|
||
Libarchive can read and write “web archives”. TODO: Need more informa‐
|
||
tion
|
||
|
||
XAR
|
||
Libarchive can read and write the XAR format used by many Apple tools.
|
||
TODO: Need more information
|
||
|
||
SEE ALSO
|
||
ar(1), cpio(1), mkisofs(1), shar(1), tar(1), zip(1), zlib(3), cpio(5),
|
||
mtree(5), tar(5)
|
||
|
||
BSD December 27, 2016 BSD
|