summaryrefslogtreecommitdiff
path: root/libarchive/libarchive-2.8.0/doc/text/libarchive_internals.3.txt
diff options
context:
space:
mode:
authorTomas Bzatek <tbzatek@redhat.com>2023-12-17 16:55:58 +0100
committerTomas Bzatek <tbzatek@redhat.com>2023-12-17 16:55:58 +0100
commitb22a4476a66a913a07d5e80334c0400a9b162206 (patch)
treed896eb5f6f9212b5ef424219c45571ce5f152cc0 /libarchive/libarchive-2.8.0/doc/text/libarchive_internals.3.txt
parent7592788feb1a8cb79b85e6a9911a206a5d55896d (diff)
downloadtuxcmd-modules-b22a4476a66a913a07d5e80334c0400a9b162206.tar.xz
libarchive: Remove in-tree libarchive package
Libarchive has become a standard package in most distributions, no need to carry the sources along here.
Diffstat (limited to 'libarchive/libarchive-2.8.0/doc/text/libarchive_internals.3.txt')
-rw-r--r--libarchive/libarchive-2.8.0/doc/text/libarchive_internals.3.txt248
1 files changed, 0 insertions, 248 deletions
diff --git a/libarchive/libarchive-2.8.0/doc/text/libarchive_internals.3.txt b/libarchive/libarchive-2.8.0/doc/text/libarchive_internals.3.txt
deleted file mode 100644
index e5e65bd..0000000
--- a/libarchive/libarchive-2.8.0/doc/text/libarchive_internals.3.txt
+++ /dev/null
@@ -1,248 +0,0 @@
-LIBARCHIVE(3) FreeBSD Library Functions Manual LIBARCHIVE(3)
-
-NAME
- libarchive_internals -- description of libarchive internal interfaces
-
-OVERVIEW
- The libarchive library provides a flexible interface for reading and
- writing streaming archive files such as tar and cpio. Internally, it
- follows a modular layered design that should make it easy to add new ar-
- chive and compression formats.
-
-GENERAL ARCHITECTURE
- Externally, libarchive exposes most operations through an opaque, object-
- style interface. The archive_entry(1) objects store information about a
- single filesystem object. The rest of the library provides facilities to
- write archive_entry(1) objects to archive files, read them from archive
- files, and write them to disk. (There are plans to add a facility to
- read archive_entry(1) objects from disk as well.)
-
- The read and write APIs each have four layers: a public API layer, a for-
- mat layer that understands the archive file format, a compression layer,
- and an I/O layer. The I/O layer is completely exposed to clients who can
- replace it entirely with their own functions.
-
- In order to provide as much consistency as possible for clients, some
- public functions are virtualized. Eventually, it should be possible for
- clients to open an archive or disk writer, and then use a single set of
- code to select and write entries, regardless of the target.
-
-READ ARCHITECTURE
- From the outside, clients use the archive_read(3) API to manipulate an
- archive object to read entries and bodies from an archive stream. Inter-
- nally, the archive object is cast to an archive_read object, which holds
- all read-specific data. The API has four layers: The lowest layer is the
- I/O layer. This layer can be overridden by clients, but most clients use
- the packaged I/O callbacks provided, for example, by
- archive_read_open_memory(3), and archive_read_open_fd(3). The compres-
- sion layer calls the I/O layer to read bytes and decompresses them for
- the format layer. The format layer unpacks a stream of uncompressed
- bytes and creates archive_entry objects from the incoming data. The API
- layer tracks overall state (for example, it prevents clients from reading
- data before reading a header) and invokes the format and compression
- layer operations through registered function pointers. In particular,
- the API layer drives the format-detection process: When opening the ar-
- chive, it reads an initial block of data and offers it to each registered
- compression handler. The one with the highest bid is initialized with
- the first block. Similarly, the format handlers are polled to see which
- handler is the best for each archive. (Prior to 2.4.0, the format bid-
- ders were invoked for each entry, but this design hindered error recov-
- ery.)
-
- I/O Layer and Client Callbacks
- The read API goes to some lengths to be nice to clients. As a result,
- there are few restrictions on the behavior of the client callbacks.
-
- The client read callback is expected to provide a block of data on each
- call. A zero-length return does indicate end of file, but otherwise
- blocks may be as small as one byte or as large as the entire file. In
- particular, blocks may be of different sizes.
-
- The client skip callback returns the number of bytes actually skipped,
- which may be much smaller than the skip requested. The only requirement
- is that the skip not be larger. In particular, clients are allowed to
- return zero for any skip that they don't want to handle. The skip call-
- back must never be invoked with a negative value.
-
- Keep in mind that not all clients are reading from disk: clients reading
- from networks may provide different-sized blocks on every request and
- cannot skip at all; advanced clients may use mmap(2) to read the entire
- file into memory at once and return the entire file to libarchive as a
- single block; other clients may begin asynchronous I/O operations for the
- next block on each request.
-
- Decompresssion Layer
- The decompression layer not only handles decompression, it also buffers
- data so that the format handlers see a much nicer I/O model. The decom-
- pression API is a two stage peek/consume model. A read_ahead request
- specifies a minimum read amount; the decompression layer must provide a
- pointer to at least that much data. If more data is immediately avail-
- able, it should return more: the format layer handles bulk data reads by
- asking for a minimum of one byte and then copying as much data as is
- available.
-
- A subsequent call to the consume() function advances the read pointer.
- Note that data returned from a read_ahead() call is guaranteed to remain
- in place until the next call to read_ahead(). Intervening calls to
- consume() should not cause the data to move.
-
- Skip requests must always be handled exactly. Decompression handlers
- that cannot seek forward should not register a skip handler; the API
- layer fills in a generic skip handler that reads and discards data.
-
- A decompression handler has a specific lifecycle:
- Registration/Configuration
- When the client invokes the public support function, the decom-
- pression handler invokes the internal
- __archive_read_register_compression() function to provide bid and
- initialization functions. This function returns NULL on error or
- else a pointer to a struct decompressor_t. This structure con-
- tains a void * config slot that can be used for storing any cus-
- tomization information.
- Bid The bid function is invoked with a pointer and size of a block of
- data. The decompressor can access its config data through the
- decompressor element of the archive_read object. The bid func-
- tion is otherwise stateless. In particular, it must not perform
- any I/O operations.
-
- The value returned by the bid function indicates its suitability
- for handling this data stream. A bid of zero will ensure that
- this decompressor is never invoked. Return zero if magic number
- checks fail. Otherwise, your initial implementation should
- return the number of bits actually checked. For example, if you
- verify two full bytes and three bits of another byte, bid 19.
- Note that the initial block may be very short; be careful to only
- inspect the data you are given. (The current decompressors
- require two bytes for correct bidding.)
- Initialize
- The winning bidder will have its init function called. This
- function should initialize the remaining slots of the struct
- decompressor_t object pointed to by the decompressor element of
- the archive_read object. In particular, it should allocate any
- working data it needs in the data slot of that structure. The
- init function is called with the block of data that was used for
- tasting. At this point, the decompressor is responsible for all
- I/O requests to the client callbacks. The decompressor is free
- to read more data as and when necessary.
- Satisfy I/O requests
- The format handler will invoke the read_ahead, consume, and skip
- functions as needed.
- Finish The finish method is called only once when the archive is closed.
- It should release anything stored in the data and config slots of
- the decompressor object. It should not invoke the client close
- callback.
-
- Format Layer
- The read formats have a similar lifecycle to the decompression handlers:
- Registration
- Allocate your private data and initialize your pointers.
- Bid Formats bid by invoking the read_ahead() decompression method but
- not calling the consume() method. This allows each bidder to
- look ahead in the input stream. Bidders should not look further
- ahead than necessary, as long look aheads put pressure on the
- decompression layer to buffer lots of data. Most formats only
- require a few hundred bytes of look ahead; look aheads of a few
- kilobytes are reasonable. (The ISO9660 reader sometimes looks
- ahead by 48k, which should be considered an upper limit.)
- Read header
- The header read is usually the most complex part of any format.
- There are a few strategies worth mentioning: For formats such as
- tar or cpio, reading and parsing the header is straightforward
- since headers alternate with data. For formats that store all
- header data at the beginning of the file, the first header read
- request may have to read all headers into memory and store that
- data, sorted by the location of the file data. Subsequent header
- read requests will skip forward to the beginning of the file data
- and return the corresponding header.
- Read Data
- The read data interface supports sparse files; this requires that
- each call return a block of data specifying the file offset and
- size. This may require you to carefully track the location so
- that you can return accurate file offsets for each read. Remem-
- ber that the decompressor will return as much data as it has.
- Generally, you will want to request one byte, examine the return
- value to see how much data is available, and possibly trim that
- to the amount you can use. You should invoke consume for each
- block just before you return it.
- Skip All Data
- The skip data call should skip over all file data and trailing
- padding. This is called automatically by the API layer just
- before each header read. It is also called in response to the
- client calling the public data_skip() function.
- Cleanup
- On cleanup, the format should release all of its allocated mem-
- ory.
-
- API Layer
- XXX to do XXX
-
-WRITE ARCHITECTURE
- The write API has a similar set of four layers: an API layer, a format
- layer, a compression layer, and an I/O layer. The registration here is
- much simpler because only one format and one compression can be regis-
- tered at a time.
-
- I/O Layer and Client Callbacks
- XXX To be written XXX
-
- Compression Layer
- XXX To be written XXX
-
- Format Layer
- XXX To be written XXX
-
- API Layer
- XXX To be written XXX
-
-WRITE_DISK ARCHITECTURE
- The write_disk API is intended to look just like the write API to
- clients. Since it does not handle multiple formats or compression, it is
- not layered internally.
-
-GENERAL SERVICES
- The archive_read, archive_write, and archive_write_disk objects all con-
- tain an initial archive object which provides common support for a set of
- standard services. (Recall that ANSI/ISO C90 guarantees that you can
- cast freely between a pointer to a structure and a pointer to the first
- element of that structure.) The archive object has a magic value that
- indicates which API this object is associated with, slots for storing
- error information, and function pointers for virtualized API functions.
-
-MISCELLANEOUS NOTES
- Connecting existing archiving libraries into libarchive is generally
- quite difficult. In particular, many existing libraries strongly assume
- that you are reading from a file; they seek forwards and backwards as
- necessary to locate various pieces of information. In contrast,
- libarchive never seeks backwards in its input, which sometimes requires
- very different approaches.
-
- For example, libarchive's ISO9660 support operates very differently from
- most ISO9660 readers. The libarchive support utilizes a work-queue
- design that keeps a list of known entries sorted by their location in the
- input. Whenever libarchive's ISO9660 implementation is asked for the
- next header, checks this list to find the next item on the disk. Direc-
- tories are parsed when they are encountered and new items are added to
- the list. This design relies heavily on the ISO9660 image being opti-
- mized so that directories always occur earlier on the disk than the files
- they describe.
-
- Depending on the specific format, such approaches may not be possible.
- The ZIP format specification, for example, allows archivers to store key
- information only at the end of the file. In theory, it is possible to
- create ZIP archives that cannot be read without seeking. Fortunately,
- such archives are very rare, and libarchive can read most ZIP archives,
- though it cannot always extract as much information as a dedicated ZIP
- program.
-
-SEE ALSO
- archive(3), archive_entry(3), archive_read(3), archive_write(3),
- archive_write_disk(3)
-
-HISTORY
- The libarchive library first appeared in FreeBSD 5.3.
-
-AUTHORS
- The libarchive library was written by Tim Kientzle <kientzle@acm.org>.
-
-BUGS
-FreeBSD 8.0 April 16, 2007 FreeBSD 8.0