diff options
Diffstat (limited to 'libarchive/libarchive-2.7.1/doc/text/libarchive_internals.3.txt')
| -rw-r--r-- | libarchive/libarchive-2.7.1/doc/text/libarchive_internals.3.txt | 248 |
1 files changed, 0 insertions, 248 deletions
diff --git a/libarchive/libarchive-2.7.1/doc/text/libarchive_internals.3.txt b/libarchive/libarchive-2.7.1/doc/text/libarchive_internals.3.txt deleted file mode 100644 index e5e65bd..0000000 --- a/libarchive/libarchive-2.7.1/doc/text/libarchive_internals.3.txt +++ /dev/null @@ -1,248 +0,0 @@ -LIBARCHIVE(3) FreeBSD Library Functions Manual LIBARCHIVE(3) - -NAME - libarchive_internals -- description of libarchive internal interfaces - -OVERVIEW - The libarchive library provides a flexible interface for reading and - writing streaming archive files such as tar and cpio. Internally, it - follows a modular layered design that should make it easy to add new ar- - chive and compression formats. - -GENERAL ARCHITECTURE - Externally, libarchive exposes most operations through an opaque, object- - style interface. The archive_entry(1) objects store information about a - single filesystem object. The rest of the library provides facilities to - write archive_entry(1) objects to archive files, read them from archive - files, and write them to disk. (There are plans to add a facility to - read archive_entry(1) objects from disk as well.) - - The read and write APIs each have four layers: a public API layer, a for- - mat layer that understands the archive file format, a compression layer, - and an I/O layer. The I/O layer is completely exposed to clients who can - replace it entirely with their own functions. - - In order to provide as much consistency as possible for clients, some - public functions are virtualized. Eventually, it should be possible for - clients to open an archive or disk writer, and then use a single set of - code to select and write entries, regardless of the target. - -READ ARCHITECTURE - From the outside, clients use the archive_read(3) API to manipulate an - archive object to read entries and bodies from an archive stream. Inter- - nally, the archive object is cast to an archive_read object, which holds - all read-specific data. The API has four layers: The lowest layer is the - I/O layer. This layer can be overridden by clients, but most clients use - the packaged I/O callbacks provided, for example, by - archive_read_open_memory(3), and archive_read_open_fd(3). The compres- - sion layer calls the I/O layer to read bytes and decompresses them for - the format layer. The format layer unpacks a stream of uncompressed - bytes and creates archive_entry objects from the incoming data. The API - layer tracks overall state (for example, it prevents clients from reading - data before reading a header) and invokes the format and compression - layer operations through registered function pointers. In particular, - the API layer drives the format-detection process: When opening the ar- - chive, it reads an initial block of data and offers it to each registered - compression handler. The one with the highest bid is initialized with - the first block. Similarly, the format handlers are polled to see which - handler is the best for each archive. (Prior to 2.4.0, the format bid- - ders were invoked for each entry, but this design hindered error recov- - ery.) - - I/O Layer and Client Callbacks - The read API goes to some lengths to be nice to clients. As a result, - there are few restrictions on the behavior of the client callbacks. - - The client read callback is expected to provide a block of data on each - call. A zero-length return does indicate end of file, but otherwise - blocks may be as small as one byte or as large as the entire file. In - particular, blocks may be of different sizes. - - The client skip callback returns the number of bytes actually skipped, - which may be much smaller than the skip requested. The only requirement - is that the skip not be larger. In particular, clients are allowed to - return zero for any skip that they don't want to handle. The skip call- - back must never be invoked with a negative value. - - Keep in mind that not all clients are reading from disk: clients reading - from networks may provide different-sized blocks on every request and - cannot skip at all; advanced clients may use mmap(2) to read the entire - file into memory at once and return the entire file to libarchive as a - single block; other clients may begin asynchronous I/O operations for the - next block on each request. - - Decompresssion Layer - The decompression layer not only handles decompression, it also buffers - data so that the format handlers see a much nicer I/O model. The decom- - pression API is a two stage peek/consume model. A read_ahead request - specifies a minimum read amount; the decompression layer must provide a - pointer to at least that much data. If more data is immediately avail- - able, it should return more: the format layer handles bulk data reads by - asking for a minimum of one byte and then copying as much data as is - available. - - A subsequent call to the consume() function advances the read pointer. - Note that data returned from a read_ahead() call is guaranteed to remain - in place until the next call to read_ahead(). Intervening calls to - consume() should not cause the data to move. - - Skip requests must always be handled exactly. Decompression handlers - that cannot seek forward should not register a skip handler; the API - layer fills in a generic skip handler that reads and discards data. - - A decompression handler has a specific lifecycle: - Registration/Configuration - When the client invokes the public support function, the decom- - pression handler invokes the internal - __archive_read_register_compression() function to provide bid and - initialization functions. This function returns NULL on error or - else a pointer to a struct decompressor_t. This structure con- - tains a void * config slot that can be used for storing any cus- - tomization information. - Bid The bid function is invoked with a pointer and size of a block of - data. The decompressor can access its config data through the - decompressor element of the archive_read object. The bid func- - tion is otherwise stateless. In particular, it must not perform - any I/O operations. - - The value returned by the bid function indicates its suitability - for handling this data stream. A bid of zero will ensure that - this decompressor is never invoked. Return zero if magic number - checks fail. Otherwise, your initial implementation should - return the number of bits actually checked. For example, if you - verify two full bytes and three bits of another byte, bid 19. - Note that the initial block may be very short; be careful to only - inspect the data you are given. (The current decompressors - require two bytes for correct bidding.) - Initialize - The winning bidder will have its init function called. This - function should initialize the remaining slots of the struct - decompressor_t object pointed to by the decompressor element of - the archive_read object. In particular, it should allocate any - working data it needs in the data slot of that structure. The - init function is called with the block of data that was used for - tasting. At this point, the decompressor is responsible for all - I/O requests to the client callbacks. The decompressor is free - to read more data as and when necessary. - Satisfy I/O requests - The format handler will invoke the read_ahead, consume, and skip - functions as needed. - Finish The finish method is called only once when the archive is closed. - It should release anything stored in the data and config slots of - the decompressor object. It should not invoke the client close - callback. - - Format Layer - The read formats have a similar lifecycle to the decompression handlers: - Registration - Allocate your private data and initialize your pointers. - Bid Formats bid by invoking the read_ahead() decompression method but - not calling the consume() method. This allows each bidder to - look ahead in the input stream. Bidders should not look further - ahead than necessary, as long look aheads put pressure on the - decompression layer to buffer lots of data. Most formats only - require a few hundred bytes of look ahead; look aheads of a few - kilobytes are reasonable. (The ISO9660 reader sometimes looks - ahead by 48k, which should be considered an upper limit.) - Read header - The header read is usually the most complex part of any format. - There are a few strategies worth mentioning: For formats such as - tar or cpio, reading and parsing the header is straightforward - since headers alternate with data. For formats that store all - header data at the beginning of the file, the first header read - request may have to read all headers into memory and store that - data, sorted by the location of the file data. Subsequent header - read requests will skip forward to the beginning of the file data - and return the corresponding header. - Read Data - The read data interface supports sparse files; this requires that - each call return a block of data specifying the file offset and - size. This may require you to carefully track the location so - that you can return accurate file offsets for each read. Remem- - ber that the decompressor will return as much data as it has. - Generally, you will want to request one byte, examine the return - value to see how much data is available, and possibly trim that - to the amount you can use. You should invoke consume for each - block just before you return it. - Skip All Data - The skip data call should skip over all file data and trailing - padding. This is called automatically by the API layer just - before each header read. It is also called in response to the - client calling the public data_skip() function. - Cleanup - On cleanup, the format should release all of its allocated mem- - ory. - - API Layer - XXX to do XXX - -WRITE ARCHITECTURE - The write API has a similar set of four layers: an API layer, a format - layer, a compression layer, and an I/O layer. The registration here is - much simpler because only one format and one compression can be regis- - tered at a time. - - I/O Layer and Client Callbacks - XXX To be written XXX - - Compression Layer - XXX To be written XXX - - Format Layer - XXX To be written XXX - - API Layer - XXX To be written XXX - -WRITE_DISK ARCHITECTURE - The write_disk API is intended to look just like the write API to - clients. Since it does not handle multiple formats or compression, it is - not layered internally. - -GENERAL SERVICES - The archive_read, archive_write, and archive_write_disk objects all con- - tain an initial archive object which provides common support for a set of - standard services. (Recall that ANSI/ISO C90 guarantees that you can - cast freely between a pointer to a structure and a pointer to the first - element of that structure.) The archive object has a magic value that - indicates which API this object is associated with, slots for storing - error information, and function pointers for virtualized API functions. - -MISCELLANEOUS NOTES - Connecting existing archiving libraries into libarchive is generally - quite difficult. In particular, many existing libraries strongly assume - that you are reading from a file; they seek forwards and backwards as - necessary to locate various pieces of information. In contrast, - libarchive never seeks backwards in its input, which sometimes requires - very different approaches. - - For example, libarchive's ISO9660 support operates very differently from - most ISO9660 readers. The libarchive support utilizes a work-queue - design that keeps a list of known entries sorted by their location in the - input. Whenever libarchive's ISO9660 implementation is asked for the - next header, checks this list to find the next item on the disk. Direc- - tories are parsed when they are encountered and new items are added to - the list. This design relies heavily on the ISO9660 image being opti- - mized so that directories always occur earlier on the disk than the files - they describe. - - Depending on the specific format, such approaches may not be possible. - The ZIP format specification, for example, allows archivers to store key - information only at the end of the file. In theory, it is possible to - create ZIP archives that cannot be read without seeking. Fortunately, - such archives are very rare, and libarchive can read most ZIP archives, - though it cannot always extract as much information as a dedicated ZIP - program. - -SEE ALSO - archive(3), archive_entry(3), archive_read(3), archive_write(3), - archive_write_disk(3) - -HISTORY - The libarchive library first appeared in FreeBSD 5.3. - -AUTHORS - The libarchive library was written by Tim Kientzle <kientzle@acm.org>. - -BUGS -FreeBSD 8.0 April 16, 2007 FreeBSD 8.0 |
