Version: | 1.0.3 |
Type: | Package |
Title: | Compression and Decompression |
Author: | Semjon Geist [aut, cre] |
License: | MIT + file LICENSE |
URL: | https://github.com/sgeist-ionos/R-zlib |
BugReports: | https://github.com/sgeist-ionos/R-zlib/issues |
Description: | The 'zlib' package for R aims to offer an R-based equivalent of 'Python's' built-in 'zlib' module for data compression and decompression. This package provides a suite of functions for working with 'zlib' compression, including utilities for compressing and decompressing data streams, manipulating compressed files, and working with 'gzip', 'zlib', and 'deflate' formats. |
Depends: | R (≥ 3.6.0) |
Imports: | Rcpp |
Suggests: | testthat (≥ 3.0.0) |
LinkingTo: | Rcpp |
Encoding: | UTF-8 |
Language: | en-US |
RoxygenNote: | 7.2.3 |
Config/testthat/edition: | 3 |
NeedsCompilation: | yes |
Packaged: | 2023-10-18 14:40:04 UTC; sgeist |
Maintainer: | Semjon Geist <mail@semjon-geist.de> |
Repository: | CRAN |
Date/Publication: | 2023-10-18 20:50:02 UTC |
Single-step compression of raw data
Description
Compresses the provided raw data in a single step.
Usage
compress(
data,
level = -1,
method = zlib$DEFLATED,
wbits = zlib$MAX_WBITS,
memLevel = zlib$DEF_MEM_LEVEL,
strategy = zlib$Z_DEFAULT_STRATEGY,
zdict = NULL
)
Arguments
data |
Raw data to be compressed. |
level |
Compression level, default is -1. |
method |
Compression method, default is |
wbits |
Window bits, default is |
memLevel |
Memory level, default is |
strategy |
Compression strategy, default is |
zdict |
Optional predefined compression dictionary as a raw vector. |
Details
The compress
function simplifies the compression process by encapsulating
the creation of a compression object, compressing the data, and flushing the buffer
all within a single call. This is particularly useful for scenarios where the user
wants to quickly compress data without dealing with the intricacies of compression
objects and buffer management. The function leverages the compressobj
function
to handle the underlying compression mechanics.
Value
A raw vector containing the compressed data.
Examples
compressed_data <- compress(charToRaw("some data"))
Compress a Chunk of Data
Description
Compresses a given chunk of raw binary data using a pre-existing compressor object.
Usage
compress_chunk(compressorPtr, input_chunk)
Arguments
compressorPtr |
An external pointer to an existing compressor object.
This object is usually initialized by calling a different function like |
input_chunk |
A raw vector containing the uncompressed data that needs to be compressed. |
Details
This function is primarily designed for use with a compressor object created by create_compressor()
.
It takes a chunk of raw data and compresses it, returning a raw vector of the compressed data.
Value
A raw vector containing the compressed data.
Examples
# Create a new compressor object for zlib -> wbts = 15
zlib_compressor <- create_compressor(wbits=31)
compressed_data <- compress_chunk(zlib_compressor, charToRaw("Hello, World"))
compressed_data <- c(compressed_data, flush_compressor_buffer(zlib_compressor))
decompressed_data <- memDecompress(compressed_data, type = "gzip")
cat(rawToChar(decompressed_data))
Create a Compression Object
Description
compressobj
initializes a new compression object with specified parameters
and methods. The function makes use of publicEval
to manage scope and encapsulation.
Usage
compressobj(
level = -1,
method = zlib$DEFLATED,
wbits = zlib$MAX_WBITS,
memLevel = zlib$DEF_MEM_LEVEL,
strategy = zlib$Z_DEFAULT_STRATEGY,
zdict = NULL
)
Arguments
level |
Compression level, default is -1. |
method |
Compression method, default is |
wbits |
Window bits, default is |
memLevel |
Memory level, default is |
strategy |
Compression strategy, default is |
zdict |
Optional predefined compression dictionary as a raw vector. |
Value
Returns an environment containing the public methods compress
and flush
.
Methods
-
compress(data)
: Compresses a chunk of data. -
flush()
: Flushes the compression buffer.
Examples
compressor <- compressobj(level = 6)
compressed_data <- compressor$compress(charToRaw("some data"))
compressed_data <- c(compressed_data, compressor$flush())
Create a new compressor object
Description
Initialize a new compressor object for zlib-based compression with specified settings.
Usage
create_compressor(
level = -1L,
method = 8L,
wbits = 15L,
memLevel = 8L,
strategy = 0L,
zdict = NULL
)
Arguments
level |
Compression level, integer between 0 and 9, or -1 for default. |
method |
Compression method. |
wbits |
Window size bits. |
memLevel |
Memory level for internal compression state. |
strategy |
Compression strategy. |
zdict |
Optional predefined compression dictionary as a raw vector. |
Value
A SEXP pointer to the new compressor object.
Examples
compressor <- create_compressor(level = 6, memLevel = 8)
Create a new decompressor object
Description
Initialize a new decompressor object for zlib-based decompression.
Usage
create_decompressor(wbits = 0L)
Arguments
wbits |
The window size bits parameter. Default is 0. |
Value
A SEXP pointer to the new decompressor object.
Examples
decompressor <- create_decompressor()
Single-step decompression of raw data
Description
Decompresses the provided compressed raw data in a single step.
Usage
decompress(data, wbits = 0)
Arguments
data |
Compressed raw data to be decompressed. |
wbits |
The window size bits parameter. Default is 0. |
Details
The decompress
function offers a streamlined approach to decompressing
raw data. By abstracting the creation of a decompression object, decompressing
the data, and flushing the buffer into one function call, it provides a hassle-free
way to retrieve original data from its compressed form. This function is designed
to work seamlessly with data compressed using the compress
function or
any other zlib-based compression method.
Value
A raw vector containing the decompressed data.
Examples
original_data <- charToRaw("some data")
compressed_data <- compress(original_data)
decompressed_data <- decompress(compressed_data)
Decompress a chunk of data
Description
Perform chunk-wise decompression on a given raw vector using a decompressor object.
Usage
decompress_chunk(decompressorPtr, input_chunk)
Arguments
decompressorPtr |
An external pointer to an initialized decompressor object. |
input_chunk |
A raw vector containing the compressed data chunk. |
Value
A raw vector containing the decompressed data.
Examples
rawToChar(decompress_chunk(create_decompressor(), memCompress(charToRaw("Hello, World"))))
Create a new decompressor object
Description
Initializes a new decompressor object for zlib-based decompression.
Usage
decompressobj(wbits = 0)
Arguments
wbits |
The window size bits parameter. Default is 0. |
Details
The returned decompressor object has methods for performing chunk-wise decompression on compressed data using the zlib library.
Value
A decompressor object with methods for decompression.
Methods
-
decompress(data)
: Compresses a chunk of data. -
flush()
: Flushes the compression buffer.
Examples
compressor <- zlib$compressobj(zlib$Z_DEFAULT_COMPRESSION, zlib$DEFLATED, zlib$MAX_WBITS + 16)
compressed_data <- compressor$compress(charToRaw("some data"))
compressed_data <- c(compressed_data, compressor$flush())
decompressor <- decompressobj(zlib$MAX_WBITS + 16)
decompressed_data <- c(decompressor$decompress(compressed_data), decompressor$flush())
Flush the internal buffer of the compressor object.
Description
This function flushes the internal buffer according to the specified mode.
Usage
flush_compressor_buffer(compressorPtr, mode = 4L)
Arguments
compressorPtr |
A SEXP pointer to an existing compressor object. |
mode |
A compression flush mode. Default is Z_FINISH. Available modes are Z_NO_FLUSH, Z_PARTIAL_FLUSH, Z_SYNC_FLUSH, Z_FULL_FLUSH, Z_BLOCK, and Z_FINISH. |
Value
A raw vector containing the flushed output.
Examples
compressor <- create_compressor()
# ... (some compression actions)
flushed_data <- flush_compressor_buffer(compressor)
Flush the internal buffer of the decompressor object.
Description
This function processes all pending input and returns the remaining uncompressed output. The function uses the provided initial buffer size and dynamically expands it as necessary to ensure all remaining data is decompressed. After calling this function, the decompress_chunk() method cannot be called again on the same object.
Usage
flush_decompressor_buffer(decompressorPtr, length = 256L)
Arguments
decompressorPtr |
A SEXP pointer to an existing decompressor object. |
length |
An optional parameter that sets the initial size of the output buffer. Default is 256. |
Value
A raw vector containing the remaining uncompressed output.
Examples
decompressor <- create_decompressor()
# ... (some decompression actions)
flushed_data <- flush_decompressor_buffer(decompressor)
Evaluate Expression with Public and Private Environments
Description
publicEval
creates an environment hierarchy consisting of
public, self, and private environments. The expression expr
is
evaluated within these nested environments, allowing for controlled
variable scope and encapsulation.
Usage
publicEval(expr, parentEnv = parent.frame(), name = NULL)
Arguments
expr |
An expression to evaluate within the constructed environment hierarchy. |
parentEnv |
The parent environment for the new 'public' environment. Default is the parent frame. |
name |
Optional name attribute to set for the public environment. |
Value
Returns an invisible reference to the public environment.
Environments
Public: Variables in this environment are externally accessible.
Self: Inherits from Public and also contains Private and Public as children.
Private: Variables are encapsulated and are not externally accessible.
Examples
publicEnv <- publicEval({
private$hidden_var <- "I am hidden"
public_var <- "I am public"
}, parentEnv = parent.frame(), name = "MyEnvironment")
print(exists("public_var", envir = publicEnv)) # Should return TRUE
print(exists("hidden_var", envir = publicEnv)) # Should return FALSE
Validate if a File is a Valid Gzip File
Description
This function takes a file path as input and checks if it's a valid gzip-compressed file.
It reads the file in chunks and tries to decompress it using the zlib library.
If any step fails, the function returns FALSE
. Otherwise, it returns TRUE
.
Usage
validate_gzip_file(file_path)
Arguments
file_path |
A string representing the path of the file to validate. |
Value
A boolean value indicating whether the file is a valid gzip file.
TRUE
if the file is valid, FALSE
otherwise.
Examples
validate_gzip_file("path/to/your/file.gz")
zlib
Description
What My Package Offers
This package provides several key features:
- Robustness:
Built to handle even corrupted or incomplete gzip data efficiently without causing system failures.
- Demonstration:
-
compressed_data <- memCompress(charToRaw(paste0(rep("This is an example string. It contains more than just 'hello, world!'", 1000), collapse = ", "))) decompressor <- zlib$decompressobj(zlib$MAX_WBITS) rawToChar(c(decompressor$decompress(compressed_data[1:300]), decompressor$flush())) # Still working
- Compliance:
Strict adherence to the GZIP File Format Specification, ensuring compatibility across systems.
- Demonstration:
-
compressor <- zlib$compressobj(zlib$Z_DEFAULT_COMPRESSION, zlib$DEFLATED, zlib$MAX_WBITS + 16) c(compressor$compress(charToRaw("Hello World")), compressor$flush()) # Correct 31 wbits (or custom wbits you provide) # [1] 1f 8b 08 00 00 00 00 00 00 03 f3 48 cd c9 c9 57 08 cf 2f ca 49 01 00 56 b1 17 4a 0b 00 00 00
- Flexibility:
Ability to manage Gzip streams from REST APIs without the need for temporary files or other workarounds.
- Demonstration:
-
# Byte-Range Request and decompression in chunks # Initialize the decompressor decompressor <- zlib$decompressobj(zlib$MAX_WBITS + 16) # Define the URL and initial byte ranges url <- "https://example.com/api/data.gz" range_start <- 0 range_increment <- 5000 # Adjust based on desired chunk size # Placeholder for the decompressed content decompressed_content <- character(0) # Loop to make multiple requests and decompress chunk by chunk for (i in 1:5) { # Adjust the loop count based on the number of chunks you want to retrieve range_end <- range_start + range_increment # Make a byte-range request response <- httr::GET(url, httr::add_headers(`Range` = paste0("bytes=", range_start, "-", range_end))) # Check if the request was successful if (httr::http_type(response) != "application/octet-stream" || httr::http_status(response)$category != "Success") { stop("Failed to retrieve data.") } # Decompress the received chunk compressed_data <- httr::content(response, "raw") decompressed_chunk <- decompressor$decompress(compressed_data) decompressed_content <- c(decompressed_content, rawToChar(decompressed_chunk)) # Update the byte range for the next request range_start <- range_end + 1 } # Flush the decompressor after all chunks have been processed final_data <- decompressor$flush() decompressed_content <- c(decompressed_content, rawToChar(final_data))
In summary, while R’s built-in methods could someday catch up in functionality, the zlib package for now fills an important gap by providing a more robust and flexible way to handle compression and decompression tasks.
Usage
.onLoad(libname, pkgname)
Details
The following 'zlib' enrivonment is generated by the .onLoad Behavior for R packages.
The .onLoad function is automatically called when the package is loaded using
library()
or require()
. It initializes the an environment,
which can be reached from anywhere and is unique (i.e. cannot be ovwerwritten),
including defining a variety of constants / methods related to the zlib compression
library.
Specifically, the function assigns a new environment named "zlib" containing
constants such as DEFLATED
, DEF_BUF_SIZE
, MAX_WBITS
,
and various flush and compression strategies like Z_FINISH
,
Z_BEST_COMPRESSION
, etc.
Value
No return value, called for side effect. An environment containing the zlib constants created onLoad.
Methods
-
compressobj(...)
: Create a compression object. -
decompressobj(...)
: Create a decompression object. -
compress(data, ...)
: Compress data in a single step. -
decompress(data, ...)
: Decompress data in a single step.
Constants
-
DEFLATED
: The compression method, set to 8. -
DEF_BUF_SIZE
: The default buffer size, set to 16384. -
DEF_MEM_LEVEL
: Default memory level, set to 8. -
MAX_WBITS
: Maximum size of the history buffer, set to 15. -
Z_BEST_COMPRESSION
: Best compression level, set to 9. -
Z_BEST_SPEED
: Best speed for compression, set to 1. -
Z_BLOCK
: Block compression mode, set to 5. -
Z_DEFAULT_COMPRESSION
: Default compression level, set to -1. -
Z_DEFAULT_STRATEGY
: Default compression strategy, set to 0. -
Z_FILTERED
: Filtered compression mode, set to 1. -
Z_FINISH
: Finish compression mode, set to 4. -
Z_FULL_FLUSH
: Full flush mode, set to 3. -
Z_HUFFMAN_ONLY
: Huffman-only compression mode, set to 2. -
Z_NO_COMPRESSION
: No compression, set to 0. -
Z_NO_FLUSH
: No flush mode, set to 0. -
Z_PARTIAL_FLUSH
: Partial flush mode, set to 1. -
Z_RLE
: Run-length encoding compression mode, set to 3. -
Z_SYNC_FLUSH
: Synchronized flush mode, set to 2. -
Z_TREES
: Tree block compression mode, set to 6.
See Also
publicEval()
for the method used to set up the public environment.
zlib_constants()
for the method used to set up the constants in the environment. https://www.zlib.net/manual.html#Constants
Examples
# Load the package
library(zlib)
# Create a temporary file
temp_file <- tempfile(fileext = ".txt")
# Generate example data and write to the temp file
example_data <- "This is an example string. It contains more than just 'hello, world!'"
writeBin(charToRaw(example_data), temp_file)
# Read data from the temp file into a raw vector
file_con <- file(temp_file, "rb")
raw_data <- readBin(file_con, "raw", file.info(temp_file)$size)
close(file_con)
# Create a Compressor object gzip
compressor <- zlib$compressobj(zlib$Z_DEFAULT_COMPRESSION, zlib$DEFLATED, zlib$MAX_WBITS + 16)
# Initialize variables for chunked compression
chunk_size <- 1024
compressed_data <- raw(0)
# Compress the data in chunks
for (i in seq(1, length(raw_data), by = chunk_size)) {
chunk <- raw_data[i:min(i + chunk_size - 1, length(raw_data))]
compressed_chunk <- compressor$compress(chunk)
compressed_data <- c(compressed_data, compressed_chunk)
}
# Flush the compressor buffer
compressed_data <- c(compressed_data, compressor$flush())
# Create a Decompressor object for gzip
decompressor <- zlib$decompressobj(zlib$MAX_WBITS + 16)
# Initialize variable for decompressed data
decompressed_data <- raw(0)
# Decompress the data in chunks
for (i in seq(1, length(compressed_data), by = chunk_size)) {
chunk <- compressed_data[i:min(i + chunk_size - 1, length(compressed_data))]
decompressed_chunk <- decompressor$decompress(chunk)
decompressed_data <- c(decompressed_data, decompressed_chunk)
}
# Flush the decompressor buffer
decompressed_data <- c(decompressed_data, decompressor$flush())
# Comporess / Decompress data in a single step
original_data <- charToRaw("some data")
compressed_data <- zlib$compress(original_data,
zlib$Z_DEFAULT_COMPRESSION,
zlib$DEFLATED,
zlib$MAX_WBITS + 16)
decompressed_data <- zlib$decompress(compressed_data, zlib$MAX_WBITS + 16)
Retrieve zlib Constants
Description
This function returns a list of constants from the zlib C library.
Usage
zlib_constants()
Details
The constants are defined as follows:
-
DEFLATED
: The compression method, set to 8. -
DEF_BUF_SIZE
: The default buffer size, set to 16384. -
DEF_MEM_LEVEL
: Default memory level, set to 8. -
MAX_WBITS
: Maximum size of the history buffer, set to 15. -
Z_BEST_COMPRESSION
: Best compression level, set to 9. -
Z_BEST_SPEED
: Best speed for compression, set to 1. -
Z_BLOCK
: Block compression mode, set to 5. -
Z_DEFAULT_COMPRESSION
: Default compression level, set to -1. -
Z_DEFAULT_STRATEGY
: Default compression strategy, set to 0. -
Z_FILTERED
: Filtered compression mode, set to 1. -
Z_FINISH
: Finish compression mode, set to 4. -
Z_FULL_FLUSH
: Full flush mode, set to 3. -
Z_HUFFMAN_ONLY
: Huffman-only compression mode, set to 2. -
Z_NO_COMPRESSION
: No compression, set to 0. -
Z_NO_FLUSH
: No flush mode, set to 0. -
Z_PARTIAL_FLUSH
: Partial flush mode, set to 1. -
Z_RLE
: Run-length encoding compression mode, set to 3. -
Z_SYNC_FLUSH
: Synchronized flush mode, set to 2. -
Z_TREES
: Tree block compression mode, set to 6.
Value
A named list of zlib constants.
Examples
constants <- zlib_constants()