Version: 1.0.3
Type: Package
Title: Compression and Decompression
Author: Semjon Geist [aut, cre]
License: MIT + file LICENSE
URL: https://github.com/sgeist-ionos/R-zlib
BugReports: https://github.com/sgeist-ionos/R-zlib/issues
Description: The 'zlib' package for R aims to offer an R-based equivalent of 'Python's' built-in 'zlib' module for data compression and decompression. This package provides a suite of functions for working with 'zlib' compression, including utilities for compressing and decompressing data streams, manipulating compressed files, and working with 'gzip', 'zlib', and 'deflate' formats.
Depends: R (≥ 3.6.0)
Imports: Rcpp
Suggests: testthat (≥ 3.0.0)
LinkingTo: Rcpp
Encoding: UTF-8
Language: en-US
RoxygenNote: 7.2.3
Config/testthat/edition: 3
NeedsCompilation: yes
Packaged: 2023-10-18 14:40:04 UTC; sgeist
Maintainer: Semjon Geist <mail@semjon-geist.de>
Repository: CRAN
Date/Publication: 2023-10-18 20:50:02 UTC

Single-step compression of raw data

Description

Compresses the provided raw data in a single step.

Usage

compress(
  data,
  level = -1,
  method = zlib$DEFLATED,
  wbits = zlib$MAX_WBITS,
  memLevel = zlib$DEF_MEM_LEVEL,
  strategy = zlib$Z_DEFAULT_STRATEGY,
  zdict = NULL
)

Arguments

data

Raw data to be compressed.

level

Compression level, default is -1.

method

Compression method, default is zlib$DEFLATED.

wbits

Window bits, default is zlib$MAX_WBITS.

memLevel

Memory level, default is zlib$DEF_MEM_LEVEL.

strategy

Compression strategy, default is zlib$Z_DEFAULT_STRATEGY.

zdict

Optional predefined compression dictionary as a raw vector.

Details

The compress function simplifies the compression process by encapsulating the creation of a compression object, compressing the data, and flushing the buffer all within a single call. This is particularly useful for scenarios where the user wants to quickly compress data without dealing with the intricacies of compression objects and buffer management. The function leverages the compressobj function to handle the underlying compression mechanics.

Value

A raw vector containing the compressed data.

Examples

compressed_data <- compress(charToRaw("some data"))


Compress a Chunk of Data

Description

Compresses a given chunk of raw binary data using a pre-existing compressor object.

Usage

compress_chunk(compressorPtr, input_chunk)

Arguments

compressorPtr

An external pointer to an existing compressor object. This object is usually initialized by calling a different function like create_compressor().

input_chunk

A raw vector containing the uncompressed data that needs to be compressed.

Details

This function is primarily designed for use with a compressor object created by create_compressor(). It takes a chunk of raw data and compresses it, returning a raw vector of the compressed data.

Value

A raw vector containing the compressed data.

Examples

# Create a new compressor object for zlib -> wbts = 15
zlib_compressor <- create_compressor(wbits=31)
compressed_data <- compress_chunk(zlib_compressor, charToRaw("Hello, World"))
compressed_data <- c(compressed_data, flush_compressor_buffer(zlib_compressor))
decompressed_data <- memDecompress(compressed_data, type = "gzip")
cat(rawToChar(decompressed_data))

Create a Compression Object

Description

compressobj initializes a new compression object with specified parameters and methods. The function makes use of publicEval to manage scope and encapsulation.

Usage

compressobj(
             level = -1,
             method = zlib$DEFLATED,
             wbits = zlib$MAX_WBITS,
             memLevel = zlib$DEF_MEM_LEVEL,
             strategy = zlib$Z_DEFAULT_STRATEGY,
             zdict = NULL
         )

Arguments

level

Compression level, default is -1.

method

Compression method, default is zlib$DEFLATED.

wbits

Window bits, default is zlib$MAX_WBITS.

memLevel

Memory level, default is zlib$DEF_MEM_LEVEL.

strategy

Compression strategy, default is zlib$Z_DEFAULT_STRATEGY.

zdict

Optional predefined compression dictionary as a raw vector.

Value

Returns an environment containing the public methods compress and flush.

Methods

Examples

compressor <- compressobj(level = 6)
compressed_data <- compressor$compress(charToRaw("some data"))
compressed_data <- c(compressed_data, compressor$flush())


Create a new compressor object

Description

Initialize a new compressor object for zlib-based compression with specified settings.

Usage

create_compressor(
  level = -1L,
  method = 8L,
  wbits = 15L,
  memLevel = 8L,
  strategy = 0L,
  zdict = NULL
)

Arguments

level

Compression level, integer between 0 and 9, or -1 for default.

method

Compression method.

wbits

Window size bits.

memLevel

Memory level for internal compression state.

strategy

Compression strategy.

zdict

Optional predefined compression dictionary as a raw vector.

Value

A SEXP pointer to the new compressor object.

Examples

compressor <- create_compressor(level = 6, memLevel = 8)

Create a new decompressor object

Description

Initialize a new decompressor object for zlib-based decompression.

Usage

create_decompressor(wbits = 0L)

Arguments

wbits

The window size bits parameter. Default is 0.

Value

A SEXP pointer to the new decompressor object.

Examples

decompressor <- create_decompressor()

Single-step decompression of raw data

Description

Decompresses the provided compressed raw data in a single step.

Usage

decompress(data, wbits = 0)

Arguments

data

Compressed raw data to be decompressed.

wbits

The window size bits parameter. Default is 0.

Details

The decompress function offers a streamlined approach to decompressing raw data. By abstracting the creation of a decompression object, decompressing the data, and flushing the buffer into one function call, it provides a hassle-free way to retrieve original data from its compressed form. This function is designed to work seamlessly with data compressed using the compress function or any other zlib-based compression method.

Value

A raw vector containing the decompressed data.

Examples

original_data <- charToRaw("some data")
compressed_data <- compress(original_data)
decompressed_data <- decompress(compressed_data)


Decompress a chunk of data

Description

Perform chunk-wise decompression on a given raw vector using a decompressor object.

Usage

decompress_chunk(decompressorPtr, input_chunk)

Arguments

decompressorPtr

An external pointer to an initialized decompressor object.

input_chunk

A raw vector containing the compressed data chunk.

Value

A raw vector containing the decompressed data.

Examples

rawToChar(decompress_chunk(create_decompressor(), memCompress(charToRaw("Hello, World"))))

Create a new decompressor object

Description

Initializes a new decompressor object for zlib-based decompression.

Usage

decompressobj(wbits = 0)

Arguments

wbits

The window size bits parameter. Default is 0.

Details

The returned decompressor object has methods for performing chunk-wise decompression on compressed data using the zlib library.

Value

A decompressor object with methods for decompression.

Methods

Examples

compressor <- zlib$compressobj(zlib$Z_DEFAULT_COMPRESSION, zlib$DEFLATED, zlib$MAX_WBITS + 16)
compressed_data <- compressor$compress(charToRaw("some data"))
compressed_data <- c(compressed_data, compressor$flush())
decompressor <- decompressobj(zlib$MAX_WBITS + 16)
decompressed_data <- c(decompressor$decompress(compressed_data), decompressor$flush())


Flush the internal buffer of the compressor object.

Description

This function flushes the internal buffer according to the specified mode.

Usage

flush_compressor_buffer(compressorPtr, mode = 4L)

Arguments

compressorPtr

A SEXP pointer to an existing compressor object.

mode

A compression flush mode. Default is Z_FINISH. Available modes are Z_NO_FLUSH, Z_PARTIAL_FLUSH, Z_SYNC_FLUSH, Z_FULL_FLUSH, Z_BLOCK, and Z_FINISH.

Value

A raw vector containing the flushed output.

Examples

compressor <- create_compressor()
# ... (some compression actions)
flushed_data <- flush_compressor_buffer(compressor)

Flush the internal buffer of the decompressor object.

Description

This function processes all pending input and returns the remaining uncompressed output. The function uses the provided initial buffer size and dynamically expands it as necessary to ensure all remaining data is decompressed. After calling this function, the decompress_chunk() method cannot be called again on the same object.

Usage

flush_decompressor_buffer(decompressorPtr, length = 256L)

Arguments

decompressorPtr

A SEXP pointer to an existing decompressor object.

length

An optional parameter that sets the initial size of the output buffer. Default is 256.

Value

A raw vector containing the remaining uncompressed output.

Examples

decompressor <- create_decompressor()
# ... (some decompression actions)
flushed_data <- flush_decompressor_buffer(decompressor)

Evaluate Expression with Public and Private Environments

Description

publicEval creates an environment hierarchy consisting of public, self, and private environments. The expression expr is evaluated within these nested environments, allowing for controlled variable scope and encapsulation.

Usage

publicEval(expr, parentEnv = parent.frame(), name = NULL)

Arguments

expr

An expression to evaluate within the constructed environment hierarchy.

parentEnv

The parent environment for the new 'public' environment. Default is the parent frame.

name

Optional name attribute to set for the public environment.

Value

Returns an invisible reference to the public environment.

Environments

Examples

publicEnv <- publicEval({
  private$hidden_var <- "I am hidden"
  public_var <- "I am public"
}, parentEnv = parent.frame(), name = "MyEnvironment")

print(exists("public_var", envir = publicEnv))  # Should return TRUE
print(exists("hidden_var", envir = publicEnv))  # Should return FALSE


Validate if a File is a Valid Gzip File

Description

This function takes a file path as input and checks if it's a valid gzip-compressed file. It reads the file in chunks and tries to decompress it using the zlib library. If any step fails, the function returns FALSE. Otherwise, it returns TRUE.

Usage

validate_gzip_file(file_path)

Arguments

file_path

A string representing the path of the file to validate.

Value

A boolean value indicating whether the file is a valid gzip file. TRUE if the file is valid, FALSE otherwise.

Examples

validate_gzip_file("path/to/your/file.gz")

zlib

Description

What My Package Offers

This package provides several key features:

Robustness:

Built to handle even corrupted or incomplete gzip data efficiently without causing system failures.

Demonstration:
  compressed_data <- memCompress(charToRaw(paste0(rep("This is an example string. It contains more than just 'hello, world!'", 1000), collapse = ", ")))
  decompressor <- zlib$decompressobj(zlib$MAX_WBITS)
  rawToChar(c(decompressor$decompress(compressed_data[1:300]), decompressor$flush()))  # Still working
  
Compliance:

Strict adherence to the GZIP File Format Specification, ensuring compatibility across systems.

Demonstration:
  compressor <- zlib$compressobj(zlib$Z_DEFAULT_COMPRESSION, zlib$DEFLATED, zlib$MAX_WBITS + 16)
  c(compressor$compress(charToRaw("Hello World")), compressor$flush())  # Correct 31 wbits (or custom wbits you provide)
  # [1] 1f 8b 08 00 00 00 00 00 00 03 f3 48 cd c9 c9 57 08 cf 2f ca 49 01 00 56 b1 17 4a 0b 00 00 00
  
Flexibility:

Ability to manage Gzip streams from REST APIs without the need for temporary files or other workarounds.

Demonstration:
    # Byte-Range Request and decompression in chunks

    # Initialize the decompressor
    decompressor <- zlib$decompressobj(zlib$MAX_WBITS + 16)

    # Define the URL and initial byte ranges
    url <- "https://example.com/api/data.gz"
    range_start <- 0
    range_increment <- 5000  # Adjust based on desired chunk size

    # Placeholder for the decompressed content
    decompressed_content <- character(0)

    # Loop to make multiple requests and decompress chunk by chunk
    for (i in 1:5) {  # Adjust the loop count based on the number of chunks you want to retrieve
      range_end <- range_start + range_increment

      # Make a byte-range request
      response <- httr::GET(url, httr::add_headers(`Range` = paste0("bytes=", range_start, "-", range_end)))

      # Check if the request was successful
      if (httr::http_type(response) != "application/octet-stream" || httr::http_status(response)$category != "Success") {
        stop("Failed to retrieve data.")
      }

      # Decompress the received chunk
      compressed_data <- httr::content(response, "raw")
      decompressed_chunk <- decompressor$decompress(compressed_data)
      decompressed_content <- c(decompressed_content, rawToChar(decompressed_chunk))

      # Update the byte range for the next request
      range_start <- range_end + 1
    }

    # Flush the decompressor after all chunks have been processed
    final_data <- decompressor$flush()
    decompressed_content <- c(decompressed_content, rawToChar(final_data))
  

In summary, while R’s built-in methods could someday catch up in functionality, the zlib package for now fills an important gap by providing a more robust and flexible way to handle compression and decompression tasks.

Usage

.onLoad(libname, pkgname)

Details

The following 'zlib' enrivonment is generated by the .onLoad Behavior for R packages.

The .onLoad function is automatically called when the package is loaded using library() or require(). It initializes the an environment, which can be reached from anywhere and is unique (i.e. cannot be ovwerwritten), including defining a variety of constants / methods related to the zlib compression library.

Specifically, the function assigns a new environment named "zlib" containing constants such as DEFLATED, DEF_BUF_SIZE, MAX_WBITS, and various flush and compression strategies like Z_FINISH, Z_BEST_COMPRESSION, etc.

Value

No return value, called for side effect. An environment containing the zlib constants created onLoad.

Methods

Constants

See Also

publicEval() for the method used to set up the public environment.

zlib_constants() for the method used to set up the constants in the environment. https://www.zlib.net/manual.html#Constants

Examples

# Load the package
library(zlib)
# Create a temporary file
temp_file <- tempfile(fileext = ".txt")

# Generate example data and write to the temp file
example_data <- "This is an example string. It contains more than just 'hello, world!'"
writeBin(charToRaw(example_data), temp_file)

# Read data from the temp file into a raw vector
file_con <- file(temp_file, "rb")
raw_data <- readBin(file_con, "raw", file.info(temp_file)$size)
close(file_con)
# Create a Compressor object gzip
compressor <- zlib$compressobj(zlib$Z_DEFAULT_COMPRESSION, zlib$DEFLATED, zlib$MAX_WBITS + 16)

# Initialize variables for chunked compression
chunk_size <- 1024
compressed_data <- raw(0)

# Compress the data in chunks
for (i in seq(1, length(raw_data), by = chunk_size)) {
   chunk <- raw_data[i:min(i + chunk_size - 1, length(raw_data))]
   compressed_chunk <- compressor$compress(chunk)
   compressed_data <- c(compressed_data, compressed_chunk)
}

# Flush the compressor buffer
compressed_data <- c(compressed_data, compressor$flush())


# Create a Decompressor object for gzip
decompressor <- zlib$decompressobj(zlib$MAX_WBITS + 16)

# Initialize variable for decompressed data
decompressed_data <- raw(0)

# Decompress the data in chunks
for (i in seq(1, length(compressed_data), by = chunk_size)) {
  chunk <- compressed_data[i:min(i + chunk_size - 1, length(compressed_data))]
  decompressed_chunk <- decompressor$decompress(chunk)
  decompressed_data <- c(decompressed_data, decompressed_chunk)
}

# Flush the decompressor buffer
decompressed_data <- c(decompressed_data, decompressor$flush())

# Comporess / Decompress data in a single step

original_data <- charToRaw("some data")
compressed_data <- zlib$compress(original_data,
                                 zlib$Z_DEFAULT_COMPRESSION,
                                 zlib$DEFLATED,
                                 zlib$MAX_WBITS + 16)
decompressed_data <- zlib$decompress(compressed_data, zlib$MAX_WBITS + 16)


Retrieve zlib Constants

Description

This function returns a list of constants from the zlib C library.

Usage

zlib_constants()

Details

The constants are defined as follows:

Value

A named list of zlib constants.

Examples

constants <- zlib_constants()