Python gzip module

28/12/2020

In this lesson, we will study about how we can make use of Python gzip module to read & writing into the compressed files in Python. The biggest feature this module provides to us is that we can treat compressed files as normal File objects which saves us from the complexity of managing files and their lifecycle in our code and allows us to focus on core business logic of the program.The gzip module provides us almost the same features as the GNU programs like gunzip and gzip.

Writing Compressed Files with open()

We will start with a basic example where we can create a gzip file and write some data into it. For this, we need to make a file and open it with write mode so that data can be inserted into it. Let’s look at a sample program with which we can write data into a gzip file:

import gzip
import io
import os

output_file = ‘linxhint_demo.txt.gz’
write_mode = ‘wb’

with gzip.open(output_file, write_mode) as output:
with io.TextIOWrapper(output, encoding=‘utf-8’) as encode:
encode.write(‘We can write anything we want to the file.n’)

print(output_file,
‘contains’, os.stat(output_file).st_size, ‘bytes’)
os.system(‘file -b –mime {}’.format(output_file))

Here is what we get back with this command:

Writing to zip file

If you now take a look at the folder structure where you executed this script, there should be a new file named with what we provided in our program above.

Writing multiple lines into a compressed file

We can also write multiple lines or actually any number of lines in our gzip file in a very similar fashion as we did in the previous example. To make this example different, we will make use of itertools module as well. Let’s look at the sample program:

import gzip
import io
import os
import itertools

output_file = ‘linxhint_demo.txt.gz’
write_mode = ‘wb’

with gzip.open(output_file, write_mode) as output:
with io.TextIOWrapper(output, encoding=‘utf-8’) as enc:
enc.writelines(
itertools.repeat(‘LinuxHint, repeating same line!.n’, 10)
)

os.system(‘gzcat linxhint_demo.txt.gz’)

Let’s see the output for this command:

Writing multiple lines

Reading Compressed Data

We can also read the compressed file we created in the last example using the gzip module with a very simple call to open function:

import gzip
import io
import os

file_name = ‘linxhint_demo.txt.gz’
file_mode = ‘rb’

with gzip.open(file_name, file_mode) as input_file:
with io.TextIOWrapper(input_file, encoding=‘utf-8’) as dec:
print(dec.read())

Here is what we get back with this command:

Reading a gzip file

Reading Streams

Due to the fact that text files can be very big in size, it is smart to open these files in a stream rather than loading the complete file in a single object which occupies a lot of system’s memory and in some cases may even cause the process to crash completely. Let’s look at a sample program which read the given compressed file in a stream:

import gzip
from io import BytesIO
import binascii

mode_write = ‘wb’
mode_read = ‘rb’

non_compressed = b‘Repeated line x times.n’ * 8
print(‘Non compressed Data:’, len(non_compressed))
print(non_compressed)

buf = BytesIO()
with gzip.GzipFile(mode=mode_write, fileobj=buf) as file:
file.write(non_compressed)

compressed = buf.getvalue()
print(‘Compressed Data:’, len(compressed))
print(binascii.hexlify(compressed))

in_buffer = BytesIO(compressed)
with gzip.GzipFile(mode=mode_read, fileobj=in_buffer) as file:
read_data = file.read(len(non_compressed))

print(‘nReading it again:’, len(read_data))
print(read_data)

Let’s see the output for this command:

Reading gzip file in a Stream

Although program was a bit long, we actually just used Python modules open the file and stream the content on to the console with a buffered reader object.

Conclusion

In this lesson, we looked at how we can make use of Python gzip module to compress and decompress files in Python. The biggest feature this library provides to us is that we can treat compressed files as normal File objects.

Read more Python based posts here.

ONET IDC thành lập vào năm 2012, là công ty chuyên nghiệp tại Việt Nam trong lĩnh vực cung cấp dịch vụ Hosting, VPS, máy chủ vật lý, dịch vụ Firewall Anti DDoS, SSL… Với 10 năm xây dựng và phát triển, ứng dụng nhiều công nghệ hiện đại, ONET IDC đã giúp hàng ngàn khách hàng tin tưởng lựa chọn, mang lại sự ổn định tuyệt đối cho website của khách hàng để thúc đẩy việc kinh doanh đạt được hiệu quả và thành công.
Bài viết liên quan

10 Best Math Libraries for Python

Many times, when you write programs you need to use special functions that others have used before you. When this happens,...
29/12/2020

Heatmaps and Colorbars in Matplotlib

Data visualization is one of the most crucial step in Data Science (or any other science, for that matter). We, as humans,...
28/12/2020

Parsing HTML using Python

Parsing HTML is one of the most common task done today to collect information from the websites and mine it for various...
28/12/2020
Bài Viết

Bài Viết Mới Cập Nhật

Mua proxy v4 chạy socks5 để chơi game an toàn, tốc độ cao ở đâu?
18/05/2024

Thuê mua proxy Telegram trọn gói, tốc độ cao, giá siêu hời
18/05/2024

Thuê mua proxy Viettel ở đâu uy tín, chất lượng và giá tốt? 
14/05/2024

Dịch vụ thuê mua proxy US UK uy tín, chất lượng số #1
13/05/2024

Thuê mua proxy Việt Nam: Báo giá & các thông tin MỚI NHẤT
13/05/2024