Read a normal file with a Linux syscall
The best way to start working with this function is by reading a normal file. This is the simplest way to use that syscall, and for a reason: it doesn’t have as much constraints as other types of stream or pipe. If you think about it that’s logic, when you read the output of another application, you need to have some output ready before reading it and so you will need wait for this application to write this output.
First, a key difference with the standard library: There is no buffering at all. Each time you call the read function, you will call the Linux Kernel, and so this is going to take time – it’s almost instant if you call it once, but can slow you down if you call it thousands of times in a second. By comparison the standard library will buffer the input for you. So whenever you call read, you should read more than a few bytes, but rather a big buffer like few kilobytes – except if what you need is really few bytes, for example if you check if a file exists and isn’t empty.
This however has a benefit: each time you call read, you are sure you get the updated data, if any other application modifies currently the file. This is especially useful for special files such as those in /proc or /sys.
Time to show you with a real example. This C program checks if the file is PNG or not. To do so, it reads the file specified in the path you provide in command line argument, and it checks if the first 8 bytes corresponds to a PNG header.
Here’s the code:
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
typedef enum {
IS_PNG,
TOO_SHORT,
INVALID_HEADER
} pngStatus_t;
unsigned int isSyscallSuccessful(const ssize_t readStatus) {
return readStatus >= 0;
}
/*
* checkPngHeader is checking if the pngFileHeader array corresponds to a PNG
* file header.
*
* Currently it only checks the first 8 bytes of the array. If the array is less
* than 8 bytes, TOO_SHORT is returned.
*
* pngFileHeaderLength must cintain the kength of tye array. Any invalid value
* may lead to undefined behavior, such as application crashing.
*
* Returns IS_PNG if it corresponds to a PNG file header. If there’s at least
* 8 bytes in the array but it isn’t a PNG header, INVALID_HEADER is returned.
*
*/
pngStatus_t checkPngHeader(const unsigned char* const pngFileHeader,
size_t pngFileHeaderLength) { const unsigned char expectedPngHeader[8] =
{0x89, 0x50, 0x4E, 0x47, 0x0D, 0x0A, 0x1A, 0x0A};
int i = 0;
if (pngFileHeaderLength < sizeof(expectedPngHeader)) {
return TOO_SHORT;
}
for (i = 0; i < sizeof(expectedPngHeader); i++) {
if (pngFileHeader[i] != expectedPngHeader[i]) {
return INVALID_HEADER;
}
}
/* If it reaches here, all first 8 bytes conforms to a PNG header. */
return IS_PNG;
}
int main(int argumentLength, char *argumentList[]) {
char *pngFileName = NULL;
unsigned char pngFileHeader[8] = {0};
ssize_t readStatus = 0;
/* Linux uses a number to identify a open file. */
int pngFile = 0;
pngStatus_t pngCheckResult;
if (argumentLength != 2) {
fputs("You must call this program using isPng {your filename}.n", stderr);
return EXIT_FAILURE;
}
pngFileName = argumentList[1];
pngFile = open(pngFileName, O_RDONLY);
if (pngFile == –1) {
perror("Opening the provided file failed");
return EXIT_FAILURE;
}
/* Read few bytes to identify if the file is PNG. */
readStatus = read(pngFile, pngFileHeader, sizeof(pngFileHeader));
if (isSyscallSuccessful(readStatus)) {
/* Check if the file is a PNG since it got the data. */
pngCheckResult = checkPngHeader(pngFileHeader, readStatus);
if (pngCheckResult == TOO_SHORT) {
printf("The file %s isn’t a PNG file: it’s too short.n", pngFileName);
} else if (pngCheckResult == IS_PNG) {
printf("The file %s is a PNG file!n", pngFileName);
} else {
printf("The file %s is not in PNG format.n", pngFileName);
}
} else {
perror("Reading the file failed");
return EXIT_FAILURE;
}
/* Close the file… */
if (close(pngFile) == –1) {
perror("Closing the provided file failed");
return EXIT_FAILURE;
}
pngFile = 0;
return EXIT_SUCCESS;
}
See, it’s a full blown, working and compilable example. Don’t hesitate to compile it yourself and test it, it really works. You should call the program from a terminal like this:
Now, let’s focus on the read call itself:
if (pngFile == –1) {
perror("Opening the provided file failed");
return EXIT_FAILURE;
}
/* Read few bytes to identify if the file is PNG. */
readStatus = read(pngFile, pngFileHeader, sizeof(pngFileHeader));
The read signature is the following (extracted from Linux man-pages):
First, the fd argument represents the file descriptor. I have explained a bit this concept in my fork article. A file descriptor is a int representing an open file, socket, pipe, FIFO, device, well it’s a lot of things where data can be read or written, generally in a stream-like way. I’ll go more in depth about that in a future article.
open function is one of the way to tell to Linux: I want to do things with the file at that path, please find it where it is and give me access to it. It will give you back this int called file descriptor and now, if you want to do anything with this file, use that number. Don’t forget to call close when you’re done with the file, as in the example.
So you need to provide this special number to read. Then there’s the buf argument. You should here provide a pointer to the array where read will store your data. Finally, count is how many bytes it will read at most.
The return value is of ssize_t type. Weird type, isn’t it? It means “signed size_t”, basically it’s a long int. It returns the number of bytes it successfully reads, or -1 if there’s a problem. You can find the exact cause of the problem in the errno global variable created by Linux, defined in <errno.h>. But to print an error message, using perror is better as it prints errno on your behalf.
In normal files – and only in this case – read will return less than count only if you have reached the file’s end. The buf array you provide must be big enough to fit at least count bytes, or your program may crash or create a security bug.
Now, read is not only useful for normal files and if you want to feel its super-powers – Yes I know it’s not in any Marvel’s comics but it has true powers – you will want to use it with other streams such as pipes or sockets. Let’s take a look on that:
Linux special files and read system call
The fact read works with a variety of files such as pipes, sockets, FIFOs or special devices such as a disk or serial port is what makes it really more powerful. With some adaptations, you can do really interesting things. Firstly, this means you can literally write functions working on a file and use it with a pipe instead. That’s interesting to pass data without ever hitting disk, ensuring best performance.
However this triggers special rules as well. Let’s take the example of a reading a line from terminal compared to a normal file. When you call read on a normal file, it only needs few milliseconds to Linux to get the amount of data you request.
But when it comes to terminal, that’s another story: let’s say you ask for an username. The user is typing in terminal her/his username and press Enter. Now you follow my advice above and you call read with a big buffer such as 256 bytes.
If read worked like it did with files, it would wait for the user to type 256 characters before returning! Your user would wait forever, and then sadly kill your application. It’s certainly not what you want, and you would have a big problem.
Okay, you could read one byte at a time but this workaround is terribly inefficient, as I told you above. It must work better than that.
But Linux developers thought read differently to avoid this problem:
- When you read normal files, it tries as much as possible to read count bytes and it will actively get bytes from disk if that’s needed.
- For all other file types, it will return as soon as there’s some data available and at most count bytes:
- For terminals, it’s generally when the user presses Enter key.
- For TCP sockets, it’s as soon as your computer receives something, doesn’t matter the amount of bytes it gets.
- For FIFO or pipes, it’s generally the same amount as what the other application wrote, but the Linux kernel can deliver less at a time if that’s more convenient.
So you can safely call with your 2 KiB buffer without staying locked up forever. Note it can also get interrupted if the application receives a signal. As reading from all these sources can take seconds or even hours – until the other side decides to write, after all – being interrupted by signals allows to stop staying blocked for too long.
This also has a drawback though: when you want to exactly read 2 KiB with these special files, you’ll need to check read’s return value and call read multiple times. read will rarely fill your whole buffer. If your application uses signals, you’ll also need to check if read failed with -1 because it was interrupted by a signal, using errno.
Let me show you how it can be interesting to use this special property of read:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <signal.h>
#include <errno.h>
/*
* isSignal tells if read syscall has been interrupted by a signal.
*
* Returns TRUE if the read syscall has been interrupted by a signal.
*
* Global variables: it reads errno defined in errno.h
*/
unsigned int isSignal(const ssize_t readStatus) {
return (readStatus == –1 && errno == EINTR);
}
unsigned int isSyscallSuccessful(const ssize_t readStatus) {
return readStatus >= 0;
}
/*
* shouldRestartRead tells when the read syscall has been interrupted by a
* signal event or not, and given this "error" reason is transient, we can
* safely restart the read call.
*
* Currently, it only checks if read has been interrupted by a signal, but it
* could be improved to check if the target number of bytes was read and if it’s
* not the case, return TRUE to read again.
*
*/
unsigned int shouldRestartRead(const ssize_t readStatus) {
return isSignal(readStatus);
}
/*
* We need an empty handler as the read syscall will be interrupted only if the
* signal is handled.
*/
void emptyHandler(int ignored) {
return;
}
int main() {
/* Is in seconds. */
const int alarmInterval = 5;
const struct sigaction emptySigaction = {emptyHandler};
char lineBuf[256] = {0};
ssize_t readStatus = 0;
unsigned int waitTime = 0;
/* Do not modify sigaction except if you exactly know what you’re doing. */
sigaction(SIGALRM, &emptySigaction, NULL);
alarm(alarmInterval);
fputs("Your text:n", stderr);
do {
/* Don’t forget the ” */
readStatus = read(STDIN_FILENO, lineBuf, sizeof(lineBuf) – 1);
if (isSignal(readStatus)) {
waitTime += alarmInterval;
alarm(alarmInterval);
fprintf(stderr, "%u secs of inactivity…n", waitTime);
}
} while (shouldRestartRead(readStatus));
if (isSyscallSuccessful(readStatus)) {
/* Terminate the string to avoid a bug when providing it to fprintf. */
lineBuf[readStatus] = ‘‘;
fprintf(stderr, "You typed %lu chars. Here’s your string:n%sn", strlen(lineBuf),
lineBuf);
} else {
perror("Reading from stdin failed");
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
Once again, this is a full C application that you can compile and actually run.
It does the following: it reads a line from standard input. However, every 5 seconds, it prints a line telling the user that no input was given yet.
Example if I wait 23 seconds before typing “Penguin”:
Your text:
5 secs of inactivity…
10 secs of inactivity…
15 secs of inactivity…
20 secs of inactivity…
Penguin
You typed 8 chars. Here‘s your string:
Penguin
That’s incredibly useful. It can be used to update often the UI to print the progress of the read or of the processing your application you’re doing. It can also be used as a timeout mechanism. You could also get interrupted by any other signal that might be useful for your application. Anyway, this means your application can now be responsive instead of staying stuck forever.
So the benefits outweighs the drawback described above. If you wonder whether you should support special files in an application normally working with normal files – and so calling read in a loop – I would say do it except if you’re in a hurry, my personal experience often proved that replacing a file with a pipe or FIFO can literally make an application much more useful with small efforts. There’s even premade C functions on Internet that implements that loop for you: it’s called readn functions.
Conclusion
As you can see, fread and read might look similar, they’re not. And with only few changes on how read works for the C developer, read is much more interesting for designing new solutions to the problems you meet during application development.
Next time, I will tell you how write syscall works, as reading is cool, but being able to do both is much better. In the meantime, experiment with read, get to know it and I wish you an Happy New Year!