Introduction: Why peek under the hood?
I have been working with Node for many years, and recently I figured that even though I use it daily, I am not actually sure with what exactly happens under the hood when it comes to some of the more advanced features. Features like file management, streams, buffers, and memory management.
Honestly, most of the time when working with Node, I do things the way I have always done, and usually AI helps me autocomplete the more advanced use cases, so things work - I really don’t have to care.
This is my small attempt to at least make things like ArrayBuffer
, Buffer
and TypedArray
a bit more clear - even though you might not need it in your everyday work.
ArrayBuffer
- raw memory
We have to start somewhere, and the best place is probably at the very beginning - the ArrayBuffer
.
So what exactly is the ArrayBuffer
? At its core, it is just a fixed size chunk of binary data. An array of bytes.
A quick reminder: a byte is 8 bits, and a bit is the smallest unit of data in a computer. It can be either 0 or 1. So a byte can represent 256 different values.
Let’s have a look on how to create an ArrayBuffer
in Node.js:
const buffer = new ArrayBuffer(4); // 4 bytes
This creates a new ArrayBuffer
of 4 bytes. We can store 4 raw bytes of memory in this. So what exactly can we do with it? Not much, actually. The ArrayBuffer
is just a chunk of memory. We cannot read or write directly to it. We need to use a TypedArray
or a DataView
(the latter we won’t touch in this article) to access the data in the buffer.
Giving Meaning to Bytes: TypedArray
The ArrayBuffer
does not have any value on its own. It can be interpreted in many different ways. This is where TypedArray
come into play.
So, exactly how does the TypedArray
allow us interpret things differently? We’ll dive into that, but first we’ll look into what a TypedArray
is.
A TypedArray
is actually exactly what it sounds like - an array that is of one specific type. Usually different types of numbers. You have all probably heard of floats
, doubles
and ints
. I won’t dive into the details of these types, I will focus on the integers in this article.
Common Types of Integers
Some common types of integers are:
- 8-bit integer (1 byte)
- 16-bit integer (2 bytes)
- 32-bit integer (4 bytes)
- 64-bit integer (8 bytes)
The difference between these integers is the amount of memory they use, as you can see in the list. The more memory they use, the larger numbers they can represent.
- 8-bit integer can represent numbers from -128 to 127
- 16-bit integer can represent numbers from -32768 to 32767
- 32-bit integer can represent numbers from -2147483648 to 2147483647
- 64-bit integer can represent numbers from -9223372036854775808 to 9223372036854775807
When talking about numbers, you might also hear the term signed
or unsigned
. Simply put, this just means that the number can be negative or not. So a signed integer can be from -128 to 127, and an unsigned integer can be from 0 to 255.
The TypedArray
Some of the most common TypedArray
types are:
Int8Array
- 8 bit signed integer - 1 byteUint8Array
- 8 bit unsigned integer - 1 byteInt16Array
- 16 bit signed integer - 2 bytesUint16Array
- 16 bit unsigned integer - 2 bytesInt32Array
- 32 bit signed integer - 4 bytesUint32Array
- 32 bit unsigned integer - 4 bytesFloat32Array
- 32 bit float - 4 bytesFloat64Array
- 64 bit float - 8 bytesBigInt64Array
- 64 bit signed integer - 8 bytesBigUint64Array
- 64 bit unsigned integer - 8 bytes
There are other ones as well, but in this article we will focus mostly on the integers.
To create a TypedArray
, we need to pass the ArrayBuffer
to the constructor of the TypedArray
. We also need to specify the type of the TypedArray
.
const buffer = new ArrayBuffer(4); // 4 bytes
const int8Array = new Int8Array(buffer);
console.log(int8Array); // Int8Array(4) [0, 0, 0, 0]
So What? How Does This Help Us?
The TypedArray
allows us to interpret the data in the ArrayBuffer
in a specific way. We can read and write to the TypedArray
, and it will automatically convert the data to the correct type.
The thing is this - based on which TypedArray
we use, we can interpret the same data in different ways. Same 4 bytes can mean two completely different things based on which view we use.
Let’s compare the Int8Array
and the Int32Array
:
const buffer = new ArrayBuffer(4); // 4 bytes
const int8Array = new Int8Array(buffer);
const int32Array = new Int32Array(buffer);
console.log(int8Array); // Int8Array(4) [0, 0, 0, 0]
console.log(int32Array); // Int32Array(1) [0]
// You can use TypedArray to write to the buffer
int8Array[0] = 1; // [1, 0, 0, 0]
int8Array[1] = 2; // [1, 2, 0, 0]
int8Array[2] = 3; // [1, 2, 3, 0]
int8Array[3] = 4; // [1, 2, 3, 4]
console.log(int8Array); // Int8Array(4) [1, 2, 3, 4]
console.log(int32Array); // Int32Array(1) [67305985]
What? What happened here? Let’s recap.
- The smallest representable unit of data (besides a bit) is a byte.
- We created an
ArrayBuffer
of 4 bytes. - We created an
Int8Array
and anInt32Array
based on the sameArrayBuffer
. - We wrote 4 bytes to the
Int8Array
.[1, 2, 3, 4]
. - The 8-bit array has 4 numbers.
- The 32-bit array has 1 number.
As mentioned earlier - the TypedArray
helps us interpret the data in the ArrayBuffer
in a specific way. The Int8Array
interprets the data as 4 numbers, and the Int32Array
interprets the data as 1 number.
I won’t dive too deep into why that is, for that you need to read up on binary numbers and how they work - but just to simply summarise it:
- An 8-bit integer requires 1 byte of memory to represent a number
- A 32 bit integer requires 4 bytes of memory to represent a number, we have 4 bytes, so we can represent 1 number.
Final Words on TypedArray
The TypedArray
is a powerful tool that allows us to interpret the data in the ArrayBuffer
in different ways. However, it is important to know that the TypedArray
do not own the data. It is just a view on the data. If we change the data in the TypedArray
, it will also change the data in the ArrayBuffer
. That is why, in the snippet above, we could use two “views” on the same memory chunk.
It is also possible to create a TypedArray
directly, without using an ArrayBuffer
. This is done by passing an array to the constructor of the TypedArray
:
const int8Array = new Int8Array([1, 2, 3, 4]); // 4 bytes
The Buffer
- Making Things Easier
What we have talked about so far is quite low-level, and I guess it might not be something you use every day. This next step might be a bit more familiar for you if you have worked with Node before - let’s talk about the Buffer
.
A Buffer
is a global object in Node.js that allows us to work with binary data. It does sound very similar to an TypedArray
, does it not? Well, it is actually built on top of the TypedArray
.
The Buffer
is a subclass of Uint8Array
, so you could say that it is a TypedArray
. The difference is that the Buffer
has some additional methods that make it easier to work with binary data.
Binary data refers to data stored as a sequence of bytes, which might represent various types of information like images, audio, or even text in a specific character encoding.
The Buffer
is a bit more high level than the TypedArray
, and it is easier to work with. For example, we can create a Buffer
from a string, and it will automatically convert the string to binary data.
const buffer = Buffer.from("Hello World"); // 11 bytes
console.log(buffer); // <Buffer 48 65 6c 6c 6f 20 57 6f 72 6c 64>
const int8Array = new Int8Array(buffer); // 11 bytes
console.log(int8Array); // Int8Array(11) [72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100]
Why Buffer?
As mentioned earlier, the Buffer
is a subclass of Uint8Array
, so it is actually a TypedArray
. This means that we can use all the methods of the TypedArray
on the Buffer
. It also means that we can get the underlying ArrayBuffer
from the Buffer
:
const buffer = Buffer.from("Hello World"); // 11 bytes
const arrayBuffer = buffer.buffer; // ArrayBuffer(11)
These types are all tightly coupled, so why should we use the Buffer
?
The Buffer
was created to make binary data easier in Node. Besides the example above where we use it with a simple string, it also has methods for string encoding. It is also the default way to work with files.
Files and Buffers
When we read a file in Node, we usually get a Buffer
back. This is because files are binary data, and the Buffer
is the best way to work with binary data in Node.
Imagine we have a file called file.txt
with the following content:
Hello World
We can read the file using the fs
module in Node:
import fs from "fs";
const buffer = fs.readFileSync("file.txt");
console.log(buffer); // <Buffer 48 65 6c 6c 6f 20 57 6f 72 6c 64>
As you can see, we get a Buffer
back. We can use the Buffer
methods to convert it to a string:
const str = buffer.toString("utf8"); // 'Hello World'
We also get Buffers when working with streams. Streams allow us to read and write data in chunks, which is very useful when working with large files. When we read a file using a stream, we get a Buffer
back for each chunk of data that is read.
import fs from "fs";
const stream = fs.createReadStream("file.txt");
stream.on("data", (chunk) => {
console.log(chunk); // <Buffer 48 65 6c 6c 6f 20 57 6f 72 6c 64>
});
Why is it good when working with large files? Imagine you have a large file of 1GB. As it will be stored as chunk of memory - 1GB of 1s and 0s - it would be quite heavy to load it all into memory at once. Instead, we can read it in chunks, which allows us to process it without running out of of the available memory.
So, if we take a look at what we know now:
- We want to read a text file in Node
- We use the
fs
module to read it, and we receive aBuffer
- The
Buffer
is a subclass ofUint8Array
- The
Uint8Array
is a view on theArrayBuffer
- The
ArrayBuffer
is a fixed size chunk of binary data. It is just small pieces of memory.
Even though we started off small, with just a chunk of memory, we have now built up a small stack of abstractions that help us work with binary data in Node. We have the ArrayBuffer
, the TypedArray
, and the Buffer
. Each of these abstractions helps us work with binary data in a more convenient way.
This contrived example just shows us how to work with text files, but the same principles apply to all types of files. We can read and write binary data using the Buffer
, and we can use the TypedArray
to interpret the data in different ways.
Final Words
Hopefully, this small introduction to the ArrayBuffer
, TypedArray
, and Buffer
has helped you at least get a little bit more familiar with how Node works under the hood. It might not be the most useful thing to know for everyday work, but maybe it will make file operations a bit easier to understand.