What does ‘big data’ mean to you? To most it refers to huge datasets comprising millions of data points. It has become a fundamental part of our tech-reliant culture. So far a staggering 50 billion photos have been uploaded to Instagram and 250 billion to Facebook. We’re used to hearing how machine learning leverages these increasingly large datasets to analyze trends and generate novel insights. However, the ‘big’ in ‘big data’ can also refer to the data points themselves.
Smartphones are now sporting 4k cameras. These have resolutions of 3840 x 2160, resulting in a whopping 8 million pixels per image, or file sizes of 24MB at 24-bit precision. A single medical image can be even larger. A typical 3D MRI scan might have a resolution of 512 x 512 x 128, resulting in 33 million voxels (3D pixels) stored using 128MB of memory (at 32-bit precision). Does a medical doctor really draw 33 million conclusions from a single scan?
Ultimately, this depends on what kind of conclusions we’re interested in. For example, a doctor staging cancer from a medical scan needs to know how many tumors there are, their locations, sizes and other characteristics. This information can be written down without the need for 33 million variables. Images at higher resolutions are desirable for improved detection, yet the amount of relevant symbolic information they contain remains comparatively small.
Imagine you’re describing your dream house to a friend.
“It’s on a cliff by the sea, made with red brick, two floors, a chimney and huge windows.”
Conversation as dimensionality reduction:
Autoencoders consist of an encoder, E mapping an input x to a lower dimensional version Z. This is then decoded by D to give x ?. Typically E and D are neural networks trained so x ? matches x as closely as possible (under some predefined definition of ‘closeness’).
Here, speech serves as a low bandwidth medium for information transfer. You’re forced to compress or encode the house in your mind’s eye into words, which are then decoded by the listener. The hope is that the image they create resembles the original you envisaged. That your words carried the essence of what you were trying to say.
In this instance of dimensionality reduction, we assume the world and all its complexities can be adequately captured in words. Clearly, and much to the frustration of writers, we can never completely describe the world this way. We are always losing some nuance in the process and can never guarantee our words are interpreted the same way each time they’re read. No two people reading this article will have imagined the same house. A skill shared by good orators and writers is to maximize the amount of relevant information they convey in a given amount of words.
However, there is a trade-off between retaining the information we care about while minimizing the size of the compressed format. Any dimensionality reduction, reconstruction or denoising technique, be it linear regression, principle component analysis or autoencoders, walks this line. Lossy compression formats such as JPEG and MP3 operate on the same principle.
The benefits of compression lie as much in the process itself as in the end result. Recall that autoencoders are trained to reconstruct data. This may seem like a pointless task; what’s the use in reproducing what you already have? The key is in the presence of an information bottleneck, which for human communication is typically speech, gesticulation and body language.
Have you ever gained a better understanding of an idea through explaining it to somebody else? If we break it down, two things are happening here. Superfluous details are stripped away to reveal the concept in its simplest form, which is simultaneously reorganised to become more understandable. In machine learning the former is referred to as denoising, and the latter as disentanglement1. Both are found to be an inherent properties of autoencoders2 and skilled conversationalists. Essentially, your points become more interpretable in your own mind through your attempt to make it more understandable for someone else.
It’s no secret that not every engineer enjoys conversation. Some go as far as Elon Musk, seeking to solve the “data rate issue” of human communication. Musk’s company Neuralink is developing ways to bypass speech altogether by linking mind to machine, thereby increasing the bandwidth of human communication.
This sounds great in theory, and we will never know all the benefits and consequences until we experience it for ourselves. Imagine knowing someone else’s thoughts and feelings exactly as they do. Such intimate connections might help us to see past our surface differences. But might we lose the structuring of our thoughts that conversation entails?
And if we cannot organize and interpret our own thoughts and feelings, how can we expect to be understood by, or indeed understand, someone else. Should we instead accept that we can never truly know each other’s minds?
If you know what I mean.
1. Bengio Y, Courville A, Vincent P. Representation learning: A review and new perspectives. arXiv:12065538 [cs]. http://arxiv.org/abs/1206.5538 Published online April 23, 2014. Accessed June 30, 2021.
2. Rolinek M, Zietlow D, Martius G. Variational autoencoders pursue PCA directions (by accident). arXiv:181206775 [cs, stat]. http://arxiv.org/abs/1812.06775 Published online April 16, 2019. Accessed June 30, 2021.