Skip to content

ABC Tool

  • Home
  • About / Contect
    • PRIVACY POLICY
Columnar Storage is Normalization • Buttondown

Columnar Storage is Normalization • Buttondown

Posted on April 22, 2026 By safdargal12 No Comments on Columnar Storage is Normalization • Buttondown
Blog


Something I didn’t understand for a while is that the process of turning row-oriented data into column-oriented data isn’t a totally bespoke, foreign concept in the realm of databases. It’s still of the relational abstraction. Or can be.

As an example, say we have this data:

data = [
    { "name": "Smudge", "colour": "black" },
    { "name": "Sissel", "colour": "grey" },
    { "name": "Hamlet", "colour": "black" }
]

This represents a table in a relational database. Let’s assume this was a table in a relational database and we had to do all sorts of disk-access, whatever, to access any particular part of the data. This representation has some nice properties.

It’s easy to add a new row: we can just construct a row:

{ "name": "Petee", "colour": "black" }

and add it to the end of our already-existing list. On disk, we probably only have to touch a couple pages to do it. And if our row were really wide, in that it had a whole bunch of columns, that wouldn’t really change. It would still have that nice property.

This is also true of looking up a row. Since all of a row’s columns are stored next to each other, it’s very fast to just pull that row out from wherever its stored.

Conversely, if we were to want to, say, compute a histogram of the different pet colours, we have to read quite a lot of data we don’t care about in order to do so.

This is a row-oriented representation of the data. A column-oriented representation would look something like this:

data = {
    "name": [
        "Smudge",
        "Sissel",
        "Hamlet"
    ],
    "colour": [
        "black",
        "grey",
        "black"
    ],
}

This has all the opposite tradeoffs of the row-oriented design: if we only care about colour, we can very effectively read only that data. We don’t have to read their names at all. But modifying the data, or reading a specific row, becomes harder. We have to go all over the place to do them both. If we want the second row, we have to go to the second index in each column to reconstruct the original row.

So, one way to think about this shaping of the data is that it’s encoding level. That it lives at a level of abstraction firmly beneath that of the data model: a SQL engine on top of it can’t distinguish between the two, except via the performance characteristics of various queries.

A different way to think about columnarization like this is that it’s akin to very extreme type of database normalization.

Instead of one wide table that’s represented by a bunch of vectors of data, you might think of columnar data as a set of tables which all have a primary key plus one additional attribute:

Denormalized table:

+----+------+-----+
| id | name | age |
+----+------+-----+
| 12 | Bob  |  30 |
| 93 | Tom  |  35 |
| 27 | Kim  |  28 |
+----+------+-----+

Normalized tables:

Name

+----+------+
| id | name |
+----+------+
| 12 | Bob  |
| 93 | Tom  |
| 27 | Kim  |
+----+------+

Age

+----+-----+
| id | age |
+----+-----+
| 12 |  30 |
| 93 |  35 |
| 27 |  28 |
+----+-----+

We can easily reconstruct the original table with a join on the id column.

In the context of a columnar-stored table, you can think of the primary key as the ordinal position of a given piece of data.

Our original data:

data = {
    "name": [
        "Smudge",
        "Sissel",
        "Hamlet"
    ],
    "colour": [
        "black",
        "grey",
        "black"
    ],
}

Looks like this:

+----+--------+
| id |   name |
+----+--------+
|  0 | Smudge |
|  1 | Sissel |
|  2 | Hamlet |
+----+--------+

+----+--------+
| id | colour |
+----+--------+
|  0 |  black |
|  1 |   grey |
|  2 |  black |
+----+--------+

However, the id column is just implied by the position in the arrays:

+--------+
|   name |
+--------+
| Smudge |
| Sissel |
| Hamlet |
+--------+

+--------+
| colour |
+--------+
|  black |
|   grey |
|  black |
+--------+

I think the value of this perspective is that it unifies a lot of traditional query-processing operations, like projections, and joins, with manipulation of data formats. Some, many, most times, you probably should think about data formats like this as an implementation detail that queries are logically blind to, but it’s a useful mental model to realize that “reconstructing a row from columnar storage” doesn’t just look like performing a join, it is a join.



Source link

Post Views: 16

Post navigation

❮ Previous Post: Oppo Find X9 Ultra review
Next Post: Today’s NYT Mini Crossword Answers for April 22 ❯

You may also like

Google is letting social media stars customize their search result page
Blog
Google is letting social media stars customize their search result page
June 5, 2026
They call it stupid hot for a reason: Heat muddles animal brains
Blog
They call it stupid hot for a reason: Heat muddles animal brains
May 31, 2026
WHOOP response to the Fitbit Air is doctor access in its app
Blog
WHOOP response to the Fitbit Air is doctor access in its app
May 8, 2026
Anthropic Eyes an IPO as Big Tech’s AI Cash Crunch Comes for Wall Street
Blog
Anthropic Eyes an IPO as Big Tech’s AI Cash Crunch Comes for Wall Street
June 2, 2026

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • Dyson’s First Handheld Fan Is Powerful, but It Comes at a Cost
  • Motorola Razr Ultra 2026 Review
  • The first Story-Rich showcase was packed with narrative-driven games
  • What to expect from WWDC 2026: Siri’s highly anticipated revamp and Apple Intelligence updates
  • Quick Share may be closing the AirDrop gap between Android and iPhone in your own home

Recent Comments

  1. Last Chance for Big Savings on TechCrunch Disrupt 2026 Tickets – Artiverse on 5 days left: Save up to $410 on Disrupt 2026 passes

Archives

  • June 2026
  • May 2026
  • April 2026

Categories

  • Blog

Copyright © 2026 ABC Tool.

Theme: Oceanly News by ScriptsTown