I often find myself explaining the same things in real life and online, so I recently started writing technical blog posts.
This one is about why it was a mistake to call 1024 bytes a kilobyte. It’s about a 20min read so thank you very much in advance if you find the time to read it.
Feedback is very much welcome. Thank you.
Here’s my favorite part.
“In addition, the conversions were sometimes not even self-consistent and applied completely arbitrary. The 3½-inch floppy disk for example, which was marketed as “1.44 MB”, was actually not 1.44 MB and also not 1.44 MiB. The size of the double-sided, high-density 3½-inch floppy was 512 bytes per sector, 18 sectors per track, 160 tracks, that’s 512×18×16 = 1’474’560 bytes. To get to “1.44” you must first divide 1’474’560 by 1024 (“bEcAuSE BiNaRY obviously”) to get 1440 and then divide by 1000 for perfect inconsistency, because dividing by 1024 again would get you an ugly number and we definitely don’t want that. We finally end up with “1.44”. Now let’s add “MB” because why the heck not. We already abused those units so much it’s not like they still mean anything and it’s “close enough” anyways. By the way, that “close enough” excuse never “worked when I was in school but what would I know compared to the computer “scientists” back then.
When things get that messy, numbers don’t even mean anything any more. Might as well just label the products using entirely qualitative terms like “big” or “bigger”.
❤️ Thank you for taking the time to read it.
The mistake is thinking that a 1000 byte file takes up a 1000 bytes on any storage medium. The mistake is thinking that it even matters if a kB means 1000 or 1024 bytes. It only matters for some programmers, and to those 1024 is the number that matters.
Disregarding reality in favor of pedantics is the real mistake.
I dunno it makes up a few gigabytes of lost storage on a terrabyte hard drive.
A lot of people are replying as if OP asked a question. It’s a link to a blog post explaining why a kilobyte is 1000 and not 1024 bytes (exactly as the title says!). OP knows the answer, in fact they know it so well they wrote an extensive post about it.
Thank you for the write up! You should re-check the spelling and grammar as some sections had some troubles. I have a sentence I need to go to the post to get, so let me edit this later!
Edit: the second half of this sentence is a mess: “The factors don’t solely consist of twos, but ten are certainly lot of them.” Otherwise nothing jumped out at me but I would reread it just in case!
OP asked for feedback.
A lot of people are replying as if OP asked a question.
I think part of that is because outgoing links without a preview image are really easy to confuse with text-only posts, particularly because Reddit didn’t allow adding both a text and a link simultaneously. Though in this case the text should’ve tipped people off that there’s a link as well.
As for the actual topic, I agree with OP. I often forget to do it right when speaking, but I try to at least get it right when writing.
Thank you very much. I’ll try to fix that sentence later. I’m not a native speaker so it’s not always obvious for me when a sentence doesn’t sound right even though I pass sentences I’m not sure about through spell checks, MS Word grammar check and chat gpt 🤣
I suggest considering this from a linguistic perspective rather than a technical perspective.
For years (decades, even), KB, MB, GB, etc. were broadly used to mean 2^10, 2^20, 2^30, etc. Throughout the 80s and 90s, the only place you would likely see base-10 units was in marketing materials, such as those for storage media and modems. Mac OS exclusively used base-2 definitions well into the 21st century. Windows, as noted in the article, still does. Many Unix/POSIX tools do, as well, and this is unlikely to change.
I will spare you my full rant on the evils of linguistic prescriptivism. Suffice it to say that I am a born-again descriptivist, fully recovered from my past affliction.
From a descriptivist perspective, the only accurate way to define kilobyte, megabyte, etc. is to say that there are two common usages. This is what you will see if you look up the words in any decent dictionary. e.g.:
- https://www.dictionary.com/browse/kilobyte
- https://www.merriam-webster.com/dictionary/kilobyte
- https://en.wiktionary.org/wiki/kilobyte
I don’t recall ever seeing KiB/MiB/etc. in the 90s, although Wikipedia tells me they “were defined in 1999 by the International Electrotechnical Commission (IEC), in the IEC 60027-2 standard”.
While I wholeheartedly agree with the goal of eliminating ambiguity, I am frustrated with the half-measure of introducing unambiguous terms on one side (KiB, MiB, etc.) while failing to do the same on the other. The introduction of new terms has no bearing on the common usage of old terms. The correct thing to have done would have been to introduce two new unambiguous terms, with the goal of retiring KB/MB/etc. from common usage entirely. If we had KiB and KeB, there’d be no ambiguity. KB will always have ambiguity because that’s language, baby! regardless of any prescriptivist’s opinion on the matter.
Sadly, even that would do nothing to solve the use of common single-letter abbreviations. For example, Linux’s
ls -l -h
command will return sizes like 1K, 1M, 1G, referring to the base-2 definitions. Only if you specify the non-default--si
flag will you receive base-10 values (again with just the first letter!). Many other standard tools have no such options and will exclusively use base-2 numbers.I was confused when I just read the headline. Should be “Why I (that would be you not me) think a kilobyte should be 1000 instead of 1024”. Unpopular opinion would be a better sub for it.
You should read the blog post. It’s not a matter of option.
Just because you wrote about a topic doesn’t mean you’re suddenly the authority figure lol.
I know there is no option as 1024 is what the standard is now. Im not reading that anymore than someone saying how a red light really means go.
I know it’s already been explained but here is a visualization of why.
0 2 4 8 16 32 64 128 256 512 1024
Did you read the blog post? If you don’t find the time you should at least read “(Un)lucky coincidence” to see why it’s not (and never was) a bright idea to call 1024 “a kilo”.
Dude you’re pretty condescending for a new author on an old topic.
Yeah I read it and it’s very over worded.
1024 was the closest binary approximation of 1000 so that became the standard measurement. Then drive manufacturers decided to start using decimal for capacity because it was a great way to make numbers look better.
Then the IEC decided “enough of this confusion” and created binary naming standards (kibi gibi etc…) and enforced the standard decimal quantity values for standard names like kilo-.
It’s not ground breaking news and your constant arguing with people in the thread paints you as quite immature. Especially when plenty of us remember the whole story BECAUSE WE LIVED IT AS IT PROFESSIONALS.
We lacked a standard, a system was created. It was later changed to match global standard values.
You portray it with emotive language making decisions out to be stupid, or malicious. A decision was made that was perfectly sensible at the time. It was then improved. Some people have trouble with change.
Your writing and engagement styles scream of someone raised on clickbait news. Focus on facts, not emotion and sensationalism if you want to be taken seriously in tech writing.
Focus on emotion and bullshit of you want to work for BuzzFeed.
And if you just want an argument go use bloody twitter.
This has been my pet rant for a long time, but I usually explain it … almost exactly the other way around to you.
You can essentially start off with nothing using binary prefixes. IBM’s first magnetic harddrive (the IBM 350 - you’ve probably seen it in the famous “forklifting it into a plane” photo) stored 5 million characters. Not 5*1024*1024 characters, 5,000,000 characters. This isn’t some consumer-era marketing trick - this is 1956, when companies were paying half a million dollars a year (2023-inflated-adjusted) to lease a computer. I keep getting told this is some modern trick - doesn’t it blow your mind to realise hdd manufacturers have been using base10 for nearly 70 years? Line-speed was always
a liebase 10, where 1200 baud laughs at your 2^n fetish (and for that matter, baud comes from telegraphs, and was defined before computers existed), 100Mbit ethernet runs on a 25MHz clock, and speaking of clocks - kHz, MHz, MT/s, GT/s etc are always specified in base 10. For some reason no-one asks how we got 3GHz in between 2 & 4GHz CPUs.As you say, memory is the trouble-maker. RAM has two interesting properties for this discussion. One is that it heavily favours binary-prefixed “round numbers”, traditionally because no-one wanted RAM with un-used addresses because it made address decoding nightmarish (tl;dr; when 8k of RAM was usually 8x1k chips, you’d use the first 3 bits of the address to select the chip, and the other 10 bits as the address on the chip - if chips didn’t use their entire address space you’d need to actually calculate the address map, and this calculation would have to run multiples of times faster than the cpu itself) . The second, is that RAM was the first place non-CSy types saw numbers big enough for k to start becoming useful. So for the entire generation that started on microcomputers rather than big iron, memory-flavoured-k were the first k they ever tasted.
I mean, hands up who had a computer with 8-64k of RAM and a cassette deck. You didn’t measure the size of your stored program in kB, but in seconds of tape.
This shortcut than leaked into filesystems purely as an implementation detail - reading disk blocks into memory is much easier if you’re putting square pegs into square holes. So disk sectors are specified in binary sizes to enable them to fit efficiently into memory regions/pages. For example, CP/M has a 128-byte disk buffer between 0x080 and 0x100 - and its filesystem uses 128-byte sectors. Not a coincidence.
This is where we start getting into stuff like floppy disk sizes being utter madness. 360k & 720k were 720 and 1440 512-byte sectors. When they doubled up again, we doubled 2800 512-byte sectors gave us 1440k - and because nothing is ever allowed to make sense (or because 1.40625M looks stupid), we used base10 to call this 1.44M.
So it’s never been that computers used 1024-shaped-k’s. It should be a simple story of “everything uses 1,000s, except memory because reasons”. But once we started dividing base10-flavoured storage devices into base2-flavoured sectors, we lost any hope of this ever looking logical.
aside: the little-k thing. SI has a beautifully simple rule, capital letters for prefixes >1, small letters for prefixes <1. So this disambiguates between a millivolts (mV) and megavolts (MV).
But, and there’s always a but. The kilogram was the first SI unit, before they’d really thought it through. So we got both a lower-case k breaking such a beautifully simple rule, and the kilogram as a base unit instead of a gram. The Kilogram is metric’s “screw it, we’ll do it live”.
Luckily this is almost a non-issue in computing as a fraction of a bit never shows up in practice. But! If you had a system that took 1000 seconds to transfer one bit, you could call that a millibit per second, or mbps, and really mess things up.
2, 4, 8, 16, 32, 64, 128, 256, 512, 1024. It’s pretty fucking logical m8. You know what’s not logical? Base 10
Removed by mod
I’m not sure if I’m too stupid, but how so?
Removed by mod
So why don’t they just label drives in Terabit instead of terabyte. The number would be even bigger. Why don’t Europeans also use Fahrenheit, with the bigger numbers the temperature for sure would instantly feel warmer 🤣
Jokes aside. Even if HDD manufacturers benefit from “the bigger numbers” using the 1000 conversation is the objectively only correct answer here, because there is nothing intrinsically base 2 about hard drives. You should give the blog post a read 😉
there is nothing intrinsically base 2 about hard drives
Yes there is. The addressing protocol. Sectors are 512 (2⁹) bytes, and there’s an integer number of them on a drive.
That’s true but the entire disk size is not an exact power of two that’s why binary prefixes (1024 conversation) don’t have any benefit whatsoever when it comes to hard drives. With memory it’s a bit different because other than with storage devices RAM size is always exactly a power of two.
It’s a scam by HDD makers to sell less storage for more money.
that’s what it was initially, reporting decimal ‘megabytes’ for hdd capacity. lawsuits and settlements followed.
the dust settled and what we have now is disclaimers on storage products (from the legal settlements) and they continue to use ‘decimal’ measurements…
and we also a different set of prefixes for ‘binary’ units of measurements (standards body trying to address the problem of confusion): kibi, mebi, gibi, tebi, pebi, exbi; which are not widely used yet… the ‘old’ ones are for decimal but still commonly used for binary.
Did you read the blog post? It’s not a scam. HDD vendors might profit from “bigger numbers” but using the units they do is objectively the only sensible and correct option. It’s like saying that the weather report is in Fahrenheit because in Celsius the numbers would be lower and feel somehow colder 🤣
If it would be about bigger numbers why don’t HDD manufacturers just use Terabit instead of terabyte? The “bigger number” argument is not a good one.
Videogame companies literally did use “megabit” when the truth was “128KiB”, because it sounded better. Actual computer companies were still listing binary power numbers, because buyers had more to invest and care about accuracy.
You say “sensible”, but it’s lying for profit.
WD needed to sell a drive with more advertised space than real space.
Unlike many comments here, I enjoyed reading the article, especially the parts in the “I don’t want to use gibibyte!” chapter, where you explain that this (the pedantry) is important in technical and formal situations (such as documentation). Seeing some of the comments here, I think it would have helped to focus on this aspect a bit more.
I also liked the extra part explaining the reasoning for using the Nokia E60.
I don’t quite agree with the recommendation to use base 10 SI units where neither KiB or kB would result in nice numbers. I don’t see why base 10 should have an influence on computers, and I think it makes more sense to stick to a single unit, such as KiB.
The reasons I have this opinion are probably to do with:
- My computer has shown me values using KiB, Gib, etc for years - I think it’s a KDE default - so I’m already used to the concept of KiB being different from kB.
- I dislike the concept of base 10 in general. I like the idea of using base 16 universally (because computers. Base 12 is also valid in a less computer-dominant society). I therefore also think 1024 is a silly number to use, and we should measure memory in multiples of 2^8 or 2^16…
p.s, I agree with other commenters that your comments starting with “Pretty obvious that you didn’t read the article.” or similar are probably not helping your case… I understand that some comments here have been quite frustrating though.
❤️ Thank you for taking the time to read it and thank you for your feedback, I really appreciate it.
deleted by creator
i mean, you can’t get to 1000 by doubling twos, so, no?
Reality doesn’t care what you prefer my dude
This whole mess regularly frustrates me… why the units can’t be used consistently?!
The other peeve of mine with this debacle is that drive capacities using SI units do not use the full available address space (since it’s binary). Is the difference between 250GB and 256GiB really used effectively for wear-levelling (which only applies to SSDs) or spare sectors?
Huh? What does how a drive size is measured affect the available address space used at all? Drives are broken up into blocks, and each block is addressable. This is irrelevant of if you measure it in GB or GiB and does not change the address or block size. Hell, you have have a block size in binary units and the overall capacity in SI units and it does not matter - that is how it is typically done with typical block sizes being 512 bytes, or 4096 (4KiB).
Or have anything to do with ware leveling at all? If you buy a 250GB SSD then you will be able to write 250GB to it - it will have some hidden capacity for ware-leveling, but that could be 10GB, 20GB, 50GB or any number they want. No relation to unit conversions at all.
Power of 2 makes more sense to the computer. 1000 makes more sense to people.
Of course. The thing is, though, that if the units had been consistent to begin with, there wouldn’t be anywhere near as much confusion. Most people would just accept MiB, GiB, etc. as the units on their storage devices. People already accept weird values for DVDs (~4.37GiB / 4.7GB), so if we had to use SI units then a 256GiB drive could be marketed as a ~275GB drive (obviously with the non-rounded value in the fine print, e.g. “Usable space approx. 274.8GB”).
They were consistent until around 2005 (it’s an estimate) when drives got large enough where the absolute difference between the two forms became significant. Before that everyone is computing used base 2 prefixes.
I bet OP does too when talking about RAM.
It’s not as simple as that. A lot of “computer things” are not exact powers of two. A prominent example would be HDDs.
In terms of storage 1000 and 1024 take the same amount of
bytesbits to represent. So from a computer point of view 1024 makes a lot more sense.It’s just a binary Vs decimal thing. 1000 is not nicely represented in binary the same as 1024 isn’t in decimal.
Edit: was talking about storing the actual number.
- Kilobyte is 2^10 bytes or about a thousand bytes within a few reasonably significant digits.
- Megabyte is 2^20 bytes or about a thousand megabytes within a few reasonably significant digits.
- Terabyte is 2^30 bytes or about a a thousand megabytes within a few reasonably significant digits.
The binary storage is always going to be a translation from a binary base to a decimal equivalent. So the shorthand terms used to refer to a specific and long integer number should comes as absolutely no surprise. And that’s just it; they’re just a shorthand, slang jargon that caught on because it made sense to anyone that was using it.
Your whole article just makes it sound like you don’t actually understand the math, the way computers actually work, linguistics, or etymology very well. But you’re not really here for feedback are you. The whole rant sounds like a reaction to a bad grade in a computer science 101 course.