Note: phiên bản Tiếng Việt của bài này ở link dưới.
https://duongnt.com/strong-password-vie
Recently, while changing my credential for the Japan Immigration e-Notification System, I encountered this password policy.
Use at least one from each of the alphabet letters, numbers, and signs. The total length must be 8 letters or longer and up to 32 characters.
This policy is already quite complicated, but does it really make our account more secure? Today, we will see why an overly complex policy might do more harm than good, and see how we can create a memorable yet strong password.
Know your enemy
Before talking about security, we need to define our attack vectors. Below are some common ways to compromise a password.
- Trick the users into revealing their password; either directly or by infecting their device.
- Compromise the service itself and gain access to the password storage. This is only possible if passwords are stored as plain text, which is a very bad idea but unfortunately not all that uncommon.
- Try a list of known passwords to see if the user is using any of those (dictionary attack).
- Try all possible passwords until the right one is found (brute-force attack).
It’s obvious that for the first attack, all the password policies in the world cannot help us. For the second attack, if your password is stored as plain text and the service is compromised, then all is already lost. But if your password is properly hashed then the attacker needs to crack the hash before gaining access to it. The protection that hashing provides depends on the hash method in used.
A normal GPU can calculate billions of SHA1 hash in a second. But that rate is reduced to a few thousands hash per second if bcrypt with the default cost is used. Of course, the attacker will try to use dictionary attack first before resorting to brute-forcing. With a known hash, the attacker is only limited by their calculating power. But for a non-compromised service, hopefully there are mechanisms in place to protect users against abnormal login patterns.
Our question then becomes does a complex password policy protects us against dictionary attack or brute-force attack?
Why a complex password policy is not very helpful
Against dictionary attack, a complex policy can make a password more vulnerable. Hardly anyone enjoys creating and remembering a long and complex password. And when they are forced to do so, usually one of these two things happens.
- They create just one password and reuse it for all other services. This means when one of those services has a leak (which seems to happen almost all the time), all services using the same password automatically become vulnerable.
- They take a common password and add some digits/symbols at the end. The result is a password that seems secured, but can easily be cracked by a simple script. It’s not hard to generate all permutations of a known password.
Some may argue that forcing the user to frequently change their password can solve this problem. But most of the time, they will just add a number to their password and keep increasing it. How many times have we seen someone use password1/password2/password3
? This type of password is again easily defeated by a minimal amount of scripting. In fact, if our password is strong and unique, there’s really no reason to change it frequently.
Against brute-force attacks, the problem with the majority of password policies is they make the password harder for the user to remember, while not really preventing a computer from brute-forcing it. But to see why, we need to understand the concept of entropy.
The concept of entropy
In information theory, entropy, or Shannon entropy, is defined as follows.
The entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent in the variable’s possible outcomes.
This seems complicated, but the gist of it is entropy measures the randomness of a variable. In our context, when talking about the entropy of a password, we mean how hard it is to guess that password without any other information.
For example, let’s say we toss a coin and try to guess whether it will land on its head of tail. The probability for a head or tail are both equal to 50%. In other words, our result can take one out of 2 different outcomes. In this case, we say that our coin toss has log₂(2) == 1 bit of entropy. Generally, if your password is one out of X outcomes then its entropy is calculated below.
log₂(X)
Likewise, to find the entropy of our password, we do not focus on just its length or complexity. Instead, we need to know how many outcomes our password can have. Take these two passwords SnrPpSBVW53
and letmein1989
for example, the first one is totally random and consists of uppercase/lowercase letters and digits. While the second one is just a common password plus a birth year, which is a known pattern.
Since there are 26 letters and 10 digits, the entropy of the first password can be calculated as log₂(26 + 26 + 10)¹¹ == 65 bits. Estimating the entropy of the second password is harder. But assuming that the attacker is using the list of the 1000 most common passwords (which certainly contains letmein
), and that a birth year is between 1900 and 2021 (122 years), then we can calculate the entropy as log₂(1000×122) == 17 bits.
It’s also perfectly possible to archive high entropy with letters only, digits only, or even with just 0 and 1, we just need a long password. For example, the entropy of a password consists of just 0 and 1 but is 128 digits long is log₂(2¹²⁸), which obviously is 128 bits.
Entropy and password cracking
So why does the entropy of our password matter? Because the more entropy it has, the harder for an attacker to guess it. If our password has just 8 bit of entropy then it is one out of 2⁸ == 256 outcomes and can be brute-forced with at most 256 guesses. In practice, this number is usually lower, because after just 2⁸⁻¹ == 128 guesses, the attacker already has a 50% chance to find your password.
Let’s assume that the attacker already has the SHA1 hash of our password (SHA1 is actually a very fast hash and you shouldn’t use it for password hashing in practice). We will calculate how many bits of entropy is needed before we can declare our password brute-force proof. If a computer can calculate N hashes per second then it can break a password with log₂(N) bit in just one second. If we want our password to be secured for T seconds then the formula to calculate its minimum entropy is.
log₂(N) + log₂(T)
The minimal amount of entropy needed to withstand different level of attacks is given below.
Attack level | Hash Rate (Hash/s) | Safe for 1 second | Safe for 1 year | Safe for 100 years |
---|---|---|---|---|
Normal CPU | 24,000,000 | 26 bit | 50 bit | 57 bit |
Normal GPU | 1.14×10¹° | 34 bit | 59 bit | 65 bit |
Bitcoin network¹ | 1.8×10¹⁷ | 58 bit | 83 bit | 89 bit |
¹: The highest hash rate ever of the whole Bitcoin network was 1.8×10¹⁷ Hash/s, reached in May 2021.
Entropy of a traditional password
If a password just meets the minimum requirement in the policies of the Japan Immigration e-Notification System we stated above, how many bits of entropy does it have? That password will have 8 characters, where each character can be a letter, a number, or a sign. We have 26 letters in the alphabet, 10 numbers, and around 30 special symbols, that’s 66 different choices in total. For 8 characters, we have 66⁸ == 3.6×10¹⁴ outcomes, which is equal to around 48 bits of entropy. This is not very secure, a normal CPU can crack it in 49 days and a normal GPU can crack it in around 70 minutes.
To increase the entropy of our password, we need to increase its total outcomes. If our password has 15 characters instead of 8 and it uses both uppercase and lowercase then the total number of outcome grows to 92¹⁵ == 2.86×10²⁹. That’s around 97 bits of entropy and even the whole Bitcoin network working for 100 years still comes nowhere close to cracking it.
However, the calculation above is only correct if all 15 characters are randomly generated (and good luck trying to remember such passwords). As we have seen in the previous section, if we take a common password like letmein
and pad it to 15 characters then the amount of entropy will be much lower. The entropy of our password will mostly depend on the remaining 7 characters, which gives us just log₂(92⁷) == 45 bits of entropy.
Use passphrase instead of password
What we should use is a passphrase, which is a sentence used for authentication in place of a password. Although a passphrase is much longer than a password, it is easier to remember while much harder to crack. All we need is a word list (for example, this list from the Electronic Frontier Foundation) and 5 dice. This is how we generate an 8-word passphrase from the EFF word list.
- Roll our 5 dice and see how many points each dice gives.
- If our dice gives us 1-3-6-3-2 for example, we find the word at index
13632
in our word list, which isbootie
, this is the first word of our passphrase. - Repeat the first two steps another seven times (I got the passphrase
bootie backlands relieve uselessly negligee tumble impolite pediatric
while writing this article).
Let’s calculate the amount of entropy of our passphrase. Because each word is randomly selected from a list of 6⁵ == 7776 words, each word has log₂(7776) == 12.92 bit of entropy. The entropy of our passphrase then becomes 8×12.92 == 103 bit. If we want to increase the amount of entropy further, all we need to do is adding more words to our passphrase, each additional word gives us another 12.92 bit of entropy. This is much more secure than a totally random, 15 characters password. And considering such passwords look like this BH~y/ekYqV@C8nW
, I would much rather remember a passphrase.
Conclusion
If used correctly, a passphrase can guarantee that your account is impossible to brute-force. Just to prove a point, I will give you the MD5 hash of the admin password for my site.
e56e62623a52f1a69ab5366e1526057b
My password is an 8-word passphrase generated with the EFF word list I mentioned above and my trusty dice, and MD5 is a superfast hash. Is there anyone who wants to crack it?
One Thought on “What makes a strong password”