Note: phiên bản Tiếng Việt của bài này ở link dưới.

https://duongnt.com/leaked-password-vie

In a previous article, we’ve looked at how to create a strong password. Back then, we have established that dictionary attack is one of the most effective methods an attacker can use to compromise a password. To build a dictionary, they use the data from various security breaches, which seems to happen all the time.

In fact, there is a website called haveibeenpwned dedicated to documenting those incidents. They also provide an API to help us check a password against a database of leaked passwords.

Today, we will find out how to use passpwnedcheck, a package I wrote in Python, to find out if our password has been leaked.

Note: if we use the method I introduced in my earlier article then you can be confident that your passphrase does not exist anywhere else. The amount of entropy is simply too high.

Checking a password without disclosing it with k-anonymity

Perhaps some of you are wary of sending our password to haveibeenpwned. This concern is valid. In fact, we should never send our password to any third-party. In this case however, we will use a mathematical property called k-anonymity to check our password without actually disclosing it.

Let’s say our password is thaiduong, the step to check it is below.

  • Calculate the SHA1 hash of our password in hex format, which is 90CF82C8ABCBEB601EF133F2407C32A00E6AB7F9.
  • Split our hash into two parts, the prefix is the first five characters 90CF8 (20-bit), and the suffix is the rest, which is 2C8ABCBEB601EF133F2407C32A00E6AB7F9 (140-bit).
  • Send a GET request to https://api.pwnedpasswords.com/range/90CF8 to retrieve the list of passwords whose hashes begin with 90CF8. You can see that we send the prefix to the haveibeenpwned API as a query parameter. At the time of this writing, the response is below.
    2C8ABCBEB601EF133F2407C32A00E6AB7F9:328
    2D48C2358A0B9705369E876C6204C275D2B:1
    2D8D15EAB16342EFECA70776E143D55183A:1
    2E78930667364DDE018A196EDE2AB6C39DE:3
    2E850D600CC1A9336CC0CB7E1933875AF84:4
    ... 606 other rows
    
  • Find the suffix of our hash in the response, if it exists then we know that our password is compromised. From this line 2C8ABCBEB601EF133F2407C32A00E6AB7F9:328, we can see that our password has been leaked 328 times (hmm, maybe I should not use my name as a password).
  • Conversely, if our suffix does not exist in the response then maybe our password is still safe.

How safe is k-anonymity in this case?

The concept of k-anonymity was introduced in this paper from 1998. This is its purpose.

The objective is to release information freely but to do so in a way that the identity of any individual contained in the data cannot be recognized. In this way, information can be shared freely and used for many new purposes.

And the definition of k-anonymity is below.

Let T(A₁,…, Aₙ) be a table and Q|ₜ be the quasi-identifiers associated with it. T is said to satisfy k-anonymity if and only if for each quasi-identifier QI ∈ Q|ₜ each sequence of values in T[QI] appears at least with k occurrences in T[QI].

This sounds complicated, but the gist of it is any individual in the released data must not be distinguishable from at least k-1 other individuals whose information also appears in that data.

After some testing, I estimated that for any 20-bit prefix, haveibeenpwned returns around 600 rows. That means even in the worst case, our password is already compromised, their API cannot know which of those 600 passwords is ours. But if our password is already leaked then we should change it as soon as possible anyway, so this case is not very interesting. What if our password is not leaked yet?

A SHA1 hash is 160-bit, as mentioned above, we sent a 20-bit prefix to their API. If an attacker gets hold of that prefix, they still need the rest of our hash, which is 140-bit, before they can start brute-forcing it. And if we use a strong password (a 8-word passphrase like the one I created in my previous post for example), then brute-forcing would be impossible for the foreseeable future.

Introducing the passpwnedcheck package

To make checking passwords against the haveibeenpwned API easier, I wrote a simple package called passpwnedcheck, you can find it at this link.

https://github.com/duongntbk/passpwnedcheck

Alternatively, you can install it using pip, just run the following command.

pip install passpwnedcheck

Using blocking calls

Create an object of type PassChecker.

from passpwnedcheck.pass_checker import PassChecker
pass_checker = PassChecker()

Call is_password_compromised method of PassChecker class to make a blocking call to the API. The result is a tuple with two elements; the first one is a flag to indicate whether our password is compromised, while the second one is the number of times it is compromised.

passwords = 'Password'
is_leaked, count = await pass_checker.is_password_compromised(password)

if is_leaked:
    print(f'Your password has been leaked {count} times')
else:
    print('Your password has not been leaked (yet)')

You can also run pass_checker.py script from the command line, make sure to install the package via pip first.

C:\> python pass_checker.py password
Your password has been compromised xxxxxxx time(s)

Using non-blocking calls

From version 2.0.0 onward, non-blocking calls are also supported. First, we need to create an object of type PassCheckerAsync, which requires an assyncio session.

from passpwnedcheck.pass_checker_async import PassCheckerAsync

# session = <Code to create an assyncio.session object>
pass_checker_async = PassCheckerAsync(session)

Checking a single password is very similar to the blocking call case.

passwords = 'Password'
is_leaked, count = await pass_checker_async.is_password_compromised(password)

Moreover, it’s possible to check multiple passwords at once. For each password, we will send a separate request to the API, those requests will run concurrently.

passwords = ['Password1', 'Password2', 'Password3', 'Password4']
results = await PassCheckerAsync.is_passwords_compromised(passwords)

In this case, results is a dictionary where the passwords are keys and the number of times they are compromised are values. A typical dictionary looks like this.

{
  'Password1': 19,
  'Password2': 89,
  'Password3': 123,
  'Password4': 456
}

To reduce the load on the API, we send requests in batches of ten. However, this value is customizable, just make sure that the number of concurrent requests is kept at a reasonable level.

my_batch_size = 15

# Send requests in batches of 15
results = await pass_checker_async.is_passwords_compromised(passwords=passwords, batch_size=my_batch_size)

If you don’t need to reuse the session then you can use the SessionManager helper class, which is included with this library. Just wrap the code above inside a with statement.

from passpwnedcheck.session_manager import SessionManager

async with SessionManager() as manager:
    pass_checker_async = PassCheckerAsync(manager.get_session())
    is_leaked, count = await pass_checker_async.is_password_compromised('Password')

Conclusion

For a high-value service, checking and rejecting known leaked passwords can help improve security. I had fun writing passpwnedcheck and I hope it can be useful to you.

A software developer from Vietnam and is currently living in Japan.

One Thought on “Check for leaked passwords with passpwnedcheck”

Leave a Reply