Written by

Whether you develop a closed or an open source product, you rarely do it for yourself. You build tools and solutions for people. And those users generate a lot of data. Data leads to privacy. Recent Big Data scandals prove that users often don’t know how to protect themselves.

To bring data protection to our users, we must add encryption to our products. Whatsapp, Signal, Keybase, Nextcloud E2E module… all those apps rely on cryptography in a transparent way for users.

We give a lot of talks about security and cryptography as a part of our Tech Evangelism action. It’s a good time to share this knowledge here, and not only on conferences. Please welcome an introduction to Cryptography For The Newcomers.


You said: “encryption”?

First, a quick reminder: encryption is all about content obfuscation. It means that any encrypted data becomes unreadable without a key. Some encryption is single-way: the obfuscated content can’t return to its original form. They are the checksum algorithms and they’re only used for control tasks. Others ciphers are reversible. The content will be deciphered to its primal form at some point. Those are the one involved in secured data transmission and storage.

All rely on a common element: the key. Without it, no one can decipher nor control the encrypted content, regardless of its form. It’s the most critical part of any cryptographic system. The secret needs to be hard to brute force.

Digital encryption performs huge and complex tasks that could never be done by hand. It ensures that the content is efficiently protected. Ciphers rely on complex math problems1) to compute the keys and (de)cipher the data.

So, what’s in the hat?

This section explains in a simple way how ciphers work with contents. If you prefer to skip this theoretical part, feel free :)

Encryption goal is to make the content unreadable to others. It’s still the same process since the begin: it’s a block substitution. You may have tried it by yourself when you were young, playing with a cipher wheel from a magazine.

With this tool, if we perform a 3 shift substitution, then A becomes D, B becomes E, and so on. So the text Hello World! becomes KHOOR ZRUOG!.

Modern digital algorithms do the same but in a very more complex way. They apply cascading effects on blocks of data 2) to make it more complex to retro-engineer.

We must initialize those algorithms with parameters. It avoids repetitions, that present a risk of analysis attack on a large amount of data. This Initialization Vector (IV) is randomly unique. It ensures you won’t encrypt two different contents in the same environment. But we need pure random elements to make them unique.

Our computers aren’t unpredictable: they tend to reproduce some patterns over time. So cryptographers have created pure random functions to generate a Cryptographically Secure Pseudo Random Number Generator (CSPRNG) content. As you can see, the basic principle is simple, but its implementation is far from accessible to anyone. That’s why we never perform encryption by hand. We rely on dedicated crypto libraries, developed and tested by cryptographers.

Keep it secret

As said above, the key is the most critical part of the crypto layer. You must keep it safe to ensure your content stay protected. In the encryption/decryption family, they are two main types of ciphers. The difference is how they deal with their keys.

The first category is the Symmetric Ciphers, like AES or IDEA. They use the same key to perform both encryption and decryption. They’re particularly fast and they can handle a large amount of data. But you have to transfer the content and the key if you want your recipient being able to decrypt the content. If your key leaks, your content isn’t safe anymore.

Second ones are Asymmetric Ciphers, like RSA. They use a pair of key: the first one is publicly available to encrypt the data. To decipher it, you need the associated private key. Only the owner of the key pair holds this one3). So anyone can send encrypted content to me with my public key (even myself), but I’m the only one able to decipher it. Problem is: those algorithms rely on math problems and are very slow. It doesn’t fit very well with large contents.

In day to day developments, we often rely on the key wrapping process. We generate a CSPRNG that serves as a unique symmetric key for the content for fast and robust encryption. Then we encrypt this symmetric key with the recipient public key. This operation is fast enough because the symmetric key is small compared to the content. We pack them — the encrypted content and the encrypted symmetric key — and we send/store it securely. Best of the two worlds!

Ready? Use it!

Once again, you should not build your own cryptographic algorithm on your side. Cryptographers work together for years to create strong and well-tested ciphers. Recent abstraction libraries allow every developer to use them easily. So use them all!

The game-changer: libsodium, a multi-platform library

All languages come with their own implementation of cryptographic primitives. But using them is sometimes hard. It needs you to know a lot about what crypto is and how it works, to pick the right algorithm and its parameters. It’s sometimes complex to figure out which algorithm is available for your use-case. Most of all, it means we need to have a lot of teams working on crypto libs, for each language. It can’t be sustainable and may lead to potential failure in the implementations.

An elegant solution is to concentrate our efforts on a single agnostic crypto lib. This is exactly what the libsodium project stands for. It’s a low-level, language-agnostic, portable library, dedicated to cryptography. It focuses on developer’s usability, so it’s also a very good abstraction. It gives to everyone an elegant way to pick the right choice without going mad at it. You can use the lib directly in your codebase. Or you can rely on the bindings that allow you to interact with it in your favorite language!

Python

To generate CSPRNG, you should rely on the embed secrets lib, which is way more secure than the os.random() function.

To encrypt and decrypt content, the PyNaCl package is a pretty good choice:

Ruby

Use RbNaCl to rely on the libsodium library in Ruby:

PHP

Libsodium can be used from PHP with the libsodium-php extension:

You can easily install the PECL extension on our server with the ad_install_pecl libsodium command. See our documentation about how to manage PECL extensions for more details.

Node.js®

Node.js® exposes a very good cryptography API in the Crypto module. But relying on libsodium is probably a better choice. You can do it with libsodium.js, compiled from the libsodium source code using WebAssembly:

Java

Java can rely on libsodium by using the libsodium-jna library:

The Dark Side

The earlier you encrypt the content, the most robust your architecture will be. It means that encryption should occur in the client itself. Libsodium is usable on mobile client with libsodium-jni for Android and swift-sodium for iOS4).

The dark side is when you have to work in the Web browser. WebCrypto is a pretty good low-level API, but it is… well… low-level ¯\_(ツ)_/¯. Using it in a day to day work is painful, and we need a more reliable way. Another issue is that JavaScript is too weak to ensure a good level of protection. Its prototyped nature may cause data leaks, because of the unsafe environment.

Fortunately, libsodium has the libsodium.js package! Using it in the browser is as easy as:

Note that you would prefer to use it as a module5) and pack it with your codebase using a bundler like Webpack or Parcel.js.

Maybe you need a more robust way to deal with your content. Well, you can develop most of your web app core using Rust, and rely on sodiumoxide for the crypto parts! It compiles well to WASM and you can run your whole domain codebase in the browser, and keep JS for the UI only. With the complete libsodium support in your WASM modules!

Far from nowhere

Using cryptography may be a really hard job. We didn’t cover in the examples above many many things like:

  • how to deal with asymmetric keys
  • how to exchange them
  • how to use a Diffie-Hellman algorithm
  • how to sign content to make it trustable
  • etc.

It’s worth repeating that you should use high-level libraries. Never try to develop your own algorithm if you’re not a confirmed cryptographer.

Whatever you need to protect, always consider the worst possible scenario. No protection can be strong enough to resist time. Always rely on strong, high-grade tooling, like long ECC and RSA keys6). Use unique keys. Use nonces. Avoid repetitions. Never hesitate to update your codebase to increase security when an audit reveals a flaw. Nobody’s perfect, but we must fix security issues each time we spot them!

Consider using dedicated hardware encryption when working on the server with private keys. Hardware Security Module (HSM) devices allow you to externalize keys computation. They make your keys inaccessible from the filesystem. So they cannot be compromised if the server is corrupted. Running codebases can protect their keys at an affordable cost, thanks to open-source hardware7).


Guessing this (not that so) short introduction to cryptography sounds good to you. If you’re interested in the topic, you can find a live french version on YouTube by myself (m4dz).

We give the talk in the next months both at Techorama, Netherlands and PiterPy, Saint Petersburg. Please join us to those awesome events to meet and discuss the topic!

Are you convinced about how critical cryptography is in your project? Want to use it on your alwaysdata hosting plans? Then ask us for an HSM key attached to your dedicated environment, and we’ll be pleased to provide you with it!

Notes

Notes
1like prime numbers or elliptic curves
2e.g. 4096 bytes-length blocks
3because it’s private, see?
4Apple has also recently released its Cryptokit library, giving to developers access to crypto patterns in a native Swift codebase
5like with Node.js above
6ECC 256 bits key is as secured as RSA 3078 bits length key
7like Nitrokey’s HSM dongle