Written by

Whether you devel­op a closed or an open source prod­uct, you rarely do it for your­self. You build tools and solu­tions for peo­ple. And those users gen­er­ate a lot of data. Data leads to pri­va­cy. Recent Big Data scan­dals prove that users often don’t know how to pro­tect them­selves.

To bring data pro­tec­tion to our users, we must add encryp­tion to our prod­ucts. Whatsapp, Signal, Keybase, Nextcloud E2E mod­ule… all those apps rely on cryp­tog­ra­phy in a trans­par­ent way for users.

We give a lot of talks about secu­ri­ty and cryp­tog­ra­phy as a part of our Tech Evangelism action. It’s a good time to share this knowl­edge here, and not only on con­fer­ences. Please wel­come an intro­duc­tion to Cryptography For The Newcomers.


You said: “encryption”?

First, a quick reminder: encryp­tion is all about con­tent obfus­ca­tion. It means that any encrypt­ed data becomes unread­able with­out a key. Some encryp­tion is sin­gle-way: the obfus­cat­ed con­tent can’t return to its orig­i­nal form. They are the check­sum algo­rithms and they’re only used for con­trol tasks. Others ciphers are reversible. The con­tent will be deci­phered to its pri­mal form at some point. Those are the one involved in secured data trans­mis­sion and stor­age.

All rely on a com­mon ele­ment: the key. Without it, no one can deci­pher nor con­trol the encrypt­ed con­tent, regard­less of its form. It’s the most crit­i­cal part of any cryp­to­graph­ic sys­tem. The secret needs to be hard to brute force.

Digital encryp­tion per­forms huge and com­plex tasks that could nev­er be done by hand. It ensures that the con­tent is effi­cient­ly pro­tect­ed. Ciphers rely on com­plex math problems1) to com­pute the keys and (de)cipher the data.

So, what’s in the hat?

This sec­tion explains in a sim­ple way how ciphers work with con­tents. If you pre­fer to skip this the­o­ret­i­cal part, feel free 🙂

Encryption goal is to make the con­tent unread­able to oth­ers. It’s still the same process since the begin: it’s a block sub­sti­tu­tion. You may have tried it by your­self when you were young, play­ing with a cipher wheel from a mag­a­zine.

With this tool, if we per­form a 3 shift sub­sti­tu­tion, then A becomes D, B becomes E, and so on. So the text Hello World! becomes KHOOR ZRUOG!.

Modern dig­i­tal algo­rithms do the same but in a very more com­plex way. They apply cas­cad­ing effects on blocks of data 2) to make it more com­plex to retro-engi­neer.

We must ini­tial­ize those algo­rithms with para­me­ters. It avoids rep­e­ti­tions, that present a risk of analy­sis attack on a large amount of data. This Initialization Vector (IV) is ran­dom­ly unique. It ensures you won’t encrypt two dif­fer­ent con­tents in the same envi­ron­ment. But we need pure ran­dom ele­ments to make them unique.

Our com­put­ers aren’t unpre­dictable: they tend to repro­duce some pat­terns over time. So cryp­tog­ra­phers have cre­at­ed pure ran­dom func­tions to gen­er­ate a Cryptographically Secure Pseudo Random Number Generator (CSPRNG) con­tent. As you can see, the basic prin­ci­ple is sim­ple, but its imple­men­ta­tion is far from acces­si­ble to any­one. That’s why we nev­er per­form encryp­tion by hand. We rely on ded­i­cat­ed cryp­to libraries, devel­oped and test­ed by cryp­tog­ra­phers.

Keep it secret

As said above, the key is the most crit­i­cal part of the cryp­to lay­er. You must keep it safe to ensure your con­tent stay pro­tect­ed. In the encryption/decryption fam­i­ly, they are two main types of ciphers. The dif­fer­ence is how they deal with their keys.

The first cat­e­go­ry is the Symmetric Ciphers, like AES or IDEA. They use the same key to per­form both encryp­tion and decryp­tion. They’re par­tic­u­lar­ly fast and they can han­dle a large amount of data. But you have to trans­fer the con­tent and the key if you want your recip­i­ent being able to decrypt the con­tent. If your key leaks, your con­tent isn’t safe any­more.

Second ones are Asymmetric Ciphers, like RSA. They use a pair of key: the first one is pub­licly avail­able to encrypt the data. To deci­pher it, you need the asso­ci­at­ed pri­vate key. Only the own­er of the key pair holds this one3). So any­one can send encrypt­ed con­tent to me with my pub­lic key (even myself), but I’m the only one able to deci­pher it. Problem is: those algo­rithms rely on math prob­lems and are very slow. It does­n’t fit very well with large con­tents.

In day to day devel­op­ments, we often rely on the key wrap­ping process. We gen­er­ate a CSPRNG that serves as a unique sym­met­ric key for the con­tent for fast and robust encryp­tion. Then we encrypt this sym­met­ric key with the recip­i­ent pub­lic key. This oper­a­tion is fast enough because the sym­met­ric key is small com­pared to the con­tent. We pack them — the encrypt­ed con­tent and the encrypt­ed sym­met­ric key — and we send/store it secure­ly. Best of the two worlds!

Ready? Use it!

Once again, you should not build your own cryp­to­graph­ic algo­rithm on your side. Cryptographers work togeth­er for years to cre­ate strong and well-test­ed ciphers. Recent abstrac­tion libraries allow every devel­op­er to use them eas­i­ly. So use them all!

The game-changer: libsodium, a multi-platform library

All lan­guages come with their own imple­men­ta­tion of cryp­to­graph­ic prim­i­tives. But using them is some­times hard. It needs you to know a lot about what cryp­to is and how it works, to pick the right algo­rithm and its para­me­ters. It’s some­times com­plex to fig­ure out which algo­rithm is avail­able for your use-case. Most of all, it means we need to have a lot of teams work­ing on cryp­to libs, for each lan­guage. It can’t be sus­tain­able and may lead to poten­tial fail­ure in the imple­men­ta­tions.

An ele­gant solu­tion is to con­cen­trate our efforts on a sin­gle agnos­tic cryp­to lib. This is exact­ly what the lib­sodi­um project stands for. It’s a low-lev­el, lan­guage-agnos­tic, portable library, ded­i­cat­ed to cryp­tog­ra­phy. It focus­es on devel­op­er’s usabil­i­ty, so it’s also a very good abstrac­tion. It gives to every­one an ele­gant way to pick the right choice with­out going mad at it. You can use the lib direct­ly in your code­base. Or you can rely on the bind­ings that allow you to inter­act with it in your favorite lan­guage!

Python

To gen­er­ate CSPRNG, you should rely on the embed secrets lib, which is way more secure than the os.random() func­tion.


To encrypt and decrypt con­tent, the PyNaCl pack­age is a pret­ty good choice:

Ruby

Use RbNaCl to rely on the lib­sodi­um library in Ruby:

PHP

Libsodium can be used from PHP with the lib­sodi­um-php exten­sion:


You can eas­i­ly install the PECL exten­sion on our serv­er with the ad_install_pecl libsodium com­mand. See our doc­u­men­ta­tion about how to man­age PECL exten­sions for more details.

Node.js®

Node.js® expos­es a very good cryp­tog­ra­phy API in the Crypto mod­ule. But rely­ing on lib­sodi­um is prob­a­bly a bet­ter choice. You can do it with libsodium.js, com­piled from the lib­sodi­um source code using WebAssembly:

Java

Java can rely on lib­sodi­um by using the lib­sodi­um-jna library:

The Dark Side

The ear­li­er you encrypt the con­tent, the most robust your archi­tec­ture will be. It means that encryp­tion should occur in the client itself. Libsodium is usable on mobile client with lib­sodi­um-jni for Android and swift-sodi­um for iOS4).

The dark side is when you have to work in the Web brows­er. WebCrypto is a pret­ty good low-lev­el API, but it is… well… low-lev­el ¯\_(ツ)_/¯. Using it in a day to day work is painful, and we need a more reli­able way. Another issue is that JavaScript is too weak to ensure a good lev­el of pro­tec­tion. Its pro­to­typed nature may cause data leaks, because of the unsafe envi­ron­ment.

Fortunately, lib­sodi­um has the libsodium.js pack­age! Using it in the brows­er is as easy as:


Note that you would pre­fer to use it as a module5) and pack it with your code­base using a bundler like Webpack or Parcel.js.

Maybe you need a more robust way to deal with your con­tent. Well, you can devel­op most of your web app core using Rust, and rely on sodi­u­mox­ide for the cryp­to parts! It com­piles well to WASM and you can run your whole domain code­base in the brows­er, and keep JS for the UI only. With the com­plete lib­sodi­um sup­port in your WASM mod­ules!

Far from nowhere

Using cryp­tog­ra­phy may be a real­ly hard job. We did­n’t cov­er in the exam­ples above many many things like:

  • how to deal with asym­met­ric keys
  • how to exchange them
  • how to use a Diffie-Hellman algo­rithm
  • how to sign con­tent to make it trustable
  • etc.

It’s worth repeat­ing that you should use high-lev­el libraries. Never try to devel­op your own algo­rithm if you’re not a con­firmed cryp­tog­ra­ph­er.

Whatever you need to pro­tect, always con­sid­er the worst pos­si­ble sce­nario. No pro­tec­tion can be strong enough to resist time. Always rely on strong, high-grade tool­ing, like long ECC and RSA keys6). Use unique keys. Use nonces. Avoid rep­e­ti­tions. Never hes­i­tate to update your code­base to increase secu­ri­ty when an audit reveals a flaw. Nobody’s per­fect, but we must fix secu­ri­ty issues each time we spot them!

Consider using ded­i­cat­ed hard­ware encryp­tion when work­ing on the serv­er with pri­vate keys. Hardware Security Module (HSM) devices allow you to exter­nal­ize keys com­pu­ta­tion. They make your keys inac­ces­si­ble from the filesys­tem. So they can­not be com­pro­mised if the serv­er is cor­rupt­ed. Running code­bas­es can pro­tect their keys at an afford­able cost, thanks to open-source hardware7).


Guessing this (not that so) short intro­duc­tion to cryp­tog­ra­phy sounds good to you. If you’re inter­est­ed in the top­ic, you can find a live french ver­sion on YouTube by myself (m4dz).

We give the talk in the next months both at Techorama, Netherlands and PiterPy, Saint Petersburg. Please join us to those awe­some events to meet and dis­cuss the top­ic!

Are you con­vinced about how crit­i­cal cryp­tog­ra­phy is in your project? Want to use it on your always­da­ta host­ing plans? Then ask us for an HSM key attached to your ded­i­cat­ed envi­ron­ment, and we’ll be pleased to pro­vide you with it!

Notes   [ + ]

1. like prime num­bers or ellip­tic curves
2. e.g. 4096 bytes-length blocks
3. because it’s pri­vate, see?
4. Apple has also recent­ly released its Cryptokit library, giv­ing to devel­op­ers access to cryp­to pat­terns in a native Swift code­base
5. like with Node.js above
6. ECC 256 bits key is as secured as RSA 3078 bits length key
7. like Nitrokey’s HSM don­gle