Miscellaneous notes

Here you'll find my miscellaneous, mostly-unsorted notes on various topics.

Project ideas

Various ideas for projects that I do not yet have the time, knowledge or energy to work on. Feel free to take these ideas if they seem interesting, though please keep them non-commercial and free of ads!

Project ideas

Automatic authentication keys

Problem: Every website needs you to create an account. This is a pain to manage, and a barrier. This is especially problematic for self-hosted things like Forgejo, because it gives centralized platforms an advantage (everyone already has an account there). It should be trivial to immediately start using a site without going through a registration process.

Other solutions: OIDC, OpenID and such all require you to have an account with a provider. You fully trust this provider with your access, or you need to self-host it which is extra work. Passkeys are extremely Google-shaped and dubiously designed and documented. Federation is a massively complex solution to design for, and really an unnecessary complexity expense for the vast majority of self-hosted cases.

Proposed solution: Authentication directly integrated into browser through a browser extension. It uses request interception APIs and such to detect "is supported" headers from websites, and inject authentication headers into requests upon confirmation from the user that they wish to authenticate (it should not disclose its existence before that point). Authentication is done through keys managed locally by the browser and optionally stored encrypted on a third-party server.

Unsolved issues: Key management and backup, making it robust. Offer to backup to a USB key? How to deal with Manifest v3 in Chrome?

Javascript

Anything about Javascript in general, that isn't specific to Node.js.

Javascript

Whirlwind tour of (correct) npm usage

This article was originally published at https://gist.github.com/joepie91/9b9dbd8c9ac3b55a65b2

This is a quick tour of how to get started with NPM, how to use it, and how to fix it.

I'm available for tutoring and code review :)

Starting a new project

Create a folder for your project, preferably a Git repository. Navigate into that folder, and run:

npm init

It will ask you a few questions. Hit Enter without input if you're not sure about a question, and it will use the default.

You now have a package.json.

If you're using Express: Please don't use express-generator. It sucks. Just use npm init like explained above, and follow the 'Getting Started' and 'Guide' sections on the Express website. They will teach you all you need to know when starting from scratch.

Installing a package

All packages in NPM are local - that is, specific to the project you install it in, and actually installed within that project. They are also nested - if you use the foo module, and foo uses the bar module, then you will have a ./node_modules/foo/node_modules/bar. This means you pretty much never have version conflicts, and can install as many modules as you want without running into issues.

All modern versions of NPM will 'deduplicate' and 'flatten' your module folder as much as possible to save disk space, but as a developer you don't have to care about this - it will still work like it's a tree of nested modules, and you can still assume that there will be no version conflicts.

You install a package like this:

npm install packagename

While the packages themselves are installed in the node_modules directory (as that's where the Node.js runtime will look for them), that's only a temporary install location. The primary place where your dependencies are defined, should be in your package.json file - so that they can be safely updated and reinstalled later, even if your node_modules gets lost or corrupted somehow.

In older versions of npm, you had to manually specify the --save flag to make sure that the package is saved in your package.json; that's why you may come across this in older articles. However, modern versions of NPM do this automatically, so the command above should be enough.

One case where you do still need to use a flag, is when you're installing a module that you just need for developing your project, but that isn't needed when actually using or deploying your project. Then you can use the --save-dev flag, like so:

npm install --save-dev packagename

Works pretty much the same, but saves it as a development dependency. This allows a user to install just the 'real' dependencies, to save space and bandwidth, if they just want to use your thing and not modify it.

To install everything that is declared in package.json, you just run it without arguments:

npm install

When you're using Git or another version control system, you should add node_modules to your ignore file (eg. .gitignore for Git); this is because installed copies of modules may need to be different depending on the system. You can then use the above command to make sure that all the dependencies are correctly installed, after cloning your repository to a new system.

Semantic versioning

Packages in NPM usually use semantic versioning; that is, the changes in a version number indicate what has changed, and whether the change is breaking. Let's take 1.2.3 as an example version. The components of that version number would be:

Depending on which number changes, there's a different kind of change to the module:

Most NPM packages follow this, and it gives you a lot of certainty in what upgrades are safe to carry out, and what upgrades aren't. NPM explicitly adopts semver in its package.json as well, by introducing a few special version formats:

By default, NPM will automatically use the ^1.2.3 notation, which is usually what you want. Only configure it otherwise if you have an explicit reason to do so.

A special case are 0.x.x versions - these are considered to be 'unstable', and the rules are slightly different: the minor version number indicates a breaking change, rather than the major version number. That means that ^0.1.2 will allow an upgrade to 0.1.3, but not to 0.2.0. This is commonly used for pre-release testing versions, where things may wildly change with every release.

If you end up publishing a module yourself (and you most likely eventually will), then definitely adhere to these guidelines as well. They make it a lot easier for developers to keep dependencies up to date, leading to considerably less bugs and security issues.

Global modules

Sometimes, you want to install a command-line utility such as peerflix, but it doesn't belong to any particular project. For this, there's the --global or -g flag:

npm install -g peerflix

If you used packages from your distribution to install Node, you may have to use sudo for global modules.

Never, ever, ever use global modules for project dependencies, ever. It may seem 'nice' and 'efficient', but you will land in dependency hell. It is not possible to enforce semver constraints on global modules, and things will spontaneously break. All the time. Don't do it. Global modules are only for project-independent, system-wide, command-line tools.

This applies even to development tools for your project. Different projects will often need different, incompatible versions of development tools - so those tools should be installed without the global flag. For local packages, the binaries are all collected in node_modules/.bin. You can then run the tools like so:

./node_modules/.bin/eslint

NPM is broken, and I don't understand the error!

The errors that NPM shows are usually not very clear. I've written a tool that will analyze your error, and try to explain it in plain English. It can be found here.

My dependencies are broken!

If you've just updated your Node version, then you may have native (compiled) modules that were built against the old Node version, and that won't work with the new one. Run this to rebuild them:

npm rebuild

My dependencies are still broken!

Make sure that all your dependencies are declared in package.json. Then just remove and recreate your node_modules:

rm -rf node_modules
npm install
Javascript

An overview of Javascript tooling

This article was originally published at https://gist.github.com/joepie91/3381ce7f92dec7a1e622538980c0c43d.

Getting confused about the piles of development tools that people use for Javascript? Here's a quick index of what is used for what.

Keep in mind that you shouldn't add tools to your workflow for the sake of it. While you'll see many production systems using a wide range of tools, these tools are typically used because they solved a concrete problem for the developers working on it. You should not add tools to your project unless you have a concrete problem that they can solve; none of the tools here are required.

Start with nothing, and add tools as needed. This will keep you from getting lost in an incomprehensible pile of tooling.

Build/task runners

Typical examples: Gulp, Grunt

These are not exactly build tools in and of themselves; they're rather just used to glue together other tools. For example, if you have a set of build steps where you need to run tool A after tool B, a build runner can help to orchestrate those tools.

Bundlers

Typical examples: Browserify, Webpack, Parcel

These tools take a bunch of .js files that use modules (either CommonJS using require() statements, or ES Modules using import statements), and combine them into a single .js file. Some of them also allow specifying 'transformation steps', but their main purpose is bundling.

Why does bundling matter? While in Node.js you have access to a module system that lets you load files as-needed from disk, this wouldn't be practical in a browser; fetching every file individually over the network would be very slow. That's why people use a bundler, which effectively does all this work upfront, and then produces a single 'combined' file with all the same guarantees of a module system, but that can be used in a browser.

Bundlers can also be useful for running module-using code in very basic JS environments that don't have module support for some reason; this includes Google Sheets, extensions for PostgreSQL, GNOME, and so on.

Bundlers are not transpilers. They do not compile one language to another, and they don't "make ES6 work everywhere". Those are the job of a transpiler. Bundlers are sometimes configured to use a transpiler, but the transpiling itself isn't done by the bundler.

Bundlers are not task runners. This is an especially popular misconception around Webpack. Webpack does not replace task runners like Gulp; while Gulp is designed to glue together arbitrary build tasks, Webpack is specifically designed for browser bundles. It's commonly useful to use Webpack with Gulp or another task runner.

Transpilers

Typical examples: Babel, the TypeScript compiler, CoffeeScript

These tools take a bunch of code in one language, and 'compile' it to another language. They're called commonly 'transpilers' rather than 'compilers' because unlike traditional compilers, these tools don't compile to a lower-level representation; they're just different languages at a similar level of abstraction.

These are typically used to run code written against newer JS versions in older JS runtimes (eg. Babel), or to provide custom languages with more conveniences or constraints that can then be executed in any regular JS environment (TypeScript, CoffeeScript).

Process restarters

Typical examples: nodemon

These tools automatically restart your (Node.js) process when the underlying code is changed. This is used for development purposes, to remove the need to manually restart your process every change.

A process restarter may either watch for file changes itself, or be controlled by an external tool like a build runner.

Page reloaders

Typical examples: LiveReload, BrowserSync, Webpack hot-reload

These tools automatically refresh a page in the browser and/or reload stylesheets and/or re-render parts of the page, to reflect the changes in your browser-side code. They're kind of the equivalent of a process restarter, but for webpages.

These tools are usually externally controlled; typically by either a build runner or a bundler, or both.

Debuggers

Typical examples: Chrome Developer Tools, node-inspect

These tools allow you to inspect running code; in Node.js, in your browser, or both. Typically they'll support things like pausing execution, stepping through function calls manually, inspecting variables, profiling memory allocations and CPU usage, viewing execution logs, and so on.

They're typically used to find tricky bugs. It's a good idea to learn how these tools work, but often it'll still be easier to find a bug by just 'dumb logging' variables throughout your code using eg. console.log.

Javascript

Monolithic vs. modular - what's the difference?

This article was originally published at https://gist.github.com/joepie91/7f03a733a3a72d2396d6.

When you're developing in Node.js, you're likely to run into these terms - "monolithic" and "modular". They're usually used to describe the different types of frameworks and libraries; not just HTTP frameworks, but modules in general.

At a glance

Coupled?

In software development, the terms "tightly coupled" and "loosely coupled" are used to indicate how much components rely on each other; or more specifically, how many assumptions they make about each other. This directly translates to how easy it is to replace and change them.

While tight coupling can sometimes result in slightly more performant code and very occasionally makes it easier to build a 'mental model', loosely coupled code is much easier to understand and maintain - as the inner workings of a component are separated from its interface or API, you can make many more assumptions about how it behaves.

Loosely coupled code is often centered around 'events' and data - a component 'emits' changes that occur, with data attached to them, and other components may optionally 'listen' to those events and do something with it. However, the emitting component has no idea who (if anybody!) is listening, and cannot make assumptions about what the data is going to be used for.

What this means in practice, is that loosely coupled (and modular!) code rarely needs to be changed - once it is written, has a well-defined set of events and methods, and is free of bugs, it no longer needs to change. If an application wants to start using the data differently, it doesn't require changes in the component; the data is still of the same format, and the application can simply process it differently.

This is only one example, of course - loose coupling is more of a practice than a pattern. The exact implementation depends on your usecase. A quick checklist to determine how loosely coupled your code is:

In this section, I've used the terms "component" and "application", but these are interchangeable with "callee"/"caller", and "provider"/"consumer". The principles remain the same.

The trade-offs

At first, a monolithic framework might look easier - after all, it already includes everything you think you're going to need. In the long run, however, you're likely to run into situations where the framework just doesn't quite work how you want it to, and you have to spend time trying to work around it. This problem gets worse if your usecase is more unusual - because the framework developers didn't keep in mind your usecase - but it's a risk that always exists to some degree.

Initially, a modular framework might look harder - you have to figure out what components to use for yourself. That's a one-time cost, however; the majority of modules are reusable across projects, so after your first project you'll have a good idea of what to start with. The remaining usecase-specific modules would've been just as much of a problem in a monolithic framework, where they likely wouldn't have existed to begin with.

Another consideration is the possibility to 'swap out' components. What if there's a bug in the framework that you're unable (or not allowed) to fix? When building your application modularly, you can simply get rid of the offending component and replace it with a different one; this usually doesn't take more than a few minutes, because components are typically small and only do one thing.

In a monolithic framework, this is more problematic - the component is an inherent part of the framework, and replacing it may be impossible or extremely hard, depending on how many assumptions the framework makes. You will almost certainly end up implementing a workaround of some sort, which can take hours; you need to understand the framework's codebase, the component you're using, and the exact reason why it's failing. Then you need to write code that works around it, sometimes even having to 'monkey-patch' framework methods.

Relatedly, you may find out halfway through the project that the framework doesn't support your usecase as well as you thought it would. Now you have to either replace the entire framework, or build hacks upon hacks to make it 'work' somehow; well enough to convince your boss or client, anyway. The higher cost for on-boarding new developers (as they have to learn an entire framework, not just the bits you're interested in right now), only compounds this problem - now they also have to learn why all those workarounds exist.

In summary, the tradeoffs look like this:

The "it's just a prototype!" argument

When explaining this to people, a common justification for picking a monolithic framework is that "it's just a prototype!", or "it's just an MVP!", with the implication that it can be changed later. In reality, it usually can't.

Try explaining to your boss that you want to throw out the working(!) code you have, and rewrite everything from the ground up in a different, more maintainable framework. The best response that you're likely to get, is your boss questioning why you didn't use that framework to begin with - but more likely, the answer is "no", and you're going to be stuck with your hard-to-maintain monolithic codebase for the rest of the project or your employment, whichever terminates first.

Again, the cost of a modular codebase is a one-time cost. After your first project, you already know where to find most modules you need, and building on a modular framework will not be more expensive than building on a monolithic one. Don't fall into the "prototype trap", and do it right from day one. You're likely to be stuck with it for the rest of your employment.

Javascript

Synchronous vs. asynchronous

This article was originally published at https://gist.github.com/joepie91/bf3d04febb024da89e3a3e61b164247d

You'll run into the terms "synchronous" and "asynchronous" a lot when working with JS. Let's look at what they actually mean.

Synchronous code is like what you might be used to already from other languages. You call a function, it does some work, and then returns the result. No other code runs in the meantime. This is simple to understand, but it's also inefficient; what if "doing some work" mostly involves getting some data from a database? In the meantime, our process is sitting around doing nothing, waiting for the database to respond. It could be doing useful work in that time!

And that's what brings us to asynchronous code. Asynchronous code works differently; you still call a function, but it doesn't return a result. Instead, you don't just pass the regular arguments to the function, but also give it a piece of code in a function (a so-called "asynchronous callback") to execute when the operation completes. The JS runtime stores this callback alongside the in-progress operation, to retrieve and execute it later when the external service (eg. the database) reports that the operation has been completed.

Crucially, this means that when you call an asynchronous function, it cannot wait until the external processing is complete before returning from the function! After all, the intention is to keep running other code in the meantime, so it needs to return from the function so that the 'caller' (the code which originally called the function) can continue doing useful things even while the external operation is in progress.

All of this takes place in what's called the "event loop" - you can pretty much think of it as a huge infinite loop that contains your entire program. Every time you trigger an external process through an asynchronous function call, that external process will eventually finish, and put its result in a 'queue' alongside the callback you specified. On each iteration ("tick") of the event loop, it then goes through that queue, executes all of the callbacks, which can then indirectly cause new items to be put into the queue, and so on. The end result is a program that calls asynchronous callbacks as and when necessary, and that keeps giving new work to the event loop through a chain of those callbacks.

This is, of course, a very simplified explanation - just enough to understand the rest of this page. I strongly recommend reading up on the event loop more, as it will make it much easier to understand JS in general. Here are some good resources that go into more depth:

  1. https://nodesource.com/blog/understanding-the-nodejs-event-loop (article)
  2. https://www.youtube.com/watch?v=8aGhZQkoFbQ (video)
  3. https://www.youtube.com/watch?v=cCOL7MC4Pl0 (video)

Now that we understand the what the event loop is, and what a "tick" is, we can define more precisely what "asynchronous" means in JS:

Asynchronous code is code that happens across more than one event loop tick. An asynchronous function is a function that needs more than one event loop tick to complete.

This definition will be important later on, for understanding why asynchronous code can be more difficult to write correctly than synchronous code.

Asynchronous execution order and boundaries

This idea of "queueing code to run at some later tick" has consequences for how you write your code.

Remember how the event loop is a loop, and ticks are iterations - this means that event loop ticks are distributed across time linearly. First the first tick happens, then the second tick, then the third tick, and so on. Something that runs in the first tick can never execute before something that runs in the third tick; unless you're a time traveller anyway, in which case you probably would have more important things to do than reading this guide 😃

Anyhow, this means that code will run in a slightly counterintuitive way, if you're used to synchronous code. For example, consider the following code, which uses the asynchronous setTimeout function to run something after a specified amount of milliseconds:

console.log("one");

setTimeout(() => {
	console.log("two");
}, 300);

console.log("three");

You might expect this to print out one, two, three - but if you try running this code, you'll see that it doesn't! Instead, you get this:

one
three
two

What's going on here?!

The answer to that is what I mentioned earlier; the asynchronous callback is getting queued for later. Let's pretend for the sake of explanation that an event loop tick only happens when there's actually something to do. The first tick would then run this code:

console.log("one");

setTimeout(..., 300); // This schedules some code to run in a next tick, about 300ms later

console.log("three");

Then 300 milliseconds elapse, with nothing for the event loop to do - and after those 300ms, the callback we gave to setTimeout suddenly appears in the event loop queue. Now the second tick happens, and it executes this code:

console.log("two");

... thus resulting in the output that we saw above.

The key insight here is that code with callbacks does not execute in the order that the code is written. Only the code outside of the callbacks executes in the written order. For example, we can be certain that three will get printed after one because both are outside of the callback and so they are executed in that order, but because two is printed from inside of a callback, we can't know when it will execute.

"But hold on", you say, "then how can you know that two will be printed after three and one?"

This is where the earlier definition of "asynchronous code" comes into play! Let's reason through it:

  1. setTimeout is asynchronous.
  2. Therefore, we call console.log("two") from within an asynchronous callback.
  3. Synchronous code executes within one tick.
  4. Asynchronous code needs more than one tick to execute, ie. the asynchronous callback will be called in a later tick than the one where we started the operation (eg. setTimeout).
  5. Therefore, an asynchronous callback will always execute after the synchronous code that started the operation, no matter what.
  6. Therefore, two will always be printed after one and three.

So, we can know when the asynchronous callback will be executed, in terms of relative time. That's useful, isn't it? Doesn't that mean that we can do that for all asynchronous code? Well, unfortunately not - it gets more complicated when there is more than one asynchronous operation.

Take, for example, the following code:

console.log("one");

someAsynchronousOperation(() => {
	console.log("two");
});

someOtherAsynchronousOperation(() => {
	console.log("three");
});

console.log("four");

We have two different asynchronous operations here, and we don't know for certain which of the two will finish faster. We don't even know whether it's always the same one that finishes faster, or whether it varies between runs of the program. So while we can determine that two and three will always be printed after one and four - remember, asynchronous callbacks in synchronous code - we can't know whether two or three will come first.

And this is, fundamentally, what makes asynchronous code more difficult to write; you never know for sure in what order your code will complete. Every real-world program will have at least some scenarios where you can't force an order of operations (or, at least, not without horribly bad performance), so this is a problem that you have to account for in your code.

The easiest solution to this, is to avoid "shared state". Shared state is information that you store (eg. in a variable) and that gets used by multiple parts of your code independently. This can sometimes be necessary, but it also comes at a cost - if function A and function B both modify the same variable, then if they run in a different order than you expected, one of them might mess up the expected state of the other. This is generally already true in programming, but even more important when working with asynchronous code, as your chunks of code get 'interspersed' much more due to the callback model.

[...]

Javascript

What is state?

This article was originally published at https://gist.github.com/joepie91/8c2cba6a3e6d19b275fdff62bef98311

"State" is data that is associated with some part of a program, and that can be changed over time to change the behaviour of the program. It doesn't have to be changed by the user; it can be changed by anything in the program, and it can be any kind of data.

It's a bit of an abstract concept, so here's an example: say you have a button that increases a number by 1 every time you click it, and the (pseudo-)code looks something like this:

let counter = 0;
let increment = 1;

button.on("click", () => {
	counter = counter + increment;
});

In this code, there are two bits of "state" involved:

  1. Whether the button is clicked: This bit of data - specifically, the change between "yes" and "no" - is what determines when to increase the counter. The example code doesn't interact with this data directly, but the callback is called whenever it changes from "no" to "yes" and back again.
  2. The current value of the counter: This bit of data is used to determine what the next value of the counter is going to be (the current value plus one), as well as what value to show on the screen.

Now, you may note that we also define an increment variable, but that it isn't in the list of things that are "state"; this is because the increment value never changes. It's just a static value (1) that is always the same, even though it's stored in a variable. That means it's not state.

You'll also note that "whether the button is clicked" isn't stored in any variable we have access to, and that we can't access the "yes" or "no" value directly. This is an example of what we'll call invisible state - data that is state, but that we cannot see or access directly - it only exists "behind the scenes". Nevertheless, it still affects the behaviour of the code through the event handler callback that we've defined, and that means it's still state.

Javascript

Promises reading list

This article was originally published at https://gist.github.com/joepie91/791640557e3e5fd80861

This is a list of examples and articles, in roughly the order you should follow them, to show and explain how promises work and why you should use them. I'll probably add more things to this list over time.

This list primarily focuses on Bluebird, but the basic functionality should also work in ES6 Promises, and some examples are included on how to replicate Bluebird functionality with ES6 promises. You should still use Bluebird where possible, though - they are faster, less error-prone, and have more utilities.

I'm available for tutoring and code review :)

You may reuse all of the referenced posts and Gists (written by me) for any purpose under the WTFPL / CC0 (whichever you prefer).

If you get stuck

I've made a brief FAQ of common questions that people have about Promises, and how to use them. If you don't understand something listed here, or you're wondering how to implement a specific requirement, chances are that it'll be answered in that FAQ.

Compatibility

Bluebird will not work correctly (in client-side code) in older browsers. If you need to support older browsers, and you're using Webpack or Browserify, you should use the es6-promise module instead, and reimplement behaviour where necessary.

Introduction

Promise.try

Many guides and examples fail to demonstrate Promise.try, or to explain why it's important. This article will explain it.

Error handling

Many examples on the internet don't show this, but you should always start a chain of promises with Promise.try, and if it is within a function or callback, you should always return your promises chain. Not doing so, will result in less reliable error handling and various other issues (eg. code executing too soon).

Promisifying

Functional (map, filter, reduce)

Nesting

ES6 Promises

Odds and ends

Some potentially useful snippets:

You're unlikely to need any of these things, if you just stick with either Bluebird or ES6 promises:

Javascript

The Promises FAQ - addressing the most common questions and misconceptions about Promises

This article was originally published at https://gist.github.com/joepie91/4c3a10629a4263a522e3bc4839a28c83. Nowadays Promises are more widely understood and supported, and it's not as relevant as it once was, but it's kept here for posterity.

By the way, I'm available for tutoring and code review :)

You'll find a table of contents on your left.

1. What Promises library should I use?

That depends a bit on your usecase.

My usual recommendation is Bluebird - it's robust, has good error handling and debugging facilities, is fast, and has a well-designed API. The downside is that Bluebird will not correctly work in older browsers (think Internet Explorer 8 and older), and when used in Browserified/Webpacked code, it can sometimes add a lot to your bundle size.

ES6 Promises are gaining a lot of traction purely because of being "ES6", but in practice they are just not very good. They are generally lacking standardized debugging facilities, they are missing essential utilities such as Promise.try/promisify/promisifyAll, they cannot catch specific error types (this is a big robustness issue), and so on.

ES6 Promises can be useful in constrained scenarios (eg. older browsers with a polyfill, restricted non-V8 runtimes, etc.) but I would not generally recommend them.

There are many other Promise implementations (Q, WhenJS, etc.) - but frankly, I've not seen any that are an improvement over either Bluebird or ES6 Promises in their respective 'optimal scenarios'. I'd also recommend explicitly against Q because it is extremely slow and has a very poorly designed API.

In summary: Use Bluebird, unless you have a very specific reason not to. In those very specific cases, you probably want ES6 Promises.

2. How do I create a Promise myself?

Usually, you don't. Promises are not usually something you 'create' explicitly - rather, they're a natural consequence of chaining together multiple operations. Take this example:

function getLinesFromSomething() {
    return Promise.try(() => {
        return bhttp.get("http://example.com/something.txt");
    }).then((response) => {
        return response.body.toString().split("\n");
    });
}

In this example, all of the following technically result in a new Promise:

... but none of them are explicitly created as "a new Promise" - that's just the natural consequence of starting a chain with Promise.try and then returning Promises or values from the callbacks.

There is one example to this, where you do need to explicitly create a new Promise - when converting a different kind of asynchronous API to a Promises API, and even then you only need to do this if promisify and friends don't work. This is explained in question 7.

3. How do I use new Promise?

You don't, usually. In almost every case, you either need Promise.try, or some kind of promisification method. Question 7 explains how you should do promisification, and when you do need new Promise.

But when in doubt, don't use it. It's very error-prone.

4. How do I resolve a Promise?

You don't, usually. Promises are not something you need to 'resolve' manually - rather, you should just return some kind of Promise, and let the Promise library handle the rest.

There's one exception here: when you're manually promisifying a strange API using new Promise, you need to call resolve() or reject() for a successful and unsuccessful state, respectively. Make sure to read question 3, though - you should almost never actually use new Promise.

5. But what if I want to resolve a synchronous result or error?

You simply return it (if it's a result) or throw it (if it's an error), from your .then callback. When using Promises, synchronously returned values are automatically converted into a resolved Promise, whereas synchronously thrown errors are automatically converted into a rejected Promise. You don't need to use Promise.resolve() or Promise.reject().

6. But what if it's at the start of a chain, and I'm not in a .then callback yet?

Using Promise.try will make this problem not exist.

7. How do I make this non-Promises library work with Promises?

That depends on what kind of API it is.

8. How do I propagate errors, like with if(err) return cb(err)?

You don't. Promises will propagate errors automatically, and you don't need to do anything special for it - this is one of the benefits that Promises provide over error-first callbacks.

When using Promises, the only case where you need to .catch an error, is if you intend to handle it - and you should always only catch the types of error you're interested in.

These two Gists (step 1, step 2) show how error propagation works, and how to .catch specific types of errors.

9. How do I break out of a Promise chain early?

You don't. You use conditionals instead. Of course, specifically for failure scenarios, you'd still throw an error.

10. How do I convert a Promise to a synchronous value?

You can't. Once you write asynchronous code, all of the 'surrounding' code also needs to be asynchronous. However, you can just have a Promise chain in the 'parent code', and return the Promise from your own method.

For example:

function getUserFromDatabase(userId) {
    return Promise.try(() => {
        return database.table("users").where({id: userId}).get();
    }).then((results) => {
        if (results.length === 0) {
            throw new MyCustomError("No users found with that ID");
        } else {
            return results[0];
        }
    });
}

/* Now, to *use* that getUserFromDatabase function, we need to have another Promise chain: */

Promise.try(() => {
    // Here, we return the result of calling our own function. That return value is a Promise.
    return getUserFromDatabase(42);
}).then((user) => {
    console.log("The username of user 42 is:", user.username);
});

(If you're not sure what Promise.try is or does, this article will explain it.)

11. How do I save a value from a Promise outside of the callback?

You don't. See question 10 above - you need to use Promises "all the way down".

12. How do I access previous results from the Promise chain?

In some cases, you might need to access an earlier result from a chain of Promises, one that you don't have access to anymore. A simple example of this scenario:

'use strict';

// ...

Promise.try(() => {
    return database.query("users", {id: req.body.userId});
}).then((user) => {
    return database.query("groups", {id: req.body.groupId});
}).then((group) => {
    res.json({
        user: user, // This is not possible, because `user` is not in scope anymore.
        group: group
    });
});

This is a fairly simple case - the user query and the group query are completely independent, and they can be run at the same time. Because of that, we can use Promise.all to run them in parallel, and return a combined Promise for both of their results:

'use strict';

// ...

Promise.try(() => {
    return Promise.all([
        database.query("users", {id: req.body.userId}),
        database.query("groups", {id: req.body.groupId})
    ]);
}).spread((user, group) => {
    res.json({
        user: user, // Now it's possible!
        group: group
    });
});

Note that instead of .then, we use .spread here. Promises only support a single result argument for a .then, which is why a Promise created by Promise.all would resolve to an array of [user, group] in this case. However, .spread is a Bluebird-specific variation of .then, that will automatically "unpack" that array into multiple callback arguments. Alternatively, you can use ES6 object destructuring to accomplish the same.

Now, the above example assumes that the two asynchronous operations are independent - that is, they can run in parallel without caring about the result of the other operation. In some cases, you will want to use the results of two operations that are dependent - while you still want to use the results of both at the same time, the second operation also needs the result of the first operation to work.

An example:

'use strict';

// ...

Promise.try(() => {
    return getDatabaseConnection();
}).then((databaseConnection) => {
    return databaseConnection.query("users", {id: req.body.id});
}).then((user) => {
    res.json(user);

    // This is not possible, because we don't have `databaseConnection` in scope anymore:
    databaseConnection.close();
});

In these cases, rather than using Promise.all, you'd add a level of nesting to keep something in scope:

'use strict';

// ...

Promise.try(() => {
    return getDatabaseConnection();
}).then((databaseConnection) => {
    // We nest here, so that `databaseConnection` remains in scope.

    return Promise.try(() => {
        return databaseConnection.query("users", {id: req.body.id});
    }).then((user) => {
        res.json(user);

        databaseConnection.close(); // Now it works!
    });
});

Of course, as with any kind of nesting, you should do it sparingly - and only when necessary for a situation like this. Splitting up your code into small functions, with each of them having a single responsibility, will prevent trouble with this.

Javascript

Error handling (with Promises)

This article was originally published at https://gist.github.com/joepie91/c8d8cc4e6c2b57889446. It only applies when using Promise chaining syntax; when you use async/await, you are instead expected to use try/catch, which unfortunately does not support error filtering.

There's roughly three types of errors:

  1. Expected errors - eg. "URL is unreachable" for a link validity checker. You should handle these in your code at the top-most level where it is practical to do so.
  2. Unexpected errors - eg. a bug in your code. These should crash your process (yes, really), they should be logged and ideally e-mailed to you, and you should fix them right away. You should never catch them for any purpose other than to log the error, and even then you should make the process crash.
  3. User-facing errors - not really in the same category as the above two. While you can represent them with error objects (and it's often practical to do so), they're not really errors in the programming sense - rather, they're user feedback. When represented as error objects, these should only ever be handled at the top-most point of a request - in the case of Express, that would be the error-handling middleware that sends a HTTP status code and a response.

Would I still need to use try/catch if I use promises?

Sort of. Not the usual try/catch, but eg. Bluebird has a .try and .catch equivalent. It works like synchronous try/catch, though - errors are propagated upwards automatically so that you can handle them where appropriate.

Bluebird's try isn't identical to a standard JS try - it's more a 'start using Promises' thing, so that you can also wrap synchronous errors. That's the magic of Promises, really - they let you handle synchronous and asynchronous errors/values like they're one and the same thing.

Below is a relatively complex example, that uses a custom 'error filter' (predicate) function, because filesystem errors have a name but not a special error type. The error filtering is only available in Bluebird, by the way - 'native' Promises don't have the filtering.

/* UPDATED: This example has been changed to use the new object predicates, that were
 * introduced in Bluebird 3.0. If you are using Bluebird 2.x, you will need to use the
 * older example below, with the predicate function. */

var Promise = require("bluebird");
var fs = Promise.promisifyAll(require("fs"));

Promise.try(function(){
	return fs.readFileAsync("./config.json").then(JSON.parse);
}).catch({code: "ENOENT"}, function(err){
	/* Return an empty object. */
	return {};
}).then(function(config){
	/* `config` now either contains the JSON-parsed configuration file, or an empty object if no configuration file existed. */
});

If you are still using Bluebird 2.x, you should use predicate functions instead:

/* This example is ONLY for Bluebird 2.x. When using Bluebird 3.0 or newer, you should
 * use the updated example above instead. */

var Promise = require("bluebird");
var fs = Promise.promisifyAll(require("fs"));

var NonExistentFilePredicate = function(err) {
	return (err.code === "ENOENT");
};

Promise.try(function(){
	return fs.readFileAsync("./config.json").then(JSON.parse);
}).catch(NonExistentFilePredicate, function(err){
	/* Return an empty object. */
	return {};
}).then(function(config){
	/* `config` now either contains the JSON-parsed configuration file, or an empty object if no configuration file existed. */
});

 

Javascript

Bluebird Promise.try using ES6 Promises

This article was originally published at https://gist.github.com/joepie91/255250eeea8b94572a03.

Note that this will only be equivalent to Promise.try if your runtime or ES6 Promise shim correctly catches synchronous errors in Promise constructors.

If you are using the latest version of Node, this should be fine.

var Promise = require("es6-promise").Promise;

module.exports = function promiseTry(func) {
    return new Promise(function(resolve, reject) {
        resolve(func());
    })
}

 

Javascript

Please don't include minified builds in your npm packages!

This article was originally published at https://gist.github.com/joepie91/04cc8329df231ea3e262dffe3d41f848

There's quite a few libraries on npm that not only include the regular build in their package, but also a minified build. While this may seem like a helpful addition to make the package more complete, it actually poses a real problem: it becomes very difficult to audit these libraries.

The problem

You've probably seen incidents like the event-stream incident, where a library was compromised in some way by an attacker. This sort of thing, also known as a "supply-chain attack", is starting to become more and more common - and it's something that developers need to protect themselves against.

One effective way to do so, is by auditing dependencies. Having at least a cursory look through every dependency in your dependency tree, to ensure that there's nothing sketchy in there. While it isn't going to be 100% perfect, it will detect most of these attacks - and not only is briefly reviewing dependencies still faster than reinventing your own wheels, it'll also give you more insight into how your application actually works under the hood.

But, there's a problem: a lot of packages include almost-duplicate builds, sometimes even minified ones. It's becoming increasingly common to see a separate CommonJS and ESM build, but in many cases there's a minified build included too. And those are basically impossible to audit! Even with a code beautifier, it's very difficult to understand what's really going on. But you can't ignore them either, because if they are a part of the package, then other code can require them. So you have to audit them.

There's a workaround for this, in the form of "reproducing" the build; taking the original (Git) repository for the package which only contains the original code and not the minified code, checking out the intended version, and then just running a build that creates the minified version, which you can then compare to the one on npm. If they match, then you can assume that you only need to audit the original source in the Git repo.

Or well, that would be the case, if it weren't possible for the build tools to introduce malicious code as well. Argh! Now you need to audit all of the build tools being used as well, at the specific versions that are being used by each dependency. Basically, you're now auditing hundreds of build stacks. This is a massive waste of time for every developer who wants to make sure there's nothing sketchy in their dependencies!

All the while these minified builds don't really solve a problem. Which brings me to...

Why it's unnecessary to include minified builds

As a library author, you are going to be dealing with roughly two developer demographics:

  1. Those who just want a file they can include as a <script> tag, so that they can use your library in their (often legacy) module-less code.
  2. Those with a more modern development stack, including a package manager (npm) and often also build tooling.

For the first demographic, it makes a lot of sense to provide a pre-minified build, as they are going to directly include it in their site, and it should ideally be small. But, here's the rub: those are also the developers who probably aren't using (or don't want to use) a package manager like npm! There's not really a reason why their minified pre-build should exist on npm, specifically - you might just as well offer it as a separate download.

For the second demographic, a pre-minified build isn't really useful at all. They probably already have their own development stack that does minification (of their own code and dependencies), and so they simply won't be using your minified build.

In short: there's not really a point to having a minified build in your npm package.

The solution

Simply put: don't include minified files in your npm package - distribute them separately, instead. In most cases, you can just put it on your project's website, or even in the (Git) repository.

If you really do have some specific reason to need to distribute them through npm, at least put them in a separate package (eg. yourpackage-minified), so that only those who actually use the minified version need to add it to their dependency folder.

Ideally, try to only have a single copy of your code in your package at all - so also no separate CommonJS and ESM builds, for example. CommonJS works basically everywhere, and there's basically no reason to use ESM anyway, so this should be fine for most projects.

If you really must include an ESM version of your code, you should at least use a wrapping approach instead of duplicating the code (note that this can be a breaking change!). But if you can, please leave it out to make it easier for developers to understand what they are installing into their project!

Anyone should be able to audit and review their dependencies, not just large companies with deep pockets; and not including unnecessarily duplicated or obfuscated code into your packages will help a long way towards that. Thanks!

Javascript

How to get the actual width of an element in jQuery, even with border-box: box-sizing

This article was originally published at https://gist.github.com/joepie91/5ffffefbf24dcfdb4477.

This is ridiculous, but per the jQuery documentation:

Note that .width() will always return the content width, regardless of the value of the CSS box-sizing property. As of jQuery 1.8, this may require retrieving the CSS width plus box-sizing property and then subtracting any potential border and padding on each element when the element has box-sizing: border-box. To avoid this penalty, use .css( "width" ) rather than .width().

function parsePx(input) {
	let match;
	
	if (match = /^([0-9+])px$/.exec(input)) {
		return parseFloat(match[1]);
	} else {
		throw new Error("Value is not in pixels!");
	}
}

$.prototype.actualWidth = function() {
	/* WTF, jQuery? */
	let isBorderBox = (this.css("box-sizing") === "border-box");
	let width = this.width();
	
	if (isBorderBox) {
		width = width
			+ parsePx(this.css("padding-left"))
			+ parsePx(this.css("padding-right"))
			+ parsePx(this.css("border-left-width"))
			+ parsePx(this.css("border-right-width"));
	}
	
	return width;
}

 

Javascript

A survey of unhandledRejection and rejectionHandled handlers

This article was originally published at https://gist.github.com/joepie91/06cca7058a34398f168b08223b642162

Bluebird (http://bluebirdjs.com/docs/api/error-management-configuration.html#global-rejection-events)

  • process.on//unhandledRejection: (Node.js) Potentially unhandled rejection.
  • process.on//rejectionHandled: (Node.js) Cancel unhandled rejection, it was handled anyway.
  • self.addEventListener//unhandledrejection: (WebWorkers) Potentially unhandled rejection.
  • self.addEventListener//rejectionhandled: (WebWorkers) Cancel unhandled rejection, it was handled anyway.
  • window.addEventListener//unhandledrejection: (Modern browsers, IE >= 9) Potentially unhandled rejection.
  • window.addEventListener//rejectionhandled: (Modern browsers, IE >= 9) Cancel unhandled rejection, it was handled anyway.
  • window.onunhandledrejection: (IE >= 6) Potentially unhandled rejection.
  • window.onrejectionhandled: (IE >= 6) Cancel unhandled rejection, it was handled anyway.

WhenJS (https://github.com/cujojs/when/blob/3.7.0/docs/debug-api.md)

  • process.on//unhandledRejection: (Node.js) Potentially unhandled rejection.
  • process.on//rejectionHandled: (Node.js) Cancel unhandled rejection, it was handled anyway.
  • window.addEventListener//unhandledRejection: (Modern browsers, IE >= 9) Potentially unhandled rejection.
  • window.addEventListener//rejectionHandled: (Modern browsers, IE >= 9) Cancel unhandled rejection, it was handled anyway.

Spec (https://gist.github.com/benjamingr/0237932cee84712951a2)

  • process.on//unhandledRejection: (Node.js) Potentially unhandled rejection.
  • process.on//rejectionHandled: (Node.js) Cancel unhandled rejection, it was handled anyway.

Spec (WHATWG: https://html.spec.whatwg.org/multipage/webappapis.html#unhandled-promise-rejections)

  • window.addEventListener//unhandledrejection: (Browsers) Potentially unhandled rejection.
  • window.addEventListener//rejectionhandled: (Browsers) Cancel unhandled rejection, it was handled anyway.
  • window.onunhandledrejection: (Browsers) Potentially unhandled rejection.
  • window.onrejectionhandled: (Browsers) Cancel unhandled rejection, it was handled anyway.

ES6 Promises in Node.js (https://nodejs.org/api/process.html#process_event_rejectionhandled onwards)

  • process.on//unhandledRejection: Potentially unhandled rejection.
  • process.on//rejectionHandled: Cancel unhandled rejection, it was handled anyway.

Yaku (https://github.com/ysmood/yaku#unhandled-rejection)

  • process.on//unhandledRejection: (Node.js) Potentially unhandled rejection.
  • process.on//rejectionHandled: (Node.js) Cancel unhandled rejection, it was handled anyway.
  • window.onunhandledrejection: (Browsers) Potentially unhandled rejection.
  • window.onrejectionhandled: (Browsers) Cancel unhandled rejection, it was handled anyway.
Javascript

Quill.js glossary

This article was originally published at https://gist.github.com/joepie91/46241ef1ce89c74958da0fdd7d04eb55.

Since Quill.js doesn't seem to document its strange jargon-y terms anywhere, here's a glossary that I've put together for it. No guarantees that it's correct! But I've done my best.

Quill - The WYSIWYG editor library

Parchment - The internal model used in Quill to implement the document tree

Scroll - A document, expressed as a tree, technically also a Blot (node) itself, specifically the root node

Blot - A node in the document tree

Block (Blot) - A block-level node

Inline (Blot) - An inline (formatting) node

Text (Blot) - A node that contains only(!) raw text contents

Break (Blot) - A node that contains nothing, used as a placeholder where there is no actual content

"a format" - A specific formatting attribute (width, height, is bold, ...)

.format(...) - The API method that is used to set a formatting attribute on some selection

Javascript

Riot.js cheatsheet

This article was originally published at https://gist.github.com/joepie91/ed3a267de70210b46fb06dd57077827a

Component styling

This section only applies to Riot.js 2.x. Since 3.x, all styles are scoped by default and you can simply add a style tag to your component.

  1. You can use a <style> tag within your tag. This style tag is applied globally by default.
  2. You can scope your style tag to limit its effect to the component that you've defined it in. Note that scoping is based on the tag name. There are two options:
  3. Use the scoped attribute, eg. <style scoped> ... </style>
  4. Use the :scope pseudo-selector, eg. <style> :scope { ... } </style>
  5. You can change where global styles are 'injected' by having <style type="riot"></style> somewhere in your <head>. This is useful for eg. controlling what styles are overridden.

Mounting

"Mounting" is the act of attaching a custom tag's template and behaviour to a specific element in the DOM. The most common case is to mount all instances of a specific top-level tag, but there are more options:

  1. Mount all custom tags on the page: riot.mount("*")
  2. Mount all instances of a specific tag name: riot.mount("app")
  3. Mount a tag with a specific ID: riot.mount("#specific_element")
  4. Mount using a more complex selector: riot.mount("foo, bar")

Note that "child tags" (that is, custom tags that are specified within other custom tags) are automatically mounted as-needed. You do not need to riot.mount them separately.

The simplest example:

<script>
// Load the `app` tag's definition here somehow...

document.addEventListener("DOMContentLoaded", (event) => {
    riot.mount("app");
});
</script>

<app></app>

Tag logic

All of the above also work on regular (ie. non-Riot) HTML tags.

If you need to add/hide/display/loop a group of tags, rather than a single one, you can wrap them in a <virtual> pseudo-tag. This works with all of the above constructs. For example:

<virtual for="{item in items}">
    <label>{item.label}</label>
    <textarea>{item.defaultValue}</textarea>
</virtual>
Javascript

Quick reference for `checkit` validators

This article was originally published at https://gist.github.com/joepie91/cd107b3a566264b28a3494689d73e589.

Presence

Character set

Value

Value (numbers)

Note that "numbers" refers to both Number-type values, and strings containing numeric values!

Relations to other fields

JavaScript types

Format

Javascript

ES Modules are terrible, actually

This post was originally published at https://gist.github.com/joepie91/bca2fda868c1e8b2c2caf76af7dfcad3, which was in turn adapted from an earlier Twitter thread.

It's incredible how many collective developer hours have been wasted on pushing through the turd that is ES Modules (often mistakenly called "ES6 Modules"). Causing a big ecosystem divide and massive tooling support issues, for... well, no reason, really. There are no actual advantages to it. At all.

It looks shiny and new and some libraries use it in their documentation without any explanation, so people assume that it's the new thing that must be used. And then I end up having to explain to them why, unlike CommonJS, it doesn't actually work everywhere yet, and may never do so. For example, you can't import ESM modules from a CommonJS file! (Update: I've released a module that works around this issue.)

And then there's Rollup, which apparently requires ESM to be used, at least to get things like treeshaking. Which then makes people believe that treeshaking is not possible with CommonJS modules. Well, it is - Rollup just chose not to support it.

And then there's Babel, which tried to transpile import/export to require/module.exports, sidestepping the ongoing effort of standardizing the module semantics for ESM, causing broken imports and require("foo").default nonsense and spec design issues all over the place.

And then people go "but you can use ESM in browsers without a build step!", apparently not realizing that that is an utterly useless feature because loading a full dependency tree over the network would be unreasonably and unavoidably slow - you'd need as many roundtrips as there are levels of depth in your dependency tree - and so you need some kind of build step anyway, eliminating this entire supposed benefit.

And then people go "well you can statically analyze it better!", apparently not realizing that ESM doesn't actually change any of the JS semantics other than the import/export syntax, and that the import/export statements are equally analyzable as top-level require/module.exports.

"But in CommonJS you can use those elsewhere too, and that breaks static analyzers!", I hear you say. Well, yes, absolutely. But that is inherent in dynamic imports, which by the way, ESM also supports with its dynamic import() syntax. So it doesn't solve that either! Any static analyzer still needs to deal with the case of dynamic imports somehow - it's just rearranging deck chairs on the Titanic.

And then, people go "but now we at least have a standard module system!", apparently not realizing that CommonJS was literally that, the result of an attempt to standardize the various competing module systems in JS. Which, against all odds, actually succeeded!

... and then promptly got destroyed by ESM, which reintroduced a split and all sorts of incompatibility in the ecosystem, rather than just importing some updated variant of CommonJS into the language specification, which would have sidestepped almost all of these issues.

And while the initial CommonJS standardization effort succeeded due to none of the competing module systems being in particularly widespread use yet, CommonJS is so ubiquitous in Javascript-land nowadays that it will never fully go away. Which means that runtimes will forever have to keep supporting two module systems, and developers will forever be paying the cost of the interoperability issues between them.

But it's the future!

Is it really? The vast majority of people who believe they're currently using ESM, aren't even actually doing so - they're feeding their entire codebase through Babel, which deftly converts all of those snazzy import and export statements back into CommonJS syntax. Which works. So what's the point of the new module system again, if it all works with CommonJS anyway?

And it gets worse; import and export are designed as special-cased statements. Aside from the obvious problem of needing to learn a special syntax (which doesn't quite work like object destructuring) instead of reusing core language concepts, this is also a downgrade from CommonJS' require, which is a first-class expression due to just being a function call.

That might sound irrelevant on the face of it, but it has very real consequences. For example, the following pattern is simply not possible with ESM:

const someInitializedModule = require("module-name")(someOptions);

Or how about this one? Also no longer possible:

const app = express();
// ...
app.use("/users", require("./routers/users"));

Having language features available as a first-class expression is one of the most desirable properties in language design; yet for some completely unclear reason, ESM proponents decided to remove that property. There's just no way anymore to directly combine an import statement with some other JS syntax, whether or not the module path is statically specified.

The only way around this is with await import, which would break the supposed static analyzer benefits, only work in async contexts, and even then require weird hacks with parentheses to make it work correctly.

It also means that you now need to make a choice: do you want to be able to use ESM-only dependencies, or do you want to have access to patterns like the above that help you keep your codebase maintainable? ESM or maintainability, your choice!

So, congratulations, ESM proponents. You've destroyed a successful userland specification, wasted many (hundreds of?) thousands of hours of collective developer time, many hours of my own personal unpaid time trying to support people with the fallout, and created ecosystem fragmentation that will never go away, in exchange for... fuck all.

This is a disaster, and the only remaining way I see to fix it is to stop trying to make ESM happen, and deprecate it in favour of some variant of CommonJS modules being absorbed into the spec. It's not too late yet; but at some point it will be.

Javascript

A few notes on the "Gathering weak npm credentials" article

This article was originally published in 2017 at https://gist.github.com/joepie91/828532657d23d512d76c1e68b101f436. Since then, npm has implemented 2FA support in the registry, and was acquired by Microsoft through Github.

Yesterday, an article was released that describes how one person could obtain access to enough packages on npm to affect 52% of the package installations in the Node.js ecosystem. Unfortunately, this has brought about some comments from readers that completely miss the mark, and that draw away attention from the real issue behind all this.

To be very clear: This (security) issue was caused by 1) poor password management on the side of developers, 2) handing out unnecessary publish access to packages, and most of all 3) poor security on the side of the npm registry.

With that being said, let's address some of the common claims. This is going to be slightly ranty, because to be honest I'm rather disappointed that otherwise competent infosec people distract from the underlying causes like this. All that's going to do is prevent this from getting fixed in other language package registries, which almost certainly suffer from the same issues.

"This is what you get when you use small dependencies, because there are such long dependency chains"

This is very unlikely to be a relevant factor here. Don't forget that a key part of the problem here is that publisher access is handed out unnecessarily; if the Node.js ecosystem were to consist of a few large dependencies (that everybody used) instead of many small ones (that are only used by those who actually need the entire dependency), you'd just end up with each large dependency being responsible for a larger part of the 52%.

There's a potential point of discussion in that a modular ecosystem means that more different groups of people are involved in the implementation of a given dependency, and that this could provide for a larger (human) attack surface; however, this is a completely unexplored argument for which no data currently exists, and this particular article does not provide sufficient evidence to show it to be true.

Perhaps not surprisingly, the "it's because of small dependencies" argument seems to come primarily from people who don't fully understand the Node.js dependency model and make a lot of (incorrect) assumptions about its consequences, and who appear to take every opportunity to blame things on "small dependencies" regardless of technical accuracy.

In short: No, this is not because of small dependencies. It would very likely happen with large dependencies as well.

"See, that's why you should always lock your dependency versions. This is why semantic versioning is bad."

Aside from semantic versioning being a practice that's separate from automatically updating based on a semver range, preventing automatic updates isn't going to prevent this issue either. The problem here is with publish access to the modules, which is a completely separate concern from "how the obtained access is misused".

In practice, most people who "lock dependency versions" seem to follow a practice of "automatically merge any update that doesn't break tests" - which really is no different from just letting semver ranges do their thing. Even if you do audit updates before you apply them (and let's be realistic, how many people actually do this for every update?), it would be trivial to subtly backdoor most of the affected packages due to their often aging and messy codebase, where one more bit of strange code doesn't really stand out.

The chances of locked dependencies preventing exploitation are close to zero. Even if you do audit your updates, it's relatively trivial for a competent developer to sneak by a backdoor. At the same time, "people not applying updates" is a far bigger security issue than audit-less dependency locking will solve.

All this applies to "vendoring in dependencies", too - vendoring in dependencies is no technically different from pinning a version/hash of a dependency.

In short: No, dependency locking will not prevent exploitation through this vector. Unless you have a strict auditing process (which you should, but many do not), you should not lock dependency versions.

"That's why you should be able to add a hash to your package.json, so that it verifies the integrity of the dependency.

This solves a completely different and almost unimportant problem. The only thing that a package hash will do, is assuring that everybody who installs the dependencies gets the exact same dependencies (for a locked set of versions). However, the npm registry already does that - it prevents republishing different code under an already-used version number, and even with publisher access you cannot bypass that.

Package hashes also give you absolutely zero assurances about future updates; package hashes are not signatures.

In short: This just doesn't even have anything to do with the credentials issue. It's totally unrelated.

"See? This is why Node.js is bad."

Unfortunately plenty of people are conveniently using this article as an excuse to complain about Node.js (because that's apparently the hip thing to do?), without bothering to understand what happened. Very simply put: this issue is not in any way specific to Node.js. The issue here is an issue of developers with poor password policies and poor registry access controls. It just so happens that the research was done on npm.

As far as I am aware, this kind of research has not been carried out for any other language package registries - but many other registries appear to be similarly poorly monitored and secured, and are very likely to be subject to the exact same attack.

If you're using this as an excuse to complain about Node.js, without bothering to understand the issue well enough to realize that it's a language-independent issue, then perhaps you should reconsider exactly how well-informed your point of view of Node.js (or other tools, for that matter) really is. Instead, you should take this as a lesson and prevent this from happening in other language ecosystems.

In short: This has absolutely nothing to do with Node.js specifically. That's just where the research happens to be done. Take the advice and start looking at other language package registries, to ensure they are not vulnerable to this either.

So then how should I fix this?

  1. Demand from npm Inc. that they prioritize implementing 2FA immediately, actively monitor for incidents like this, and generally implement all the mitigations suggested in the article. It's really not reasonable how poorly monitored or secured the registry is, especially given that it's operated by a commercial organization, and it's been around for a long time.
  2. If you have an npm account, follow the instructions here.
  3. Carry out or encourage the same kind of research on the package registry for your favorite language. It's very likely that other package registries are similarly insecure and poorly monitored.

Unfortunately, as a mere consumer of packages, there's nothing you can do about this other than demanding that npm Inc. gets their registry security in order. This is fundamentally an infrastructure problem.

Node.js

Things that are specific to Node.js. Note that things about Javascript in general, are found under their own "Javascript" chapter!

Node.js

How to install Node.js applications, if you're not a Node.js developer

This article was originally published at https://gist.github.com/joepie91/24f4e70174d10325a9af743a381d5ec6.

While installing a Node.js application isn't difficult in principle, it may still be confusing if you're not used to how the Node.js ecosystem works. This post will tell you how to get the application going, what to expect, and what to do if it doesn't work.

Occasionally an application may have custom installation steps, such as installing special system-wide dependencies; in those cases, you'll want to have a look at the install documentation of the application itself as well. However, most of the time it's safe to assume that the instructions below will work fine.

If the application you want to install is available in your distribution's repositories, then install it through there instead and skip this entire guide; your distribution's package manager will take care of all the dependencies.

Checklist

Before installing a Node.js application, check the following things:

  1. You're running a maintained version of Node.js. You can find a list of current maintained versions here. For minimal upgrade headaches, ensure that you're running an LTS version. If your system is running an unsupported version, you should install Node.js from the Nodesource repositories instead.
  2. Your version of Node.js is a standard one. In particular Debian and some Debian-based distributions have a habit of modifying the way Node.js works, leading to a lot of things breaking. Try running node --version - if that works, you're running a standard-enough version. If you can only do nodejs --version, you should install Node.js from the Nodesource repositories instead.
  3. You have build tools installed. In particular, you'll want to make sure that make, pkgconfig, GCC and Python exist on your system. If you don't have build tools or you're unsure, you'll want to install a package like build-essential (on Linux) or look here for further instructions (on other platforms, or unusual Linux distributions).
  4. npm works. Run npm --version to check this. If the npm command doesn't exist, your distribution is probably shipping a weird non-standard version of Node.js; use the Nodesource repositories instead. Do not install npm as a separate package, this will lead to headaches down the road.

No root/administrator access, no repositories exist for your distro, can't change your system-wide Node.js version, need a really specific Node.js version to make the application work, or have some other sort of edge case? Then nvm can be a useful solution, although keep in mind that it will not automatically update your Node.js installation.

How packages work in Node.js

Packages work a little differently in Node.js from most languages and distributions. In particular, dependencies are not installed system-wide. Every project has its own (nested) set of dependencies. This solves a lot of package management problems, but it can take a little getting used to if you're used to other systems.

In practice, this means that you should almost always do a regular npm install - that is, installing the dependencies locally into the project. The only time you need to do a 'global installation' (using npm install -g packagename) is when you're installing an application that is itself published on npm, and you want it to be available globally on your system.

This also means that you should not run npm as root by default. This is a really important thing to internalize, or you'll run into trouble down the line.

To recap:

If you're curious about the details of packages in Node.js, here is a developer-focused article about them.

Installing an application from the npm registry

Is the application published on the npm registry, ie. does it have a page on npmjs.org? Great! That means that installation is a single command.

If you've installed Node.js through your distribution's package manager: sudo npm install -g packagename, where packagename is the name of the package on npm.

If you've installed Node.js through nvm or a similar tool: npm install -g packagename, where packagename is the name of the package on npm.

You'll notice that you need to run the command as root (eg. through sudo) when installing Node.js through your distribution's package manager, but not when installing it through nvm.

This is because by default, Node.js will use a system-wide folder for globally installed packages; but under nvm, your entire Node.js installation exists in a subdirectory of your unprivileged user's home directory - including the 'global packages' folder.

After following these steps, some new binaries will probably be available for you to use system-wide. If the application's documentation doesn't tell you what binaries are available, then you should find its code repository, and look at the "bin" key in its package.json; that will contain a list of all the binaries it provides. Running them with --help will probably give you documentation.

You're done!

If you run into a problem: Scroll down to the 'troubleshooting' section.

Installing an application from a repository

Some applications are not published to the npm registry, and instead you're expected to install it from the code (eg. Git) repository. In those cases, start by looking at the application's install instructions to see if there are special requirements for cloning the repository, like eg. checking out submodules.

If there are no special instructions, then a simple git clone http://example.com/path/to/repository should work, replacing the URL with the cloning URL of the repository.

Making it available globally (like when installing from the npm registry)

Enter the cloned folder, and then run:

You're done!

If you run into a problem: Scroll down to the 'troubleshooting' section.

Keeping it in the repository

Sometimes you don't want to really install the application onto your system, but you rather just want to get it running locally from the repository.

In that case, enter the cloned folder, and run: npm install, with no other arguments.

You're done!

If you run into a problem: Scroll down to the 'troubleshooting' section.

Troubleshooting

Sometimes, things still won't work. In most cases it'll be a matter of missing some sort of undocumented external dependency, ie. a dependency that npm can't manage for you and that's typically provided by the OS. Sometimes it's a version compatibility issue. Occasionally applications are just outright broken.

When running into trouble with npm, try entering your installation output into this tool first. It's able to (fully automatically!) recognize the most common issues that people tend to run into with npm.

If the tool can't find your issue and it still doesn't work, then drop by the IRC channel (#Node.js on Libera, an online chat can be found here) and we'll be happy to help you get things going! You do need to register your username to talk in the channel; you can get help with this in the #libera channel.

Node.js

Getting started with Node.js

This article was originally published at https://gist.github.com/joepie91/95ed77b71790442b7e61. Some of the links in it still point to Gists that I have written; these will be moved over and relinked in due time.

Some of the suggestions on this page have become outdated, and better alternatives are available nowadays. However, the suggestions listed here should still work today as they did when this article was originally written. You do not need to update things to new approaches, and sometimes the newer approaches actually aren't better either, they can even be worse!

"How do I get started with Node?" is a commonly heard question in #Node.js. This gist is an attempt to compile some of the answers to that question. It's a perpetual work-in-progress.

And if this list didn't quite answer your questions, I'm available for tutoring and code review! A donation is also welcome :)

Setting expectations

Before you get started learning about JavaScript and Node.js, there's one very important article you need to read: Teach Yourself Programming in Ten Years.

Understand that it's going to take time to learn Node.js, just like it would take time to learn any other specialized topic - and that you're not going to learn effectively just by reading things, or following tutorials or courses. Get out there and build things! Experience is by far the most important part of learning, and shortcuts to this simply do not exist.

Avoid "bootcamps", courses, extensive books, and basically anything else that claims to teach you programming (or Node.js) in a single run. They all lie, and what they promise you simply isn't possible. That's also the reason this post is a list of resources, rather than a single book - they're references for when you need to learn about a certain topic at a certain point in time. Nothing more, nothing less.

There's also no such thing as a "definitive guide to Node.js", or a "perfect stack". Every project is going to have different requirements, that are best solved by different tools. There's no point in trying to learn everything upfront, because you can't know what you need to learn, until you actually need it.

In conclusion, the best way to get started with Node.js is to simply decide on a project you want to build, and start working on it. Start with the simplest possible implementation of it, and over time add bits and pieces to it, learning about those bits and pieces as you go. The links in this post will help you with that.

You'll find a table of contents for this page on your left.

Javascript refresher

Especially if you normally use a different language, or you only use Javascript occasionally, it's easy to misunderstand some of the aspects of the language.

The Node.js platform

Node.js is not a language. Rather, it's a "runtime" that lets you run Javascript without a browser. It comes with some basic additions such as a TCP library - or rather, in Node.js-speak, a "TCP module" - that you need to write server applications.

Setting up your environment

Functional programming

Javascript has part of its roots in functional programming languages, which means that you can use some of those concepts in your own projects. They can be greatly beneficial to the readability and maintainability of your code.

Module patterns

To build "configurable" modules, you can use a pattern known as "parametric modules". This gist shows an example of that. This is another example.

A commonly used pattern is the EventEmitter - this is exactly what it sounds like; an object that emits events. It's a very simple abstraction, but helps greatly in writing loosely coupled code. This gist illustrates the object, and the full documentation can be found here.

Code architecture

The 'design' of your codebase matters a lot. Certain approaches for solving a problem work better than other approaches, and each approach has its own set of benefits and drawbacks. Picking the right approach is important - it will save you hours (or days!) of time down the line, when you are maintaining your code.

I'm still in the process of writing more about this, but so far, I've already written an article that explains the difference between monolithic and modular code and why it matters. You can read it here.

Express

If you want to build a website or web application, you'll probably find Express to be a good framework to start with. As a framework, it is very small. It only provides you with the basic necessities - everything else is a plugin.

If this sounds complicated, don't worry - things almost always work "out of the box". Simply follow the README for whichever "middleware" (Express plugin) you want to add.

To get started with Express, simply follow the below articles. Whatever you do, don't use the Express Generator - it generates confusing and bloated code. Just start from scratch and follow the guides!

To get a better handle on how to render pages server-side with Express:

Some more odds and ends regarding about Express:

Some examples:

Combining Express and Promises:

Some common Express middleware that you might want to use:

Coming from other languages or platforms

Security

Note that this advice isn't necessarily complete. It answers some of the most common questions, but your project might have special requirements or caveats. When in doubt, you can always ask in the #Node.js channel!

Also, keep in mind the golden rule of security: humans suck at repetition, regardless of their level of competence. If a mistake can be made, then it will be made. Design your systems such that they are hard to use incorrectly.

Useful modules:

This is an incomplete list, and I'll probably be adding stuff to it in the future.

Deployment

Distribution

Scalability

Scalability is a result of your application architecture, not the technologies you pick. Be wary of anything that claims to be "scalable" - it's much more important to write loosely coupled code with small components, so that you can split out responsibilities across multiple processes and servers.

Troubleshooting

Is something not working properly? Here are some resources that might help:

Optimization

The first rule of optimization is: do not optimize.

The correct order of concerns is security first, then maintainability/readability, and then performance. Optimizing performance is something that you shouldn't care about, until you have hard metrics showing you that it is needed. If you can't show a performance problem in numbers, it doesn't exist; while it is easy to optimize readable code, it's much harder to make optimized core more readable.

There is one exception to this rule: never use any methods that end with Sync - these are blocking, synchronous methods, and will block your event loop (ie. your entire application) until they have completed. They may look convenient, but they are not worth the performance penalty.

Now let's say that you are having performance issues. Here are some articles and videos to learn more about how optimization and profiling works in Node.js / V8 - they are going to be fairly in-depth, so you may want to hold off on reading these until you've gotten some practice with Node.js:

If you're seeing memory leaks, then these may be helpful articles to read:

These are some modules that you may find useful for profiling your application:

Writing C++ addons

You'll usually want to avoid this - C++ is not a memory-safe language, so it's much safer to just write your code in Javascript. V8 is rather well-optimized, so in most cases, performance isn't a problem either. That said, sometimes - eg. when writing bindings to something else - you just have to write a native module.

These are some resources on that:

Writing Rust addons

Neon is a new project that lets you write memory-safe compiled extensions for Node.js, using Rust. It's still pretty new, but quite promising - an introduction can be found here.

Odds and ends

Some miscellaneous code snippets and examples, that I haven't written a section or article for yet.

Future additions to this list

There are a few things that I'm currently working on documenting, that will be added to this list in the future. I write new documentation as I find the time to do so.

Node.js

Node.js for PHP developers

This article was originally published at https://gist.github.com/joepie91/87c5b93a5facb4f99d7b2a65f08363db. It has not been finished yet, but still contains some useful pointers.

Learning a second language

If PHP was your first language, and this is the first time you're looking to learn another language, you may be tempted to try and "make it work like it worked in PHP". While understandable, this is a really bad idea. Different languages have fundamentally different designs, with different best practices, different syntax, and so on. The result of this is that different languages are also better for different usecases.

By trying to make one language work like the other, you get the worst of both worlds - you lose the benefits that made language one good for your usecase, and add the design flaws of language two. You should always aim to learn a language properly, including how it is commonly or optimally used. Your code is going to look and feel considerably different, and that's okay!

Over time, you will gain a better understanding of how different language designs carry different tradeoffs, and you'll be able to get the best of both worlds. This will take time, however, and you should always start by learning and using each language as it is first, to gain a full understanding of it.

One thing I explicitly recommend against, is CGI-Node - you should never, ever, ever use this. It makes a lot of grandiose claims, but it actually just reimplements some of the worst and most insecure parts of PHP in Node.js. It is also completely unnecessary - the sections below will go into more detail.

Execution model

The "execution model" of a language describes how your code is executed. In the case of a web-based application, it decides how your server goes from "a HTTP request is coming in", to "the application code is executed", to "a response has been sent".

PHP uses what we'll call the "CGI model" to run your code - for every HTTP request that comes in, the webserver (usually Apache or nginx) will look in your "document root" for a .php file with the same path and filename, and then execute that file. This means that for every new request, it effectively starts a new PHP process, with a "clean slate" as far as application state is concerned. Other than $_SESSION variables, all the variables in your PHP script are thrown away after a response is sent.

This "CGI model" is a somewhat unique execution model, and only a few technologies use it - PHP, ASP and ColdFusion are the most well-known. It's also a very fragile and limited model, that makes it easy to introduce security issues; for example, "uploading a shell" is something that's only possible because of the CGI model.

Node.js, however, uses a different model: the "long-running process" model. In this model, your code is not executed by a webserver - rather, your code is the webserver. Your application is only started once, and once it has started, it will be handling an essentially infinite amount of requests, potentially hundreds or thousands at the same time. Almost every other language uses this same model.

This also means that your application state continues to exist after a response has been sent, and this makes a lot of projects much easier to implement, because you don't need to constantly store every little thing in a database; instead, you only need to store things in your database that you actually intend to store for a long time.

Some of the advantages of the "long-running process" model (as compared to the "CGI model"):

The reason attackers cannot upload a shell, is that there is no direct mapping between a URL and a location on your filesystem. Your application is explicitly designed to only execute specific files that are a part of your application. When you try to access a .js file that somebody uploaded, it will just send the .js file; it won't be executed.

There aren't really any disadvantages - while you do have to have a Node.js process running at all times, it can be managed in the same way as any other webserver. You can also use another webserver in front of it; for example, if you want to host multiple domains on a single server.

Hosting

Node.js applications will not run in most shared hosting environments, as they are designed to only run PHP. While there are some 'managed hosting' environments like Heroku that claim to work similarly, they are usually rather expensive and not really worth the money.

When deploying a Node.js project in production, you will most likely want to host it on a VPS or a dedicated server. These are full-blown Linux systems that you have full control over, so you can run any application or database that you want. The cheapest option here is to go with an "unmanaged provider".

Unmanaged providers are providers whose responsibility ends at the server and the network - they make sure that the system is up and running, and from that point on it's your responsibility to manage your applications. Because they do not provide support for your projects, they are a lot cheaper than "managed providers".

My usual recommendations for unmanaged providers are (in no particular order): RamNode, Afterburst, SecureDragon, Hostigation and RAM Host. Another popular choice is DigitalOcean - but while their service is stable and sufficient for most people, I personally don't find the performance/resources/price ratio to be good enough. I've also heard good things about Linode, but I don't personally use them - they do, however, apparently provide limited support for your server management.

As explained in the previous section, your application is the webserver. However, there are some reasons you might still want to run a "generic" webserver in front of your application:

My recommendation for this is Caddy. While nginx is a popular and often-recommended option, it's considerably harder to set up than Caddy, especially for TLS.

Frameworks

(this section is a work in progress, these are just some notes left for myself)

Templating

If you've already used a templater like Smarty in PHP, here's the short version: use either Pug or Nunjucks, depending on your preference. Both auto-escape values by default, but I strongly recommend Pug - it understands the actual structure of your template, which gives you more flexibility.

If you've been using include() or require() in PHP along with inline <?php echo($foobar); ?> statements, here's the long version:

The "using-PHP-as-a-templater" approach is quite flawed - it makes it very easy to introduce security issues such as XSS by accidentally forgetting to escape something. I won't go into detail here, but suffice to say that this is a serious risk, regardless of how competent you are as a developer. Instead, you should be using a templater that auto-escapes values by default, unless you explicitly tell it not to. Pug and Nunjucks are two options in Node.js that do precisely that, and both will work with Express out of the box.

Node.js

Rendering pages server-side with Express (and Pug)

This article was originally published at https://gist.github.com/joepie91/c0069ab0e0da40cc7b54b8c2203befe1.

Terminology

Pug is an example of a HTML templater. Nunjucks is an example of a string-based templater. React could technically be considered a HTML templater, although it's not really designed to be used primarily server-side.

View engine setup

Assuming you'll be using Pug, this is simply a matter of installing Pug...

npm install --save pug

... and then configuring Express to use it:

let app = express();

app.set("view engine", "pug");

/* ... rest of the application goes here ... */

You won't need to require() Pug anywhere, Express will do this internally.

You'll likely want to explicitly set the directory where your templates will be stored, as well:

let app = express();

app.set("view engine", "pug");
app.set("views", path.join(__dirname, "views"));

/* ... rest of the application goes here ... */

This will make Express look for your templates in the "views" directory, relative to the file in which you specified the above line.

Rendering a page

homepage.pug:

html
    body
        h1 Hello World!
        p Nothing to see here.

app.js:

router.get("/", (req, res) => {
    res.render("homepage");
});

Express will automatically add an extension to the file. That means that - with our Express configuration - the "homepage" template name in the above example will point at views/homepage.pug.

Rendering a page with locals

homepage.pug:

html
    body
        h1 Hello World!
        p Hi there, #{user.username}!

app.js:

router.get("/", (req, res) => {
    res.render("homepage", {
        user: req.user
    });
});

In this example, the #{user.username} bit is an example of string interpolation. The "locals" are just an object containing values that the template can use. Since every expression in Pug is written in JavaScript, you can pass any kind of valid JS value into the locals, including functions (that you can call from the template).

For example, we could do the following as well - although there's no good reason to do this, so this is for illustratory purposes only:

homepage.pug:

html
    body
        h1 Hello World!
        p Hi there, #{getUsername()}!

app.js:

router.get("/", (req, res) => {
    res.render("homepage", {
        getUsername: function() {
            return req.user;
        }
    });
});

Using conditionals

homepage.pug:

html
    body
        h1 Hello World!

        if user != null
            p Hi there, #{user.username}!
        else
            p Hi there, unknown person!

app.js:

router.get("/", (req, res) => {
    res.render("homepage", {
        user: req.user
    });
});

Again, the expression in the conditional is just a JS expression. All defined locals are accessible and usable as before.

Using loops

homepage.pug:

html
    body
        h1 Hello World!

        if user != null
            p Hi there, #{user.username}!
        else
            p Hi there, unknown person!

        p Have some vegetables:

        ul
            for vegetable in vegetables
                li= vegetable

app.js:

router.get("/", (req, res) => {
    res.render("homepage", {
        user: req.user,
        vegetables: [
            "carrot",
            "potato",
            "beet"
        ]
    });
});

Note that this...

li= vegetable

... is just shorthand for this:

li #{vegetable}

By default, the contents of a tag are assumed to be a string, optionally with interpolation in one or more places. By suffixing the tag name with =, you indicate that the contents of that tag should be a JavaScript expression instead.

That expression may just be a variable name as well, but it doesn't have to be - any JS expression is valid. For example, this is completely okay:

li= "foo" + "bar"

And this is completely valid as well, as long as the randomVegetable method is defined in the locals:

li= randomVegetable()

Request-wide locals

Sometimes, you want to make a variable available in every res.render for a request, no matter what route or middleware the page is being rendered from. A typical example is the user object for the current user. This can be accomplished by setting it as a property on the res.locals object.

homepage.pug:

html
    body
        h1 Hello World!

        if user != null
            p Hi there, #{user.username}!
        else
            p Hi there, unknown person!

        p Have some vegetables:

        ul
            for vegetable in vegetables
                li= vegetable

app.js:

app.use((req, res, next) => {
    res.locals.user = req.user;
    next();
});

/* ... more code goes here ... */

router.get("/", (req, res) => {
    res.render("homepage", {
        vegetables: [
            "carrot",
            "potato",
            "beet"
        ]
    });
});

Application-wide locals

Sometimes, a value even needs to be application-wide - a typical example would be the site name for a self-hosted application, or other application configuration that doesn't change for each request. This works similarly to res.locals, only now you set it on app.locals.

homepage.pug:

html
    body
        h1 Hello World, this is #{siteName}!

        if user != null
            p Hi there, #{user.username}!
        else
            p Hi there, unknown person!

        p Have some vegetables:

        ul
            for vegetable in vegetables
                li= vegetable

app.js:

app.locals.siteName = "Vegetable World";

/* ... more code goes here ... */

app.use((req, res, next) => {
    res.locals.user = req.user;
    next();
});

/* ... more code goes here ... */

router.get("/", (req, res) => {
    res.render("homepage", {
        vegetables: [
            "carrot",
            "potato",
            "beet"
        ]
    });
});

The order of specificity is as follows: app.locals are overwritten by res.locals of the same name, and res.locals are overwritten by res.render locals of the same name.

In other words: if we did something like this...

router.get("/", (req, res) => {
    res.render("homepage", {
        siteName: "Totally Not Vegetable World",
        vegetables: [
            "carrot",
            "potato",
            "beet"
        ]
    });
});

... then the homepage would show "Totally Not Vegetable World" as the website name, while every other page on the site still shows "Vegetable World".

Rendering a page after asynchronous operations

homepage.pug:

html
    body
        h1 Hello World, this is #{siteName}!

        if user != null
            p Hi there, #{user.username}!
        else
            p Hi there, unknown person!

        p Have some vegetables:

        ul
            for vegetable in vegetables
                li= vegetable

app.js:

app.locals.siteName = "Vegetable World";

/* ... more code goes here ... */

app.use((req, res, next) => {
    res.locals.user = req.user;
    next();
});

/* ... more code goes here ... */

router.get("/", (req, res) => {
    return Promise.try(() => {
        return db("vegetables").limit(3);
    }).map((row) => {
        return row.name;
    }).then((vegetables) => {
        res.render("homepage", {
            vegetables: vegetables
        });
    });
});

Basically the same as when you use res.send, only now you're using res.render.

Template inheritance in Pug

It would be very impractical if you had to define the entire site layout in every individual template - not only that, but the duplication would also result in bugs over time. To solve this problem, Pug (and most other templaters) support template inheritance. An example is below.

layout.pug:

html
    body
        h1 Hello World, this is #{siteName}!

        if user != null
            p Hi there, #{user.username}!
        else
            p Hi there, unknown person!

        block content
            p This page doesn't have any content yet.

homepage.pug:

extends layout

block content
    p Have some vegetables:

    ul
        for vegetable in vegetables
            li= vegetable

app.js:

app.locals.siteName = "Vegetable World";

/* ... more code goes here ... */

app.use((req, res, next) => {
    res.locals.user = req.user;
    next();
});

/* ... more code goes here ... */

router.get("/", (req, res) => {
    return Promise.try(() => {
        return db("vegetables").limit(3);
    }).map((row) => {
        return row.name;
    }).then((vegetables) => {
        res.render("homepage", {
            vegetables: vegetables
        });
    });
});

That's basically all there is to it. You define a block in the base template - optionally with default content, as we've done here - and then each template that "extends" (inherits from) that base template can override such blocks. Note that you never render layout.pug directly - you still render the page layouts themselves, and they just inherit from the base template.

Things of note:

Static files

You'll probably also want to serve static files on your site, whether they are CSS files, images, downloads, or anything else. By default, Express ships with express.static, which does this for you.

All you need to do, is to tell Express where to look for static files. You'll usually want to put express.static at the very start of your middleware definitions, so that no time is wasted on eg. initializing sessions when a request for a static file comes in.

let app = express();

app.set("view engine", "pug");
app.set("views", path.join(__dirname, "views"));

app.use(express.static(path.join(__dirname, "public")));

/* ... rest of the application goes here ... */

Your directory structure might look like this:

your-project
|- node_modules ...
|- public
|  |- style.css
|  `- logo.png
|- views
|  |- homepage.pug
|  `- layout.pug
`- app.js

In the above example, express.static will look in the public directory for static files, relative to the app.js file. For example, if you tried to access https://your-project.com/style.css, it would send the user the contents of your-project/public/style.css.

You can optionally also specify a prefix for static files, just like for any other Express middleware:

let app = express();

app.set("view engine", "pug");
app.set("views", path.join(__dirname, "views"));

app.use("/static", express.static(path.join(__dirname, "public")));

/* ... rest of the application goes here ... */

Now, that same your-project/public/style.css can be accessed through https://your-project.com/static/style.css instead.

An example of using it in your layout.pug:

html
    head
        link(rel="stylesheet", href="/static/style.css")
    body
        h1 Hello World, this is #{siteName}!

        if user != null
            p Hi there, #{user.username}!
        else
            p Hi there, unknown person!

        block content
            p This page doesn't have any content yet.

The slash at the start of /static/style.css is important - it tells the browser to ask for it relative to the domain, as opposed to relative to the page URL.

An example of URL resolution without a leading slash:

An example of URL resolution with the loading slash:

That's it! You do the same thing to embed images, scripts, link to downloads, and so on.

Node.js

Running a Node.js application using nvm as a systemd service

This article was originally published at https://gist.github.com/joepie91/73ce30dd258296bd24af23e9c5f761aa.

Hi there! Since this post was originally written, nvm has gained some new tools, and some people have suggested alternative (and potentially better) approaches for modern systems. Make sure to have a look at the comments on the original Gistbefore following this guide!

Trickier than it seems.

1. Set up nvm

Let's assume that you've already created an unprivileged user named myapp. You should never run your Node.js applications as root!

Switch to the myapp user, and do the following:

  1. curl -o- https://raw.githubusercontent.com/creationix/nvm/v0.31.0/install.sh | bash (however, this will immediately run the nvm installer - you probably want to just download the install.sh manually, and inspect it before running it)
  2. Install the latest stable Node.js version: nvm install stable

2. Prepare your application

Your package.json must specify a start script, that describes what to execute for your application. For example:

...
"scripts": {
    "start": "node app.js"
},
...

3. Service file

Save this as /etc/systemd/system/my-application.service:

[Unit]
Description=My Application

[Service]
EnvironmentFile=-/etc/default/my-application
ExecStart=/home/myapp/start.sh
WorkingDirectory=/home/myapp/my-application-directory
LimitNOFILE=4096
IgnoreSIGPIPE=false
KillMode=process
User=myapp

[Install]
WantedBy=multi-user.target

You'll want to change the User, Description and ExecStart/WorkingDirectory paths to reflect your application setup.

4. Startup script

Next, save this as /home/myapp/start.sh (adjusting the username in both the path and the script if necessary):

#!/bin/bash
. /home/myapp/.nvm/nvm.sh
npm start

This script is necessary, because we can't load nvm via the service file directly.

Make sure to make it executable:

chmod +x /home/myapp/start.sh

5. Enable and start your service

Replace my-application with whatever you've named your service file after, running the following as root:

  1. systemctl enable my-application
  2. systemctl start my-application

To verify whether your application started successfully (don't forget to npm install your dependencies!), run:

systemctl status my-application

... which will show you the last few lines of its output, whether it's currently running, and any errors that might have occurred.

Done!

Node.js

Persistent state in Node.js

This article was originally published at https://gist.github.com/joepie91/bf0813626e6568e8633b.

This is an extremely simple example of how you have 'persistent state' when writing an application in Node.js. The i variable is shared across all requests, so every time the /increment route is accessed, the number is incremented and returned.

This may seem obvious, but it works quite differently from eg. PHP, where each HTTP request is effectively a 'clean slate', and you don't have persistent state. Were this written in PHP, then every request would have returned 1, rather than an incrementing number.

var i = 0;

// [...]

app.get("/increment", function(req, res) {
      i += 1;
      res.send("Current number: " + i);
})

// [...]
Node.js

node-gyp requirements

This article was originally published at https://gist.github.com/joepie91/375f6d9b415213cf4394b5ba3ae266ae. It may no longer be applicable.

Linux

Windows

OS X

Node.js

Introduction to sessions

This article was originally published at https://gist.github.com/joepie91/cf5fd6481a31477b12dc33af453f9a1d.

While a lot of Node.js guides recommend using JWT as an alternative to session cookies (sometimes even mistakenly calling it "more secure than cookies"), this is a terrible idea. JWTs are absolutely not a secure way to deal with user authentication/sessions, and this article goes into more detail about that.

Secure user authentication requires the use of session cookies.

Cookies are small key/value pairs that are usually sent by a server, and stored on the client (often a browser). The client then sends this key/value pair back with every request, in a HTTP header. This way, unique clients can be identified between requests, and client-side settings can be stored and used by the server.

Session cookies are cookies containing a unique session ID that is generated by the server. This session ID is used by the server to identify the client whenever it makes a request, and to associate session data with that request.

Session data is arbitrary data that is stored on the server side, and that is associated with a session ID. The client can't see or modify this data, but the server can use the session ID from a request to associate session data with that request.

Altogether, this allows for the server to store arbitrary data for a session (that the user can't see or touch!), that it can use on every subsequent request in that session. This is how a website remembers that you've logged in.

Step-by-step, the process goes something like this:

  1. Client requests login page.
  2. Server sends login page HTML.
  3. Client fills in the login form, and submits it.
  4. Server receives the data from the login form, and verifies that the username and password are correct.
  5. Server creates a new session in the database, containing the ID of the user in the database, and generates a unique session ID for it (which is not the same as the user ID!)
  6. Server sends the session ID to the user as a cookie header, alongside a "welcome" page.
  7. Client receives the session ID, and saves it locally as a cookie.
  8. Client displays the "welcome" page that the cookie came with.
  9. User clicks a link on the welcome page, navigating to his "notifications" page.
  10. Client retrieves the session cookie from storage.
  11. Client requests the notifications page, sending along the session cookie (containing the session ID).
  12. Server receives the request.
  13. Server looks at the session cookie, and extract the session ID.
  14. Server retrieves the session data from the database, for the session ID that it received.
  15. Server associates the session data (containing the user ID) with the request, and passes it on to something that handles the request.
  16. Server request handler receives the request (containing the session data including user ID), and sends a personalized notifications page for the user with that ID.
  17. Client receives the personalized notifications page, and displays it.
  18. User clicks another link, and we go back to step 10.

Configuring sessions

Thankfully, you won't have to implement all this yourself - most of it is done for you by existing session implementations. If you're using Express, that implementation would be express-session.

The express-session module doesn't implement the actual session storage itself, it only handles the Express-related bits - for example, it ensures that req.session is automatically loaded from and saved to.

For the storage of session data, you need to specify a "session store" that's specific to the database you want to use for your session data - and when using Knex, connect-session-knex is the best option for that.

While full documentation is available in the express-session repository, this is what your express-session initialization might look like when you're using a relational database like PostgreSQL (through Knex):

const express = require("express");
const knex = require("knex");
const expressSession = require("express-session");
const KnexSessionStore = require("connect-session-knex")(expressSession);

const config = require("./config.json");

/* ... other code ... */

/* You will probably already have a line that looks something like the below.
 * You won't have to create a new Knex instance for dealing with sessions - you
 * can just use the one you already have, and the Knex initialization here is
 * purely for illustrative purposes. */
let db = knex(require("./knexfile"));

let app = express();

/* ... other app initialization code ... */

app.use(expressSession({
    secret: config.sessions.secret,
    resave: false,
    saveUninitialized: false,
    store: new KnexSessionStore({
        knex: db
    })
}));

/* ... rest of the application goes here ... */

The configuration example in more detail

require("connect-session-knex")(expressSession)

The connect-session-knex module needs access to the express-session library, so instead of exporting the session store constructor directly, it exports a wrapper function. We call that wrapper function immediately after requiring the module, passing in the express-session module, and we get back a session store constructor.

app.use(expressSession({
    secret: config.sessions.secret,
    resave: false,
    saveUninitialized: false,
    store: new KnexSessionStore({
        knex: db
    })
}));

This is where we 1) create a new express-session middleware, and 2) app.use it, so that it processes every request, attaching session data where needed.

secret: config.sessions.secret,

Every application should have a "secret" for sessions - essentially a secret key that will be used to cryptographically sign the session cookie, so that the user can't tamper with it. This should be a random value, and it should be stored in a configuration file. You should not store this value (or any other secret values) in the source code directly.

On Linux and OS X, a quick way to generate a securely random key is the following command: cat /dev/urandom | env LC_CTYPE=C tr -dc _A-Za-z0-9 | head -c${1:-64}

resave: false,

When resave is set to true, express-session will always save the session data after every request, regardless of whether the session data was modified. This can cause race conditions, and therefore you usually don't want to do this, but with some session stores it's necessary as they don't let you reset the "expiry timer" without saving all the session data again.

connect-session-knex doesn't have this problem, and so you should set it to false, which is the safer option. If you intend to use a different session store, you should consult the express-session documentation for more details about this option.

saveUninitialized: false,

If the user doesn't have a session yet, a brand new req.session object is created for them on their first request. This setting determines whether that session should be saved to the database, even if no session data was stored into it. Setting it to false makes it so that the session is only saved if it's actually used for something, and that's the setting you want here.

store: new KnexSessionStore({
    knex: db
})

This tells express-session where to store the actual session data. In the case of connect-session-knex (which is where KnexSessionStore comes from), we need to pass in an existing Knex instance, which it will then use for interacting with the sessions table. Other options can be found in the connect-session-knex documentation.

Using sessions

The usage of sessions is quite simple - you simply set properties on req.session, and you can then access those properties from other requests within the same session. For example, this is what a login route might look like (assuming you're using Knex, scrypt-for-humans, and a custom AuthenticationError created with create-error):

router.post("/login", (req, res) => {
    return Promise.try(() => {
        return db("users").where({
            username: req.body.username
        });
    }).then((users) => {
        if (users.length === 0) {
            throw new AuthenticationError("No such username exists");
        } else {
            let user = users[0];

            return Promise.try(() => {
                return scryptForHumans.verifyHash(req.body.password, user.hash);
            }).then(() => {
                /* Password was correct */
                req.session.userId = user.id;
                res.redirect("/dashboard");
            }).catch(scryptForHumans.PasswordError, (err) => {
                throw new AuthenticationError("Invalid password");
            });
        }
    });
});

And your /dashboard route might look like this:

router.get("/dashboard", (req, res) => {
    return Promise.try(() => {
        if (req.session.userId == null) {
            /* User is not logged in */
            res.redirect("/login");
        } else {
            return Promise.try(() => {
                return db("users").where({
                    id: req.session.userId
                });
            }).then((users) => {
                if (users.length === 0) {
                    /* User no longer exists */
                    req.session.destroy();
                    res.redirect("/login");
                } else {
                    res.render("dashboard", {
                        user: users[0];
                    });
                }
            });
        }
    });
});

In this example, req.session.destroy() will - like the name suggests - destroy the session, essentially returning the user to a session-less state. In practice, this means they get "logged out".

Now, if you had to do all that logic for every route that requires the user to be logged in, it would get rather unwieldy. So let's move it out into some middleware:

function requireLogin(req, res, next) {
    return Promise.try(() => {
        if (req.session.userId == null) {
            /* User is not logged in */
            res.redirect("/login");
        } else {
            return Promise.try(() => {
                return db("users").where({
                    id: req.session.userId
                });
            }).then((users) => {
                if (users.length === 0) {
                    /* User no longer exists */
                    req.session.destroy();
                    res.redirect("/login");
                } else {
                    req.user = users[0];
                    next();
                }
            });
        }
    });
}

router.get("/dashboard", requireLogin, (req, res) => {
    res.render("dashboard", {
        user: req.user
    });
});

Note the following:

Node.js

Secure random values

This article was originally published at https://gist.github.com/joepie91/7105003c3b26e65efcea63f3db82dfba.

Not all random values are created equal - for security-related code, you need a specific kind of random value.

A summary of this article, if you don't want to read the entire thing:

You should seriously consider reading the entire article, though - it's not that long :)

Types of "random"

There exist roughly three types of "random":

Irregular data is fast to generate, but utterly worthless for security purposes - even if it doesn't seem like there's a pattern, there is almost always a way for an attacker to predict what the values are going to be. The only realistic usecase for irregular data is things that are represented visually, such as game elements or randomly generated phrases on a joke site.

Unpredictable data is a bit slower to generate, but still fast enough for most cases, and it's sufficiently hard to guess that it will be attacker-resistant. Unpredictable data is provided by what's called a CSPRNG.

Types of RNGs (Random Number Generators)

Every random value that you need for security-related purposes (ie. anything where there exists the possibility of an "attacker"), should be generated using a CSPRNG. This includes verification tokens, reset tokens, lottery numbers, API keys, generated passwords, encryption keys, and so on, and so on.

Bias

In Node.js, the most widely available CSPRNG is the crypto.randomBytes function, but you shouldn't use this directly, as it's easy to mess up and "bias" your random values - that is, making it more likely that a specific value or set of values is picked.

A common example of this mistake is using the % modulo operator when you have less than 256 possibilities (since a single byte has 256 possible values). Doing so actually makes lower values more likely to be picked than higher values.

For example, let's say that you have 36 possible random values - 0-9 plus every lowercase letter in a-z. A naive implementation might look something like this:

let randomCharacter = randomByte % 36;

That code is broken and insecure. With the code above, you essentially create the following ranges (all inclusive):

If you look at the above list of ranges you'll notice that while there are 7 possible values for each randomCharacter between 4 and 35 (inclusive), there are 8 possible values for each randomCharacter between 0 and 3 (inclusive). This means that while there's a 2.64% chance of getting a value between 4 and 35 (inclusive), there's a 3.02% chance of getting a value between 0 and 3 (inclusive).

This kind of difference may look small, but it's an easy and effective way for an attacker to reduce the amount of guesses they need when bruteforcing something. And this is only one way in which you can make your random values insecure, despite them originally coming from a secure random source.

So, how do I obtain random values securely?

In Node.js:

Both of these use a CSPRNG, and 'transform' the bytes in an unbiased (ie. secure) way.

In the browser:

However, it is strongly recommended that you use a bundler, in general.

Node.js

Checking file existence asynchronously

This article was originally published at https://gist.github.com/joepie91/bbf495e044da043de2ba.

Checking whether a file exists before doing something with it, can lead to race conditions in your application. Race conditions are extremely hard to debug and, depending on where they occur, they can lead to data loss or security holes. Using the synchronous versions will not fix this.

Generally, just do what you want to do, and handle the error if it doesn't work. This is much safer.

If you're really, really sure that you need to use fs.exists or fs.stat, then you can use the example code below to do so asynchronously. If you just want to know how to promisify an asynchronous callback that doesn't follow the nodeback convention, then you can look at the example below as well.

You should almost never actually use the code below. The same applies to fs.stat (when used for checking existence). Make sure you have read the text above first!

const fs = require("fs");
const Promise = require("bluebird");

function existsAsync(path) {
  return new Promise(function(resolve, reject){
    fs.exists(path, function(exists){
      resolve(exists);
    })
  })
}
Node.js

Fixing "Buffer without new" deprecation warnings

This article was originally published at https://gist.github.com/joepie91/a0848a06b4733d8c95c95236d16765aa. Newer Node.js versions no longer behave in this exact way, but the information is kept here for posterity. If you have code that still uses new Buffer, you should still update it.

If you're using Node.js, you might run into a warning like this:

DeprecationWarning: Using Buffer without `new` will soon stop working.

The reason for this warning is that the Buffer creation API was changed to require the use of new. However, contrary to what the warning says, you should not use new Buffer either, for security reasons. Any usage of it must be converted as soon as possible to Buffer.from, Buffer.alloc, or Buffer.allocUnsafe, depending on what it's being used for. Not changing it could mean a security vulnerability in your code.

Where is it coming from?

Unfortunately, the warning doesn't indicate where the issue comes from. If you've verified that your own code doesn't use Buffer without new anymore, but you're still getting the warning, then you are probably using an (outdated) dependency that still uses the old API.

The following command (for Linux and Cygwin) will list all the affected modules:

grep -rP '(?<!new |[a-zA-Z])Buffer\(' node_modules | grep "\.js" | grep -Eo '^(node_modules/[^/:]+/)*' | sort | uniq -c | sort -h

If you're on OS X, your sort tool will not have the -h flag. Therefore, you'll want to run this instead (but the result won't be sorted by frequency):

grep -rP '(?<!new |[a-zA-Z])Buffer\(' node_modules | grep "\.js" | grep -Eo '^(node_modules/[^/:]+/)*' | sort | uniq -c | sort

How do I fix it?

If the issue is in your own code, this documentation will explain how to migrate. If you're targeting older Node.js versions, you may want to use the safe-buffer shim to maintain compatibility.

If the issue is in a third-party library:

  1. Run npm ls <package name here> to determine where in your dependency tree it is installed, and look at the top-most dependency (that isn't your project itself) that it originates from.
  2. If that top-most dependency is out of date, try updating the dependency first, to see if the warning goes away.
  3. If the dependency is up-to-date, that means it's an unfixed issue in the dependency. You should create an issue ticket (or, even better, a pull request) on the dependency's repository, asking for it to be fixed.
Node.js

Why you shouldn't use Sails.js

This article was originally published at https://gist.github.com/joepie91/cc8b0c9723cc2164660e.

This article was published in 2015. Since then, the situation may have changed, and this article is kept for posterity. You should verify whether the issues still apply when making a decision

A large list of reasons why to avoid Sails.js and Waterline: https://kev.inburke.com/kevin/dont-use-sails-or-waterline/

Furthermore, the CEO of Balderdash, the company behind Sails.js, stated the following:

"we promise to push a fix within 60 days",

@kevinburkeshyp This would amount to a Service Level Agreement with the entire world; this is generally not possible, and does not exist in any software project that I know of.

Upon notifying him in the thread that I actually offer exactly that guarantee, and that his statement was thus incorrect, he accused me of "starting a flamewar", and proceeded to delete my posts.

UPDATE: The issue has been reopened by the founder of Balderdash. Mind that this article was written back when this was not the case yet, and judge appropriately.

He is apparently also unaware that Google Project Zero expects the exact same - a hard deadline of 90 days, after which an issue is publicly disclosed.

Now, just locking the thread would have been at least somewhat justifiable - he might have legitimately misconstrued my statement as inciting a flamewar.

What is not excusable, however, is removing my posts that show his (negligent) statement is wrong. This raises serious questions about what the Sails maintainers consider more important: their reputation, or the actual security of their users.

It would have been perfectly possible to just leave the posts intact - the thread would be locked, so a flamewar would not have been a possibility, and each reader could make up their own mind about the state of things.

In short: Avoid Sails.js. They do not have your best interests at heart, and this could result in serious security issues for your project.

For reference, the full thread is below, pre-deletion.

image.png

Node.js

Building desktop applications with Node.js

Option 1: Electron

This is the most popular and well-supported option. Electron is a combination of Node.js and Chromium Embedded Framework, and so it will give you access to the feature sets of both. The main tradeoff is that it doesn't give you much direct control over the window or the system integration.

Benefits

Drawbacks

Option 2: SDL

Using https://www.npmjs.com/package/@kmamal/sdl and https://www.npmjs.com/package/@kmamal/gl, you can use SDL and OpenGL directly from Node.js. This will take care of window creation, input handling, and so on - but you will have to do all the drawing yourself using shaders.

A full (low-level) example is available here, and you can also use regl to simplify things a bit.

For text rendering, you may wish to use Pango or Harfbuzz, which can both be used through the node-gtk library (which, despite the name, is a generic GObject Introspection library rather than anything specific to the GTK UI toolkit).

Benefits

Drawbacks

Option 3: FFI bindings

You can also use an existing UI library that's written in C, C++ or Rust, by using a generic FFI library that lets you call the necessary functions from Javascript code in Node.js directly.

For C, a good option is Koffi, which has excellent documentation. For Rust, a good option is Neon, whose documentation is not quite as extensive as that of Koffi, but still pretty okay.

Option 4: GTK

The aforementioned node-gtk library can also be used to use GTK directly. Very little documentation is available about this, so you'll likely be stuck reading the GTK documentation (for its C API) and mentally translating to what the equivalent in the bindings would be.

NixOS

NixOS

Setting up Bookstack

Turned out to be pretty simple.

deployment.secrets.bookstack-app-key = {
	source = "../private/bookstack/app-key";
	destination = "/var/lib/bookstack/app-key";
	owner = { user = "bookstack"; group = "bookstack"; };
	permissions = "0700";
};

services.bookstack = {
	enable = true;
	hostname = "wiki.slightly.tech";
	maxUploadSize = "10G";
	appKeyFile = "/var/lib/bookstack/app-key";
	nginx = { enableACME = true; forceSSL = true; };
	database = { createLocally = true; };
};

Server was running an old version of NixOS, 23.05, where MySQL doesn't work in a VPS (anymore). Upgraded the whole thing to 24.11 and then it Just Worked.

Afterwards, run:

bookstack bookstack:create-admin

... in a terminal on the server to set up the primary administrator account. Done.

NixOS

A *complete* listing of operators in Nix, and their predence.

This article was originally published at https://gist.github.com/joepie91/c3c047f3406aea9ec65eebce2ffd449d.

The information in this article has since been absorbed into the official Nix manual. It is kept here for posterity. It may be outdated by the time you read this.

Lower precedence means a stronger binding; ie. this list is sorted from strongest to weakest binding, and in the case of equal precedence between two operators, the associativity decides the binding.

Prec Abbreviation Example Assoc Description
1 SELECT e . attrpath [or def] none Select attribute denoted by the attribute path attrpath from set e. (An attribute path is a dot-separated list of attribute names.) If the attribute doesn’t exist, return default if provided, otherwise abort evaluation.
2 APP e1 e2 left Call function e1 with argument e2.
3 NEG -e none Numeric negation.
4 HAS_ATTR e ? attrpath none Test whether set e contains the attribute denoted by attrpath; return true or false.
5 CONCAT e1 ++ e2 right List concatenation.
6 MUL e1 * e2 left Numeric multiplication.
6 DIV e1 / e2 left Numeric division.
7 ADD e1 + e2 left Numeric addition, or string concatenation.
7 SUB e1 - e2 left Numeric subtraction.
8 NOT !e left Boolean negation.
9 UPDATE e1 // e2 right Return a set consisting of the attributes in e1 and e2 (with the latter taking precedence over the former in case of equally named attributes).
10 LT e1 < e2 left Less than.
10 LTE e1 <= e2 left Less than or equal.
10 GT e1 > e2 left Greater than.
10 GTE e1 >= e2 left Greater than or equal.
11 EQ e1 == e2 none Equality.
11 NEQ e1 != e2 none Inequality.
12 AND e1 && e2 left Logical AND.
13 OR e1 || e2 left Logical OR.
14 IMPL e1 -> e2 none Logical implication (equivalent to !e1 || e2).
NixOS

Setting up Hydra

This article was originally published at https://gist.github.com/joepie91/c26f01a787af87a96f967219234a8723 in 2017. The NixOS ecosystem constantly changes, and it may not be relevant anymore by the time you read this article.

Just some notes from my attempt at setting up Hydra.

Setting up on NixOS

No need for manual database creation and all that; just ensure that your PostgreSQL service is running (services.postgresql.enable = true;), and then enable the Hydra service (services.hydra.enable). The Hydra service will need a few more options to be set up, below is my configuration for it:

    services.hydra = {
        enable = true;
        port = 3333;
        hydraURL = "http://localhost:3333/";
        notificationSender = "hydra@cryto.net";
        useSubstitutes = true;
        minimumDiskFree = 20;
        minimumDiskFreeEvaluator = 20;
    };

Database and user creation and all that will happen automatically. You'll only need to run hydra-init and then hydra-create-user to create the first user. Note that you may need to run these scripts as root if you get permission or filesystem errors.

Can't run hydra-* utility scripts / access the web interface due to database errors

If you already have a services.postgresql.authentication configuration line from elsewhere (either another service, or your own configuration.nix), it may be conflicting with the one specified in the Hydra service. There's an open issue about it here.

Can't login

After running hydra-create-user in your shell, you may be running into the following error in the web interface: "Bad username or password."

When this occurs, it's likely because the hydra-* utility scripts stored your data in a local SQLite database, rather than the PostgreSQL database you configured. As far as I can tell, this happens because of some missing HYDRA_* environment variables that are set through /etc/profile, which is only applied on your next login. Simply opening a new shell is not enough.

As a workaround until your next login/boot, you can run the following to obtain the command you need to run to apply the new environment variables in your current shell:

cat /etc/profile | grep set-environment

... and then run the resulting command (including the dot at the start, if there is one!) in the shell you intend to run the hydra-* scripts in. If you intend to run them as root, make sure you run the set-environment script in the root shell - using sudo will make the environment variables get lost, so you'll be stuck with the same issue as before.

NixOS

Fixing root filesystem errors with fsck on NixOS

If you run into an error like this:

An error occurred in stage 1 of the boot process, which must mount the root filesystem on `/mnt-root` and then start stage 2. Press one of the following keys:

r) to reboot immediately
*) to ignore the error and continue

Then you can fix it like this:

  1. Boot into a live CD/DVD for NixOS, or some other environment that has fsck installed, but not your installed copy of NixOS (as that will mount the root filesystem) (source)
  2. Run fsck -yf /dev/sda1 where you replace /dev/sda1 with your root filesystem. (source)
    • If you're on a (KVM) VPS, it'll probably be /dev/vda1. If you're using LVM (even on a VPS), then you need to specify your logical volume instead (eg. /dev/vg_main/lv_root, but it depends on what you've named it).

The above command will automatically agree to whatever suggestion fsck makes. This can technically lead to data loss!

Many distributions will give you an option to drop down into a shell from the error directly, but NixOS does not do that. In theory you could add the boot.shell_on_fail flag to the boot options for your existing installation, but for reasons that I didn't bother debugging any further, the installed fsck was unable to fix the issues.

NixOS

Stepping through builder steps in your custom packages

This article was originally published at https://gist.github.com/joepie91/b0041188c043259e6e1059d026eff301

  1. Create a temporary building folder in your repository (or elsewhere) and enter it: mkdir test && cd test
  2. nix-shell ../main.nix -A packagename (assuming the entry point for your custom repository is main.nix in the parent directory)
  3. Run the phases individually by entering their name (for a default phase) or doing something like eval "$buildPhase" (for an overridden phase) in the Nix shell - a summary of the common ones: unpackPhase, patchPhase, configurePhase, buildPhase, checkPhase, installPhase, fixupPhase, distPhase

More information about these phases can be found here. If you use a different builder, you may have a different set of phases.

Don't forget to clear out your test folder after every attempt!

NixOS

Using dependencies in your build phases

This article was originally published at https://gist.github.com/joepie91/b0041188c043259e6e1059d026eff301.

You can just use string interpolation to add a dependency path to your script. For example:

{
  # ...
  preBuildPhase = ''
    ${grunt-cli}/bin/grunt prepare
  '';
  # ...
}
NixOS

Source roots that need to be renamed before they can be used

This article was originally published at https://gist.github.com/joepie91/b0041188c043259e6e1059d026eff301

Some applications (such as Brackets) are very picky about the directory name(s) of your unpacked source(s). In this case, you might need to rename one or more source roots before cding into them.

To accomplish this, do something like the following:

{
  # ...
  sourceRoot = ".";
  
  postUnpack = ''
    mv brackets-release-${version} brackets
    mv brackets-shell-${shellBranch} brackets-shell
    cd brackets-shell;
  '';
  # ...
}

This keeps Nix from trying to move into the source directories immediately, by explicitly pointing it at the current (ie. top-most) directory of the environment.

NixOS

Error: `error: cannot coerce a function to a string`

This article was originally published at https://gist.github.com/joepie91/b0041188c043259e6e1059d026eff301

Probably caused by a syntax ambiguity when invoking functions within a list. For example, the following will throw this error:

{
  # ...
  srcs = [
    fetchurl {
      url = "https://github.com/adobe/brackets-shell/archive/${shellBranch}.tar.gz";
      sha256 = shellHash;
    }
    fetchurl {
      url = "https://github.com/adobe/brackets/archive/release-${version}.tar.gz";
      sha256 = "00yc81p30yamr86pliwd465ag1lnbx8j01h7a0a63i7hsq4vvvvg";
    }
  ];
  # ...
}

This can be solved by adding parentheses around the invocations:

{
  # ...
  srcs = [
    (fetchurl {
      url = "https://github.com/adobe/brackets-shell/archive/${shellBranch}.tar.gz";
      sha256 = shellHash;
    })
    (fetchurl {
      url = "https://github.com/adobe/brackets/archive/release-${version}.tar.gz";
      sha256 = "00yc81p30yamr86pliwd465ag1lnbx8j01h7a0a63i7hsq4vvvvg";
    })
  ];
  # ...
}
NixOS

`buildInputs` vs. `nativeBuildInputs`?

This article was originally published at https://gist.github.com/joepie91/b0041188c043259e6e1059d026eff301

More can be found here.

The difference only really matters when cross-building - when building for your own system, both sets of dependencies will be exposed as nativeBuildInputs.

NixOS

QMake ignores my `PREFIX`/`INSTALL_PREFIX`/etc. variables!

This article was originally published at https://gist.github.com/joepie91/b0041188c043259e6e1059d026eff301

QMake does not have a standardized configuration variable for installation prefixes - PREFIX and INSTALL_PREFIX only work if the project files for the software you're building specify it explicitly.

If the project files have a hardcoded path, there's still a workaround to install it in $out anyway, without source code or project file patches:

{
  # ...
  preInstall = "export INSTALL_ROOT=$out";
  # ...
}

This INSTALL_ROOT environment variable will be picked up and used by make install, regardless of the paths specified by QMake.

NixOS

Useful tools for working with NixOS

This article was originally published at https://gist.github.com/joepie91/67316a114a860d4ac6a9480a6e1d9c5c. Some links have been removed, as they no longer exist, or are no longer updated.

Online things

Development tooling

(Reference) documentation

Tutorials and examples

Community and support

Miscellaneous notes and troubleshooting

NixOS

Proprietary AMD drivers (fglrx) causing fatal error in i387.h

This article was originally published at https://gist.github.com/joepie91/ce9267788fdcb37f5941be5a04fcdd0f. It should no longer be applicable, but is preserved here in case a similar issue reoccurs in the future.

If you get this error:

/tmp/nix-build-ati-drivers-15.7-4.4.18.drv-0/common/lib/modules/fglrx/build_mod/2.6.x/firegl_public.c:194:22: fatal error: asm/i387.h: No such file or directory

... it's because the drivers are not compatible with your current kernel version. I've worked around it by adding this to my configuration.nix, to switch to a 4.1 kernel:

{
  # ...
  boot.kernelPackages = plgs.linuxPackages_4_1;
  # ...
}
NixOS

Installing a few packages from `master`

This article was originally published at https://gist.github.com/joepie91/ce9267788fdcb37f5941be5a04fcdd0f.

You probably want to install from unstable instead of master, and you probably want to do it differently than described here (eg. importing from URL or specifying it as a Flake). This documentation is kept here for posterity, as it is still helpful to understand how to import a local copy of a nixpkgs into your configuration.

  1. git clone https://github.com/NixOS/nixpkgs.git /etc/nixos/nixpkgs-master
  2. Edit your /etc/nixos/configuration.nix like this:
{ config, pkgs, ... }:

let
  nixpkgsMaster = import ./nixpkgs-master {};
  
  stablePackages = with pkgs; [
    # This is where your packages from stable nixpkgs go
  ];
  
  masterPackages = with nixpkgsMaster; [
    # This is where your packages from `master` go
    nodejs-6_x
  ];
in {
  # This is where your normal config goes, we've just added a `let` block
  
  environment = {
    # ...
    
    systemPackages = stablePackages ++ masterPackages;
  };
  
  # ...
}
NixOS

GRUB2 on UEFI

This article was originally published at https://gist.github.com/joepie91/ce9267788fdcb37f5941be5a04fcdd0f.

These instructions are most likely outdated. They are kept here for posterity.

This works fine. You need your boot section configured like this:

{
  # ...
  boot = {
    loader = {
      gummiboot.enable = false;
      
      efi = {
        canTouchEfiVariables = true;
      };
      
      grub = {
        enable = true;
        device = "nodev";
        version = 2;
        efiSupport = true;
      };
    };
  };
  # ...
}
NixOS

Unblock ports in the firewall on NixOS

This article was originally published at https://gist.github.com/joepie91/ce9267788fdcb37f5941be5a04fcdd0f

The firewall is enabled by default. This is how you open a port:

{
  # ...
  networking = {
    # ...
    
    firewall = {
      allowedTCPPorts = [ 24800 ];
    };
  };
  # ...
}
NixOS

Guake doesn't start because of a GConf issue

This article was originally published at https://gist.github.com/joepie91/ce9267788fdcb37f5941be5a04fcdd0f. It may or may not still be relevant.

From nixpkgs: GNOME's GConf implements a system-wide registry (like on Windows) that applications can use to store and retrieve internal configuration data. That concept is inherently impure, and it's very hard to support on NixOS.

  1. Follow the instructions here.
  2. Run the following to set up the GConf schema for Guake: gconftool-2 --install-schema-file $(readlink $(which guake) | grep -Eo '\/nix\/store\/[^\/]+\/')"share/gconf/schemas/guake.schemas". This will not work if you have changed your Nix store path - in that case, modify the command accordingly.

You may need to re-login to make the changes apply.

NixOS

FFMpeg support in youtube-dl

This article was originally published at https://gist.github.com/joepie91/ce9267788fdcb37f5941be5a04fcdd0f. It may no longer be necessary.

Based on this post:

{
  # ...
  stablePackages = with pkgs; [
    # ...
    (python35Packages.youtube-dl.override {
      ffmpeg = ffmpeg-full;
    })
    # ...
  ];
  # ...
}

(To understand what stablePackages is here, see this entry.)

NixOS

An incomplete rant about the state of the documentation for NixOS

This article was originally published at https://gist.github.com/joepie91/5232c8f1e75a8f54367e5dfcfd573726

Historical note: I wrote this rant in 2017, originally intended to be posted on the NixOS forums. This never ended up happening, as discussing the (then private) draft already started driving changes to the documentation approach. The documentation has improved since this was written, however some issues remain to this day at the time of writing this remark, in 2024. The rant ends abruptly, because I never ended up finishing it - but it still contains a lot of useful points regarding documentation quality, and so I am preserving it here.

I've now been using NixOS on my main system for a few months, and while I appreciate the technical benefits a lot, I'm constantly running into walls concerning documentation and general problem-solving. After discussing this briefly on IRC in the past, I've decided to post a rant / essay / whatever-you-want-to-call-it here.

An upfront note

My frustration about these issues has built up considerably over the past few months, moreso because I know that from a technical perspective it all makes a lot of sense, and there's a lot of potential behind NixOS. However, I've found it pretty much impenetrable on a getting-stuff-done level, because the documentation on many things is either poor or non-existent.

While my goal here is to get things fixed rather than just complaining about them, that frustration might occasionally shine through, and so I might come across as a bit harsh. This is not my intention, and there's no ill will towards any of the maintainers or users. I just want to address the issues head-on, and get them fixed effectively.

To address any "just send in a PR" comments ahead of time: while I do know how to write good documentation (and I do so on a regular basis), I still don't understand much of how NixOS and nixpkgs are structured, exactly because the documentation is so poorly accessible. I couldn't fix the documentation myself if I wanted to, simply because I don't have the understanding required to do so, and I'm finding it very hard to obtain that understanding.

One last remark: throughout the rant, I'll be posing a number of questions. These are not necessarily all questions that I still have, as I've found the answer to several of them after hours of research - they just serve to illustrate the interpretation of the documentation from the point of view of a beginner, so there's no need to try and answer them in this thread. These are just the type of questions that should be anticipated and answered in the documentation.

Types of documentation

Roughly speaking, there are three types of documentation for anything programming-related:

  1. Reference documentation
  2. Conceptual documentation
  3. Tutorials

In the sections below, "tooling" will refer to any kind of to-be-documented thing - a function, an API call, a command-line tool, and so on.

Reference documentation

Reference documentation is intended for readers who are already familiar with the tooling that is being documented. It typically follows a rigorous format, and defines things such as function names, arguments, return values, error conditions, and so on. Reference documentation is generally considered the "single source of truth" - whatever behaviour is specified there, is what the tooling should actually do.

Some examples of reference documentation:

Reference documentation generally assumes all of the following:

Conceptual documentation

Conceptual documentation is intended for readers who do not yet understand the tooling, but are already familiar with the environment (language, shell, etc.) in which it's used.

Some examples of conceptual documentation:

Good conceptual documentation doesn't make any assumptions about the background of the reader or what other tooling they might already know about, and explicitly indicates any prior knowledge that's required to understand the documentation - preferably including a link to documentation about those "dependency topics".

Tutorials

Tutorials can be intended for two different groups of readers:

  1. Readers who don't yet understand the environment (eg. "Introduction to Bash syntax")
  2. Readers who don't want to understand the environment (eg. "How to build a full-stack web application")

While I would consider tutorials pandering to the second category actively harmful, they're a thing that exists nevertheless.

Some examples of tutorials:

Tutorials don't make any assumptions about the background of the reader... but they have to be read from start to end. Starting in the middle of a tutorial is not likely to be useful, as tutorials are more designed to "hand-hold" the reader through the process (without necessarily understanding why things work how they work).

The current state of the Nix(OS) documentation

Unfortunately, the NixOS documentation is currently lacking in all three areas.

The official Nix, NixOS and nixpkgs manuals attempt to be all three types of documentation - tutorials (like this one), conceptual documentation (like this), and reference documentation (like this). The wiki sort of tries to be conceptual documentation (like here), and does so a little better than the manual, but... the wiki is being shut down, and it's still far from complete.

The most lacking aspect of the NixOS documentation is currently the conceptual documentation. What is a "derivation"? Why does it exist? How does it relate to what I, as a user, want to do? How is the Nix store structured, and what guarantees does this give me? What is the difference between /etc/nixos/configuration.nix and ~/.nixpkgs/config.nix, and can they be used interchangeably? Is nixpkgs just a set of packages, or does it also include tooling? Which tooling is provided by Nix the package manager, which is provided by NixOS, and which is provided by nixpkgs? Is this different on non-NixOS, and why?

Most of the official documentation - including the wiki - is structured more like a very extensive tutorial. You're told, step by step, what to do... but not why any of it matters, what it's for, or how to use these techniques in different situations. This wiki section is a good example. What does overrideDerivation actually do? What's the difference with override? What's the difference between 'attributes' and 'arguments'? Why is there a random link about the Oracle JDK there? Is the src completely overridden, or just the attributes that are specified there? What if I want to reevaluate all the other attributes based on the changes that I've made - for example, regenerating the name attribute based on a changed version attribute? Are any of these tools useful in other scenarios that aren't directly addressed here?

The "Nix pills" sort of try to address this lack of conceptual information, and are quite informational, but they have their problems too. They are not clearly structured (where's the index of all the articles?), the text formatting can be hard to read, and it is still half of a tutorial - it can be hard to understand later pills without having read earlier ones, because they're not fully self-contained. On top of that, they're third-party documentation and not part of the official documentation.

The official manuals have a number of formatting/structural issues as well. The single-page format is frankly horrible for navigating through - finding anything on the page is difficult, and following links to other things gets messy fast. Because it's all a single page, every tab has the exact same title, it's easy to scroll past the section you were reading, and so on. Half the point of the web is to have hyperlinked content across multiple documents, but the manuals completely forgo that and create a really poor user experience. It's awful for search engines too, because no matter what you search for, you always end up on the exact same page.

Another problem is the fact that I have to say "manuals" - there are multiple manuals, and the distinction between them is not at all clear. Because it's unclear what functionality is provided by what part of the stack, it usually becomes a hunt of going through all three manuals ctrl+F'ing for some keywords, and hoping that you will run into the thing you're looking for. Then once you (hopefully) do, you have to be careful not to accidentally scroll away from it and lose your reference. There's really no good reason for this separation; it just makes it harder to cross-reference between different parts of the stack, and most users will be using all of them anyway.

The manual, as it is, is not a viable format. While I understand that the wiki had issues with outdated information, it's still a far better structure than a set of single-page manuals. I'll go into more detail at the end of this rant, but my proposed solution here would be to follow a wiki-like format for the official documentation.

Missing documentation

Aside from the issues with the documentation format, there are also plenty of issues with its content. Many things are fully undocumented, especially where nixpkgs is concerned. For example, nothing says that I should be using callPackage_i686 to package something with 32-bits dependencies. Or how to package something that requires the user to manually add a source file from their filesystem using nix-prefetch-url, or using nix-store --add-fixed. And what's the difference between those two anyway? And why is there a separate qt5.callPackage, and when do I need it?

There are a ton of situations where you need oddball solutions to get something packaged. In fact, I would argue that this is the majority of cases - most of the easy pickings have been packaged by now, and the tricky ones are left. But as a new user that just wants to get an application working, I end up spending several hours on each of the above questions, and I'm still not convinced that I have the right answer. Had somebody taken 10 minutes to document this, even if just as a rough note, it would have saved me hours of work.

No clear path to solutions

When faced with a given packaging problem, it's not at all obvious how to get tp the solution. There's no obvious process for fixing or debugging issues, and error messages are often cryptic or poorly formatted. What does "cannot coerce a set to a string" mean, and why is it happening? How can I duct-tape-debug something by adding a print statement of some variety? Is there an interactive debugger of some sort?

It's very difficult to learn enough about NixOS internals to figure out what the right way is to package any given thing, and because there's no good feedback on what's wrong either, it's too hard to get anything packaged that isn't a standard autotools build. There's no "Frequently Asked Questions" or "Common Packaging Problems" section, nor have I found any useful tooling for analyzing packaging problems in more detail. I've had to write some of this tooling myself!

The documentation should anticipate the common problems that new users run into, and give them some hints on where to start looking. It currently completely fails to do so, and assumes that the users will figure out the relation between things themselves.

Reading code

Because of the above issues, often the only solution is to read the code of existing packages, and try to infer from their expressions how to approach certain problems - but that comes with its own set of problems. There does not appear to be a consistent way of solving packaging problems in NixOS, and almost every package seems to have invented its own way of solving the same problems that other packages have already solved. After several hours of research, it often turns out that half the solutions are either outdated or just wrong. And then I still have no idea what the optimal solution is, out of the remaining options.

This is made worse by the serious lack of comments in nixpkgs. Barely any packages have comments at all, and frequently there are complex multi-level abstractions in place to solve certain problems, but with absolutely no information to explain why those abstractions exist. They're not exactly self-evident either. Then there are the packages that do have comments, but they're aimed at the user rather than the packager - one such example is the Guake package. Essentially, it seems the repository is absolutely full of hacks with no standardized way of solving problems, no doubt helped by the fact that existing solutions simply aren't documented.

This is a tremendous waste of time for everybody involved, and makes it very hard to package anything unusual, often to the point of just giving up and hacking around the issue in an impure way. Right now we have what seems like a significant amount of people doing the same work over and over and over again, resulting in different implementations every time. If people took the time to document their solutions, this problem would pretty much instantly go away. From a technical point of view, there's absolutely no reason for packaging to be this hard to do.

Tooling

On top of all this, the tooling seems to change constantly - abstractions get deprecated, added, renamed, moved, and so on. Many of the stdenv abstractions aren't documented, or their documentation is incomplete. There's no clear way to determine which tooling is still in use, and which tooling has been deprecated.

The tooling that is in use - in particular the command-line tooling - is often poorly designed from a usability perspective. Different tools using different flags for the same purpose, behaving differently in different scenarios for no obvious reason. There's a UX proposal that seems to fix many of these problems, but it seems to be more or less dead, and its existence is not widely known.

Rust

Rust

Futures and Tokio

This article was originally published at https://gist.github.com/joepie91/bc2d29fab43b63d16f59e1bd20fd7b6e. It may be out of date.

Event loops

If you're not familiar with the concept of an 'event loop' yet, watch this video first. While this video is about the event loop in JavaScript, most of the concepts apply to event loops in general, and watching it will help you understand Tokio and Futures better as well.

Concepts

Databases and data management

Databases and data management

Database characteristics

This article was originally published at https://gist.github.com/joepie91/f9df0b96c600b4fb3946e68a3a3344af.

NOTE: This is simplified. However, it's a useful high-level model for determining what kind of database you need for your project.

Data models

Consistency models

Schemafulness

Maths and computer science

Articles and notes that are more about the conceptual side of maths and computer science, rather than anything specific to a particular programming language.

Maths and computer science

Prefix codes (explained simply)

This article was originally published at https://gist.github.com/joepie91/26579e2f73ad903144dd5d75e2f03d83.

A "prefix code" is a type of encoding mechanism ("code"). For something to be a prefix code, the entire set of possible encoded values ("codewords") must not contain any values that start with any other value in the set.

For example: [3, 11, 22] is a prefix code, because none of the values start with ("have a prefix of") any of the other values. However, [1, 12, 33] is not a prefix code, because one of the values (12) starts with another of the values (1).

Prefix codes are useful because, if you have a complete and accurate sequence of values, you can pick out each value without needing to know where one value starts and ends.

For example, let's say we have the following codewords: [1, 2, 33, 34, 50, 61]. And let's say that the sequence of numbers we've received looks like this:

1611333425012

We can simply start from the left, until we have the first value:

1 611333425012

It couldn't have been any value other than 1, because by definition of what a prefix code is, if we have a 1 codeword, none of the other codewords can start with a 1.

Next, we just do the same thing again, with the numbers that are left:

1 61 1333425012

Again, it could only have been 61 - because in a prefix code, none of the other codewords would have been allowed to start with 61.

Let's try it again for the next number:

1 61 1 333425012

Same story, it could only have been a 1. And again:

1 61 1 33 3425012

Remember, our set of possible codewords is [1, 2, 33, 34, 50, 61].

In this case, it could only have been a 33, because again, nothing else in the set of codewords was allowed to start with 33. It couldn't have been 34 either - even though it also starts with a 3 (like 33 does), the lack of a 4 as the second digit excludes it as an option.

You can simply keep repeating this until there are no numbers left:

1 61 1 33 34 2 50 1 2

... and now we've 'decoded' the sequence of numbers, even though the sequence didn't contain any information on where one number starts and the next number ends.

Note how the fact that both 33 and 34 start with a 3 didn't matter; shared prefixes are fine, so long as one value isn't in its entirety used as a prefix of another value. So while [33, 34] is fine (it only shares the 3, neither of the numbers in its entirety is a prefix of the other), [33, 334] would not be fine, since 33 is a prefix of 334 in its entirety (33 followed by 4).

This only works if you can be certain that you got the entire message accurately, though; for example, consider the following sequence of numbers:

11333425012

(Note how this is just the last part of 16 11333425012 )

Now, let's look at the first number - it starts with a 1. However, we don't know what came before, so is it part of a 61, or is it just a single, independent 1? There's no way to know for sure, so we can't split up this message.

It doesn't work if you violate the "can't start with another value" rule, either; for example, let's say that our codewords are [1, 3, 12, 23], and we want to decode the following sequence:

12323

Let's start with the first number. It starts with a 1, so it could be either 1 or 12. We have no way to know! In this particular example we can't figure it out from the numbers after it, either, as there are two different ways to decode this sequence:

1 23 23

12 3 23

And that's why a prefix code is useful, if you want to distinguish values in a sequence that doesn't have explicit 'markers' of where a value starts and ends.

Hardware

Associated notes about hardware hacking, maintenance, etc.

Hardware

Cleaning sticky soda spills in a mechanical keyboard without disassembly

Follow these instructions at your own risk. This is an experimental approach that may or may not cause long-term damage to your keyboard.

This approach was only tested using Kailh Choc switches on a Glove80. It may not work with other switch designs.

A Glove80 is a pain to disassemble for cleaning, so I've figured out a way to deal with sticky spills without doing so. I am not certain that it is entirely safe, but so far (several weeks after the spill) I have not noticed any deterioration of functionality, despite having followed this process on multiple switches.

If you *can* disassemble your keyboard and clean it properly (with isopropyl alcohol and then relubricating switches), do that instead! This is a guide of last resort. It may damage your keyboard permanently. You have been warned.

Required tools:

Process:

  1. Turn off and disconnect your keyboard
  2. Remove keycap carefully with key puller (if you have a Glove80, follow their specific instructions for removing caps without damage)
  3. Spray some Alklanet on the paper towel - not on the switch itself!
  4. Press against the front of the switch, where there is a 'rail' indent that guides the stem, causing a droplet of Alklanet to seep into the switch. It should only be a tiny amount, just enough to seep down!
  5. Rapidly press and release the switch many times, you should start seeing the liquid slightly spread inside of the switch
  6. Blow strongly into the switch for a while, using compressed air of some kind if possible, to accelerate the drying process
  7. Verify that if you press the switch, you can no longer see liquid moving or air bubbles forming inside (ie. it is fully dry)
  8. Done!

The reason this works: Alklanet is good at dissolving organics, including sugary drinks, but quite bad (though not entirely ineffective) at degreasing. This means that it will primarily dissolve and remove the sticky spill, without affecting the lubrication much. Because alklanet dissipates into the air quickly, it leaves very little, if any residue behind, limiting the risk of shorted contacts.

If your switch is not registering reliably after this process, it has not been fully cleaned - do it again. If your switch is registering double presses only, then it has not dried sufficiently; immediately unplug and power off, and let it dry for longer. If both happen, it is also not sufficiently cleaned.

If Alklanet is not available where you are, you may try to acquire a different cleaning agent that quickly dissolves into the air, leaves behind no residue, and that affects organic substances but not grease. Commercial window cleaners are your best bet, but this is entirely at your own risk, and you should be certain that it has these properties - labels are often misleading.

Hardware

Hacking the Areson L103G

This article was originally published many years ago at http://cryto.net/~joepie91/areson/, when I just started digging into supply chains more. The contents have not been checked for accuracy since!

The Medion MD 86079.

(it's called the Areson G3 according to the USB identification data, though.)

Several years ago, I bought a Medion laser gaming mouse, the MD 86079, at the local Aldi. I'd been using it quite happily for years, and was always amazed at the comfort and accuracy of it, especially on glossy surfaces. Recently, I recommended it to somebody else, only to find that Medion had stopped selling it and it wasn't available anywhere else, either.

So I started researching, to figure out who actually made these things - after all, Medion barely does any manufacturing themselves. I started out by searching for it on Google, but this failed to bring up any useful results. The next step was to check the manual - but that didn't turn up anything useful either. Then I got an idea - what if I looked for clues in the driver software?

Huh?

Certainly interesting, but not quite what I was looking for...

A hex dump of the relevant portion of the driver software.

Bingo! The manufacturer turned out to be Areson.

Some further searching on the manufacturer name led me to believe that the particular model was the L103G, since the exterior shape matched that of my mouse. However, when I searched for this model number a bit more... I started running across the Mtek Gaming Xtreme L103G and the Bross L103G. And, more interestingly, an earlier version of my mouse from Medion that was branded the "USB Mouse Areson L103G"! Apparently it had been staring me in the face for a while, and I failed to notice it.

Either way, the various other mice with the exact same build piqued my interest, and I started looking for other Areson L103G variants. And oh man, there were many. It started out with the Cyber Snipa mice, but as it turns out, Areson builds OEM mice for a large array of brands. Even the OCZ Dominatrix is actually just an Areson L103G! I've made a list of L103G variants and some other Areson-manufactured mice down this page.

Another thing that I noticed, was that all these L103G variants advertised configurable macro keys and DPI settings, up to sometimes 5000 DPI, while my mouse was advertised as hard-set 1600 DPI with just an auto-fire button, and only came with driver software that let me remap the navigational buttons.

Surely if these mice are all the same model, they would have the same chipset and thus the same capabilities? I also wondered why my DPI switching and autofire (macro) buttons didn't work under Linux - if these mice are programmable, then surely this functionality is handled by the mouse chipset and not by the driver?

It was time to fire up a Windows XP VM.

After some mucking around with VirtualBox to get USB passthrough to work (hey, openSUSE packagers, you should probably document that you've disabled that by default for security reasons!), I installed the original driver software for my Medion mouse. Apparently it's not even really a kernel driver - it seems to just be a piece of userspace software that sends signals to the device.

Sure enough, when I installed the driver, then disabled the USB passthrough, and thereby returned the device to the host (Linux) OS... the DPI switcher and macro button still worked fine, despite there being no driver to talk to anymore.

So, what was going on here?

My initial guess was that the mouse initially acts as a 'dumb' preconfigured USB/2.0 mouse, in order to have acceptable behaviour and DPI on a driver-less system, and that it would only enable the 'advanced features' (macros, DPI switching) if it got a signal from the driver saying that the configuration software was present. Now of course this makes sense for a highly configurable gaming mouse, but as my mouse didn't come with such software I found it a little odd.

So I fired up SnoopyPro, and had a look at the interaction that took place. Compared to a 'regular' 5 euro optical USB mouse - which I always have laying around as a spare - I noticed that two more interactions took place:

USB protocol dump, part 1.

USB protocol dump, part 2.

I haven't gotten around to looking at this in more detail yet (more to come!), but to me, that looks like it registers an extra non-standard configuration interface. Presumably, that interface is used for configuring the DPI and macros, and I suspect that the registration of it triggers enabling the DPI and macro buttons on the device.

USB protocol stuff aside, I wondered - is the hardware in my mouse really the same as that in the other models? And could I (ab)use that fact to configure my mouse beyond its advertised DPI?

The Trust GXT 33 control panel.

As it turns out, yes, I can!

The Trust GXT 33 is another Areson L103G model, advertised as configurable up to 5000 DPI. Its 'driver' software happily lets me configure my mouse up to those 5000 DPI - even though my Medion mouse was only advertised as 1600 DPI! I've changed the configuration (as you can see in the screenshot), and it really does take effect. It even keeps working after detaching it from the USB passthrough and thus returning it to Linux. And it doesn't stop there...

The Trust GXT 33 control panel, macro panel.

I can even configure macros for it. The interface isn't the most pleasant, but it works. And apparently, I now have some 5.7 KB of free storage space! I wonder if you could store arbitrary data in there...

Either way, back to the L103G. There is a quite wide array of variants of it, and I've made a list below for your perusal. Most of these are not sold anymore, but the Trust GXT 33 is - if it's sold near you (or any of the other L103G models are), I'd definitely recommend picking one up :)

A sidenote: some places reported particular mice (such as the Mtek L103G) as having a 1600 DPI sensor that can interpolate up to 3200 DPI with accuracy loss. However, even when cranking up mine to 5000 DPI, I did not notice any loss in quality - it is therefore possible that there are some differences between the sensors in different models.

The model list

Know of a model not listed here, or have a suggestion / correction / other addition? E-mail me!

image.png

Medion MD 86079

Medion X81007

Medion L103G

Advertised default DPI
400 / 800 / 1200 / 1600
Advertised maximum configurable DPI
N/A, advertised as hard-set resolution. Native sensor resolution unclear.
Actual maximum configurable DPI
5000 DPI
Advertised macro features
Hard-set, macro key enables auto-fire.
Actual macro features
Freely configurable mouse/keyboard macros, 5888 bytes internal storage space.
Sold at...
No longer available.

image.png

Bross L103G

Advertised default DPI
Not listed.
Advertised maximum configurable DPI
400 - 3200 DPI
Actual maximum configurable DPI
Not tested. Send me feedback!
Advertised macro features
Freely configurable mouse/keyboard macros.
Actual macro features
Not tested. Send me feedback!
Notes
Sold primarily in Turkey.
Sold at...
No longer available.

image.png

Cyber Snipa Stinger

Advertised default DPI
Not listed.
Advertised maximum configurable DPI
400 - 3200 DPI
Actual maximum configurable DPI
Not tested. Send me feedback!
Advertised macro features
Freely configurable mouse/keyboard macros, 3 profiles with 6 each.
Actual macro features
Not tested. Send me feedback!
Notes
Company (Cyber Snipa) appears to have gone defunct. E-mail bounces, Twitter compromised, most of their site broken.
Sold at...
No longer available.

image.png

Mtek Gaming Extreme L103G

Advertised default DPI
400 / 800 / 1600 / 2000
Advertised maximum configurable DPI
400 - 3200 DPI
Actual maximum configurable DPI
Not tested. Send me feedback!
Advertised macro features
Freely configurable mouse/keyboard macros.
Actual macro features
Not tested. Send me feedback!
Notes
Sold primarily in Brazil.
Sold at...
No longer available.

image.png

Trust GXT 33

Advertised default DPI
450 / 900 / 1800 / 3600
Advertised maximum configurable DPI
3600 DPI native
Actual maximum configurable DPI
Not tested. Send me feedback!
Advertised macro features
Freely configurable mouse/keyboard macros.
Actual macro features
Not tested. Send me feedback!
Notes
Software for this mouse let me configure my Medion mouse to 5000 DPI. Not sure if also possible for the GXT 33 itself, or whether interpolation is involved.
Sold at...
Physical stores, online Dutch shops (from €35), Amazon (from $61.88).

image.png

MSI StarMouse GS501

Advertised default DPI
400 / 800 / 1600 / 2400
Advertised maximum configurable DPI
1600 DPI native
Actual maximum configurable DPI
Not tested. Send me feedback!
Advertised macro features
Freely configurable mouse/keyboard macros. Macro key acts as mode/profile switch button. Two programmable buttons.
Actual macro features
Not tested. Send me feedback!
Notes
Slightly different shell design.
Sold at...
No longer available.

image.png

OCZ Dominatrix

Advertised default DPI
400 / 800 / 1600 / 2000
Advertised maximum configurable DPI
3200 DPI, unclear if native or interpolated
Actual maximum configurable DPI
Not tested. Send me feedback!
Advertised macro features
Freely configurable mouse/keyboard macros.
Actual macro features
Not tested. Send me feedback!
Notes
Slightly different shell; not a single-piece cover, and differently shaped DPI / macro keys. Possibly more customized.
Sold at...
No longer available.

image.png

Revoltec FightMouse Pro

Advertised default DPI
400 / 800 / 1600 / 2000
Advertised maximum configurable DPI
3200 DPI native
Actual maximum configurable DPI
Not tested. Send me feedback!
Advertised macro features
Freely configurable mouse/keyboard macros.
Actual macro features
Not tested. Send me feedback!
Notes
Same shell layout as the OCZ Dominatrix, but with carbon print.
Sold at...
No longer manufactured.
Azerty (NL, €41,02), Amazon UK (£39.59)

Earlier/simpler models (no macros, etc.)

image.png

Gigabyte GM-M6800

Advertised default DPI
800 / 1600
Advertised maximum configurable DPI
Advertised as hard-set.
Actual maximum configurable DPI
Not tested. Send me feedback!
Advertised macro features
None. No physical macro button either.
Actual macro features
Not tested. Send me feedback!
Notes
This is an optical mouse, not a laser mouse! This appears to be a custom (cheaper) optical build. No LED illumination, no side-scroll, and no weight adjustment.
Sold at...
Online Dutch shops (from €14,39), Amazon (from $9.99).

image.png

Gigabyte GM-M6880

Advertised default DPI
400 / 800 / 1600 (version 1 only supports 800 / 1600)
Advertised maximum configurable DPI
Advertised as hard-set.
Actual maximum configurable DPI
Not tested. Send me feedback!
Advertised macro features
None. No physical macro button either.
Actual macro features
Not tested. Send me feedback!
Notes
The same as the Gigabyte GM-M6800, but with a laser sensor and a differently colored shell. This appears to be a custom (cheaper) build. No LED illumination, no side-scroll, and no weight adjustment.
Sold at...
Online Dutch shops (from €12,77), Amazon (from $19.67).

image.png

PureTrak Valor

Advertised default DPI
800 / 1600 / 2400 / 3500
Advertised maximum configurable DPI
3500 DPI native
Actual maximum configurable DPI
Not tested. Send me feedback!
Advertised macro features
None. No physical macro button either.
Actual macro features
Not tested. Send me feedback!
Notes
This is an optical mouse, not a laser mouse! Similar shell to the OCZ Dominatrix and Revoltech FightMouse Pro, but without macro key. Has the same weight adjustment system as the standard L103G.
Sold at...
Online Dutch shops (from €24,90), Amazon (from $19.95).

image.png

Sentey Whirlwind X

Advertised default DPI
400 / 800 / 1600 / 3200
Advertised maximum configurable DPI
3200 DPI native
Actual maximum configurable DPI
Not tested. Send me feedback!
Advertised macro features
None. No physical macro button either.
Actual macro features
Not tested. Send me feedback!
Notes
This is an optical mouse, not a laser mouse! Similar shell to the PureTrak Valor, but no weight adjustment. Also no macro key. Pixart PAW-3305 chipset, rather than the AVAGO ADNS series that is common in this type of mouse.
Sold at...
Many stores (from $29.99), Amazon (sale $9.99, regular $34.99).

image.png

CANYON CNR-MSG01

Advertised default DPI
400 / 800 / 1600 / 2400 (?)
Advertised maximum configurable DPI
3200 DPI (possibly interpolated)
Actual maximum configurable DPI
Not tested. Send me feedback!
Advertised macro features
None. No physical macro button either.
Actual macro features
Not tested. Send me feedback!
Notes
This is an optical mouse, not a laser mouse! Similar shell to the regular L103G, but without macro key. No weight adjustment, likely no sidescroll either.
Sold at...
No longer available.

Server administration

General Linux server management notes, not specific to anything in particular.

Server administration

Batch-migrating Gitolite repositories to Gogs

This article was originally published at https://gist.github.com/joepie91/2ff74545f079352c740a

NOTE: This will only work if you are an administrator on your Gogs instance, or if an administrator has enabled local repository importing for all users.

First, save the following as migrate.sh somewhere, and make it executable (chmod +x migrate.sh):

HOSTNAME="git.cryto.net"
BASEPATH="/home/git/old-repositories/projects/joepie91"

OWNER_ID="$1"
CSRF=`cat ./cookies.txt | grep _csrf | cut -f 7`

while read REPO; do
	REPONAME=`echo "$REPO" | sed "s/\.git\$//"`
	curl "https://$HOSTNAME/repo/migrate" \
		-b "./cookies.txt" \
		-H 'origin: null' \
		-H 'content-type: application/x-www-form-urlencoded' \
		-H "authority: $HOSTNAME" \
		--data "_csrf=$CSRF" \
		--data-urlencode "clone_addr=$BASEPATH/$REPO" \
		--data-urlencode "uid=$OWNER_ID" \
		--data-urlencode "auth_username=" \
		--data-urlencode "auth_password=" \
		--data-urlencode "repo_name=$REPONAME" \
		--data-urlencode "description=Automatically migrated from Gitolite"
done

Change HOSTNAME to point at your Gogs installation, and BASEPATH to point at the folder where your Gitolite repositories live on the filesystem. It must be the entire base path - the repository names cannot contain slashes!

Now save the Gogs cookies from your browser as cookies.txt, and create a file (eg. repositories.txt) containing all your repository names, each on a new line. It could look something like this:

project1.git
project2.git
project3.git

After that, run the following command:

cat repositories.txt | ./migrate.sh 1

... where you replace 1 with your User ID on your Gogs instance.

Done!

Server administration

What is(n't) Docker actually for?

This article was originally published at https://gist.github.com/joepie91/1427c8fb172e07251a4bbc1974cdb9cd.

This article was written in 2016. Some details may have changed since.

A brief listing of some misconceptions about the purpose of Docker.

Secure isolation

Some people try to use Docker as a 'containment system' for either:

... but Docker explicitly does not provide that kind of functionality. You get essentially the same level of security from just running things under a user account.

If you want secure isolation, either use a full virtualization technology (Xen HVM, QEMU/KVM, VMWare, ...), or a containerization/paravirtualization technology that's explicitly designed to provide secure isolation (OpenVZ, Xen PV, unprivileged LXC, ...)

"Runs everywhere"

Absolutely false. Docker will not run (well) on:

Docker is just a containerization system. It doesn't do magic. And due to environmental limitations, chances are that using Docker will actually make your application run in less environments.

No dependency conflicts

Sort of true, but misleading. There are many solutions to this, and in many cases it isn't even a realistic problem.

If you do need to isolate something and the above either doesn't suffice or it doesn't integrate with your management flow well enough, you should rather look at something like Nix/NixOS, which solves the dependency isolation problem in a much more robust and efficient way, and also solves the problem of state. It does incur management overhead, like Docker would.

Magic scalability

First of all: you probably don't need any of this. 99.99% of projects will never have to scale beyond a single system, and all you'll be doing is adding management overhead and moving parts that can break, to solve a problem you never had to begin with.

If you do need to scale beyond a single system, even if that needs to be done rapidly, you probably still don't get a big benefit from automated orchestration. You set up each server once, and assuming you run the same OS/distro on each system, the updating process will be basically the same for every system. It'll likely take you more time to set up and manage automated orchestration, than it would to just do it manually when needed.

The only usecase where automated orchestration really shines, is in cases where you have high variance in the amount of infrastructure you need - one day you need a single server, the next day you need ten, and yet another day later it's back down to five. There are extremely few applications that fall into this category, but even if your application does - there have been automated orchestration systems for a long time (Puppet, Chef, Ansible, ...) that don't introduce the kind of limitations or overhead that Docker does.

No need to rely on a sysadmin

False. Docker is not your system administrator, and you still need to understand what the moving parts are, and how they interact together. Docker is just a container system, and putting an application in a container doesn't somehow magically absolve you from having to have somebody manage your systems.

Server administration

Blocking LLM scrapers on Alibaba Cloud from your nginx configuration

There are currently LLM scrapers running off many Alibaba Cloud IPs, that ignore robots.txt and pretend to be desktop browsers. They also generate absurd request rates, to the point of being basically a DDoS attack. One way to deal with them is to simply block all of Alibaba Cloud.

This will also block legitimate users of Alibaba Cloud!

Here's how you can block them:

  1. Generate a deny entry list at https://www.enjen.net/asn-blocklist/index.php?asn=45102&type=nginx
  2. Add the entries to your nginx configuration. It goes directly in the server { ... } block.

On NixOS

If you're using Nix or NixOS, you can keep the deny list in a separate file, which makes it easier to maintain and won't clutter up your nginx configuration as much. It would look something like this:

services.nginx.virtualHosts.<name>.extraConfig = ''
  ${import ./alibaba-blocklist.nix}
  # other config goes here
''

... where you replace <name> with the name of your hostname.

Server administration

Dealing with a degraded btrfs array due to disk failure

Forcing a btrfs filesystem to be mounted even though some drives are missing (in a default multi-disk setup, ie. RAID0 for data but RAID1 for metadata):

mount -o degraded,ro /path/to/mount

This assumes that the mounting configuration is defined in your fstab, and will mount it as read-only in a degraded state. You will be able to browse the filesystem, but any file contents may have unexplained gaps and/or be corrupted. Mostly useful to figure out what data used to be on a degraded filesystem.

Never mount a degraded filesystem as read-write unless you have a very specific reason to need it, and you understand the risks. If applications are allowed to write to it, they can very easily make the data corruption worse, and reduce your chances of data recovery to zero!

Privacy

Privacy

Don't use VPN services.

This article was originally published at https://gist.github.com/joepie91/5a9909939e6ce7d09e29.

No, seriously, don't. You're probably reading this because you've asked what VPN service to use, and this is the answer.

Note: The content in this post does not apply to using VPN for their intended purpose; that is, as a virtual private (internal) network. It only applies to using it as a glorified proxy, which is what every third-party "VPN provider" does.

Why not?

Because a VPN in this sense is just a glorified proxy. The VPN provider can see all your traffic, and do with it what they want - including logging.

But my provider doesn't log!

There is no way for you to verify that, and of course this is what a malicious VPN provider would claim as well. In short: the only safe assumption is that every VPN provider logs.

And remember that it is in a VPN provider's best interest to log their users - it lets them deflect blame to the customer, if they ever were to get into legal trouble. The $10/month that you're paying for your VPN service doesn't even pay for the lawyer's coffee, so expect them to hand you over.

But a provider would lose business if they did that!

I'll believe that when HideMyAss goes out of business. They gave up their users years ago, and this was widely publicized. The reality is that most of their customers will either not care or not even be aware of it.

But I pay anonymously, using Bitcoin/PaysafeCard/Cash/drugs!

Doesn't matter. You're still connecting to their service from your own IP, and they can log that.

But I want more security!

VPNs don't provide security. They are just a glorified proxy.

But I want more privacy!

VPNs don't provide privacy, with a few exceptions (detailed below). They are just a proxy. If somebody wants to tap your connection, they can still do so - they just have to do so at a different point (ie. when your traffic leaves the VPN server).

But I want more encryption!

Use SSL/TLS and HTTPS (for centralized services), or end-to-end encryption (for social or P2P applications). VPNs can't magically encrypt your traffic - it's simply not technically possible. If the endpoint expects plaintext, there is nothing you can do about that.

When using a VPN, the only encrypted part of the connection is from you to the VPN provider. From the VPN provider onwards, it is the same as it would have been without a VPN. And remember, the VPN provider can see and mess with all your traffic.

But I want to confuse trackers by sharing an IP address!

Your IP address is a largely irrelevant metric in modern tracking systems. Marketers have gotten wise to these kind of tactics, and combined with increased adoption of CGNAT and an ever-increasing amount of devices per household, it just isn't a reliable data point anymore.

Marketers will almost always use some kind of other metric to identify and distinguish you. That can be anything from a useragent to a fingerprinting profile. A VPN cannot prevent this.

So when should I use a VPN?

There are roughly two usecases where you might want to use a VPN:

  1. You are on a known-hostile network (eg. a public airport WiFi access point, or an ISP that is known to use MITM), and you want to work around that.
  2. You want to hide your IP from a very specific set of non-government-sanctioned adversaries - for example, circumventing a ban in a chatroom or preventing anti-piracy scareletters.

In the second case, you'd probably just want a regular proxy specifically for that traffic - sending all of your traffic over a VPN provider (like is the default with almost every VPN client) will still result in the provider being able to snoop on and mess with your traffic.

However, in practice, just don't use a VPN provider at all, even for these cases.

So, then... what?

If you absolutely need a VPN, and you understand what its limitations are, purchase a VPS and set up your own (either using something like Streisand or manually - I recommend using Wireguard). I will not recommend any specific providers (diversity is good!), but there are plenty of cheap ones to be found on LowEndTalk.

But how is that any better than a VPN service?

A VPN provider specifically seeks out those who are looking for privacy, and who may thus have interesting traffic. Statistically speaking, it is more likely that a VPN provider will be malicious or a honeypot, than that an arbitrary generic VPS provider will be.

So why do VPN services exist? Surely they must serve some purpose?

Because it's easy money. You just set up OpenVPN on a few servers, and essentially start reselling bandwidth with a markup. You can make every promise in the world, because nobody can verify them. You don't even have to know what you're doing, because again, nobody can verify what you say. It is 100% snake-oil.

So yes, VPN services do serve a purpose - it's just one that benefits the provider, not you.


This post is licensed under the WTFPL or CC0, at your choice. You may distribute, use, modify, translate, and license it in any way.


Privacy

Normies just don't care about privacy

If you're a privacy enthusiast, you probably clicked a link to this post thinking it's going to vindicate you; that it's going to prove how you've been right all along, and "normies just don't care about privacy", despite your best efforts to make them care. That it's going to show how you're smarter, because you understand the threats to privacy and how to fight them.

Unfortunately, you're not right. You never were. Let's talk about why, and what you should do next.

So, first of all, let's dispense with the "normie" term. It's a pejorative term, a name to call someone when they don't have your exact set of skills and interests, a term to use when you want to imply that someone is clueless or otherwise below you. There's no good reason to use it, and it suggests that you're looking down on them. Just call them "people", like everybody else and like yourself - you don't need to turn them into a group of "others" to begin with.

Why does that matter? Well, would you take advice from someone who looks down on you? You probably wouldn't. Talking about "normies" pretty much sets the tone for a conversation; it means that you don't care about someone elses interests or circumstances, that you won't treat them like a full human being of equal value to yourself. In other words, you're being an arrogant asshole. And noone likes arrogant assholes.

And this is also exactly why you think that they "just don't care about privacy". They might have even explicitly told you that they don't! So then it's clear, right? If they say they don't care about privacy, that must mean that they don't care about privacy, otherwise they wouldn't say that!

Unfortunately, that's not how it works. Most likely, the reason they told you that they "don't care" is to make you go away. Most likely, you've been quite pushy, telling them what they should be doing or using instead, and responding to every counterpoint with an even stronger recommendation, maybe even trying to make them feel guilty about "not caring enough" just because they're not as enthusiastic about it as you are.

And how do you make an enthusiast like that go away? You cut off the conversation. You tell them that you don't care. You leave zero space for the enthusiast to wiggle their way back into the conversation, for them to try and continue arguing something that you've grown tired of. If you don't care, then there's nothing to argue about, and so that is what they tell you.

In reality, almost everybody does care about privacy. To different degrees, in different situations, and in different ways - but almost everybody cares. People lock the bathroom door; they use changing stalls; they don't like strangers shouldersurfing their phone screen; they hide letters and other things. Clearly people do care. They probably also know that Facebook and the like are pretty shitty, considering that media outlets have been reporting on it for a decade now. You don't need to tell them that.

So what should you do? It's easy for me to say "don't be pushy", but then how do you help people keep their communications private? How do you help advance the state of private communications in general?

The answer is to understand, not argue. Don't try to convince people, at least not directly. Don't tell them what to do, or what to use. Don't try to make them feel bad about using closed or privacy-unfriendly systems. Instead, ask questions. Try to understand their circumstances - who do they talk to, why do they need to use specific services? Does their employer require it? Are their friends refusing to move over to something without a specific feature?

Recognize and accept that caring about privacy does not mean it needs to be your primary purpose in life. Someone can simultaneously care about privacy, but also refuse to stop using Facebook because they care more about talking to a long-lost friend who is not reachable anywhere else. They can care about privacy, but care more about keeping their job which requires using Slack. They're not enthusiasts, and they shouldn't need to be to have privacy in their life - that's the whole point of the privacy movement, isn't it?

Finally, once you have asked enough questions - without being judgmental or considering answers 'wrong' in any way - you can build an understanding of someone's motivations and concerns and interests. You now have enough information to understand whether you can help them make their life more private without giving up on the things they care about.

Maybe they really want reactions in their messenger when talking to their friends, and just weren't aware that Matrix can do that, and that's what kept them on Discord. Maybe they've looked at Mastodon, but it looked like a ghost town to them, just because they didn't know about a good instance to join. But these are all things that you can't know until you've learned about someone's individual concerns and priorities. Things that you would never learn about to begin with, if they cut you off with "I don't care" because you're being pushy.

And maybe, the answer is that you can't do anything for them. Maybe, they just don't have any other options, and there are issues with all your alternative suggestions that would make them unworkable in their situation. Sometimes, the answer is just that something isn't good enough yet; and that you need to accept that, and put in the work to improve the tool instead of trying to convince people to use it as-is.

Don't be the insufferable privacy nut. Be the helpful, supportive and understanding friend who happens to know things about privacy.

Security

The computer kind, mostly.

Security

Why you probably shouldn't use a wildcard certificate

This article was originally published at https://gist.github.com/joepie91/7e5cad8c0726fd6a5e90360a754fc568.

Recently, Let's Encrypt launched free wildcard certificates. While this is good news in and of itself, as it removes one of the last remaining reasons for expensive commercial certificates, I've unfortunately seen a lot of people dangerously misunderstand what wildcard certificates are for.

Therefore, in this brief post I'll explain why you probably shouldn't use a wildcard certificate, as it will put your security at risk.

A brief explainer

It's generally pretty poorly understood (and documented!) how TLS ("SSL") works, so let's go through a brief explanation of the parts that are important here.

The general (simplified) idea behind how real-world TLS deployments work, is that you:

  1. Generate a cryptographic keypair (private + public key)
  2. Generate a 'certificate' from that (containing the public key + some metadata, such as your hostname and when the certificate will expire)
  3. Send the certificate to a Certificate Authority (like Let's Encrypt), who will then validate the metadata - this is where it's ensured that you actually own the hostname you've created a certificate for, as the CA will check this.
  4. Receive a signed certificate - the original certificate, plus a cryptographic signature proving that a given CA validated it
  5. Serve up this signed certificate to your users' clients

The client will then do the following:

  1. Verify that the certificate was signed by a Certificate Authority that it trusts; the keys of all trusted CAs already exist on your system.
  2. If it's valid, treat the public key included with the certificate as the legitimate server's public key, and use that key to encrypt the communication with the server

This description is somewhat simplified, and I don't want to go into too much detail as to why this is secure from many attacks, but the general idea is this: nobody can snoop on your traffic or impersonate your server, so long as 1) no Certificate Authorities have their own keys compromised, and 2) your keypair + signed certificate have not been leaked.

So, what's a wildcard certificate really?

A typical TLS certificate will have an explicit hostname in its metadata; for example, Google might have a certificate for mail.google.com. That certificate is only valid on https://mail.google.com/ - not on https://google.com/, not on https://images.google.com/, and not on https://my.mail.google.com/ either. In other words, the hostname has to be an exact match. If you tried to use that certificate on https://my.mail.google.com/. you'd get a certificate error from your browser.

A wildcard certificate is different; as the name suggests, it uses a wildcard match rather than an exact match. You might have a certificate for *.google.com, and it would be valid on https://mail.google.com/ and https://images.google.com/ - but still not on https://google.com/ or https://my.mail.google.com/. In other words, the asterisk can match any one single 'segment' of a hostname, but nothing with a full stop in it.

There are some situations where this is very useful. Say that I run a website builder from a single server, and every user gets their own subdomain - for example, my website might be at https://joepie91.somesitebuilder.com/, whereas your website might be at https://catdogcat.somesitebuilder.com/.

It would be very impractical to have to request a new certificate for every single user that signs up; so, the easier option is to just request one for *.somesitebuilder.com, and now that single certificate works for all users' subdomains.

So far, so good.

So, why can't I do this for everything with subdomains?

And this is where we run into trouble. Note how in the above example, all of the sites are hosted on a single server. If you run a larger website or organization with lots of subdomains that host different things - say, for example, Google with their images.google.com and mail.google.com - then these subdomains will probably be hosted on multiple servers.

And that's where the security of wildcard certificates breaks down.

Remember how one of the two requirements for TLS security is "your keypair + signed certificate have not been leaked". Sometimes certificates do leak - servers sometimes get hacked, for example.

When this happens, you'd want to limit the damage of the compromise - ideally, your certificate will expire pretty rapidly, and it doesn't affect anything other than the server that was compromised anyway. After fixing the issue, you then revoke the old compromised certificate, replace it with a new, non-compromised one, and all your other servers are unaffected.

In our single-server website builder example, this is not a problem. We have a single server, it got compromised, the stolen certificate only works for that one single server; we've limited the damage as much as possible.

But, consider the "multiple servers" scenario - maybe just the images.google.com server got hacked, and mail.google.com was unaffected. However, the certificate on images.google.com was a wildcard certificate for *.google.com, and now the thief can use it to impersonate the mail.google.com server and intercept people's e-mail traffic, even though the mail.google.com server was never hacked!

Even though originally only one server was compromised, we didn't correctly limit the damage, and now the e-mail server is at risk too. If we'd had two certificates, instead - one for mail.google.com and one for images.google.com, each of the servers only having access to their own certificate - then this would not have happened.

The moral of the story

Each certificate should only be used for one server, or one homogeneous cluster of servers. Different services on different servers should have their own, usually non-wildcard certificates.

If you have a lot of hostnames pointing at the same service on the same server(s), then it's fine to use a wildcard certificate - so long as that wildcard certificate doesn't also cover hostnames pointing at other servers; otherwise, each service should have its own certificates.

If you have a few hostnames pointing at unique servers and everything else at one single service - eg. login.mysite.com and then a bunch of user-created sites - then you may want to put the wildcard-covered hostnames under their own prefix. For example, you might have one certificate for login.mysite.com, and one (wildcard) certificate for *.users.mysite.com.

In practice, you will almost never need wildcard certificates. It's nice that the option exists, but unless you're automatically generating subdomains for users, a wildcard certificate is probably an unnecessary and insecure option.

(To be clear: this is in no way specific to Let's Encrypt, it applies to wildcard certificates in general. But now that they're suddenly not expensive anymore, I think this problem requires a bit more attention.)

The Fediverse and Mastodon

The Fediverse and Mastodon

The 5-minute guide to the fediverse and Mastodon

This article was originally published at https://gist.github.com/joepie91/f924e846c24ec7ed82d6d554a7e7c9a8.

There are lots of guides explaining Mastodon and the broader fediverse, but they often go into way too much detail. So I've written this guide - it only talks about the basics you need to know to start using it, and you can then gradually learn the rest from other helpful fediverse users. Let's get started!

The fediverse is not Twitter!

The fediverse is very different from Twitter, and that is by design. It's made for building close communities, not for building a "global town square" or as a megaphone for celebrities. That means many things will work differently from what you're used to. Give it some time, and ask around on the fediverse if you're not sure why something works how it does! People are usually happy to explain, as long as it's a genuine question. Some of the details are explained in this article, but it's not required reading.

The most important takeaway is the "community" part. Clout-chasing and dunking are strongly frowned upon in the fediverse. People expect you to talk to others like they're real human beings, and they will do the same for you.

The fediverse is also not just Mastodon

"The fediverse" is a name for the thousands of servers that connect together to form a big "federated" social network. Every server is its own community with its own people and "vibe", but you can talk to people in other communities as well. Different servers also run different software with different features, and Mastodon is the most well-known option - but you can also talk to servers using different fediverse software, like Misskey.

It doesn't matter what server you pick... mostly

Like I said, different servers have different communities. But don't get stuck on picking one - you can always move to a different server later, and your follows will move with you. Just pick the first server from https://joinmastodon.org/servers that looks good to you. In the long run, you'll probably want to use a smaller server with a closer community, but again it's okay if you start out on a big server first. Other people on your server can help you find a better option later on!

Also keep in mind that the fediverse is run by volunteers; if you run into issues with your server, then you can usually just talk to the admin to get them resolved. It's not like a faceless corporation where you get bounced from department to department!

It's a good idea to avoid mastodon.social and mastodon.online - they have long-standing moderation issues, and are frequently overloaded.

Content warnings and alt texts are important

There are two important parts of the culture on the fediverse that you might not be used; content warnings, and image alt texts. You should always give images a useful descriptive alt text (though it doesn't have to be detailed!), so that the many blind and vision-impaired users in the fediverse can also understand them. They can also help for people to understand jokes that they otherwise wouldn't get. Many people will never "boost" (basically retweet) images that don't have an alt text.

Content warnings are a bit subtler, but also very important. There is a strong culture of using content warnings on the fediverse, and so when in doubt, you should err on the side of using them. Because they are so widespread, people are used to them - you don't need to worry that people won't read things behind a CW. CW rules vary across communities, but you should at least put a CW on posts about violence, politics, sexuality, heavy topics, meta stuff about Twitter or the fediverse, and anything that's currently a "hot topic" that everybody seems to be talking about.

This helps people keep control over what they see, and stops people from getting overwhelmed, like you've probably seen (or felt) happen a lot on Twitter. Replies automatically get the same CW, so it's pretty easy to use.

Take your time

The fediverse isn't built around algorithmic feeds like Twitter is, so by default you won't really find much happening - what you see is entirely determined by who you follow, and it'll take some time to find people you like. This is normal! Things will get much more lively once you're following and interacting with a few people. Likewise, there's no "one big network" - you'll have a different 'view of the network' from every server, because communities tend to be tight-knit. This also means that it's difficult for unpleasant people to find you.

It's a good idea to make an introduction post, tagged with the #introduction hashtag, and hashtags for any of the other topics you're interested in. Posts on the fediverse can only be found by their hashtag, so they're important to use if you want people to find you. Likewise, you can search for hashtags to find interesting people.

That's pretty much it! You'll find many more useful tips on the fediverse itself, under the #FediTips hashtag. Take your time, explore, get used to how everything works, learn about the local culture, and ask for help in a post if you can't figure something out! There are many people who will be happy to help you out.

Cryptocurrency

Cryptocurrency

No, your cryptocurrency cannot work

This article was originally published at https://gist.github.com/joepie91/daa93b9686f554ac7097158383b97838.

Whenever the topic of Bitcoin's energy usage comes up, there's always a flood of hastily-constructed comments by people claiming that their favourite cryptocurrency isn't like Bitcoin, that their favourite cryptocurrency is energy-efficient and scalable and whatnot.

They're wrong, and are quite possibly trying to scam you. Let's look at why.

What is a cryptocurrency anyway?

There are plenty of intricate and complex articles trying to convince you that cryptocurrencies are the future. They usually heavily use jargon and vague terms, make vague promises, and generally give you a sense that there must be something there, but you always come away from them more confused than you were before.

That's not because you're not smart enough; that's because such articles are intentionally written to be confusing and complex, to create the impression of cryptocurrency being some revolutionary technology that you must invest in, while trying to obscure that it's all just smoke and mirrors and there's really not much to it.

So we're not going to do any of that. Let's look at what cryptocurrency really is, the fundamental concept, in simple terms.

A cryptocurrency, put simply, is a currency that is not controlled by an appointed organization like a central bank. Instead, it's a system that's built out of technical rules, code that can independently decide whether someone holds a certain amount of currency and whether a given transaction is valid. The rules are defined upfront and difficult for anybody to change afterwards, because some amount of 'consensus' (agreement) between the systems of different users is needed for that. You can think of it kind of like an automated voting process.

Basically, a cryptocurrency is a currency that is built as software, and that software runs on many people's computers. On paper, this means that "nobody controls it", because everybody has to play by the predefined rules of the system. In practice, it's unfortunately not that simple, and cryptocurrencies end up being heavily centralized, as we'll get to later.

So why does Bitcoin need so much energy?

The idea of a currency that can be entirely controlled by independent software sounds really cool, but there are some problems. For example, how do you prevent one person from convincing the software that they are actually a million different people, and misusing that to influence that consensus process? If you have a majority vote system, then you want to make really sure that everybody can only cast one vote, otherwise it would be really easy to tamper with the outcome.

Cryptocurrencies try to solve this using a 'proof scheme', and Bitcoin specifically uses what's called "proof of work". The idea is that there is a finite amount of computing power in the world, computing power is expensive, and so you can prevent someone from tampering with the 'vote' by requiring them to do some difficult computations. After all, computations can be automatically and independently checked, and so nobody can pretend to have more computing power than they really do. So that's the problem solved, right?

The underlying trick here is to make a 'vote' require the usage of something scarce, something relatively expensive, something that you can't just infinitely wish into existence, like you could do with digital identities. It makes it costly in the real world to participate in the network. That's the core concept behind a proof scheme, and it is crucial for the functioning of a cryptocurrency - without a proof scheme requiring a scarce resource of some sort, the network cannot protect itself and would be easy to tamper with, making it useless as a currency.

To incentivize people to actually do this kind of computation - keep in mind, it's expensive! - cryptocurrencies are set up to reward those who do it, by essentially giving them first dibs on any newly minted currency. This is all fully automated based on that predefined set of rules, there are no manual decisions from some organization involved here.

Unfortunately, we're talking about currencies, and where there are currencies, there is money to be made. And many greedy people have jumped at the chance of doing so with Bitcoin. That's why there are entire datacenters filled with "Bitcoin miners" - computers that are built for just a single purpose, doing those computations, to get a claim on that newly minted currency.

And that is why Bitcoin uses so much energy. As long as the newly minted coins are worth slightly more than the cost of the computations, it's economically viable for these large mining organizations to keep building more and more 'miners' and consuming more and more energy to stake their claim. This is also why energy usage will always go up alongside the exchange rate; the more a Bitcoin is 'worth', the more energy miners are willing to put into obtaining one.

And that's a fundamental problem, one that simply cannot be solved, because it is so crucial to how Bitcoin works. Bitcoin will forever continue consuming more energy as the exchange rate rises, which is currently happening due to speculative bubbles, but which would happen if it gained serious real-world adoption as well. If everybody started using Bitcoin, it would essentially eat the world. There's no way around this.

Even renewable energy can't solve this; renewable energy still requires polluting manufacturing processes, it is often difficult to scale, and it is often more expensive than fossil fuels. So in practice, "mining Bitcoins on renewable energy" - insofar that happens at all - means that all the renewable energy you are now using could not be distributed to factories or households, and they have to continue running on non-renewable energy instead, so you're just shuffling chairs! And because of the endless growth of Bitcoin's energy consumption, it is pretty much guaranteed that those renewable energy resources won't even be enough in the end.

So there's this proof-of-stake thing, right?

You'll often see 'proof of stake' mentioned as an alternative proof scheme in response to this. So what is that, anyway?

The exact implementations vary and can get very complex, but every proof-of-stake scheme is basically some variation of "instead of the scarce resource being energy, it's the currency itself". In other words: the more of the currency that you own, the more votes you have, the more control you have over how the network (and therefore the currency) works as a whole.

You can probably begin to see the problem here already: if the currency is controlled by those who have most of it, how is this any different from government-issued currency, if it's the wealthy controlling the financial system either way? And you'd be completely right. There isn't really a difference.

But what you might not realize, is that this applies for proof-of-work cryptocurrencies too. The frequent claim is that Bitcoin is decentralized and controlled by nobody, but that isn't really true. Because who can afford to invest the most in specialized mining hardware? Exactly, the wealthy. And in practice, almost the entire network is controlled by a small handful of large mining companies and 'mining pools'. Not very decentralized at all.

The same is true for basically every other proof scheme, such as Chia's "proof of space and time", where the scarce resource is just "free storage space". Wealthy people can afford to buy more empty harddrives and SSDs and gain an edge. Look at any cryptocurrency with any proof scheme and you will find the same problem, because it is a fundamental one - if power in your system is handed out based on ownership of a scarce resource of some sort, the wealthy will always have an edge, because they can afford to buy whatever it is.

In other words: it doesn't actually matter what the specific scarce resource is, and it doesn't matter what the proof scheme is! Power will always centralize in the hands of the wealthy, either those who already were wealthy, or those who have recently gotten wealthy with cryptocurrency through dubious means.

The only redeeming feature of proof-of-stake (and many other proof schemes) over proof-of-work is that it does indeed address the energy consumption problem - but that's little comfort when none of these options actually work in a practical sense anyway. This is ultimately a socioeconomic problem, not a technical one, and so you can't solve it with technology.

And that brings us to the next point...

Yes, cryptocurrencies are effectively pyramid schemes

While Bitcoin was not originally designed to be a pyramid scheme, it is very much one now. Nearly every other cryptocurrency was designed to be one from the start.

The trick lies in encouraging people to buy a cryptocurrency. Whoever is telling you that their favourite cryptocurrency is the real deal, the solution to all problems, probably is holding quite a bit of that currency, and is waiting for it to appreciate in value so that they can 'cash out' and turn a profit. The way to make that value appreciation happen, is by trying to convince people like you to 'invest' or 'get in' on it. If you buy the cryptocurrency, that will drive up the price. If a lot of people buy the cryptocurrency, that will drive up the price a lot.

The more hype you can create for a cryptocurrency, the more profit potential there is in it, because more people will 'buy in' and drive up the price before you cash out. This is why there are flashy websites for cryptocurrencies promising the world and revolutionary technology, this is why people on Twitter follow you around incessantly spamming your replies with their favourite cryptocurrency, this is why people take out billboards to advertise the currency. It's a pump-and-dump stock.

This is also the reason why proponents of cryptocurrencies are always so mysterious about how it works, invoking jargon and telling you how much complicated work 'the team' has done on it. The goal is to make you believe that 'there must be something to it' for long enough that you will buy in and they can sell off. By the time you figure out it was all just smoke and mirrors, they're long gone with their profits.

And then the only choice to recoup your investment is for you to hype it up and try to replicate the rise in value. Like a pyramid scheme.

The bottom line

Cryptocurrency as we know it today, simply cannot work. It promises to decentralize power, but proof schemes necessarily give an edge to the wealthy. Meanwhile there's every incentive for people to hype up worthless cryptocurrencies to make a quick buck, all the while disrupting supply chains (GPUs, CPUs, hard drives, ...), and boiling the earth through energy usage that far exceeds that of all of Google.

Maybe some day, a legitimate cryptocurrency without Bitcoin's flaws will come to exist. If it does, it will be some boring research paper out of an academic lab in three decades, not a flashy startup promising easy money or revolutionary new tech today. There are no useful cryptocurrencies today, and there will not be any at any time in the near future. The tech just doesn't work.

Cryptocurrency

Is my blockchain a blockchain?

This article was originally published at https://gist.github.com/joepie91/e49d2bdc9dfec4adc9da8a8434fd029b

Your blockchain must have all of the following properties:

  • It's a merkle tree, or a construct with equivalent properties.
  • There is no single point of trust or authority; nodes are operated by different parties.
  • Multiple 'forks' of the blockchain may exist - that is, nodes may disagree on what the full sequence of blocks looks like.
  • In the case of such a fork, there must exist a deterministic consensus algorithm of some sort to decide what the "real" blockchain looks like (ie. which fork is "correct").
  • The consensus algorithm must be executable with only the information contained in the blockchain (or its forks), and no external input (eg. no decisionmaking from a centralized 'trust node').

If your blockchain is missing any of the above properties, it is not a blockchain, it is just a ledger.

Cryptocurrency

You don't need a blockchain.

This article was originally published at https://gist.github.com/joepie91/a90e21e3d06e1ad924a1bfdfe3c16902.

If you're reading this, you probably suggested to somebody that a particular technical problem could be solved with a blockchain.

Blockchains aren't a desirable thing; they're defined by having trustless consensus, which necessarily has to involve some form of costly signaling to work; that's what prevents attacks like sybil attacks.

In other words: blockchains must be expensive to operate, to work effectively. This makes it a last-resort solution, when you truly have no other options available for solving your problem; in almost every case you want a cheaper and less complex solution than a blockchain.

In particular, if your usecase is commercial, then you do not need or want trustless consensus. This especially includes usecases like supply chain tracking, ticketing, and so on. The whole point of a company is to centralize control; that's what allows a company to operate efficiently. Trustless consensus is the exact opposite of that.

Of course, you may still have a problem of trust, so let's look at some common solutions to common trust problems; solutions that are a better option than a blockchain.

Some people may try to sell you one of the above things as a "blockchain". It's not, and they're lying to you. A blockchain is defined by its trustless consensus; all of the above schemes have existed for way longer than blockchains have, and solve much simpler problems. The above systems also don't provide full decentralization - and that is a feature, because decentralization is expensive.

If somebody talks to you about a "permissioned blockchain" or a "private blockchain", they are also feeding you bullshit. Those things do not actually exist, and they are just buzzwords to make older concepts sound like a blockchain, when they're really not. It's most likely just a replicated append-only log.

There's quite a few derivatives of blockchains, like "tangles" and whatnot. They are all functionally the same as a blockchain, and they suffer from the same tradeoffs. If you do not need a blockchain, then you also do not need any of the blockchain derivatives.

In conclusion: blockchains were an interesting solution to an extremely specific problem, and certainly valuable from a research standpoint. But you probably don't have that extremely specific problem, so you don't need and shouldn't want a blockchain. It'll just cost you crazy amounts of money, and you'll end up with something that either doesn't work, or something that has conceptually existed for 20 years and that you could've just grabbed off GitHub yourself.


Additions

I'm going to add some common claims here over time, and address them.

"But it's useful as a platform to build upon!"

One of the most important properties of a platform is that it must be cost-efficient, or at least as cost-efficient as the requirements allow. When you build on an unnecessarily expensive foundation, you can never build anything competitive - whether commercial or otherwise.

Like all decentralized systems, blockchains fail this test for usecases that do not benefit from being decentralized, because decentralized systems are inherently more expensive than centralized systems; the lack of a trusted party means that work needs to be duplicated for both availability and verification purposes. It is a flat-out impossibility to do less work in an optimal decentralized system than in an equivalent optimal centralized system.

Unlike most decentralized systems, blockchains add an extra cost factor: costly signaling, as described above. For a blockchain to be resiliently decentralized, it must introduce some sort of significant participation cost. For proof-of-work, that cost is in the energy and hardware required, but any tangible participation cost will work. Forms of proof-of-stake are not resiliently decentralized; the cost factor can be bypassed by malicious adversaries in a number of ways, meaning that PoS-based systems aren't reliably decentralized.

In other words: due to blockchains being inherently expensive to operate, they only make sense as a platform for things that actually need trustless consensus - and that list pretty much ends at 'digital currency'. For everything else, it is an unnecessary expense and therefore a poor platform choice.

test.js

Test please ignore

Dependency management

Dependency management

Transitive dependencies and the commons

In this article, I want to explain why I personally only work with programming languages anymore that allow conflicting transitive dependencies, and why this matters for the purpose of building a commons for software.

Types of dependency structures

There are a lot of considerations in designing dependency systems, but there are two axes I've found to be particularly relevant to the topic of a software commons: nested vs. flat dependencies, and system-global vs. project-local dependencies.

Nested vs. flat dependencies

There are roughly two ways to handle transitive dependencies - that is, dependencies of your dependencies:

  1. Either you make the whole dependency set a 'flat' one, where every dependency is a top-level one, or
  2. You represent the dependency structure as a tree, where each dependency in your dependency set has its own, isolated dependency set internally.

I'm talking about the conceptual structure here, so it doesn't actually matter how these dependencies are stored on disk, but I'll illustrate the two forms below.

This is what a nested dependency tree in some project might look like:

And this would be the equivalent tree in a flat dependency structure:

This also immediately shows the primary limitation of a flat dependency set: you cannot have conflicting versions of a transitive dependency in your project! This is what most of the process of "dependency solving" is about - of all the theoretically possible versions of every dependency in your project, find the set that actually works together, ie. where every dependency matches the version constraints for all of its occurrences in the project.

Project-local vs. system-global dependencies

Another important, and related, design decision in a dependency system is whether dependencies are isolated to the project, or whether you have a single dependency set that's used system-wide. This is somewhat self-explanatory; if your dependencies are project-local then that means that they are stored within the project, but if they are system-global then there's a system-wide dependency set of some sort, and so the project gets its dependencies from the environment.

Some examples

Here are some examples of different combinations of these properties, and where you might find them:

So why does any of this actually matter?

It might seem like all of this is just an implementation detail, and it's the problem of the package manager developers to deal with. But the choice of dependency model actually has a big impact on how people use the dependency system.

The cost of conflict

The problem at the center of all of this, is that dependency conflicts are not free. Every time you run into a dependency conflict, you have to stop what you are doing and resolve a conflict. Resolving it may require anything from a small change to a complete architectural overhaul of your code, like in the case where a new version of a critical dependency introduced a different design paradigm.

Now you might think "huh, but I rarely run into that", and that is likely correct - but it's not because the problem doesn't happen. What tends to happen in mature language ecosystems, is that the whole ecosystem centers around a handful of large frameworks over time, where the maintainers do all this work of resolving conflicts preventatively; they coordinate with maintainers of other dependencies, for example, to make sure that these conflicts do not occur.

This has a large maintenance cost to maintainers, and indirectly also a cost to you - it means that that time is not spent on, for example, nice new features or usability improvements in the tools that you use. The cost is still there, it's just very difficult to see if you are not a maintainer of a large framework.

Frameworks and libraries

This also touches on another consequence of working with conflict-prone dependency systems: they incentivize centralization of the ecosystem around a handful of major frameworks, that are usually quite opinionated about how to use them. In a vacuum, small and single-responsibility libraries would be the optimal structure of an ecosystem, but that is simply not a sustainable model when your transitive dependencies can conflict; every dependency you add would superlinearly increase the chance of running into a conflict.

These frameworks are usually acceptable if you work on common systems, that solve common problems; there have been many people before you building similar things, and so the framework will likely have been designed to account for it. But it's a deathly barrier for unconventional or innovative projects, which do not fit into that mold; they are severely disadvantaged because in a framework-heavy ecosystem, every package comes with a large set of assumptions about what you'll be doing with it, and they're usually not going to be the right ones. Leaving you to either not use packages at all, or spend half your time working around them.

Consequences for the commons

A more abstract way in which this problem occurs, is in its impact to the commons. The idea of a 'software commons' is simple; a large, public, shared, freely accessible and usable collection of software that anyone can build upon according to their needs and contribute to according to their ability, resulting in a low-friction way to collaborate on software at a large scale. Some of the idealized consequences of such a commons would be that every problem only needs to be solved exactly once, and it will forever be reliably solved for everyone, and we can all collectively move on to solving other problems.

This is a laudable goal, but it too is harmed by conflict-prone dependency systems. For this goal to be achievable, there must be some sort of distribution format for 'software' that is universally usable, assumption-free, and isolated from the rest of the environment, so that it is guaranteed to fit into any project that has the problem it is designed to solve. But a flat or even system-global dependency model cannot do that - in such a model, it is possible for one piece of software to make it impossible to use another, after all this is what a dependency conflict is.

In other words, to achieve a true software commons on a technical level (the social requirements are for another article), we need a nested, project-local dependency mechanism - or at least a mechanism that can approximate or simulate those properties in some way.

So why are dependency systems so conflict-prone?

So given all of that, the answer would seem obvious, right? Just build nested, project-local dependency systems! And that does indeed solve these issues, but it brings some problems of its own.

Duplication

One of the most obvious problems, but also one of the easiest to solve, is that of duplication. If there are two uses of the same dependency in different parts of the dependency tree, you ideally want those to use the same copy to save space and resources, and indeed this is exactly the typical justification for a flat dependency space. This also applies to compilation; you'd usually want to avoid compiling more than one copy of the same library.

But there is a better way, and it is implemented today by systems like npm: a nested dependency tree which opportunistically moves dependencies to the top level when they are conflict-free. This way, they are only stored in a nested structure on disk when it is necessary to preserve the guarantees of a conflict-free dependency system, ie. when otherwise a dependency conflict would occur. This could be considered a hybrid form between nested and flattened dependencies, and is pretty close to an optimal representation.

The duplication problem exists in another form, and its solution is the optimal representation: duplication between projects. Two pieces of independent end-user software might use the same version of the same dependency, and you would probably want to reuse a single copy for all the same reasons as above. This is typically used as a justification for system-global dependency systems.

But here again, there is a better option, and this time it is truly an optimal representation: a single shared system-global store of packages, identified by some sort of unique identifier, with the software pointing to specific copies within that store. This optimally deduplicates everything, but still allows conflicting implementations to exist. This exists today in Nix (where each store entry is hashed and referenced by path from the dependent) and pnpm (an alternative Node.js package manager where the store is keyed by version and symlinks and hardlinks are used to access it in an npm-compatible manner).

Nominal types

Unfortunately, there is also a more difficult problem - it affects only a subset of languages, and explains why Node.js does have nested dependencies while a lot of other new systems do not. That problem is the nominal typing problem.

If you have a system with nominal typing, then that means that types are not identified by *what they are shaped like (as in structural typing), but by what they are called, or more accurately by their identity. In a typical nominal typing system, if you define the same thing under the same name twice, but in different files, they are different types.

This poses an obvious problem for a nested dependency system: if you can have two different copies of a dependency in your tree, that means you can also have two different types that are supposed to be identical! This would cause a lot of issues - for example, say that a value in a dependency is generated by a transitive dependency, and consumed by a different dependency that uses a different version of that same transitive dependency... the value generated by one copy would be rejected by the other, for not being the same type.

This is what can happen, for example, in Rust - Cargo will nominally let you have conflicting versions, but as soon as you try to exchange values between those copies in your code, you'll encounter a type mismatch.

There are some theoretical language-level solutions to this problem, for example in the form of type adapters - specification on how one copy of a type may be converted to another copy of a type. But this is a non-trivial thing to account for in a language design, and to this date I have not seen any major languages that have such a mechanism. Which means that nominally typed languages are, generally, stuck with flat dependencies.

(If you're wondering how this problem is overcome without nominal typing: the answer is that you're mostly just relying on the internal structure of types not changing in a breaking way between versions, or at least not without also changing the attribute names or, in a structurally typed system, the internal types. That sounds unreliable, but in practice it is very rare to run into situations where this goes wrong, to the point that it's barely worth worrying about.)

But even if this problem were overcome, there's another one.

Backwards compatibility

Dependencies are, almost always, something that is deeply integrated into a language. Whether through the design of the import syntax, or the compiler's lookup rules, or anything else, there's usually something in the design of a language that severely constrains the possibilities for package management. Nested dependencies can work for Node.js because CommonJS accounted for the needs of a nested dependency system from the start, and it is virtually impossible to retrofit it into most existing systems.

For the same reason that a software commons is a possible concept, dependencies are also subject to the network effect - they are a social endeavour, an exercise in interoperation and labour delegation, and that means that there is an immense ecosystem cost associated with breaking dependency interoperability - just ask anyone who has had to try fitting an ES Module into a CommonJS project, for example, or anyone who has gone through the Python 2 to 3 transition. This makes changing the dependency system a very unappealing move.

So in practice, a lot of languages simply aren't able to adopt a nested dependency system, because it would break everything they have today. For the most part, only new languages can adopt nested dependencies, and most new languages are going to be borrowing ideas from existing languages, which... have flat dependencies. Among other things, I'm hoping that this article might serve as an inspiration to choose differently.

My personal view

Now, to get back to why I, personally, don't want to work with conflict-prone languages anymore, which has to do with the 'software commons' point mentioned earlier. I have many motivations behind the projects I work on, but one of them is the desire to build on a software commons using FOSS; to build reliable implementations for solving problems that are generically usable, ergonomic, and just as useful in 20 years (or more) as they are today.

I do not think that this is achievable in a conflict-prone language. Even with the best possible API design that needs no changes, you would still need to periodically update dependencies to make sure that your dependency's transitive dependencies remain compatible with widely-used frameworks and tools.  This makes it impossible to write 'forever libraries' that are written once and then, eventually, after real-world testing and improvements, done forever. The maintenance cost alone would become unsustainable.

The problem is made worse by conflict-prone dependency systems' preference for monolithic frameworks, as those necessarily are opinionated and make assumptions about the usecase; which, unlike a singular solution to a singular problem, is not something that will stand the test of time - needs change, and as such, so do common usecases. Therefore, 'forever libraries' cannot take the shape that a conflict-prone dependency system encourages.

In short, a conflict-prone dependency system simply throws up too many barriers to credibly and sustainably build a long-term software commons, and that means that whatever work I do in the context of such a system, does not contribute towards my actual goals. In practice this means that I am mostly stuck with Javascript today, and I am hoping to see more languages adopt a conflict-free dependency system in the future.

Community governance

Community governance

How to de-escalate situations

I originally drafted this guide for the (public, semi-open) NixOS governance talks in 2024. It was written for participants in those governance discussions, as a de-escalation guide to steer conversation back to a constructive path. The recommendations in it, however, are more generally applicable to any sort of discussion, especially those in which decisions are to be made.

Governance is a complicated topic that often creates conflicts; some of them small, some of them not so small. Moderators are tasked with ensuring that the governance Zulip remains a constructive space for people to talk these things out, but there is a lot that you can do yourself to keep the discussion constructive; or even as a third party intervening in someone else's escalating discussion.

This guide describes some techniques that can be used to prevent and de-escalate conflicts, and help to keep the governance discussion productive for everyone involved. We encourage you to use them!

This guide is based in part on https://libera.chat/guides/catalyst, although several changes and additions have been made to better fit our specific situation.

Assume good faith

The people who participate in these governance conversations, are most likely here because they want the project governance to be improved, just like you. Try to assume that the other person is doing what they're doing in good faith. There are only very few people who genuinely seek to cause disruption, and if that is the case, it becomes a task for moderators to handle.

Listen and ask

A lot of conflicts can be both prevented and de-escalated by simply asking more questions and listening more, instead of speaking. In general, prefer to ask people why they feel a certain way if that is unclear, rather than assuming their intentions - this will provide more space for concerns that would otherwise go overlooked, and avoid creating conflicts due to wrong assumptions.

Even when a conflict has already arisen, asking questions can still be effective to de-escalate; asking people why they are doing something will encourage them to reflect on their behaviour, and this can often lead to self-moderation. Most people do not want to be viewed as "the bad guy".

Likewise, in a conflict, prefer asking for someone's input rather than admonishing their behaviour; this centers the conversation on them and their thoughts, instead of on yourself. Even if you disagree, you are more likely to gain a useful insight this way, and calming down the situation helps everyone involved.

If you need to concretely ask someone to change their behaviour, prefer asking them as a "can you do this?" question, rather than outright demanding it - they will likely be more receptive to your request, and if there is a reason why they cannot, you can look for a solution to that together.

Compromises and reconciliation

Many disagreements are not really fundamental: often there is just a miscommunication, or some mismatch in assumptions. When it seems like you cannot find agreement, try narrowing down exactly where the disagreement comes from, what the most precise difference between your views is. Often, this will inspire new solutions that work for everyone involved, and that reconcile your differences - eliminating the disagreement entirely.

If all else fails, it is often better to find a compromise that everyone can be reasonably happy with, than to leave one side of the conflict entirely unsatisfied. This should be a last resort; too many compromises can easily stack together into a sense of nothing ever being decided, or nothing being changeable. You should always prefer finding reconciliation instead, as described above. True compromise should be very rarely needed.

Health

Health

Treating hair lice in difficult hair

Theory

Hair lice are a relatively innocent parasite that lives in human head hair. Although they are typically not disease carriers, the itching can be extremely frustrating.

Hair lice attach themselves by holding onto the hair, typically close to the skin of the head, which they need to do actively, ie. they need to be alive to do so. Dead hair lice will eventually fall out.

Typical recommendations revolve around using substances that are in some way deadly to lice; depending on substance, by poisoning them, dehydrating, or both. These substances are used along with a lice comb, which is a comb with very fine teeth (with just enough space between teeth for strands of hair to pass through), and essentially 'pulls' the weakened lice out of the hair.

Unfortunately this approach doesn't work for everybody; if you have long or particularly tangle-prone hair, it can be nearly impossible to get down to every bit of skin on your head. Given that the treatment needs to be repeated daily, and missing even one louse can make your efforts futile, this can make it impractical.

Heat treatment

There are experimental heat treatment techniques to remove hair lice; these involve purpose-built devices for removing lice by dehydrating them, causing them to die and lose grip. Heat travels through tangled hair much more easily than a comb, and so can have a higher success rate. Unfortunately, you are unlikely to have such a specialized device at home, and it can be difficult to find someone to do it for you, especially if traveling is difficult or you do not have a lot of money to spend.

Fortunately, however, this process can be replicated with a simple hairdryer, as long as you are careful. Make sure your hairdryer is set to 'hot' mode, and your hair is dry. Then blow hot air through your hair, close to your head, for at least several minutes daily, for the usual treatment period of two weeks.

You need to be very careful when using hot air this close to your head. It's okay for your head to start feeling hot, but as soon as you start getting a burning or scorched feeling on the top of your head, stop the treatment immediately, and keep more distance the next day. If you do not have a lot of hair, you may need to keep the hairdryer at quite some distance - thickness of the hair affects what the correct distance is for you.

A method that I've found particularly effective is to blow air upwards; that is, instead of blowing onto your hair from the top, point the hairdryer upwards and blow it under your mop of hair, as it were - it should feel a bit weird, causing your hair to go in all directions. This maximizes airflow, as the air somewhat gets trapped under your hair, and the only way out is through; this minimizes the deflection of air you would get when blowing from the top down. Note that the upwards technique can worsen hair tangling.

You may also want to use a lice comb in the places where this is easily possible to do; it is not strictly required for the treatment to work, but it makes it easier to clear out the dead lice in one go, instead of having them fall out by themselves over time.

Make sure you continue for the full two weeks, with daily treatment; hair lice have a short breeding cycle, and this treatment only affects the living lice, not their eggs. This means that over the span of two weeks, you will need to gradually dehydrate every new generation of lice. Doing it daily without fail ensures that no generation has a chance to lay new eggs. If you miss a day, you may need to restart the two week timer.

Matrix

Matrix

State resolution attacks

These are some notes on various different kinds of attacks that might be attempted on state resolution algorithms, such as the one in Matrix. Different kinds of state resolution algorithms are vulnerable to different kinds of attacks; a reliable state algorithm should be vulnerable to none of them.

These notes are not complete. More details, graphs, etc. will be added at some later time.

Frontrunning attack

Detect an event that bans or demotes the user, then quickly craft a fake branch full of malicious events (eg. banning other users), but do not submit those events to any other homeserver yet, and then craft an event that parents both the fake branch and the event prior to the detected ban/demote, claiming that the fake branch came earlier and thereby bypassing the ban. Requires a malicious homeserver.

Dead horse attack

Attach crafted event to recent parent and ancient parent, to try and pull in ancient state and confuse the current state; eg. an event from back when a user wasn't banned yet, to try and get the membership state to revert to 'joined' by pulling it into current state. Named this because it involves "beating a dead horse".

Piggybacking attack

A low-powerlevel user places an event in a DAG branch that a high-powerlevel user has also attempted to change state in, as the high-powerlevel state change might cause their branch to become prioritized (ie. sorted in front) in state resolution.

Fir tree attack

Resource exhaustion attack; deliberately constantly creating side branches to trigger state resolution processes. Named after the shape of half a fir tree that it generates in the graph.

Huge graph attack

Resource exhaustion attack; attach crafted event to a wide range of other parent events throughout the history of the room, to pull as many sections of the event graph into state resolution as possible

Mirror attack

Takes advantage of non-deterministic state resolution algorithms to create a split-brain situation that breaks the room, by creating a fake branch containing the exact inverse operations of the real branch, and then resolving the two together; as there is no canonically 'correct' answer under these circumstances, the goal of the attack is to make different servers come to different conclusions.

Protocols and formats

Protocols and formats

Working with DBus

This article is a work in progress. It'll likely be expanded over time, but for now it's incomplete.

What is DBus?

DBus is a standardized 'message bus' protocol that is mainly used on Linux. It serves to let different applications on the same system talk to each other through a standardized format, with a standardized way of specifying the available API.

Additionally, and this is probably the most-used feature, it allows for different applications to 'claim' specific pre-defined ("well-known") namespaces, if they intend to provide the corresponding service. For example, there are many different services that can show desktop notifications to the user, and the user may be using any one of them depending on their desktop environment, but whichever one it is, it will always claim the standard org.freedesktop.Notifications name.

That way, applications that want to show notifications don't need to know which specific notification service is running on the system - they can just send them to whoever claimed that name and implements the corresponding API.

How do you use DBus as a user?

As an end user, you don't really need to care about DBus. As long as a DBus daemon is running on your system (and this will be the case by default on almost every Linux distribution), applications using DBus should just work.

If you're curious, though, you can use a DBus introspection tool such as QDBusViewer or D-Spy to have a look at what sort of APIs the programs on your system provide. Just be careful not to send anything through it without researching it first - you can break things this way!

How do you use DBus as a developer?

You'll need a DBus protocol client. There are roughly two options:

  1. Bindings to libdbus for the language you are using, or
  2. A client implementation that's written directly in the language you are using (eg. dbus-next in JS)

You could also write your own client, as DBus typically just works over a local socket, but note that the serialization format is a little unusual, so it'll take some time to implement it correctly. Using an existing implementation is usually a better idea.

Note that you use a DBus client even when you want to provide an API over DBus; the 'server' in this arrangement is the DBus daemon, not your application.

How the protocol works

DBus implements a few different kinds of interaction mechanisms:

All of these - properties, methods and signals - are addressable by pre-defined names. However, it takes a few steps to get there:

After these steps, you will end up with an interface that you can interact with - it has properties, methods, and/or signals. Don't worry too much about how exactly the hierarchy works here - the division between bus name, object path and interface can be (and in practice, is) implemented in many different ways depending on requirements, and if you merely wish to use a DBus API from some other application, you can simply specify whatever its documentation tells you for all of these values.

Some more information and context about this division can be found here, though keep in mind that you'll often encounter exactly one possible value for bus name, object path and interface, for any given application that exposes an API over DBus, so it's not required reading.

Introspection

An additional feature of DBus is that it allows introspection of DBus APIs; that is, you can use the DBus protocol itself to interrogate an API provider about its available API surface, the argument types, and so on. The details of this are currently not covered here.

Some well-known DBus APIs

Problems

Things I'm trying to work out.

Problems

Subgraph sorting

We have a graph:

We sort this graph topologically into a one-dimensional sequence:

A, B, C, D

The exact sorting order is determined by inspecting the contents of these nodes (not shown here), and doing some kind of unspecified complex comparison on those contents. As this is a topological sort, the comparison is essentially the secondary sorting criterium; the primary sorting criterium is whatever preserves the graph order of the nodes (that is, an ancestor always comes before the node that it is an ancestor of). Crucially, this means that nodes in different branches are compared to each other.

The resulting sorting order is stored in a database, in some sort of order representation. The exact representation is undefined; which representation would work best here, is part of the problem being posed.

Now, the graph is expanded with a newly discovered side branch, introducing two new nodes, E and F:

The new node E now participates in the sorting alongside B, C, and D - we know that E must come after A and before F, because of the ancestor relationships, but we do not know how exactly its ordering position in the sequence relates to the other three nodes, without actually doing the comparison against them.

The problem: the existing order (A, B, C, D) must be updated in the database, such that E and F also become part of the ordered sequence. The constraints are:

You may choose any internal representation in the database, and any sorting mechanism, as long as it fits within the above constraints.