Miscellaneous notes
Here you'll find my miscellaneous, mostly-unsorted notes on various topics.
- Project ideas
- Javascript
- Whirlwind tour of (correct) npm usage
- An overview of Javascript tooling
- Monolithic vs. modular - what's the difference?
- Synchronous vs. asynchronous
- What is state?
- Promises reading list
- The Promises FAQ - addressing the most common questions and misconceptions about Promises
- Error handling (with Promises)
- Bluebird Promise.try using ES6 Promises
- Please don't include minified builds in your npm packages!
- How to get the actual width of an element in jQuery, even with border-box: box-sizing
- A survey of unhandledRejection and rejectionHandled handlers
- Quill.js glossary
- Riot.js cheatsheet
- Quick reference for `checkit` validators
- ES Modules are terrible, actually
- A few notes on the "Gathering weak npm credentials" article
- Node.js
- How to install Node.js applications, if you're not a Node.js developer
- Getting started with Node.js
- Node.js for PHP developers
- Rendering pages server-side with Express (and Pug)
- Running a Node.js application using nvm as a systemd service
- Persistent state in Node.js
- node-gyp requirements
- Introduction to sessions
- Secure random values
- Checking file existence asynchronously
- Fixing "Buffer without new" deprecation warnings
- Why you shouldn't use Sails.js
- Building desktop applications with Node.js
- NixOS
- Setting up Bookstack
- A *complete* listing of operators in Nix, and their predence.
- Setting up Hydra
- Fixing root filesystem errors with fsck on NixOS
- Stepping through builder steps in your custom packages
- Using dependencies in your build phases
- Source roots that need to be renamed before they can be used
- Error: `error: cannot coerce a function to a string`
- `buildInputs` vs. `nativeBuildInputs`?
- QMake ignores my `PREFIX`/`INSTALL_PREFIX`/etc. variables!
- Useful tools for working with NixOS
- Proprietary AMD drivers (fglrx) causing fatal error in i387.h
- Installing a few packages from `master`
- GRUB2 on UEFI
- Unblock ports in the firewall on NixOS
- Guake doesn't start because of a GConf issue
- FFMpeg support in youtube-dl
- An incomplete rant about the state of the documentation for NixOS
- Rust
- Databases and data management
- Maths and computer science
- Hardware
- Server administration
- Batch-migrating Gitolite repositories to Gogs
- What is(n't) Docker actually for?
- Blocking LLM scrapers on Alibaba Cloud from your nginx configuration
- Dealing with a degraded btrfs array due to disk failure
- Privacy
- Security
- The Fediverse and Mastodon
- Cryptocurrency
- test.js
- Dependency management
- Community governance
- Health
- Matrix
- Protocols and formats
- Problems
Project ideas
Various ideas for projects that I do not yet have the time, knowledge or energy to work on. Feel free to take these ideas if they seem interesting, though please keep them non-commercial and free of ads!
Automatic authentication keys
Problem: Every website needs you to create an account. This is a pain to manage, and a barrier. This is especially problematic for self-hosted things like Forgejo, because it gives centralized platforms an advantage (everyone already has an account there). It should be trivial to immediately start using a site without going through a registration process.
Other solutions: OIDC, OpenID and such all require you to have an account with a provider. You fully trust this provider with your access, or you need to self-host it which is extra work. Passkeys are extremely Google-shaped and dubiously designed and documented. Federation is a massively complex solution to design for, and really an unnecessary complexity expense for the vast majority of self-hosted cases.
Proposed solution: Authentication directly integrated into browser through a browser extension. It uses request interception APIs and such to detect "is supported" headers from websites, and inject authentication headers into requests upon confirmation from the user that they wish to authenticate (it should not disclose its existence before that point). Authentication is done through keys managed locally by the browser and optionally stored encrypted on a third-party server.
Unsolved issues: Key management and backup, making it robust. Offer to backup to a USB key? How to deal with Manifest v3 in Chrome?
Javascript
Anything about Javascript in general, that isn't specific to Node.js.
Whirlwind tour of (correct) npm usage
This article was originally published at https://gist.github.com/joepie91/9b9dbd8c9ac3b55a65b2.
This is a quick tour of how to get started with NPM, how to use it, and how to fix it.
I'm available for tutoring and code review :)
Starting a new project
Create a folder for your project, preferably a Git repository. Navigate into that folder, and run:
npm init
It will ask you a few questions. Hit Enter
without input if you're not sure about a question, and it will use the default.
You now have a package.json
.
If you're using Express: Please don't use express-generator
. It sucks. Just use npm init
like explained above, and follow the 'Getting Started' and 'Guide' sections on the Express website. They will teach you all you need to know when starting from scratch.
Installing a package
All packages in NPM are local - that is, specific to the project you install it in, and actually installed within that project. They are also nested - if you use the foo
module, and foo
uses the bar
module, then you will have a ./node_modules/foo/node_modules/bar
. This means you pretty much never have version conflicts, and can install as many modules as you want without running into issues.
All modern versions of NPM will 'deduplicate' and 'flatten' your module folder as much as possible to save disk space, but as a developer you don't have to care about this - it will still work like it's a tree of nested modules, and you can still assume that there will be no version conflicts.
You install a package like this:
npm install packagename
While the packages themselves are installed in the node_modules
directory (as that's where the Node.js runtime will look for them), that's only a temporary install location. The primary place where your dependencies are defined, should be in your package.json
file - so that they can be safely updated and reinstalled later, even if your node_modules
gets lost or corrupted somehow.
In older versions of npm, you had to manually specify the --save
flag to make sure that the package is saved in your package.json
; that's why you may come across this in older articles. However, modern versions of NPM do this automatically, so the command above should be enough.
One case where you do still need to use a flag, is when you're installing a module that you just need for developing your project, but that isn't needed when actually using or deploying your project. Then you can use the --save-dev
flag, like so:
npm install --save-dev packagename
Works pretty much the same, but saves it as a development dependency. This allows a user to install just the 'real' dependencies, to save space and bandwidth, if they just want to use your thing and not modify it.
To install everything that is declared in package.json
, you just run it without arguments:
npm install
When you're using Git or another version control system, you should add node_modules
to your ignore file (eg. .gitignore
for Git); this is because installed copies of modules may need to be different depending on the system. You can then use the above command to make sure that all the dependencies are correctly installed, after cloning your repository to a new system.
Semantic versioning
Packages in NPM usually use semantic versioning; that is, the changes in a version number indicate what has changed, and whether the change is breaking. Let's take 1.2.3 as an example version. The components of that version number would be:
- Major version number: 1
- Minor version number: 2
- Patch version number: 3
Depending on which number changes, there's a different kind of change to the module:
- Patch version upgrade (eg.
1.2.3
->1.2.4
): An internal change was made, but the API hasn't changed. It's safe to upgrade. - Minor version upgrade (eg.
1.2.3
->1.3.0
): The API has changed, but in a backwards-compatible manner - for example, a new feature or option was added. It's safe to upgrade. You may still want to read the changelog, in case there's new features that you want to use, or that you were waiting for. - Major version upgrade (eg.
1.2.3
->2.0.0
): The API has changed, and is not backwards-compatible. For example, a feature was removed, a default was changed, and so on. It is not safe to upgrade. You first need to read the changelog, to see whether the changes affect your application.
Most NPM packages follow this, and it gives you a lot of certainty in what upgrades are safe to carry out, and what upgrades aren't. NPM explicitly adopts semver in its package.json as well, by introducing a few special version formats:
~1.2.3
: Allow automatic patch upgrades, but not minor or major upgrades. Upgrading to1.2.4
is allowed, but upgrading to1.3.0
or2.0.0
is not. You still can't downgrade below1.2.3
- for example,1.2.2
is not allowed.^1.2.3
: Allow automatic patch and minor upgrades, but not major upgrades. Upgrading to1.2.4
or1.3.0
is allowed, but upgrading to2.0.0
is not. You still can't downgrade below1.2.3
- for example,1.2.2
or1.1.0
are not allowed.1.2.3
: Require this specific version. No upgrades are allowed. You will rarely need this - only for misbehaving packages, really.*
: Allow upgrades to whatever the latest version is. You should never use this.
By default, NPM will automatically use the ^1.2.3 notation, which is usually what you want. Only configure it otherwise if you have an explicit reason to do so.
A special case are 0.x.x
versions - these are considered to be 'unstable', and the rules are slightly different: the minor version number indicates a breaking change, rather than the major version number. That means that ^0.1.2
will allow an upgrade to 0.1.3
, but not to 0.2.0
. This is commonly used for pre-release testing versions, where things may wildly change with every release.
If you end up publishing a module yourself (and you most likely eventually will), then definitely adhere to these guidelines as well. They make it a lot easier for developers to keep dependencies up to date, leading to considerably less bugs and security issues.
Global modules
Sometimes, you want to install a command-line utility such as peerflix
, but it doesn't belong to any particular project. For this, there's the --global
or -g
flag:
npm install -g peerflix
If you used packages from your distribution to install Node, you may have to use sudo
for global modules.
Never, ever, ever use global modules for project dependencies, ever. It may seem 'nice' and 'efficient', but you will land in dependency hell. It is not possible to enforce semver constraints on global modules, and things will spontaneously break. All the time. Don't do it. Global modules are only for project-independent, system-wide, command-line tools.
This applies even to development tools for your project. Different projects will often need different, incompatible versions of development tools - so those tools should be installed without the global flag. For local packages, the binaries are all collected in node_modules/.bin
. You can then run the tools like so:
./node_modules/.bin/eslint
NPM is broken, and I don't understand the error!
The errors that NPM shows are usually not very clear. I've written a tool that will analyze your error, and try to explain it in plain English. It can be found here.
My dependencies are broken!
If you've just updated your Node version, then you may have native (compiled) modules that were built against the old Node version, and that won't work with the new one. Run this to rebuild them:
npm rebuild
My dependencies are still broken!
Make sure that all your dependencies are declared in package.json
. Then just remove and recreate your node_modules
:
rm -rf node_modules
npm install
An overview of Javascript tooling
This article was originally published at https://gist.github.com/joepie91/3381ce7f92dec7a1e622538980c0c43d.
Getting confused about the piles of development tools that people use for Javascript? Here's a quick index of what is used for what.
Keep in mind that you shouldn't add tools to your workflow for the sake of it. While you'll see many production systems using a wide range of tools, these tools are typically used because they solved a concrete problem for the developers working on it. You should not add tools to your project unless you have a concrete problem that they can solve; none of the tools here are required.
Start with nothing, and add tools as needed. This will keep you from getting lost in an incomprehensible pile of tooling.
Build/task runners
Typical examples: Gulp, Grunt
These are not exactly build tools in and of themselves; they're rather just used to glue together other tools. For example, if you have a set of build steps where you need to run tool A after tool B, a build runner can help to orchestrate those tools.
Bundlers
Typical examples: Browserify, Webpack, Parcel
These tools take a bunch of .js
files that use modules (either CommonJS using require()
statements, or ES Modules using import
statements), and combine them into a single .js
file. Some of them also allow specifying 'transformation steps', but their main purpose is bundling.
Why does bundling matter? While in Node.js you have access to a module system that lets you load files as-needed from disk, this wouldn't be practical in a browser; fetching every file individually over the network would be very slow. That's why people use a bundler, which effectively does all this work upfront, and then produces a single 'combined' file with all the same guarantees of a module system, but that can be used in a browser.
Bundlers can also be useful for running module-using code in very basic JS environments that don't have module support for some reason; this includes Google Sheets, extensions for PostgreSQL, GNOME, and so on.
Bundlers are not transpilers. They do not compile one language to another, and they don't "make ES6 work everywhere". Those are the job of a transpiler. Bundlers are sometimes configured to use a transpiler, but the transpiling itself isn't done by the bundler.
Bundlers are not task runners. This is an especially popular misconception around Webpack. Webpack does not replace task runners like Gulp; while Gulp is designed to glue together arbitrary build tasks, Webpack is specifically designed for browser bundles. It's commonly useful to use Webpack with Gulp or another task runner.
Transpilers
Typical examples: Babel, the TypeScript compiler, CoffeeScript
These tools take a bunch of code in one language, and 'compile' it to another language. They're called commonly 'transpilers' rather than 'compilers' because unlike traditional compilers, these tools don't compile to a lower-level representation; they're just different languages at a similar level of abstraction.
These are typically used to run code written against newer JS versions in older JS runtimes (eg. Babel), or to provide custom languages with more conveniences or constraints that can then be executed in any regular JS environment (TypeScript, CoffeeScript).
Process restarters
Typical examples: nodemon
These tools automatically restart your (Node.js) process when the underlying code is changed. This is used for development purposes, to remove the need to manually restart your process every change.
A process restarter may either watch for file changes itself, or be controlled by an external tool like a build runner.
Page reloaders
Typical examples: LiveReload, BrowserSync, Webpack hot-reload
These tools automatically refresh a page in the browser and/or reload stylesheets and/or re-render parts of the page, to reflect the changes in your browser-side code. They're kind of the equivalent of a process restarter, but for webpages.
These tools are usually externally controlled; typically by either a build runner or a bundler, or both.
Debuggers
Typical examples: Chrome Developer Tools, node-inspect
These tools allow you to inspect running code; in Node.js, in your browser, or both. Typically they'll support things like pausing execution, stepping through function calls manually, inspecting variables, profiling memory allocations and CPU usage, viewing execution logs, and so on.
They're typically used to find tricky bugs. It's a good idea to learn how these tools work, but often it'll still be easier to find a bug by just 'dumb logging' variables throughout your code using eg. console.log
.
Monolithic vs. modular - what's the difference?
This article was originally published at https://gist.github.com/joepie91/7f03a733a3a72d2396d6.
When you're developing in Node.js, you're likely to run into these terms - "monolithic" and "modular". They're usually used to describe the different types of frameworks and libraries; not just HTTP frameworks, but modules in general.
At a glance
- Monolithic: "Batteries-included" and typically tightly coupled, it tries to include all the stuff that's needed for common usecases. An example of a monolithic web framework would be Sails.js.
- Modular: "Minimal" and loosely coupled. Only includes the bare minimum of functionality and structure, and the rest is a plugin. Fundamentally, it generally only has a single 'responsibility'. An example of a modular web framework would be Express.
Coupled?
In software development, the terms "tightly coupled" and "loosely coupled" are used to indicate how much components rely on each other; or more specifically, how many assumptions they make about each other. This directly translates to how easy it is to replace and change them.
- Tightly coupled: Highly cohesive code, where every part of the code makes assumptions about every other part of the code.
- Loosely coupled: Very "separated" code, where every part of the code communicates with other parts through more-or-less standardized and neutral interfaces.
While tight coupling can sometimes result in slightly more performant code and very occasionally makes it easier to build a 'mental model', loosely coupled code is much easier to understand and maintain - as the inner workings of a component are separated from its interface or API, you can make many more assumptions about how it behaves.
Loosely coupled code is often centered around 'events' and data - a component 'emits' changes that occur, with data attached to them, and other components may optionally 'listen' to those events and do something with it. However, the emitting component has no idea who (if anybody!) is listening, and cannot make assumptions about what the data is going to be used for.
What this means in practice, is that loosely coupled (and modular!) code rarely needs to be changed - once it is written, has a well-defined set of events and methods, and is free of bugs, it no longer needs to change. If an application wants to start using the data differently, it doesn't require changes in the component; the data is still of the same format, and the application can simply process it differently.
This is only one example, of course - loose coupling is more of a practice than a pattern. The exact implementation depends on your usecase. A quick checklist to determine how loosely coupled your code is:
- Does your component rely on external state? This is an absolute no-no. Your component cannot rely on any state outside of the component itself. It may not make any assumptions about the application whatsoever. Don't even rely on configuration files or other filesystem files - all such data must be passed in by the application explicitly, always. What isn't in the component itself, doesn't exist.
- How many assumptions does it make about how the result will be used? Loosely coupled code shouldn't care about how its output will be used, whether it's a return value or an event. The output just needs to be consistent, documented, and neutral.
- How many custom 'types' are used? Loosely coupled code should generally only accept objects that are defined on a language or runtime level, and in common use. Arrays and A+ promises are fine, for example - a proprietary representation of an ongoing task is not.
- If you need a custom type, how simple is it? If absolutely needed, your custom object type should be as plain as possible - just a plain Javascript object, optimally. It should be well-documented, and not duplicate an existing implementation to represent this kind of data. Ideally, it should be defined in a separate project, just for documenting the type; that way, others can implement it as well.
In this section, I've used the terms "component" and "application", but these are interchangeable with "callee"/"caller", and "provider"/"consumer". The principles remain the same.
The trade-offs
At first, a monolithic framework might look easier - after all, it already includes everything you think you're going to need. In the long run, however, you're likely to run into situations where the framework just doesn't quite work how you want it to, and you have to spend time trying to work around it. This problem gets worse if your usecase is more unusual - because the framework developers didn't keep in mind your usecase - but it's a risk that always exists to some degree.
Initially, a modular framework might look harder - you have to figure out what components to use for yourself. That's a one-time cost, however; the majority of modules are reusable across projects, so after your first project you'll have a good idea of what to start with. The remaining usecase-specific modules would've been just as much of a problem in a monolithic framework, where they likely wouldn't have existed to begin with.
Another consideration is the possibility to 'swap out' components. What if there's a bug in the framework that you're unable (or not allowed) to fix? When building your application modularly, you can simply get rid of the offending component and replace it with a different one; this usually doesn't take more than a few minutes, because components are typically small and only do one thing.
In a monolithic framework, this is more problematic - the component is an inherent part of the framework, and replacing it may be impossible or extremely hard, depending on how many assumptions the framework makes. You will almost certainly end up implementing a workaround of some sort, which can take hours; you need to understand the framework's codebase, the component you're using, and the exact reason why it's failing. Then you need to write code that works around it, sometimes even having to 'monkey-patch' framework methods.
In summary, the tradeoffs look like this:
- Monolithic: Slightly faster to get started with, but less control over its workings, more chance of the framework not supporting your usecase, and higher long-term maintenance cost due to the inevitable need for workarounds.
- Modular: Takes slightly longer to get started on your first project, but total control over its workings, practically every usecase is supported, and long-term maintenance is cheaper.
The "it's just a prototype!" argument
When explaining this to people, a common justification for picking a monolithic framework is that "it's just a prototype!", or "it's just an MVP!", with the implication that it can be changed later. In reality, it usually can't.
Try explaining to your boss that you want to throw out the working(!) code you have, and rewrite everything from the ground up in a different, more maintainable framework. The best response that you're likely to get, is your boss questioning why you didn't use that framework to begin with - but more likely, the answer is "no", and you're going to be stuck with your hard-to-maintain monolithic codebase for the rest of the project or your employment, whichever terminates first.
Again, the cost of a modular codebase is a one-time cost. After your first project, you already know where to find most modules you need, and building on a modular framework will not be more expensive than building on a monolithic one. Don't fall into the "prototype trap", and do it right from day one. You're likely to be stuck with it for the rest of your employment.
Synchronous vs. asynchronous
This article was originally published at https://gist.github.com/joepie91/bf3d04febb024da89e3a3e61b164247d.
You'll run into the terms "synchronous" and "asynchronous" a lot when working with JS. Let's look at what they actually mean.
Synchronous code is like what you might be used to already from other languages. You call a function, it does some work, and then returns the result. No other code runs in the meantime. This is simple to understand, but it's also inefficient; what if "doing some work" mostly involves getting some data from a database? In the meantime, our process is sitting around doing nothing, waiting for the database to respond. It could be doing useful work in that time!
And that's what brings us to asynchronous code. Asynchronous code works differently; you still call a function, but it doesn't return a result. Instead, you don't just pass the regular arguments to the function, but also give it a piece of code in a function (a so-called "asynchronous callback") to execute when the operation completes. The JS runtime stores this callback alongside the in-progress operation, to retrieve and execute it later when the external service (eg. the database) reports that the operation has been completed.
Crucially, this means that when you call an asynchronous function, it cannot wait until the external processing is complete before returning from the function! After all, the intention is to keep running other code in the meantime, so it needs to return from the function so that the 'caller' (the code which originally called the function) can continue doing useful things even while the external operation is in progress.
All of this takes place in what's called the "event loop" - you can pretty much think of it as a huge infinite loop that contains your entire program. Every time you trigger an external process through an asynchronous function call, that external process will eventually finish, and put its result in a 'queue' alongside the callback you specified. On each iteration ("tick") of the event loop, it then goes through that queue, executes all of the callbacks, which can then indirectly cause new items to be put into the queue, and so on. The end result is a program that calls asynchronous callbacks as and when necessary, and that keeps giving new work to the event loop through a chain of those callbacks.
This is, of course, a very simplified explanation - just enough to understand the rest of this page. I strongly recommend reading up on the event loop more, as it will make it much easier to understand JS in general. Here are some good resources that go into more depth:
- https://nodesource.com/blog/understanding-the-nodejs-event-loop (article)
- https://www.youtube.com/watch?v=8aGhZQkoFbQ (video)
- https://www.youtube.com/watch?v=cCOL7MC4Pl0 (video)
Now that we understand the what the event loop is, and what a "tick" is, we can define more precisely what "asynchronous" means in JS:
Asynchronous code is code that happens across more than one event loop tick. An asynchronous function is a function that needs more than one event loop tick to complete.
This definition will be important later on, for understanding why asynchronous code can be more difficult to write correctly than synchronous code.
Asynchronous execution order and boundaries
This idea of "queueing code to run at some later tick" has consequences for how you write your code.
Remember how the event loop is a loop, and ticks are iterations - this means that event loop ticks are distributed across time linearly. First the first tick happens, then the second tick, then the third tick, and so on. Something that runs in the first tick can never execute before something that runs in the third tick; unless you're a time traveller anyway, in which case you probably would have more important things to do than reading this guide 😃
Anyhow, this means that code will run in a slightly counterintuitive way, if you're used to synchronous code. For example, consider the following code, which uses the asynchronous setTimeout
function to run something after a specified amount of milliseconds:
console.log("one");
setTimeout(() => {
console.log("two");
}, 300);
console.log("three");
You might expect this to print out one, two, three
- but if you try running this code, you'll see that it doesn't! Instead, you get this:
one
three
two
What's going on here?!
The answer to that is what I mentioned earlier; the asynchronous callback is getting queued for later. Let's pretend for the sake of explanation that an event loop tick only happens when there's actually something to do. The first tick would then run this code:
console.log("one");
setTimeout(..., 300); // This schedules some code to run in a next tick, about 300ms later
console.log("three");
Then 300 milliseconds elapse, with nothing for the event loop to do - and after those 300ms, the callback we gave to setTimeout
suddenly appears in the event loop queue. Now the second tick happens, and it executes this code:
console.log("two");
... thus resulting in the output that we saw above.
The key insight here is that code with callbacks does not execute in the order that the code is written. Only the code outside of the callbacks executes in the written order. For example, we can be certain that three
will get printed after one
because both are outside of the callback and so they are executed in that order, but because two
is printed from inside of a callback, we can't know when it will execute.
"But hold on", you say, "then how can you know that two
will be printed after three
and one
?"
This is where the earlier definition of "asynchronous code" comes into play! Let's reason through it:
setTimeout
is asynchronous.- Therefore, we call
console.log("two")
from within an asynchronous callback. - Synchronous code executes within one tick.
- Asynchronous code needs more than one tick to execute, ie. the asynchronous callback will be called in a later tick than the one where we started the operation (eg.
setTimeout
). - Therefore, an asynchronous callback will always execute after the synchronous code that started the operation, no matter what.
- Therefore,
two
will always be printed afterone
andthree
.
So, we can know when the asynchronous callback will be executed, in terms of relative time. That's useful, isn't it? Doesn't that mean that we can do that for all asynchronous code? Well, unfortunately not - it gets more complicated when there is more than one asynchronous operation.
Take, for example, the following code:
console.log("one");
someAsynchronousOperation(() => {
console.log("two");
});
someOtherAsynchronousOperation(() => {
console.log("three");
});
console.log("four");
We have two different asynchronous operations here, and we don't know for certain which of the two will finish faster. We don't even know whether it's always the same one that finishes faster, or whether it varies between runs of the program. So while we can determine that two
and three
will always be printed after one
and four
- remember, asynchronous callbacks in synchronous code - we can't know whether two
or three
will come first.
And this is, fundamentally, what makes asynchronous code more difficult to write; you never know for sure in what order your code will complete. Every real-world program will have at least some scenarios where you can't force an order of operations (or, at least, not without horribly bad performance), so this is a problem that you have to account for in your code.
The easiest solution to this, is to avoid "shared state". Shared state is information that you store (eg. in a variable) and that gets used by multiple parts of your code independently. This can sometimes be necessary, but it also comes at a cost - if function A and function B both modify the same variable, then if they run in a different order than you expected, one of them might mess up the expected state of the other. This is generally already true in programming, but even more important when working with asynchronous code, as your chunks of code get 'interspersed' much more due to the callback model.
[...]
What is state?
This article was originally published at https://gist.github.com/joepie91/8c2cba6a3e6d19b275fdff62bef98311.
"State" is data that is associated with some part of a program, and that can be changed over time to change the behaviour of the program. It doesn't have to be changed by the user; it can be changed by anything in the program, and it can be any kind of data.
It's a bit of an abstract concept, so here's an example: say you have a button that increases a number by 1 every time you click it, and the (pseudo-)code looks something like this:
let counter = 0;
let increment = 1;
button.on("click", () => {
counter = counter + increment;
});
In this code, there are two bits of "state" involved:
- Whether the button is clicked: This bit of data - specifically, the change between "yes" and "no" - is what determines when to increase the counter. The example code doesn't interact with this data directly, but the callback is called whenever it changes from "no" to "yes" and back again.
- The current value of the counter: This bit of data is used to determine what the next value of the counter is going to be (the current value plus one), as well as what value to show on the screen.
Now, you may note that we also define an increment
variable, but that it isn't in the list of things that are "state"; this is because the increment
value never changes. It's just a static value (1
) that is always the same, even though it's stored in a variable. That means it's not state.
You'll also note that "whether the button is clicked" isn't stored in any variable we have access to, and that we can't access the "yes" or "no" value directly. This is an example of what we'll call invisible state - data that is state, but that we cannot see or access directly - it only exists "behind the scenes". Nevertheless, it still affects the behaviour of the code through the event handler callback that we've defined, and that means it's still state.
Promises reading list
This article was originally published at https://gist.github.com/joepie91/791640557e3e5fd80861.
This is a list of examples and articles, in roughly the order you should follow them, to show and explain how promises work and why you should use them. I'll probably add more things to this list over time.
This list primarily focuses on Bluebird, but the basic functionality should also work in ES6 Promises, and some examples are included on how to replicate Bluebird functionality with ES6 promises. You should still use Bluebird where possible, though - they are faster, less error-prone, and have more utilities.
I'm available for tutoring and code review :)
You may reuse all of the referenced posts and Gists (written by me) for any purpose under the WTFPL / CC0 (whichever you prefer).
If you get stuck
I've made a brief FAQ of common questions that people have about Promises, and how to use them. If you don't understand something listed here, or you're wondering how to implement a specific requirement, chances are that it'll be answered in that FAQ.
Compatibility
Bluebird will not work correctly (in client-side code) in older browsers. If you need to support older browsers, and you're using Webpack or Browserify, you should use the es6-promise
module instead, and reimplement behaviour where necessary.
Introduction
- Start reading here, to understand why Promises matter.
- If it's not quite clear yet, some code that uses callbacks, and its equivalent using Bluebird.
- A demonstration of how promise chains can be 'flattened'
Promise.try
Many guides and examples fail to demonstrate Promise.try, or to explain why it's important. This article will explain it.
Error handling
- A quick introduction
- An illustration of error bubbling: step 1, step 2
- Implementing 'fallback' values (ie. defaults for when an asynchronous operation fails)
- bluebird-tap-error, a module for intercepting and looking at errors, without preventing propagation. Useful if you need to do the actual error handling elsewhere.
- Handling errors in Express, using Promises
Many examples on the internet don't show this, but you should always start a chain of promises with Promise.try, and if it is within a function or callback, you should always return your promises chain. Not doing so, will result in less reliable error handling and various other issues (eg. code executing too soon).
Promisifying
- Promisifying functions and modules that use nodebacks (Node.js callbacks)
- An example of manually promisifying an EventEmitter
- Promisifying
fs.exists
(which is async, but doesn't follow the nodeback convention)
Functional (map, filter, reduce)
- Functional programming in Javascript: map, filter and reduce (an introduction, not Bluebird-specific, but important to understand)
- (Synchronous) examples of map, filter, and reduce in Bluebird
- Example of using map for retrieving a (remote) list of URLs with bhttp
Nesting
- Example of retaining scope through nesting
- Example of 'breaking out' of a chain through nesting
- Example of a nested Promise.map
- An example with increasing complexity, implementing an 'error-tolerant' Promise.map: part 1, part 2, part 3
ES6 Promises
Odds and ends
Some potentially useful snippets:
You're unlikely to need any of these things, if you just stick with either Bluebird or ES6 promises:
The Promises FAQ - addressing the most common questions and misconceptions about Promises
This article was originally published at https://gist.github.com/joepie91/4c3a10629a4263a522e3bc4839a28c83. Nowadays Promises are more widely understood and supported, and it's not as relevant as it once was, but it's kept here for posterity.
By the way, I'm available for tutoring and code review :)
You'll find a table of contents on your left.
1. What Promises library should I use?
That depends a bit on your usecase.
My usual recommendation is Bluebird - it's robust, has good error handling and debugging facilities, is fast, and has a well-designed API. The downside is that Bluebird will not correctly work in older browsers (think Internet Explorer 8 and older), and when used in Browserified/Webpacked code, it can sometimes add a lot to your bundle size.
ES6 Promises are gaining a lot of traction purely because of being "ES6", but in practice they are just not very good. They are generally lacking standardized debugging facilities, they are missing essential utilities such as Promise.try/promisify/promisifyAll, they cannot catch specific error types (this is a big robustness issue), and so on.
ES6 Promises can be useful in constrained scenarios (eg. older browsers with a polyfill, restricted non-V8 runtimes, etc.) but I would not generally recommend them.
There are many other Promise implementations (Q, WhenJS, etc.) - but frankly, I've not seen any that are an improvement over either Bluebird or ES6 Promises in their respective 'optimal scenarios'. I'd also recommend explicitly against Q because it is extremely slow and has a very poorly designed API.
In summary: Use Bluebird, unless you have a very specific reason not to. In those very specific cases, you probably want ES6 Promises.
2. How do I create a Promise myself?
Usually, you don't. Promises are not usually something you 'create' explicitly - rather, they're a natural consequence of chaining together multiple operations. Take this example:
function getLinesFromSomething() {
return Promise.try(() => {
return bhttp.get("http://example.com/something.txt");
}).then((response) => {
return response.body.toString().split("\n");
});
}
In this example, all of the following technically result in a new Promise:
Promise.try(...)
bhttp.get(...)
- The synchronous value from the
.then
callback, which gets converted automatically to a resolved Promise (see question 5)
... but none of them are explicitly created as "a new Promise" - that's just the natural consequence of starting a chain with Promise.try
and then returning Promises or values from the callbacks.
There is one example to this, where you do need to explicitly create a new Promise - when converting a different kind of asynchronous API to a Promises API, and even then you only need to do this if promisify
and friends don't work. This is explained in question 7.
3. How do I use new Promise
?
You don't, usually. In almost every case, you either need Promise.try, or some kind of promisification method. Question 7 explains how you should do promisification, and when you do need new Promise
.
But when in doubt, don't use it. It's very error-prone.
4. How do I resolve a Promise?
You don't, usually. Promises are not something you need to 'resolve' manually - rather, you should just return some kind of Promise, and let the Promise library handle the rest.
There's one exception here: when you're manually promisifying a strange API using new Promise
, you need to call resolve()
or reject()
for a successful and unsuccessful state, respectively. Make sure to read question 3, though - you should almost never actually use new Promise
.
5. But what if I want to resolve a synchronous result or error?
You simply return
it (if it's a result) or throw
it (if it's an error), from your .then
callback. When using Promises, synchronously returned values are automatically converted into a resolved Promise, whereas synchronously thrown errors are automatically converted into a rejected Promise. You don't need to use Promise.resolve()
or Promise.reject()
.
6. But what if it's at the start of a chain, and I'm not in a .then
callback yet?
Using Promise.try will make this problem not exist.
7. How do I make this non-Promises library work with Promises?
That depends on what kind of API it is.
- Node.js-style error-first callbacks: Use Promise.promisify and/or Promise.promisifyAll to convert the library to a Promises API. For ES6 Promises, use the es6-promisify and es6-promisify-all libraries respectively. In Node.js,
util.promisify
can also be used. - EventEmitters: It depends. Promises are explicitly meant to represent an operation that succeeds or fails precisely once, so most EventEmitters cannot be converted to a Promise, as they will have multiple results. Some exceptions exist; for example, the
response
event when making a HTTP request - in these cases, use something like bluebird-events. - setTimeout: Use
Promise.delay
instead, which comes with Bluebird. - setInterval: Avoid
setInterval
entirely (this is why), and use a recursivePromise.delay
instead. - Asynchronous callbacks with a single result argument, and no
err
: Use promisify-simple-callback. - A different Promises library: No manual conversion is necessary, as long as it is compliant with the Promises/A+ specification (and nearly every implementation is). Make sure to use Promise.try in your code, though.
- Synchronous functions: No manual conversion is necessary. Synchronous returns and throws are automatically converted by your Promises library. Make sure to use Promise.try in your code, though.
- Something else not listed here: You'll probably have to promisify it manually, using
new Promise
. Make sure to keep the code withinnew Promise
as minimal as possible - you should have a function that only promisifies the API you intend to use, without doing anything else. All further processing should happen outside ofnew Promise
, once you already have a Promise object.
8. How do I propagate errors, like with if(err) return cb(err)
?
You don't. Promises will propagate errors automatically, and you don't need to do anything special for it - this is one of the benefits that Promises provide over error-first callbacks.
When using Promises, the only case where you need to .catch
an error, is if you intend to handle it - and you should always only catch the types of error you're interested in.
These two Gists (step 1, step 2) show how error propagation works, and how to .catch
specific types of errors.
9. How do I break out of a Promise chain early?
You don't. You use conditionals instead. Of course, specifically for failure scenarios, you'd still throw an error.
10. How do I convert a Promise to a synchronous value?
You can't. Once you write asynchronous code, all of the 'surrounding' code also needs to be asynchronous. However, you can just have a Promise chain in the 'parent code', and return the Promise from your own method.
For example:
function getUserFromDatabase(userId) {
return Promise.try(() => {
return database.table("users").where({id: userId}).get();
}).then((results) => {
if (results.length === 0) {
throw new MyCustomError("No users found with that ID");
} else {
return results[0];
}
});
}
/* Now, to *use* that getUserFromDatabase function, we need to have another Promise chain: */
Promise.try(() => {
// Here, we return the result of calling our own function. That return value is a Promise.
return getUserFromDatabase(42);
}).then((user) => {
console.log("The username of user 42 is:", user.username);
});
(If you're not sure what Promise.try is or does, this article will explain it.)
11. How do I save a value from a Promise outside of the callback?
You don't. See question 10 above - you need to use Promises "all the way down".
12. How do I access previous results from the Promise chain?
In some cases, you might need to access an earlier result from a chain of Promises, one that you don't have access to anymore. A simple example of this scenario:
'use strict';
// ...
Promise.try(() => {
return database.query("users", {id: req.body.userId});
}).then((user) => {
return database.query("groups", {id: req.body.groupId});
}).then((group) => {
res.json({
user: user, // This is not possible, because `user` is not in scope anymore.
group: group
});
});
This is a fairly simple case - the user
query and the group
query are completely independent, and they can be run at the same time. Because of that, we can use Promise.all
to run them in parallel, and return a combined Promise for both of their results:
'use strict';
// ...
Promise.try(() => {
return Promise.all([
database.query("users", {id: req.body.userId}),
database.query("groups", {id: req.body.groupId})
]);
}).spread((user, group) => {
res.json({
user: user, // Now it's possible!
group: group
});
});
Note that instead of .then
, we use .spread
here. Promises only support a single result argument for a .then
, which is why a Promise created by Promise.all
would resolve to an array of [user, group]
in this case. However, .spread
is a Bluebird-specific variation of .then
, that will automatically "unpack" that array into multiple callback arguments. Alternatively, you can use ES6 object destructuring to accomplish the same.
Now, the above example assumes that the two asynchronous operations are independent - that is, they can run in parallel without caring about the result of the other operation. In some cases, you will want to use the results of two operations that are dependent - while you still want to use the results of both at the same time, the second operation also needs the result of the first operation to work.
An example:
'use strict';
// ...
Promise.try(() => {
return getDatabaseConnection();
}).then((databaseConnection) => {
return databaseConnection.query("users", {id: req.body.id});
}).then((user) => {
res.json(user);
// This is not possible, because we don't have `databaseConnection` in scope anymore:
databaseConnection.close();
});
In these cases, rather than using Promise.all
, you'd add a level of nesting to keep something in scope:
'use strict';
// ...
Promise.try(() => {
return getDatabaseConnection();
}).then((databaseConnection) => {
// We nest here, so that `databaseConnection` remains in scope.
return Promise.try(() => {
return databaseConnection.query("users", {id: req.body.id});
}).then((user) => {
res.json(user);
databaseConnection.close(); // Now it works!
});
});
Of course, as with any kind of nesting, you should do it sparingly - and only when necessary for a situation like this. Splitting up your code into small functions, with each of them having a single responsibility, will prevent trouble with this.
Error handling (with Promises)
This article was originally published at https://gist.github.com/joepie91/c8d8cc4e6c2b57889446. It only applies when using Promise chaining syntax; when you use async
/await
, you are instead expected to use try
/catch
, which unfortunately does not support error filtering.
There's roughly three types of errors:
- Expected errors - eg. "URL is unreachable" for a link validity checker. You should handle these in your code at the top-most level where it is practical to do so.
- Unexpected errors - eg. a bug in your code. These should crash your process (yes, really), they should be logged and ideally e-mailed to you, and you should fix them right away. You should never catch them for any purpose other than to log the error, and even then you should make the process crash.
- User-facing errors - not really in the same category as the above two. While you can represent them with error objects (and it's often practical to do so), they're not really errors in the programming sense - rather, they're user feedback. When represented as error objects, these should only ever be handled at the top-most point of a request - in the case of Express, that would be the error-handling middleware that sends a HTTP status code and a response.
Would I still need to use try/catch if I use promises?
Sort of. Not the usual try
/catch
, but eg. Bluebird has a .try
and .catch
equivalent. It works like synchronous try
/catch
, though - errors are propagated upwards automatically so that you can handle them where appropriate.
Bluebird's try
isn't identical to a standard JS try
- it's more a 'start using Promises' thing, so that you can also wrap synchronous errors. That's the magic of Promises, really - they let you handle synchronous and asynchronous errors/values like they're one and the same thing.
Below is a relatively complex example, that uses a custom 'error filter' (predicate) function, because filesystem errors have a name but not a special error type. The error filtering is only available in Bluebird, by the way - 'native' Promises don't have the filtering.
/* UPDATED: This example has been changed to use the new object predicates, that were
* introduced in Bluebird 3.0. If you are using Bluebird 2.x, you will need to use the
* older example below, with the predicate function. */
var Promise = require("bluebird");
var fs = Promise.promisifyAll(require("fs"));
Promise.try(function(){
return fs.readFileAsync("./config.json").then(JSON.parse);
}).catch({code: "ENOENT"}, function(err){
/* Return an empty object. */
return {};
}).then(function(config){
/* `config` now either contains the JSON-parsed configuration file, or an empty object if no configuration file existed. */
});
If you are still using Bluebird 2.x, you should use predicate functions instead:
/* This example is ONLY for Bluebird 2.x. When using Bluebird 3.0 or newer, you should
* use the updated example above instead. */
var Promise = require("bluebird");
var fs = Promise.promisifyAll(require("fs"));
var NonExistentFilePredicate = function(err) {
return (err.code === "ENOENT");
};
Promise.try(function(){
return fs.readFileAsync("./config.json").then(JSON.parse);
}).catch(NonExistentFilePredicate, function(err){
/* Return an empty object. */
return {};
}).then(function(config){
/* `config` now either contains the JSON-parsed configuration file, or an empty object if no configuration file existed. */
});
Bluebird Promise.try using ES6 Promises
This article was originally published at https://gist.github.com/joepie91/255250eeea8b94572a03.
Note that this will only be equivalent to Promise.try
if your runtime or ES6 Promise shim correctly catches synchronous errors in Promise constructors.
If you are using the latest version of Node, this should be fine.
var Promise = require("es6-promise").Promise;
module.exports = function promiseTry(func) {
return new Promise(function(resolve, reject) {
resolve(func());
})
}
Please don't include minified builds in your npm packages!
This article was originally published at https://gist.github.com/joepie91/04cc8329df231ea3e262dffe3d41f848.
There's quite a few libraries on npm that not only include the regular build in their package, but also a minified build. While this may seem like a helpful addition to make the package more complete, it actually poses a real problem: it becomes very difficult to audit these libraries.
The problem
You've probably seen incidents like the event-stream
incident, where a library was compromised in some way by an attacker. This sort of thing, also known as a "supply-chain attack", is starting to become more and more common - and it's something that developers need to protect themselves against.
One effective way to do so, is by auditing dependencies. Having at least a cursory look through every dependency in your dependency tree, to ensure that there's nothing sketchy in there. While it isn't going to be 100% perfect, it will detect most of these attacks - and not only is briefly reviewing dependencies still faster than reinventing your own wheels, it'll also give you more insight into how your application actually works under the hood.
But, there's a problem: a lot of packages include almost-duplicate builds, sometimes even minified ones. It's becoming increasingly common to see a separate CommonJS and ESM build, but in many cases there's a minified build included too. And those are basically impossible to audit! Even with a code beautifier, it's very difficult to understand what's really going on. But you can't ignore them either, because if they are a part of the package, then other code can require them. So you have to audit them.
There's a workaround for this, in the form of "reproducing" the build; taking the original (Git) repository for the package which only contains the original code and not the minified code, checking out the intended version, and then just running a build that creates the minified version, which you can then compare to the one on npm. If they match, then you can assume that you only need to audit the original source in the Git repo.
Or well, that would be the case, if it weren't possible for the build tools to introduce malicious code as well. Argh! Now you need to audit all of the build tools being used as well, at the specific versions that are being used by each dependency. Basically, you're now auditing hundreds of build stacks. This is a massive waste of time for every developer who wants to make sure there's nothing sketchy in their dependencies!
All the while these minified builds don't really solve a problem. Which brings me to...
Why it's unnecessary to include minified builds
- Those who just want a file they can include as a
<script>
tag, so that they can use your library in their (often legacy) module-less code. - Those with a more modern development stack, including a package manager (npm) and often also build tooling.
For the first demographic, it makes a lot of sense to provide a pre-minified build, as they are going to directly include it in their site, and it should ideally be small. But, here's the rub: those are also the developers who probably aren't using (or don't want to use) a package manager like npm! There's not really a reason why their minified pre-build should exist on npm, specifically - you might just as well offer it as a separate download.
For the second demographic, a pre-minified build isn't really useful at all. They probably already have their own development stack that does minification (of their own code and dependencies), and so they simply won't be using your minified build.
In short: there's not really a point to having a minified build in your npm package.
The solution
Simply put: don't include minified files in your npm package - distribute them separately, instead. In most cases, you can just put it on your project's website, or even in the (Git) repository.
If you really do have some specific reason to need to distribute them through npm, at least put them in a separate package (eg. yourpackage-minified
), so that only those who actually use the minified version need to add it to their dependency folder.
Ideally, try to only have a single copy of your code in your package at all - so also no separate CommonJS and ESM builds, for example. CommonJS works basically everywhere, and there's basically no reason to use ESM anyway, so this should be fine for most projects.
If you really must include an ESM version of your code, you should at least use a wrapping approach instead of duplicating the code (note that this can be a breaking change!). But if you can, please leave it out to make it easier for developers to understand what they are installing into their project!
Anyone should be able to audit and review their dependencies, not just large companies with deep pockets; and not including unnecessarily duplicated or obfuscated code into your packages will help a long way towards that. Thanks!
How to get the actual width of an element in jQuery, even with border-box: box-sizing
This article was originally published at https://gist.github.com/joepie91/5ffffefbf24dcfdb4477.
This is ridiculous, but per the jQuery documentation:
Note that
.width()
will always return the content width, regardless of the value of the CSSbox-sizing
property. As of jQuery 1.8, this may require retrieving the CSS width plusbox-sizing
property and then subtracting any potential border and padding on each element when the element hasbox-sizing: border-box
. To avoid this penalty, use.css( "width" )
rather than.width()
.
function parsePx(input) {
let match;
if (match = /^([0-9+])px$/.exec(input)) {
return parseFloat(match[1]);
} else {
throw new Error("Value is not in pixels!");
}
}
$.prototype.actualWidth = function() {
/* WTF, jQuery? */
let isBorderBox = (this.css("box-sizing") === "border-box");
let width = this.width();
if (isBorderBox) {
width = width
+ parsePx(this.css("padding-left"))
+ parsePx(this.css("padding-right"))
+ parsePx(this.css("border-left-width"))
+ parsePx(this.css("border-right-width"));
}
return width;
}
A survey of unhandledRejection and rejectionHandled handlers
This article was originally published at https://gist.github.com/joepie91/06cca7058a34398f168b08223b642162.
Bluebird (http://bluebirdjs.com/docs/api/error-management-configuration.html#global-rejection-events)
process.on//unhandledRejection
: (Node.js) Potentially unhandled rejection.process.on//rejectionHandled
: (Node.js) Cancel unhandled rejection, it was handled anyway.self.addEventListener//unhandledrejection
: (WebWorkers) Potentially unhandled rejection.self.addEventListener//rejectionhandled
: (WebWorkers) Cancel unhandled rejection, it was handled anyway.window.addEventListener//unhandledrejection
: (Modern browsers, IE >= 9) Potentially unhandled rejection.window.addEventListener//rejectionhandled
: (Modern browsers, IE >= 9) Cancel unhandled rejection, it was handled anyway.window.onunhandledrejection
: (IE >= 6) Potentially unhandled rejection.window.onrejectionhandled
: (IE >= 6) Cancel unhandled rejection, it was handled anyway.
WhenJS (https://github.com/cujojs/when/blob/3.7.0/docs/debug-api.md)
process.on//unhandledRejection
: (Node.js) Potentially unhandled rejection.process.on//rejectionHandled
: (Node.js) Cancel unhandled rejection, it was handled anyway.window.addEventListener//unhandledRejection
: (Modern browsers, IE >= 9) Potentially unhandled rejection.window.addEventListener//rejectionHandled
: (Modern browsers, IE >= 9) Cancel unhandled rejection, it was handled anyway.
Spec (https://gist.github.com/benjamingr/0237932cee84712951a2)
process.on//unhandledRejection
: (Node.js) Potentially unhandled rejection.process.on//rejectionHandled
: (Node.js) Cancel unhandled rejection, it was handled anyway.
Spec (WHATWG: https://html.spec.whatwg.org/multipage/webappapis.html#unhandled-promise-rejections)
window.addEventListener//unhandledrejection
: (Browsers) Potentially unhandled rejection.window.addEventListener//rejectionhandled
: (Browsers) Cancel unhandled rejection, it was handled anyway.window.onunhandledrejection
: (Browsers) Potentially unhandled rejection.window.onrejectionhandled
: (Browsers) Cancel unhandled rejection, it was handled anyway.
ES6 Promises in Node.js (https://nodejs.org/api/process.html#process_event_rejectionhandled onwards)
process.on//unhandledRejection
: Potentially unhandled rejection.process.on//rejectionHandled
: Cancel unhandled rejection, it was handled anyway.
Yaku (https://github.com/ysmood/yaku#unhandled-rejection)
process.on//unhandledRejection
: (Node.js) Potentially unhandled rejection.process.on//rejectionHandled
: (Node.js) Cancel unhandled rejection, it was handled anyway.window.onunhandledrejection
: (Browsers) Potentially unhandled rejection.window.onrejectionhandled
: (Browsers) Cancel unhandled rejection, it was handled anyway.
Quill.js glossary
This article was originally published at https://gist.github.com/joepie91/46241ef1ce89c74958da0fdd7d04eb55.
Since Quill.js doesn't seem to document its strange jargon-y terms anywhere, here's a glossary that I've put together for it. No guarantees that it's correct! But I've done my best.
Quill - The WYSIWYG editor library
Parchment - The internal model used in Quill to implement the document tree
Scroll - A document, expressed as a tree, technically also a Blot (node) itself, specifically the root node
Blot - A node in the document tree
Block (Blot) - A block-level node
Inline (Blot) - An inline (formatting) node
Text (Blot) - A node that contains only(!) raw text contents
Break (Blot) - A node that contains nothing, used as a placeholder where there is no actual content
"a format" - A specific formatting attribute (width, height, is bold, ...)
.format(...)
- The API method that is used to set a formatting attribute on some selection
Riot.js cheatsheet
This article was originally published at https://gist.github.com/joepie91/ed3a267de70210b46fb06dd57077827a.
Component styling
This section only applies to Riot.js 2.x. Since 3.x, all styles are scoped by default and you can simply add a style
tag to your component.
- You can use a
<style>
tag within your tag. This style tag is applied globally by default. - You can scope your style tag to limit its effect to the component that you've defined it in. Note that scoping is based on the tag name. There are two options:
- Use the
scoped
attribute, eg.<style scoped> ... </style>
- Use the
:scope
pseudo-selector, eg.<style> :scope { ... } </style>
- You can change where global styles are 'injected' by having
<style type="riot"></style>
somewhere in your<head>
. This is useful for eg. controlling what styles are overridden.
Mounting
"Mounting" is the act of attaching a custom tag's template and behaviour to a specific element in the DOM. The most common case is to mount all instances of a specific top-level tag, but there are more options:
- Mount all custom tags on the page:
riot.mount("*")
- Mount all instances of a specific tag name:
riot.mount("app")
- Mount a tag with a specific ID:
riot.mount("#specific_element")
- Mount using a more complex selector:
riot.mount("foo, bar")
Note that "child tags" (that is, custom tags that are specified within other custom tags) are automatically mounted as-needed. You do not need to riot.mount
them separately.
The simplest example:
<script>
// Load the `app` tag's definition here somehow...
document.addEventListener("DOMContentLoaded", (event) => {
riot.mount("app");
});
</script>
<app></app>
Tag logic
- Conditionally add to DOM:
<your-tag if="{ something === true }"> ... </your-tag>
- Conditionally display:
<your-tag show="{ something === true }"> ... </your-tag>
(but the tag always exists in the DOM) - Conditionally hide:
<your-tag hide="{ something === true }"> ... </your-tag>
(but the tag always exists in the DOM) - For-each loop:
<your-tag for="{ item in items }"> ... (you can access 'item' from within the tag) ... </your-tag>
(one instance ofyour-tag
for eachitem
initems
) - For-each loop of an object:
<your-tag for="{ key, value in someObject }"> ... (you can access 'key' and 'value' from within the tag) ... </your-tag>
(this is slow!)
All of the above also work on regular (ie. non-Riot) HTML tags.
If you need to add/hide/display/loop a group of tags, rather than a single one, you can wrap them in a <virtual>
pseudo-tag. This works with all of the above constructs. For example:
<virtual for="{item in items}">
<label>{item.label}</label>
<textarea>{item.defaultValue}</textarea>
</virtual>
Quick reference for `checkit` validators
This article was originally published at https://gist.github.com/joepie91/cd107b3a566264b28a3494689d73e589.
Presence
- exists - The field must exist, and not be
undefined
. - required - The field must exist, and not be
undefined
,null
or an empty string. - empty - The field must be some kind of "empty". Things that are considered "empty" are as follows:
""
(empty string)[]
(empty array){}
(empty object)- Other falsey values
Character set
- alpha -
a-z
,A-Z
- alphaNumeric -
a-z
,A-Z
,0-9
- alphaUnderscore -
a-z
,A-Z
,0-9
,_
- alphaDash -
a-z
,A-Z
,0-9
,_
,-
Value
- exactLength:
length
- The value must have a length of exactlylength
. - minLength:
length
- The value must have a length of at leastlength
. - maxLength:
length
- The value must have a length of at mostlength
. - contains:
needle
- The value must contain the specifiedneedle
(applies to both strings and arrays). - accepted - Must be a value that indicates agreement - varies by language (defaulting to
en
):- en, fr, nl -
"yes"
,"on"
,"1"
,1
,"true"
,true
- es -
"yes"
,"on"
,"1"
,1
,"true"
,true
,"si"
- ru -
"yes"
,"on"
,"1"
,1
,"true"
,true
,"да"
- en, fr, nl -
Value (numbers)
Note that "numbers" refers to both Number-type values, and strings containing numeric values!
- numeric - Must be a finite numeric value of some sort.
- integer - Must be an integer value (either positive or negative).
- natural - Must be a natural number (ie. an integer value of 0 or higher).
- naturalNonZero - Must be a natural number, but higher than 0 (ie. an integer value of 1 or higher).
- between:
min
:max
- The value must numerically be between themin
andmax
values (exclusive). - range:
min
:max
- The value must numerically be within themin
andmax
values (inclusive). - lessThan:
maxValue
- The value must numerically be less than the specifiedmaxValue
(exclusive). - lessThanEqualTo:
maxValue
- The value must numerically be less than or equal to the specifiedmaxValue
(inclusive). - greaterThan:
minValue
- The value must numerically be greater than the specifiedminValue
(exclusive). - greaterThanEqualTo:
minValue
- The value must numerically be greater than or equal to the specifiedminValue
(inclusive).
Relations to other fields
- matchesField:
field
- The value in this field must equal the value in the specified otherfield
. - different:
field
- The value in this field must not equal the value in the specified otherfield
.
JavaScript types
- NaN - Must be
NaN
. - null - Must be
null
- string - Must be a
String
. - number - Must be a
Number
. - array - Must be an
Array
. - plainObject - Must be a plain
object
(ie. object literal). - date - Must be a
Date
object. - function - Must be a
Function
. - regExp - Must be a
RegExp
object. - arguments - Must be an
arguments
object.
Format
- email - Must be a validly formatted e-mail address.
- luhn - Must be a validly formatted creditcard number (according to a Luhn regular expression).
- url - Must be a validly formatted URL.
- ipv4 - Must be a validly formatted IPv4 address.
- ipv6 - Must be a validly formatted IPv6 address.
- uuid - Must be a validly formatted UUID.
- base64 - Must be a validly formatted base64 string.
ES Modules are terrible, actually
This post was originally published at https://gist.github.com/joepie91/bca2fda868c1e8b2c2caf76af7dfcad3, which was in turn adapted from an earlier Twitter thread.
It's incredible how many collective developer hours have been wasted on pushing through the turd that is ES Modules (often mistakenly called "ES6 Modules"). Causing a big ecosystem divide and massive tooling support issues, for... well, no reason, really. There are no actual advantages to it. At all.
It looks shiny and new and some libraries use it in their documentation without any explanation, so people assume that it's the new thing that must be used. And then I end up having to explain to them why, unlike CommonJS, it doesn't actually work everywhere yet, and may never do so. For example, you can't import ESM modules from a CommonJS file! (Update: I've released a module that works around this issue.)
And then there's Rollup, which apparently requires ESM to be used, at least to get things like treeshaking. Which then makes people believe that treeshaking is not possible with CommonJS modules. Well, it is - Rollup just chose not to support it.
And then there's Babel, which tried to transpile import
/export
to require
/module.exports
, sidestepping the ongoing effort of standardizing the module semantics for ESM, causing broken imports and require("foo").default
nonsense and spec design issues all over the place.
And then people go "but you can use ESM in browsers without a build step!", apparently not realizing that that is an utterly useless feature because loading a full dependency tree over the network would be unreasonably and unavoidably slow - you'd need as many roundtrips as there are levels of depth in your dependency tree - and so you need some kind of build step anyway, eliminating this entire supposed benefit.
And then people go "well you can statically analyze it better!", apparently not realizing that ESM doesn't actually change any of the JS semantics other than the import
/export
syntax, and that the import
/export
statements are equally analyzable as top-level require
/module.exports
.
"But in CommonJS you can use those elsewhere too, and that breaks static analyzers!", I hear you say. Well, yes, absolutely. But that is inherent in dynamic imports, which by the way, ESM also supports with its dynamic import()
syntax. So it doesn't solve that either! Any static analyzer still needs to deal with the case of dynamic imports somehow - it's just rearranging deck chairs on the Titanic.
And then, people go "but now we at least have a standard module system!", apparently not realizing that CommonJS was literally that, the result of an attempt to standardize the various competing module systems in JS. Which, against all odds, actually succeeded!
... and then promptly got destroyed by ESM, which reintroduced a split and all sorts of incompatibility in the ecosystem, rather than just importing some updated variant of CommonJS into the language specification, which would have sidestepped almost all of these issues.
And while the initial CommonJS standardization effort succeeded due to none of the competing module systems being in particularly widespread use yet, CommonJS is so ubiquitous in Javascript-land nowadays that it will never fully go away. Which means that runtimes will forever have to keep supporting two module systems, and developers will forever be paying the cost of the interoperability issues between them.
But it's the future!
Is it really? The vast majority of people who believe they're currently using ESM, aren't even actually doing so - they're feeding their entire codebase through Babel, which deftly converts all of those snazzy import
and export
statements back into CommonJS syntax. Which works. So what's the point of the new module system again, if it all works with CommonJS anyway?
And it gets worse; import
and export
are designed as special-cased statements. Aside from the obvious problem of needing to learn a special syntax (which doesn't quite work like object destructuring) instead of reusing core language concepts, this is also a downgrade from CommonJS' require
, which is a first-class expression due to just being a function call.
That might sound irrelevant on the face of it, but it has very real consequences. For example, the following pattern is simply not possible with ESM:
const someInitializedModule = require("module-name")(someOptions);
Or how about this one? Also no longer possible:
const app = express();
// ...
app.use("/users", require("./routers/users"));
Having language features available as a first-class expression is one of the most desirable properties in language design; yet for some completely unclear reason, ESM proponents decided to remove that property. There's just no way anymore to directly combine an import
statement with some other JS syntax, whether or not the module path is statically specified.
The only way around this is with await import
, which would break the supposed static analyzer benefits, only work in async contexts, and even then require weird hacks with parentheses to make it work correctly.
It also means that you now need to make a choice: do you want to be able to use ESM-only dependencies, or do you want to have access to patterns like the above that help you keep your codebase maintainable? ESM or maintainability, your choice!
So, congratulations, ESM proponents. You've destroyed a successful userland specification, wasted many (hundreds of?) thousands of hours of collective developer time, many hours of my own personal unpaid time trying to support people with the fallout, and created ecosystem fragmentation that will never go away, in exchange for... fuck all.
This is a disaster, and the only remaining way I see to fix it is to stop trying to make ESM happen, and deprecate it in favour of some variant of CommonJS modules being absorbed into the spec. It's not too late yet; but at some point it will be.
A few notes on the "Gathering weak npm credentials" article
This article was originally published in 2017 at https://gist.github.com/joepie91/828532657d23d512d76c1e68b101f436. Since then, npm has implemented 2FA support in the registry, and was acquired by Microsoft through Github.
Yesterday, an article was released that describes how one person could obtain access to enough packages on npm to affect 52% of the package installations in the Node.js ecosystem. Unfortunately, this has brought about some comments from readers that completely miss the mark, and that draw away attention from the real issue behind all this.
To be very clear: This (security) issue was caused by 1) poor password management on the side of developers, 2) handing out unnecessary publish access to packages, and most of all 3) poor security on the side of the npm registry.
With that being said, let's address some of the common claims. This is going to be slightly ranty, because to be honest I'm rather disappointed that otherwise competent infosec people distract from the underlying causes like this. All that's going to do is prevent this from getting fixed in other language package registries, which almost certainly suffer from the same issues.
"This is what you get when you use small dependencies, because there are such long dependency chains"
This is very unlikely to be a relevant factor here. Don't forget that a key part of the problem here is that publisher access is handed out unnecessarily; if the Node.js ecosystem were to consist of a few large dependencies (that everybody used) instead of many small ones (that are only used by those who actually need the entire dependency), you'd just end up with each large dependency being responsible for a larger part of the 52%.
There's a potential point of discussion in that a modular ecosystem means that more different groups of people are involved in the implementation of a given dependency, and that this could provide for a larger (human) attack surface; however, this is a completely unexplored argument for which no data currently exists, and this particular article does not provide sufficient evidence to show it to be true.
Perhaps not surprisingly, the "it's because of small dependencies" argument seems to come primarily from people who don't fully understand the Node.js dependency model and make a lot of (incorrect) assumptions about its consequences, and who appear to take every opportunity to blame things on "small dependencies" regardless of technical accuracy.
In short: No, this is not because of small dependencies. It would very likely happen with large dependencies as well.
"See, that's why you should always lock your dependency versions. This is why semantic versioning is bad."
Aside from semantic versioning being a practice that's separate from automatically updating based on a semver range, preventing automatic updates isn't going to prevent this issue either. The problem here is with publish access to the modules, which is a completely separate concern from "how the obtained access is misused".
In practice, most people who "lock dependency versions" seem to follow a practice of "automatically merge any update that doesn't break tests" - which really is no different from just letting semver ranges do their thing. Even if you do audit updates before you apply them (and let's be realistic, how many people actually do this for every update?), it would be trivial to subtly backdoor most of the affected packages due to their often aging and messy codebase, where one more bit of strange code doesn't really stand out.
The chances of locked dependencies preventing exploitation are close to zero. Even if you do audit your updates, it's relatively trivial for a competent developer to sneak by a backdoor. At the same time, "people not applying updates" is a far bigger security issue than audit-less dependency locking will solve.
All this applies to "vendoring in dependencies", too - vendoring in dependencies is no technically different from pinning a version/hash of a dependency.
In short: No, dependency locking will not prevent exploitation through this vector. Unless you have a strict auditing process (which you should, but many do not), you should not lock dependency versions.
"That's why you should be able to add a hash to your package.json, so that it verifies the integrity of the dependency.
This solves a completely different and almost unimportant problem. The only thing that a package hash will do, is assuring that everybody who installs the dependencies gets the exact same dependencies (for a locked set of versions). However, the npm registry already does that - it prevents republishing different code under an already-used version number, and even with publisher access you cannot bypass that.
Package hashes also give you absolutely zero assurances about future updates; package hashes are not signatures.
In short: This just doesn't even have anything to do with the credentials issue. It's totally unrelated.
"See? This is why Node.js is bad."
Unfortunately plenty of people are conveniently using this article as an excuse to complain about Node.js (because that's apparently the hip thing to do?), without bothering to understand what happened. Very simply put: this issue is not in any way specific to Node.js. The issue here is an issue of developers with poor password policies and poor registry access controls. It just so happens that the research was done on npm.
As far as I am aware, this kind of research has not been carried out for any other language package registries - but many other registries appear to be similarly poorly monitored and secured, and are very likely to be subject to the exact same attack.
If you're using this as an excuse to complain about Node.js, without bothering to understand the issue well enough to realize that it's a language-independent issue, then perhaps you should reconsider exactly how well-informed your point of view of Node.js (or other tools, for that matter) really is. Instead, you should take this as a lesson and prevent this from happening in other language ecosystems.
In short: This has absolutely nothing to do with Node.js specifically. That's just where the research happens to be done. Take the advice and start looking at other language package registries, to ensure they are not vulnerable to this either.
So then how should I fix this?
- Demand from npm Inc. that they prioritize implementing 2FA immediately, actively monitor for incidents like this, and generally implement all the mitigations suggested in the article. It's really not reasonable how poorly monitored or secured the registry is, especially given that it's operated by a commercial organization, and it's been around for a long time.
- If you have an npm account, follow the instructions here.
- Carry out or encourage the same kind of research on the package registry for your favorite language. It's very likely that other package registries are similarly insecure and poorly monitored.
Unfortunately, as a mere consumer of packages, there's nothing you can do about this other than demanding that npm Inc. gets their registry security in order. This is fundamentally an infrastructure problem.
Node.js
Things that are specific to Node.js. Note that things about Javascript in general, are found under their own "Javascript" chapter!
How to install Node.js applications, if you're not a Node.js developer
This article was originally published at https://gist.github.com/joepie91/24f4e70174d10325a9af743a381d5ec6.
While installing a Node.js application isn't difficult in principle, it may still be confusing if you're not used to how the Node.js ecosystem works. This post will tell you how to get the application going, what to expect, and what to do if it doesn't work.
Occasionally an application may have custom installation steps, such as installing special system-wide dependencies; in those cases, you'll want to have a look at the install documentation of the application itself as well. However, most of the time it's safe to assume that the instructions below will work fine.
If the application you want to install is available in your distribution's repositories, then install it through there instead and skip this entire guide; your distribution's package manager will take care of all the dependencies.
Checklist
Before installing a Node.js application, check the following things:
- You're running a maintained version of Node.js. You can find a list of current maintained versions here. For minimal upgrade headaches, ensure that you're running an LTS version. If your system is running an unsupported version, you should install Node.js from the Nodesource repositories instead.
- Your version of Node.js is a standard one. In particular Debian and some Debian-based distributions have a habit of modifying the way Node.js works, leading to a lot of things breaking. Try running
node --version
- if that works, you're running a standard-enough version. If you can only donodejs --version
, you should install Node.js from the Nodesource repositories instead. - You have build tools installed. In particular, you'll want to make sure that
make
,pkgconfig
, GCC and Python exist on your system. If you don't have build tools or you're unsure, you'll want to install a package likebuild-essential
(on Linux) or look here for further instructions (on other platforms, or unusual Linux distributions). - npm works. Run
npm --version
to check this. If thenpm
command doesn't exist, your distribution is probably shipping a weird non-standard version of Node.js; use the Nodesource repositories instead. Do not install npm as a separate package, this will lead to headaches down the road.
No root/administrator access, no repositories exist for your distro, can't change your system-wide Node.js version, need a really specific Node.js version to make the application work, or have some other sort of edge case? Then nvm can be a useful solution, although keep in mind that it will not automatically update your Node.js installation.
How packages work in Node.js
Packages work a little differently in Node.js from most languages and distributions. In particular, dependencies are not installed system-wide. Every project has its own (nested) set of dependencies. This solves a lot of package management problems, but it can take a little getting used to if you're used to other systems.
In practice, this means that you should almost always do a regular npm install
- that is, installing the dependencies locally into the project. The only time you need to do a 'global installation' (using npm install -g packagename
) is when you're installing an application that is itself published on npm, and you want it to be available globally on your system.
This also means that you should not run npm as root by default. This is a really important thing to internalize, or you'll run into trouble down the line.
To recap:
- Run npm under your own, unprivileged user - unless instructions specifically state that you should run it as root.
- Run npm in 'local' mode, installing dependencies into the project folder - unless instructions specifically state that you should do a global installation.
If you're curious about the details of packages in Node.js, here is a developer-focused article about them.
Installing an application from the npm registry
Is the application published on the npm registry, ie. does it have a page on npmjs.org
? Great! That means that installation is a single command.
If you've installed Node.js through your distribution's package manager: sudo npm install -g packagename
, where packagename
is the name of the package on npm.
If you've installed Node.js through nvm
or a similar tool: npm install -g packagename
, where packagename
is the name of the package on npm.
You'll notice that you need to run the command as root (eg. through sudo
) when installing Node.js through your distribution's package manager, but not when installing it through nvm
.
This is because by default, Node.js will use a system-wide folder for globally installed packages; but under nvm
, your entire Node.js installation exists in a subdirectory of your unprivileged user's home directory - including the 'global packages' folder.
After following these steps, some new binaries will probably be available for you to use system-wide. If the application's documentation doesn't tell you what binaries are available, then you should find its code repository, and look at the "bin"
key in its package.json
; that will contain a list of all the binaries it provides. Running them with --help
will probably give you documentation.
You're done!
If you run into a problem: Scroll down to the 'troubleshooting' section.
Installing an application from a repository
Some applications are not published to the npm registry, and instead you're expected to install it from the code (eg. Git) repository. In those cases, start by looking at the application's install instructions to see if there are special requirements for cloning the repository, like eg. checking out submodules.
If there are no special instructions, then a simple git clone http://example.com/path/to/repository
should work, replacing the URL with the cloning URL of the repository.
Making it available globally (like when installing from the npm registry)
Enter the cloned folder, and then run:
- If you installed Node.js from your distribution's repositories:
sudo npm install -g
, with no other arguments. - If you installed Node.js through
nvm
or a similar tool:npm install -g
, with no other arguments.
You're done!
If you run into a problem: Scroll down to the 'troubleshooting' section.
Keeping it in the repository
Sometimes you don't want to really install the application onto your system, but you rather just want to get it running locally from the repository.
In that case, enter the cloned folder, and run: npm install
, with no other arguments.
You're done!
If you run into a problem: Scroll down to the 'troubleshooting' section.
Troubleshooting
Sometimes, things still won't work. In most cases it'll be a matter of missing some sort of undocumented external dependency, ie. a dependency that npm can't manage for you and that's typically provided by the OS. Sometimes it's a version compatibility issue. Occasionally applications are just outright broken.
When running into trouble with npm, try entering your installation output into this tool first. It's able to (fully automatically!) recognize the most common issues that people tend to run into with npm.
If the tool can't find your issue and it still doesn't work, then drop by the IRC channel (#Node.js on Libera, an online chat can be found here) and we'll be happy to help you get things going! You do need to register your username to talk in the channel; you can get help with this in the #libera channel.
Getting started with Node.js
This article was originally published at https://gist.github.com/joepie91/95ed77b71790442b7e61. Some of the links in it still point to Gists that I have written; these will be moved over and relinked in due time.
Some of the suggestions on this page have become outdated, and better alternatives are available nowadays. However, the suggestions listed here should still work today as they did when this article was originally written. You do not need to update things to new approaches, and sometimes the newer approaches actually aren't better either, they can even be worse!
"How do I get started with Node?" is a commonly heard question in #Node.js. This gist is an attempt to compile some of the answers to that question. It's a perpetual work-in-progress.
And if this list didn't quite answer your questions, I'm available for tutoring and code review! A donation is also welcome :)
Setting expectations
Before you get started learning about JavaScript and Node.js, there's one very important article you need to read: Teach Yourself Programming in Ten Years.
Understand that it's going to take time to learn Node.js, just like it would take time to learn any other specialized topic - and that you're not going to learn effectively just by reading things, or following tutorials or courses. Get out there and build things! Experience is by far the most important part of learning, and shortcuts to this simply do not exist.
Avoid "bootcamps", courses, extensive books, and basically anything else that claims to teach you programming (or Node.js) in a single run. They all lie, and what they promise you simply isn't possible. That's also the reason this post is a list of resources, rather than a single book - they're references for when you need to learn about a certain topic at a certain point in time. Nothing more, nothing less.
There's also no such thing as a "definitive guide to Node.js", or a "perfect stack". Every project is going to have different requirements, that are best solved by different tools. There's no point in trying to learn everything upfront, because you can't know what you need to learn, until you actually need it.
In conclusion, the best way to get started with Node.js is to simply decide on a project you want to build, and start working on it. Start with the simplest possible implementation of it, and over time add bits and pieces to it, learning about those bits and pieces as you go. The links in this post will help you with that.
You'll find a table of contents for this page on your left.
Javascript refresher
Especially if you normally use a different language, or you only use Javascript occasionally, it's easy to misunderstand some of the aspects of the language.
These links will help you refresh your knowledge of JS, and make sure that you understand the OOP model correctly.
- A whirlwind tour of the language: http://learnxinyminutes.com/docs/javascript/
- Javascript is asynchronous, through using an 'event loop'. This video explains what an event loop is, and this video goes into more detail about how it works and how to deal with corner cases. If you're not familiar with the event loop yet, you should watch both.
- Javascript does automatic typecasting ("type conversion") in some cases. This shows how various values cast to a boolean, and this shows how
null
andundefined
relate to each other. - In Javascript, braces are optional for single-line statements - however, you should always use them. This gist demonstrates why.
- Asynchronous execution in Javascript is normally implemented using CPS. This stands for "continuation-passing style", and this shows an example of how that works.
- However, in practice, you shouldn't use that, and you should use Promises instead. Whereas it is very easy to mess up CPS code, that is not an issue with Promises - error handling is much more reliable, for example. This guide should give you a decent introduction.
- A callback should be either consistently synchronous, or consistently asynchronous. You don't really have to worry about this when you're using Promises (as they ensure that this is consistent), but this article still has a good explanation of the reasons for this. A simpler example can be found here.
- Javascript does not have classes, and constructor functions are a bad idea. This short article will help you understand the prototypical OOP model that Javascript uses. This gist shows a brief example of what the
this
variable refers to. Often you don't need inheritance at all - this gist shows an example of creating an object in the simplest possible way. - In Javascript, closures are everywhere, by default. This gist shows an example.
The Node.js platform
Node.js is not a language. Rather, it's a "runtime" that lets you run Javascript without a browser. It comes with some basic additions such as a TCP library - or rather, in Node.js-speak, a "TCP module" - that you need to write server applications.
- The easiest way to install Node.js on Linux and OS X, is to use
nvm
. The instructions for that can be found here. Make sure you create adefault
alias (as explained in the documentation), if you want it to work like a 'normal' installation. - If you are using Windows: You can download an installer from the Node.js website. You should consider using a different operating system, though - Windows is generally rather poorly suited for software development outside of .NET. Things will be a lot easier if you use Linux or OS X.
- The package manager you'll use for Node.js, is called NPM. While it's very simple to use, it's not particularly well-documented. This article will give you an introduction to it.
- Don't hesitate to add dependencies, even small ones! Node.js and NPM are specifically designed to make this possible without running into issues, and you will get big benefits from doing so. This post explains more about that.
- The module system is very simple. The Node.js documentation explains this further.
- MongoDB is commonly recommended and used with Node.js. It is, however, extremely poorly designed - and you shouldn't use it. This article goes into more detail about why you shouldn't use it. If you're not sure what to use, use PostgreSQL.
- The rest of the documentation for all the modules included with Node.js, can be found here.
Setting up your environment
- To be able to install "native addons" (compiled C++ modules), you need to take some additional steps. If you are on Linux or OS X, you likely already have everything you need - however, on Windows you'll have to install a few additional pieces of software. The instructions for all of these platforms can be found here. Do not skip this step. Installing pure-Javascript modules is not always a viable solution, especially where it concerns cryptography-related modules such as
scrypt
orbcrypt
. - If you're running into issues on Windows, try these instructions from Microsoft.
- There are a lot of build tools for helping you manage your code. It can get a bit confusing, though - there are a lot of articles that just tell you to combine a pile of different tools, without ever explaining what they're for. This is a hype-free overview of different kinds of build tools, and what they may be useful for.
Functional programming
Javascript has part of its roots in functional programming languages, which means that you can use some of those concepts in your own projects. They can be greatly beneficial to the readability and maintainability of your code.
- This article gives an introduction to
map
,filter
andreduce
- three functional programming operations that help a lot in writing maintainable and predictable code. - This gist shows an example of using those with Bluebird, the Promises library that I recommended in the Promises Reading Guide.
- This slide deck demonstrates currying in Javascript, another functional programming technique - think of them as "partially executed functions".
Module patterns
To build "configurable" modules, you can use a pattern known as "parametric modules". This gist shows an example of that. This is another example.
A commonly used pattern is the EventEmitter
- this is exactly what it sounds like; an object that emits events. It's a very simple abstraction, but helps greatly in writing loosely coupled code. This gist illustrates the object, and the full documentation can be found here.
Code architecture
The 'design' of your codebase matters a lot. Certain approaches for solving a problem work better than other approaches, and each approach has its own set of benefits and drawbacks. Picking the right approach is important - it will save you hours (or days!) of time down the line, when you are maintaining your code.
I'm still in the process of writing more about this, but so far, I've already written an article that explains the difference between monolithic and modular code and why it matters. You can read it here.
Express
If you want to build a website or web application, you'll probably find Express to be a good framework to start with. As a framework, it is very small. It only provides you with the basic necessities - everything else is a plugin.
If this sounds complicated, don't worry - things almost always work "out of the box". Simply follow the README
for whichever "middleware" (Express plugin) you want to add.
To get started with Express, simply follow the below articles. Whatever you do, don't use the Express Generator - it generates confusing and bloated code. Just start from scratch and follow the guides!
- Installing Express (some of this was already covered in the NPM guide above)
- A Hello World example
- Routing
- Using template engines
- Writing Middleware
- Using Middleware
- Static File Handling (this is middleware, too!)
- Error Handling
- Debugging
To get a better handle on how to render pages server-side with Express:
- Rendering pages server-side with Express (and Pug) (a step-by-step walkthrough, work in progress)
Some more odds and ends regarding about Express:
- Some FAQs (don't use MVC, however - this is why.)
- Express Behind Proxies
- The full Express API documentation
Some examples:
- Making something "globally available" in an Express application
- Writing configurable middleware (using the same technique as the parametric modules I showed earlier)
Combining Express and Promises:
- A short article explaining how to use
express-promise-router
- An example, also explaining what would happen if you didn't handle errors.
Some common Express middleware that you might want to use:
- Sessions: express-session, with connect-session-knex if you are using Knex.
- Message flashing: connect-flash
- Handling request payloads ("form/POST data"): body-parser
- Handling uploads and other multipart data: multer if you want it written to disk like PHP would do, or connect-busboy if you want to interact with the upload stream directly.
- Access logs: morgan
- OAuth/OpenID integration: Passport
Coming from other languages or platforms
- If you are used to PHP or similar: Contrary to PHP, Node.js does not use a CGI-like model (ie. "one pageload is one script"). Instead, it is a persistent process - your code is the webserver, and it handles many incoming requests at the same time, for as long as the process keeps running. This means you can have persistent state - this gist shows an example of that.
- If you are used to synchronous platforms: This gist illustrates the differences between a (synchronous) PHP script and an (asynchronous) Node.js application.
Security
Note that this advice isn't necessarily complete. It answers some of the most common questions, but your project might have special requirements or caveats. When in doubt, you can always ask in the #Node.js channel!
Also, keep in mind the golden rule of security: humans suck at repetition, regardless of their level of competence. If a mistake can be made, then it will be made. Design your systems such that they are hard to use incorrectly.
- Sessions: Use something that implements session cookies. If you're using Express, express-session will take care of this for you. Whatever you do, don't use JWT for sessions, even if many blog posts recommend it - it will cause security problems. This article goes into more detail.
- Password hashing: Use
scrypt
. This wrapper module will make it easier to use. - CSRF protection: You need this if you are building a website. Use csurf.
- XSS: Every good templater will escape output by default. Only use templaters that do this (such as Jade or Nunjucks)! If you need to explicitly escape things, you should consider it insecure - it's too easy to forget to do this, and is practically guaranteed to result in vulnerabilities.
- SQL injection: Always use parameterized queries. When using MySQL, use the
node-mysql2
module instead of thenode-mysql
module - the latter doesn't use real parameterized queries. Ideally, use something like Knex, which will also prevent many other issues, and make your queries much more readable and maintainable. - Random numbers and values: Generating unpredictable random numbers is a lot harder than it seems.
Math.random()
will generate numbers that may seem random, but are actually quite predictable to an attacker. If you need random values, read this article for recommendations. It also goes into more detail about the types of "randomness" that exist. - Cryptography: Follow the suggestions in this gist. Whatever you do, do not use the
crypto
module directly, unless you really have no other choice. Never use pure-Javascript reimplementations - always use bindings to the original implementation, where possible (in the form of native addons). - Vulnerability advisories: The Node Security Project keeps track of known vulnerabilities in Node.js modules. Services like VersionEye will e-mail you, if your project uses a module that is found vulnerable.
Useful modules:
This is an incomplete list, and I'll probably be adding stuff to it in the future.
- Determining the type of a value: type-of-is
- Date/time handling: Moment.js
- Making HTTP requests: bhttp
- Clean debugging logs: debug
- Cleaner stacktraces and errors: pretty-error
- Markdown parsing: marked
- HTML parsing: cheerio (has a jQuery-like API)
- WebSockets: ws
Deployment
- Don't run Node.js as root, ever! If you want to expose your service at a privileged port (eg. port 80), and you probably do, then you can use authbind to accomplish that safely.
Distribution
- Your project is ready for release! But... you should still pick a license. This article will give you a very basic introduction to copyright, and the different kind of (common) licenses you can use.
Scalability
Scalability is a result of your application architecture, not the technologies you pick. Be wary of anything that claims to be "scalable" - it's much more important to write loosely coupled code with small components, so that you can split out responsibilities across multiple processes and servers.
Troubleshooting
Is something not working properly? Here are some resources that might help:
- Is
npm install
causing an error? Use this error explaining tool to find out what's wrong. DeprecationWarning: Using Buffer without
newwill soon stop working.
- the solution for this can be found here.
Optimization
The first rule of optimization is: do not optimize.
The correct order of concerns is security first, then maintainability/readability, and then performance. Optimizing performance is something that you shouldn't care about, until you have hard metrics showing you that it is needed. If you can't show a performance problem in numbers, it doesn't exist; while it is easy to optimize readable code, it's much harder to make optimized core more readable.
There is one exception to this rule: never use any methods that end with Sync
- these are blocking, synchronous methods, and will block your event loop (ie. your entire application) until they have completed. They may look convenient, but they are not worth the performance penalty.
Now let's say that you are having performance issues. Here are some articles and videos to learn more about how optimization and profiling works in Node.js / V8 - they are going to be fairly in-depth, so you may want to hold off on reading these until you've gotten some practice with Node.js:
- Common causes of deoptimization
- Monomorphism, and why it is important
- Tuning Node.js
- A tour of V8: object representation
- Node.js in flames
- Realtime Node.js App: A Stress Testing Story (using Socket.IO)
- A bigger list of resources about V8 optimization and internals can be found here.
If you're seeing memory leaks, then these may be helpful articles to read:
These are some modules that you may find useful for profiling your application:
- node-inspector: Based on Chrome Developer Tools, this tool gives you many features, including CPU and heap profiling. Also useful for debugging in general. Since Node.js v6.3.0, you can also connect directly using Chrome Developer Tools.
- heapdump: On-demand heap dumps, for later analysis. Usable from application code in production, so very useful for making a heap dump the moment your application goes over a certain heap size.
- memwatch-next: Provides memory leak detection, and heap diffing.
Writing C++ addons
You'll usually want to avoid this - C++ is not a memory-safe language, so it's much safer to just write your code in Javascript. V8 is rather well-optimized, so in most cases, performance isn't a problem either. That said, sometimes - eg. when writing bindings to something else - you just have to write a native module.
These are some resources on that:
- The addon documentation
nan
, an abstraction layer for making your module work across Node.js versions (you should absolutely use this)node-gyp
, the build tool you will need for this purpose- V8 API documentation for every supported Node.js version
Writing Rust addons
Neon is a new project that lets you write memory-safe compiled extensions for Node.js, using Rust. It's still pretty new, but quite promising - an introduction can be found here.
Odds and ends
Some miscellaneous code snippets and examples, that I haven't written a section or article for yet.
- Named logging in Gulp: https://gist.github.com/joepie91/e7d66ffdb17d1ea69c56
- Cached image: https://gist.github.com/joepie91/cee42198b6bc6a24ea44
- Combining Gulp and Electron: https://gist.github.com/joepie91/f81cdbc1b45d52ab4b87
Future additions to this list
There are a few things that I'm currently working on documenting, that will be added to this list in the future. I write new documentation as I find the time to do so.
- Node.js for PHP developers (a migration guide) - In progress.
- A comprehensive guide to Promises - Planned.
- A comprehensive guide to streams - Planned.
- Error handling mechanisms and strategies - Planned.
- Introduction to HTTP - Planned.
- Writing a secure authentication system - Planned.
- Writing abstractions - Planned.
Node.js for PHP developers
This article was originally published at https://gist.github.com/joepie91/87c5b93a5facb4f99d7b2a65f08363db. It has not been finished yet, but still contains some useful pointers.
Learning a second language
If PHP was your first language, and this is the first time you're looking to learn another language, you may be tempted to try and "make it work like it worked in PHP". While understandable, this is a really bad idea. Different languages have fundamentally different designs, with different best practices, different syntax, and so on. The result of this is that different languages are also better for different usecases.
By trying to make one language work like the other, you get the worst of both worlds - you lose the benefits that made language one good for your usecase, and add the design flaws of language two. You should always aim to learn a language properly, including how it is commonly or optimally used. Your code is going to look and feel considerably different, and that's okay!
Over time, you will gain a better understanding of how different language designs carry different tradeoffs, and you'll be able to get the best of both worlds. This will take time, however, and you should always start by learning and using each language as it is first, to gain a full understanding of it.
One thing I explicitly recommend against, is CGI-Node - you should never, ever, ever use this. It makes a lot of grandiose claims, but it actually just reimplements some of the worst and most insecure parts of PHP in Node.js. It is also completely unnecessary - the sections below will go into more detail.
Execution model
The "execution model" of a language describes how your code is executed. In the case of a web-based application, it decides how your server goes from "a HTTP request is coming in", to "the application code is executed", to "a response has been sent".
PHP uses what we'll call the "CGI model" to run your code - for every HTTP request that comes in, the webserver (usually Apache or nginx) will look in your "document root" for a .php
file with the same path and filename, and then execute that file. This means that for every new request, it effectively starts a new PHP process, with a "clean slate" as far as application state is concerned. Other than $_SESSION
variables, all the variables in your PHP script are thrown away after a response is sent.
This "CGI model" is a somewhat unique execution model, and only a few technologies use it - PHP, ASP and ColdFusion are the most well-known. It's also a very fragile and limited model, that makes it easy to introduce security issues; for example, "uploading a shell" is something that's only possible because of the CGI model.
Node.js, however, uses a different model: the "long-running process" model. In this model, your code is not executed by a webserver - rather, your code is the webserver. Your application is only started once, and once it has started, it will be handling an essentially infinite amount of requests, potentially hundreds or thousands at the same time. Almost every other language uses this same model.
This also means that your application state continues to exist after a response has been sent, and this makes a lot of projects much easier to implement, because you don't need to constantly store every little thing in a database; instead, you only need to store things in your database that you actually intend to store for a long time.
Some of the advantages of the "long-running process" model (as compared to the "CGI model"):
The reason attackers cannot upload a shell, is that there is no direct mapping between a URL and a location on your filesystem. Your application is explicitly designed to only execute specific files that are a part of your application. When you try to access a .js
file that somebody uploaded, it will just send the .js
file; it won't be executed.
There aren't really any disadvantages - while you do have to have a Node.js process running at all times, it can be managed in the same way as any other webserver. You can also use another webserver in front of it; for example, if you want to host multiple domains on a single server.
Hosting
Node.js applications will not run in most shared hosting environments, as they are designed to only run PHP. While there are some 'managed hosting' environments like Heroku that claim to work similarly, they are usually rather expensive and not really worth the money.
When deploying a Node.js project in production, you will most likely want to host it on a VPS or a dedicated server. These are full-blown Linux systems that you have full control over, so you can run any application or database that you want. The cheapest option here is to go with an "unmanaged provider".
Unmanaged providers are providers whose responsibility ends at the server and the network - they make sure that the system is up and running, and from that point on it's your responsibility to manage your applications. Because they do not provide support for your projects, they are a lot cheaper than "managed providers".
My usual recommendations for unmanaged providers are (in no particular order): RamNode, Afterburst, SecureDragon, Hostigation and RAM Host. Another popular choice is DigitalOcean - but while their service is stable and sufficient for most people, I personally don't find the performance/resources/price ratio to be good enough. I've also heard good things about Linode, but I don't personally use them - they do, however, apparently provide limited support for your server management.
As explained in the previous section, your application is the webserver. However, there are some reasons you might still want to run a "generic" webserver in front of your application:
- Easier setup of TLS ("SSL").
- Multiple applications for different domains, on the same server ("virtual hosts").
- Slightly faster static file serving.
My recommendation for this is Caddy. While nginx is a popular and often-recommended option, it's considerably harder to set up than Caddy, especially for TLS.
Frameworks
(this section is a work in progress, these are just some notes left for myself)
- execution model
- Express
- small modules
Templating
If you've already used a templater like Smarty in PHP, here's the short version: use either Pug or Nunjucks, depending on your preference. Both auto-escape values by default, but I strongly recommend Pug - it understands the actual structure of your template, which gives you more flexibility.
If you've been using include()
or require()
in PHP along with inline <?php echo($foobar); ?>
statements, here's the long version:
The "using-PHP-as-a-templater" approach is quite flawed - it makes it very easy to introduce security issues such as XSS by accidentally forgetting to escape something. I won't go into detail here, but suffice to say that this is a serious risk, regardless of how competent you are as a developer. Instead, you should be using a templater that auto-escapes values by default, unless you explicitly tell it not to. Pug and Nunjucks are two options in Node.js that do precisely that, and both will work with Express out of the box.
Rendering pages server-side with Express (and Pug)
This article was originally published at https://gist.github.com/joepie91/c0069ab0e0da40cc7b54b8c2203befe1.
Terminology
- View: Also called a "template", a file that contains markup (like HTML) and optionally additional instructions on how to generate snippets of HTML, such as text interpolation, loops, conditionals, includes, and so on.
- View engine: Also called a "template library" or "templater", ie. a library that implements view functionality, and potentially also a custom language for specifying it (like Pug does).
- HTML templater: A template library that's designed specifically for generating HTML. It understands document structure and thus can provide useful advanced tools like mixins, as well as more secure output escaping (since it can determine the right escaping approach from the context in which a value is used), but it also means that the templater is not useful for anything other than HTML.
- String-based templater: A template library that implements templating logic, but that has no understanding of the content it is generating - it simply concatenates together strings, potentially multiple copies of those strings with different values being used in them. These templaters offer a more limited feature set, but are more widely usable.
- Text interpolation / String interpolation: The insertion of variable values into a string of some kind. Typical examples include ES6 template strings, or this example in Pug:
Hello #{user.username}!
- Locals: The variables that are passed into a template, to be used in rendering that template. These are generally specified every time you wish to render a template.
Pug is an example of a HTML templater. Nunjucks is an example of a string-based templater. React could technically be considered a HTML templater, although it's not really designed to be used primarily server-side.
View engine setup
Assuming you'll be using Pug, this is simply a matter of installing Pug...
npm install --save pug
... and then configuring Express to use it:
let app = express();
app.set("view engine", "pug");
/* ... rest of the application goes here ... */
You won't need to require()
Pug anywhere, Express will do this internally.
You'll likely want to explicitly set the directory where your templates will be stored, as well:
let app = express();
app.set("view engine", "pug");
app.set("views", path.join(__dirname, "views"));
/* ... rest of the application goes here ... */
This will make Express look for your templates in the "views" directory, relative to the file in which you specified the above line.
Rendering a page
homepage.pug:
html body h1 Hello World! p Nothing to see here.
app.js:
router.get("/", (req, res) => {
res.render("homepage");
});
Express will automatically add an extension to the file. That means that - with our Express configuration - the "homepage"
template name in the above example will point at views/homepage.pug
.
Rendering a page with locals
homepage.pug:
html body h1 Hello World! p Hi there, #{user.username}!
app.js:
router.get("/", (req, res) => {
res.render("homepage", {
user: req.user
});
});
In this example, the #{user.username}
bit is an example of string interpolation. The "locals" are just an object containing values that the template can use. Since every expression in Pug is written in JavaScript, you can pass any kind of valid JS value into the locals, including functions (that you can call from the template).
For example, we could do the following as well - although there's no good reason to do this, so this is for illustratory purposes only:
homepage.pug:
html body h1 Hello World! p Hi there, #{getUsername()}!
app.js:
router.get("/", (req, res) => {
res.render("homepage", {
getUsername: function() {
return req.user;
}
});
});
Using conditionals
homepage.pug:
html body h1 Hello World! if user != null p Hi there, #{user.username}! else p Hi there, unknown person!
app.js:
router.get("/", (req, res) => {
res.render("homepage", {
user: req.user
});
});
Again, the expression in the conditional is just a JS expression. All defined locals are accessible and usable as before.
Using loops
homepage.pug:
html body h1 Hello World! if user != null p Hi there, #{user.username}! else p Hi there, unknown person! p Have some vegetables: ul for vegetable in vegetables li= vegetable
app.js:
router.get("/", (req, res) => {
res.render("homepage", {
user: req.user,
vegetables: [
"carrot",
"potato",
"beet"
]
});
});
Note that this...
li= vegetable
... is just shorthand for this:
li #{vegetable}
By default, the contents of a tag are assumed to be a string, optionally with interpolation in one or more places. By suffixing the tag name with =
, you indicate that the contents of that tag should be a JavaScript expression instead.
That expression may just be a variable name as well, but it doesn't have to be - any JS expression is valid. For example, this is completely okay:
li= "foo" + "bar"
And this is completely valid as well, as long as the randomVegetable method is defined in the locals:
li= randomVegetable()
Request-wide locals
Sometimes, you want to make a variable available in every res.render
for a request, no matter what route or middleware the page is being rendered from. A typical example is the user object for the current user. This can be accomplished by setting it as a property on the res.locals
object.
homepage.pug:
html body h1 Hello World! if user != null p Hi there, #{user.username}! else p Hi there, unknown person! p Have some vegetables: ul for vegetable in vegetables li= vegetable
app.js:
app.use((req, res, next) => {
res.locals.user = req.user;
next();
});
/* ... more code goes here ... */
router.get("/", (req, res) => {
res.render("homepage", {
vegetables: [
"carrot",
"potato",
"beet"
]
});
});
Application-wide locals
Sometimes, a value even needs to be application-wide - a typical example would be the site name for a self-hosted application, or other application configuration that doesn't change for each request. This works similarly to res.locals
, only now you set it on app.locals
.
homepage.pug:
html body h1 Hello World, this is #{siteName}! if user != null p Hi there, #{user.username}! else p Hi there, unknown person! p Have some vegetables: ul for vegetable in vegetables li= vegetable
app.js:
app.locals.siteName = "Vegetable World";
/* ... more code goes here ... */
app.use((req, res, next) => {
res.locals.user = req.user;
next();
});
/* ... more code goes here ... */
router.get("/", (req, res) => {
res.render("homepage", {
vegetables: [
"carrot",
"potato",
"beet"
]
});
});
The order of specificity is as follows: app.locals
are overwritten by res.locals
of the same name, and res.locals
are overwritten by res.render
locals of the same name.
In other words: if we did something like this...
router.get("/", (req, res) => {
res.render("homepage", {
siteName: "Totally Not Vegetable World",
vegetables: [
"carrot",
"potato",
"beet"
]
});
});
... then the homepage would show "Totally Not Vegetable World" as the website name, while every other page on the site still shows "Vegetable World".
Rendering a page after asynchronous operations
homepage.pug:
html body h1 Hello World, this is #{siteName}! if user != null p Hi there, #{user.username}! else p Hi there, unknown person! p Have some vegetables: ul for vegetable in vegetables li= vegetable
app.js:
app.locals.siteName = "Vegetable World";
/* ... more code goes here ... */
app.use((req, res, next) => {
res.locals.user = req.user;
next();
});
/* ... more code goes here ... */
router.get("/", (req, res) => {
return Promise.try(() => {
return db("vegetables").limit(3);
}).map((row) => {
return row.name;
}).then((vegetables) => {
res.render("homepage", {
vegetables: vegetables
});
});
});
Basically the same as when you use res.send
, only now you're using res.render
.
Template inheritance in Pug
It would be very impractical if you had to define the entire site layout in every individual template - not only that, but the duplication would also result in bugs over time. To solve this problem, Pug (and most other templaters) support template inheritance. An example is below.
layout.pug:
html body h1 Hello World, this is #{siteName}! if user != null p Hi there, #{user.username}! else p Hi there, unknown person! block content p This page doesn't have any content yet.
homepage.pug:
extends layout block content p Have some vegetables: ul for vegetable in vegetables li= vegetable
app.js:
app.locals.siteName = "Vegetable World";
/* ... more code goes here ... */
app.use((req, res, next) => {
res.locals.user = req.user;
next();
});
/* ... more code goes here ... */
router.get("/", (req, res) => {
return Promise.try(() => {
return db("vegetables").limit(3);
}).map((row) => {
return row.name;
}).then((vegetables) => {
res.render("homepage", {
vegetables: vegetables
});
});
});
That's basically all there is to it. You define a block
in the base template - optionally with default content, as we've done here - and then each template that "extends" (inherits from) that base template can override such block
s. Note that you never render layout.pug
directly - you still render the page layouts themselves, and they just inherit from the base template.
Things of note:
- Overriding a
block
is optional. If you don't override ablock
, it will simply contain either the default content from the base template (if any is specified), or no content at all (if not). - You can have an unlimited number of
block
s with different names - for example, the one in our example is calledcontent
. You can decide to override any of them from a template, all of them, or none at all. It's up to you. - You can nest multiple
block
s with different names. This can be useful for more complex layout variations. - You can have multiple levels of inheritance - any template you are inheriting from can itself inherit from another template. This can be especially useful in combination with nested
block
s, for complex cases.
Static files
You'll probably also want to serve static files on your site, whether they are CSS files, images, downloads, or anything else. By default, Express ships with express.static
, which does this for you.
All you need to do, is to tell Express where to look for static files. You'll usually want to put express.static
at the very start of your middleware definitions, so that no time is wasted on eg. initializing sessions when a request for a static file comes in.
let app = express();
app.set("view engine", "pug");
app.set("views", path.join(__dirname, "views"));
app.use(express.static(path.join(__dirname, "public")));
/* ... rest of the application goes here ... */
Your directory structure might look like this:
your-project
|- node_modules ...
|- public
| |- style.css
| `- logo.png
|- views
| |- homepage.pug
| `- layout.pug
`- app.js
In the above example, express.static
will look in the public
directory for static files, relative to the app.js
file. For example, if you tried to access https://your-project.com/style.css
, it would send the user the contents of your-project/public/style.css
.
You can optionally also specify a prefix for static files, just like for any other Express middleware:
let app = express();
app.set("view engine", "pug");
app.set("views", path.join(__dirname, "views"));
app.use("/static", express.static(path.join(__dirname, "public")));
/* ... rest of the application goes here ... */
Now, that same your-project/public/style.css
can be accessed through https://your-project.com/static/style.css
instead.
An example of using it in your layout.pug:
html head link(rel="stylesheet", href="/static/style.css") body h1 Hello World, this is #{siteName}! if user != null p Hi there, #{user.username}! else p Hi there, unknown person! block content p This page doesn't have any content yet.
The slash at the start of /static/style.css
is important - it tells the browser to ask for it relative to the domain, as opposed to relative to the page URL.
An example of URL resolution without a leading slash:
- Page URL:
https://your-project.com/some/deeply/nested/page
- Stylesheet URL:
static/style.css
- Resulting stylesheet request URL:
https://your-project.com/some/deeply/nested/static/style.css
An example of URL resolution with the loading slash:
- Page URL:
https://your-project.com/some/deeply/nested/page
- Stylesheet URL:
/static/style.css
- Resulting stylesheet request URL:
https://your-project.com/static/style.css
That's it! You do the same thing to embed images, scripts, link to downloads, and so on.
Running a Node.js application using nvm as a systemd service
This article was originally published at https://gist.github.com/joepie91/73ce30dd258296bd24af23e9c5f761aa.
Hi there! Since this post was originally written, nvm
has gained some new tools, and some people have suggested alternative (and potentially better) approaches for modern systems. Make sure to have a look at the comments on the original Gist, before following this guide!
Trickier than it seems.
1. Set up nvm
Let's assume that you've already created an unprivileged user named myapp
. You should never run your Node.js applications as root!
Switch to the myapp
user, and do the following:
curl -o- https://raw.githubusercontent.com/creationix/nvm/v0.31.0/install.sh | bash
(however, this will immediately run the nvm installer - you probably want to just download theinstall.sh
manually, and inspect it before running it)- Install the latest stable Node.js version:
nvm install stable
2. Prepare your application
Your package.json must specify a start
script, that describes what to execute for your application. For example:
...
"scripts": {
"start": "node app.js"
},
...
3. Service file
Save this as /etc/systemd/system/my-application.service
:
[Unit]
Description=My Application
[Service]
EnvironmentFile=-/etc/default/my-application
ExecStart=/home/myapp/start.sh
WorkingDirectory=/home/myapp/my-application-directory
LimitNOFILE=4096
IgnoreSIGPIPE=false
KillMode=process
User=myapp
[Install]
WantedBy=multi-user.target
You'll want to change the User
, Description
and ExecStart
/WorkingDirectory
paths to reflect your application setup.
4. Startup script
Next, save this as /home/myapp/start.sh
(adjusting the username in both the path and the script if necessary):
#!/bin/bash
. /home/myapp/.nvm/nvm.sh
npm start
This script is necessary, because we can't load nvm via the service file directly.
Make sure to make it executable:
chmod +x /home/myapp/start.sh
5. Enable and start your service
Replace my-application
with whatever you've named your service file after, running the following as root:
systemctl enable my-application
systemctl start my-application
To verify whether your application started successfully (don't forget to npm install
your dependencies!), run:
systemctl status my-application
... which will show you the last few lines of its output, whether it's currently running, and any errors that might have occurred.
Done!
Persistent state in Node.js
This article was originally published at https://gist.github.com/joepie91/bf0813626e6568e8633b.
This is an extremely simple example of how you have 'persistent state' when writing an application in Node.js. The i
variable is shared across all requests, so every time the /increment
route is accessed, the number is incremented and returned.
This may seem obvious, but it works quite differently from eg. PHP, where each HTTP request is effectively a 'clean slate', and you don't have persistent state. Were this written in PHP, then every request would have returned 1
, rather than an incrementing number.
var i = 0;
// [...]
app.get("/increment", function(req, res) {
i += 1;
res.send("Current number: " + i);
})
// [...]
node-gyp requirements
This article was originally published at https://gist.github.com/joepie91/375f6d9b415213cf4394b5ba3ae266ae. It may no longer be applicable.
Linux
- Python 2.7 (not 3.x!),
build-essential
(make, gcc, etc.)
Windows
- As Administrator:
npm install --global --production windows-build-tools
OS X
- Old OS X: http://osxdaily.com/2012/07/06/install-gcc-without-xcode-in-mac-os-x/
- New OS X: http://osxdaily.com/2014/02/12/install-command-line-tools-mac-os-x/
Introduction to sessions
This article was originally published at https://gist.github.com/joepie91/cf5fd6481a31477b12dc33af453f9a1d.
While a lot of Node.js guides recommend using JWT as an alternative to session cookies (sometimes even mistakenly calling it "more secure than cookies"), this is a terrible idea. JWTs are absolutely not a secure way to deal with user authentication/sessions, and this article goes into more detail about that.
Secure user authentication requires the use of session cookies.
Session data is arbitrary data that is stored on the server side, and that is associated with a session ID. The client can't see or modify this data, but the server can use the session ID from a request to associate session data with that request.
Altogether, this allows for the server to store arbitrary data for a session (that the user can't see or touch!), that it can use on every subsequent request in that session. This is how a website remembers that you've logged in.
Step-by-step, the process goes something like this:
- Client requests login page.
- Server sends login page HTML.
- Client fills in the login form, and submits it.
- Server receives the data from the login form, and verifies that the username and password are correct.
- Server creates a new session in the database, containing the ID of the user in the database, and generates a unique session ID for it (which is not the same as the user ID!)
- Server sends the session ID to the user as a cookie header, alongside a "welcome" page.
- Client receives the session ID, and saves it locally as a cookie.
- Client displays the "welcome" page that the cookie came with.
- User clicks a link on the welcome page, navigating to his "notifications" page.
- Client retrieves the session cookie from storage.
- Client requests the notifications page, sending along the session cookie (containing the session ID).
- Server receives the request.
- Server looks at the session cookie, and extract the session ID.
- Server retrieves the session data from the database, for the session ID that it received.
- Server associates the session data (containing the user ID) with the request, and passes it on to something that handles the request.
- Server request handler receives the request (containing the session data including user ID), and sends a personalized notifications page for the user with that ID.
- Client receives the personalized notifications page, and displays it.
- User clicks another link, and we go back to step 10.
Configuring sessions
Thankfully, you won't have to implement all this yourself - most of it is done for you by existing session implementations. If you're using Express, that implementation would be express-session.
The express-session
module doesn't implement the actual session storage itself, it only handles the Express-related bits - for example, it ensures that req.session
is automatically loaded from and saved to.
For the storage of session data, you need to specify a "session store" that's specific to the database you want to use for your session data - and when using Knex, connect-session-knex
is the best option for that.
While full documentation is available in the express-session
repository, this is what your express-session
initialization might look like when you're using a relational database like PostgreSQL (through Knex):
const express = require("express");
const knex = require("knex");
const expressSession = require("express-session");
const KnexSessionStore = require("connect-session-knex")(expressSession);
const config = require("./config.json");
/* ... other code ... */
/* You will probably already have a line that looks something like the below.
* You won't have to create a new Knex instance for dealing with sessions - you
* can just use the one you already have, and the Knex initialization here is
* purely for illustrative purposes. */
let db = knex(require("./knexfile"));
let app = express();
/* ... other app initialization code ... */
app.use(expressSession({
secret: config.sessions.secret,
resave: false,
saveUninitialized: false,
store: new KnexSessionStore({
knex: db
})
}));
/* ... rest of the application goes here ... */
The configuration example in more detail
require("connect-session-knex")(expressSession)
The connect-session-knex
module needs access to the express-session
library, so instead of exporting the session store constructor directly, it exports a wrapper function. We call that wrapper function immediately after requiring the module, passing in the express-session
module, and we get back a session store constructor.
app.use(expressSession({
secret: config.sessions.secret,
resave: false,
saveUninitialized: false,
store: new KnexSessionStore({
knex: db
})
}));
This is where we 1) create a new express-session
middleware, and 2) app.use
it, so that it processes every request, attaching session data where needed.
secret: config.sessions.secret,
Every application should have a "secret" for sessions - essentially a secret key that will be used to cryptographically sign the session cookie, so that the user can't tamper with it. This should be a random value, and it should be stored in a configuration file. You should not store this value (or any other secret values) in the source code directly.
On Linux and OS X, a quick way to generate a securely random key is the following command: cat /dev/urandom | env LC_CTYPE=C tr -dc _A-Za-z0-9 | head -c${1:-64}
resave: false,
When resave
is set to true
, express-session
will always save the session data after every request, regardless of whether the session data was modified. This can cause race conditions, and therefore you usually don't want to do this, but with some session stores it's necessary as they don't let you reset the "expiry timer" without saving all the session data again.
connect-session-knex
doesn't have this problem, and so you should set it to false
, which is the safer option. If you intend to use a different session store, you should consult the express-session
documentation for more details about this option.
saveUninitialized: false,
If the user doesn't have a session yet, a brand new req.session
object is created for them on their first request. This setting determines whether that session should be saved to the database, even if no session data was stored into it. Setting it to false
makes it so that the session is only saved if it's actually used for something, and that's the setting you want here.
store: new KnexSessionStore({
knex: db
})
This tells express-session
where to store the actual session data. In the case of connect-session-knex
(which is where KnexSessionStore
comes from), we need to pass in an existing Knex instance, which it will then use for interacting with the sessions
table. Other options can be found in the connect-session-knex
documentation.
Using sessions
The usage of sessions is quite simple - you simply set properties on req.session
, and you can then access those properties from other requests within the same session. For example, this is what a login route might look like (assuming you're using Knex, scrypt-for-humans
, and a custom AuthenticationError
created with create-error
):
router.post("/login", (req, res) => {
return Promise.try(() => {
return db("users").where({
username: req.body.username
});
}).then((users) => {
if (users.length === 0) {
throw new AuthenticationError("No such username exists");
} else {
let user = users[0];
return Promise.try(() => {
return scryptForHumans.verifyHash(req.body.password, user.hash);
}).then(() => {
/* Password was correct */
req.session.userId = user.id;
res.redirect("/dashboard");
}).catch(scryptForHumans.PasswordError, (err) => {
throw new AuthenticationError("Invalid password");
});
}
});
});
And your /dashboard
route might look like this:
router.get("/dashboard", (req, res) => {
return Promise.try(() => {
if (req.session.userId == null) {
/* User is not logged in */
res.redirect("/login");
} else {
return Promise.try(() => {
return db("users").where({
id: req.session.userId
});
}).then((users) => {
if (users.length === 0) {
/* User no longer exists */
req.session.destroy();
res.redirect("/login");
} else {
res.render("dashboard", {
user: users[0];
});
}
});
}
});
});
In this example, req.session.destroy()
will - like the name suggests - destroy the session, essentially returning the user to a session-less state. In practice, this means they get "logged out".
Now, if you had to do all that logic for every route that requires the user to be logged in, it would get rather unwieldy. So let's move it out into some middleware:
function requireLogin(req, res, next) {
return Promise.try(() => {
if (req.session.userId == null) {
/* User is not logged in */
res.redirect("/login");
} else {
return Promise.try(() => {
return db("users").where({
id: req.session.userId
});
}).then((users) => {
if (users.length === 0) {
/* User no longer exists */
req.session.destroy();
res.redirect("/login");
} else {
req.user = users[0];
next();
}
});
}
});
}
router.get("/dashboard", requireLogin, (req, res) => {
res.render("dashboard", {
user: req.user
});
});
Note the following:
- We now have a separate
requireLogin
function that verifies whether the user is logged in. - That same function also sets
req.user
if they are logged in, with their user data, before callingnext()
(which passes control to the next middleware/route). - Instead of only specifying a path and a route in the
router.get
call, we now specify ourrequireLogin
middleware as well. It will get called before the route, and the route is only ever called if therequireLogin
middleware callsnext()
(which it only does for logged-in users).
Secure random values
This article was originally published at https://gist.github.com/joepie91/7105003c3b26e65efcea63f3db82dfba.
Not all random values are created equal - for security-related code, you need a specific kind of random value.
A summary of this article, if you don't want to read the entire thing:
- Don't use
Math.random()
. There are extremely few cases whereMath.random()
is the right answer. Don't use it, unless you've read this entire article, and determined that it's necessary for your case. - Don't use
crypto.getRandomBytes
directly. While it's a CSPRNG, it's easy to bias the result when 'transforming' it, such that the output becomes more predictable. - If you want to generate random tokens or API keys: Use
uuid
, specifically theuuid.v4()
method. Avoidnode-uuid
- it's not the same package, and doesn't produce reliably secure random values. - If you want to generate random numbers in a range: Use
random-number-csprng
.
You should seriously consider reading the entire article, though - it's not that long :)
Types of "random"
There exist roughly three types of "random":
- Truly random: Exactly as the name describes. True randomness, to which no pattern or algorithm applies. It's debatable whether this really exists.
- Unpredictable: Not truly random, but impossible for an attacker to predict. This is what you need for security-related code - it doesn't matter how the data is generated, as long as it can't be guessed.
- Irregular: This is what most people think of when they think of "random". An example is a game with a background of a star field, where each star is drawn in a "random" position on the screen. This isn't truly random, and it isn't even unpredictable - it just doesn't look like there's a pattern to it, visually.
Irregular data is fast to generate, but utterly worthless for security purposes - even if it doesn't seem like there's a pattern, there is almost always a way for an attacker to predict what the values are going to be. The only realistic usecase for irregular data is things that are represented visually, such as game elements or randomly generated phrases on a joke site.
Unpredictable data is a bit slower to generate, but still fast enough for most cases, and it's sufficiently hard to guess that it will be attacker-resistant. Unpredictable data is provided by what's called a CSPRNG.
Types of RNGs (Random Number Generators)
- CSPRNG: A Cryptographically Secure Pseudo-Random Number Generator. This is what produces unpredictable data that you need for security purposes.
- PRNG: A Pseudo-Random Number Generator. This is a broader category that includes CSPRNGs and generators that just return irregular values - in other words, you cannot rely on a PRNG to provide you with unpredictable values.
- RNG: A Random Number Generator. The meaning of this term depends on the context. Most people use it as an even broader category that includes PRNGs and truly random number generators.
Every random value that you need for security-related purposes (ie. anything where there exists the possibility of an "attacker"), should be generated using a CSPRNG. This includes verification tokens, reset tokens, lottery numbers, API keys, generated passwords, encryption keys, and so on, and so on.
Bias
In Node.js, the most widely available CSPRNG is the crypto.randomBytes
function, but you shouldn't use this directly, as it's easy to mess up and "bias" your random values - that is, making it more likely that a specific value or set of values is picked.
A common example of this mistake is using the %
modulo operator when you have less than 256 possibilities (since a single byte has 256 possible values). Doing so actually makes lower values more likely to be picked than higher values.
For example, let's say that you have 36 possible random values - 0-9
plus every lowercase letter in a-z
. A naive implementation might look something like this:
let randomCharacter = randomByte % 36;
That code is broken and insecure. With the code above, you essentially create the following ranges (all inclusive):
- 0-35 stays 0-35.
- 36-71 becomes 0-35.
- 72-107 becomes 0-35.
- 108-143 becomes 0-35.
- 144-179 becomes 0-35.
- 180-215 becomes 0-35.
- 216-251 becomes 0-35.
- 252-255 becomes 0-3.
If you look at the above list of ranges you'll notice that while there are 7 possible values for each randomCharacter
between 4 and 35 (inclusive), there are 8 possible values for each randomCharacter
between 0 and 3 (inclusive). This means that while there's a 2.64% chance of getting a value between 4 and 35 (inclusive), there's a 3.02% chance of getting a value between 0 and 3 (inclusive).
This kind of difference may look small, but it's an easy and effective way for an attacker to reduce the amount of guesses they need when bruteforcing something. And this is only one way in which you can make your random values insecure, despite them originally coming from a secure random source.
So, how do I obtain random values securely?
In Node.js:
- If you need a sequence of random bytes: Use
crypto.randomBytes
. - If you need individual random numbers in a certain range: use
crypto.randomInt
. - If you need a random string: You have two good options here, depending on your needs.
- Use a v4 UUID. Safe ways to generate this are
crypto.randomUUID
, and theuuid
library (only the v4 variant!). - Use a nanoid, using the
nanoid
library. This also allows specifying a custom alphabet to use for your random string.
- Use a v4 UUID. Safe ways to generate this are
Both of these use a CSPRNG, and 'transform' the bytes in an unbiased (ie. secure) way.
In the browser:
- When using the Node.js options, your bundler should automatically select equivalently safe browser implementations for all of these.
- If not using a bundler:
- If you need a sequence of random bytes: Use
crypto.getRandomValues
with aUint8Array
. Other array types will get you numbers in different ranges. - If you need a random string: You have two good options here, depending on your needs.
- Use a v4 UUID, with the
crypto.randomUUID
method. - Use a nanoid, using the standalone build of the
nanoid
library. This also allows specifying a custom alphabet to use for your random string.
- Use a v4 UUID, with the
- If you need a sequence of random bytes: Use
However, it is strongly recommended that you use a bundler, in general.
Checking file existence asynchronously
This article was originally published at https://gist.github.com/joepie91/bbf495e044da043de2ba.
Checking whether a file exists before doing something with it, can lead to race conditions in your application. Race conditions are extremely hard to debug and, depending on where they occur, they can lead to data loss or security holes. Using the synchronous versions will not fix this.
Generally, just do what you want to do, and handle the error if it doesn't work. This is much safer.
- If you want to check whether a file exists, before reading it: just try to open the file, and handle the
ENOENT
error when it doesn't exist. - If you want to make sure a file doesn't exist, before writing to it: open the file using an exclusive mode, eg.
wx
orax
, and handle the error when the file already exists. - If you want to create a directory: just try to create it, and handle the error if it already exists.
- If you want to remove a file or directory: just try to unlink the path, and handle the error if it doesn't exist.
If you're really, really sure that you need to use fs.exists
or fs.stat
, then you can use the example code below to do so asynchronously. If you just want to know how to promisify an asynchronous callback that doesn't follow the nodeback convention, then you can look at the example below as well.
You should almost never actually use the code below. The same applies to fs.stat
(when used for checking existence). Make sure you have read the text above first!
const fs = require("fs");
const Promise = require("bluebird");
function existsAsync(path) {
return new Promise(function(resolve, reject){
fs.exists(path, function(exists){
resolve(exists);
})
})
}
Fixing "Buffer without new" deprecation warnings
This article was originally published at https://gist.github.com/joepie91/a0848a06b4733d8c95c95236d16765aa. Newer Node.js versions no longer behave in this exact way, but the information is kept here for posterity. If you have code that still uses new Buffer
, you should still update it.
If you're using Node.js, you might run into a warning like this:
DeprecationWarning: Using Buffer without `new` will soon stop working.
The reason for this warning is that the Buffer creation API was changed to require the use of new
. However, contrary to what the warning says, you should not use new Buffer
either, for security reasons. Any usage of it must be converted as soon as possible to Buffer.from
, Buffer.alloc
, or Buffer.allocUnsafe
, depending on what it's being used for. Not changing it could mean a security vulnerability in your code.
Where is it coming from?
Unfortunately, the warning doesn't indicate where the issue comes from. If you've verified that your own code doesn't use Buffer
without new
anymore, but you're still getting the warning, then you are probably using an (outdated) dependency that still uses the old API.
The following command (for Linux and Cygwin) will list all the affected modules:
grep -rP '(?<!new |[a-zA-Z])Buffer\(' node_modules | grep "\.js" | grep -Eo '^(node_modules/[^/:]+/)*' | sort | uniq -c | sort -h
If you're on OS X, your sort
tool will not have the -h
flag. Therefore, you'll want to run this instead (but the result won't be sorted by frequency):
grep -rP '(?<!new |[a-zA-Z])Buffer\(' node_modules | grep "\.js" | grep -Eo '^(node_modules/[^/:]+/)*' | sort | uniq -c | sort
How do I fix it?
If the issue is in your own code, this documentation will explain how to migrate. If you're targeting older Node.js versions, you may want to use the safe-buffer
shim to maintain compatibility.
If the issue is in a third-party library:
- Run
npm ls <package name here>
to determine where in your dependency tree it is installed, and look at the top-most dependency (that isn't your project itself) that it originates from. - If that top-most dependency is out of date, try updating the dependency first, to see if the warning goes away.
- If the dependency is up-to-date, that means it's an unfixed issue in the dependency. You should create an issue ticket (or, even better, a pull request) on the dependency's repository, asking for it to be fixed.
Why you shouldn't use Sails.js
This article was originally published at https://gist.github.com/joepie91/cc8b0c9723cc2164660e.
This article was published in 2015. Since then, the situation may have changed, and this article is kept for posterity. You should verify whether the issues still apply when making a decision
A large list of reasons why to avoid Sails.js and Waterline: https://kev.inburke.com/kevin/dont-use-sails-or-waterline/
Furthermore, the CEO of Balderdash, the company behind Sails.js, stated the following:
"we promise to push a fix within 60 days",
@kevinburkeshyp This would amount to a Service Level Agreement with the entire world; this is generally not possible, and does not exist in any software project that I know of.
Upon notifying him in the thread that I actually offer exactly that guarantee, and that his statement was thus incorrect, he accused me of "starting a flamewar", and proceeded to delete my posts.
UPDATE: The issue has been reopened by the founder of Balderdash. Mind that this article was written back when this was not the case yet, and judge appropriately.
He is apparently also unaware that Google Project Zero expects the exact same - a hard deadline of 90 days, after which an issue is publicly disclosed.
Now, just locking the thread would have been at least somewhat justifiable - he might have legitimately misconstrued my statement as inciting a flamewar.
What is not excusable, however, is removing my posts that show his (negligent) statement is wrong. This raises serious questions about what the Sails maintainers consider more important: their reputation, or the actual security of their users.
It would have been perfectly possible to just leave the posts intact - the thread would be locked, so a flamewar would not have been a possibility, and each reader could make up their own mind about the state of things.
In short: Avoid Sails.js. They do not have your best interests at heart, and this could result in serious security issues for your project.
For reference, the full thread is below, pre-deletion.
Building desktop applications with Node.js
Option 1: Electron
This is the most popular and well-supported option. Electron is a combination of Node.js and Chromium Embedded Framework, and so it will give you access to the feature sets of both. The main tradeoff is that it doesn't give you much direct control over the window or the system integration.
Benefits
- Cross-platform
- Well-supported, with a large developer base and a lot of (third-party) documentation
- Works pretty much out of the box, and lets you use HTML and CSS
- Can use native Node.js modules
Drawbacks
- Relatively high baseline memory usage; expect 50-100MB of RAM before running any application code. This is fine for most applications, but probably not for tiny utilities.
- Somewhat restrictive; does not give you much control over the system integration, instead has a default setup that's okay for most purposes and abstracts away platform-specific things for the most part.
- Limited OpenGL support; only WebGL is available.
Option 2: SDL
Using https://www.npmjs.com/package/@kmamal/sdl and https://www.npmjs.com/package/@kmamal/gl, you can use SDL and OpenGL directly from Node.js. This will take care of window creation, input handling, and so on - but you will have to do all the drawing yourself using shaders.
A full (low-level) example is available here, and you can also use regl to simplify things a bit.
For text rendering, you may wish to use Pango or Harfbuzz, which can both be used through the node-gtk library (which, despite the name, is a generic GObject Introspection library rather than anything specific to the GTK UI toolkit).
Benefits
- Direct OpenGL access
- Does not enforce any particular structure on your project
- Good selection of examples
Drawbacks
- You have to do all of the drawing yourself; there are no widgets, there is no CSS, and so on. You will be writing OpenGL shaders. There is support for canvas-style drawing, but it is not fast.
- More research required to understand how to use it; not a lot of people use these libraries, and there are not very many tutorials.
Option 3: FFI bindings
You can also use an existing UI library that's written in C, C++ or Rust, by using a generic FFI library that lets you call the necessary functions from Javascript code in Node.js directly.
For C, a good option is Koffi, which has excellent documentation. For Rust, a good option is Neon, whose documentation is not quite as extensive as that of Koffi, but still pretty okay.
Option 4: GTK
The aforementioned node-gtk library can also be used to use GTK directly. Very little documentation is available about this, so you'll likely be stuck reading the GTK documentation (for its C API) and mentally translating to what the equivalent in the bindings would be.
NixOS
Setting up Bookstack
Turned out to be pretty simple.
deployment.secrets.bookstack-app-key = {
source = "../private/bookstack/app-key";
destination = "/var/lib/bookstack/app-key";
owner = { user = "bookstack"; group = "bookstack"; };
permissions = "0700";
};
services.bookstack = {
enable = true;
hostname = "wiki.slightly.tech";
maxUploadSize = "10G";
appKeyFile = "/var/lib/bookstack/app-key";
nginx = { enableACME = true; forceSSL = true; };
database = { createLocally = true; };
};
Server was running an old version of NixOS, 23.05, where MySQL doesn't work in a VPS (anymore). Upgraded the whole thing to 24.11 and then it Just Worked.
Afterwards, run:
bookstack bookstack:create-admin
... in a terminal on the server to set up the primary administrator account. Done.
A *complete* listing of operators in Nix, and their predence.
This article was originally published at https://gist.github.com/joepie91/c3c047f3406aea9ec65eebce2ffd449d.
The information in this article has since been absorbed into the official Nix manual. It is kept here for posterity. It may be outdated by the time you read this.
Lower precedence means a stronger binding; ie. this list is sorted from strongest to weakest binding, and in the case of equal precedence between two operators, the associativity decides the binding.
Prec | Abbreviation | Example | Assoc | Description |
---|---|---|---|---|
1 | SELECT | e . attrpath [or def] |
none | Select attribute denoted by the attribute path attrpath from set e . (An attribute path is a dot-separated list of attribute names.) If the attribute doesn’t exist, return default if provided, otherwise abort evaluation. |
2 | APP | e1 e2 |
left | Call function e1 with argument e2 . |
3 | NEG | -e |
none | Numeric negation. |
4 | HAS_ATTR | e ? attrpath |
none | Test whether set e contains the attribute denoted by attrpath ; return true or false. |
5 | CONCAT | e1 ++ e2 |
right | List concatenation. |
6 | MUL | e1 * e2 |
left | Numeric multiplication. |
6 | DIV | e1 / e2 |
left | Numeric division. |
7 | ADD | e1 + e2 |
left | Numeric addition, or string concatenation. |
7 | SUB | e1 - e2 |
left | Numeric subtraction. |
8 | NOT | !e |
left | Boolean negation. |
9 | UPDATE | e1 // e2 |
right | Return a set consisting of the attributes in e1 and e2 (with the latter taking precedence over the former in case of equally named attributes). |
10 | LT | e1 < e2 |
left | Less than. |
10 | LTE | e1 <= e2 |
left | Less than or equal. |
10 | GT | e1 > e2 |
left | Greater than. |
10 | GTE | e1 >= e2 |
left | Greater than or equal. |
11 | EQ | e1 == e2 |
none | Equality. |
11 | NEQ | e1 != e2 |
none | Inequality. |
12 | AND | e1 && e2 |
left | Logical AND. |
13 | OR | e1 || e2 |
left | Logical OR. |
14 | IMPL | e1 -> e2 |
none | Logical implication (equivalent to !e1 || e2 ). |
Setting up Hydra
This article was originally published at https://gist.github.com/joepie91/c26f01a787af87a96f967219234a8723 in 2017. The NixOS ecosystem constantly changes, and it may not be relevant anymore by the time you read this article.
Just some notes from my attempt at setting up Hydra.
Setting up on NixOS
No need for manual database creation and all that; just ensure that your PostgreSQL service is running (services.postgresql.enable = true;
), and then enable the Hydra service (services.hydra.enable
). The Hydra service will need a few more options to be set up, below is my configuration for it:
services.hydra = {
enable = true;
port = 3333;
hydraURL = "http://localhost:3333/";
notificationSender = "hydra@cryto.net";
useSubstitutes = true;
minimumDiskFree = 20;
minimumDiskFreeEvaluator = 20;
};
Database and user creation and all that will happen automatically. You'll only need to run hydra-init
and then hydra-create-user
to create the first user. Note that you may need to run these scripts as root if you get permission or filesystem errors.
Can't run hydra-*
utility scripts / access the web interface due to database errors
If you already have a services.postgresql.authentication
configuration line from elsewhere (either another service, or your own configuration.nix
), it may be conflicting with the one specified in the Hydra service. There's an open issue about it here.
Can't login
After running hydra-create-user
in your shell, you may be running into the following error in the web interface: "Bad username or password."
When this occurs, it's likely because the hydra-*
utility scripts stored your data in a local SQLite database, rather than the PostgreSQL database you configured. As far as I can tell, this happens because of some missing HYDRA_*
environment variables that are set through /etc/profile
, which is only applied on your next login. Simply opening a new shell is not enough.
As a workaround until your next login/boot, you can run the following to obtain the command you need to run to apply the new environment variables in your current shell:
cat /etc/profile | grep set-environment
... and then run the resulting command (including the dot at the start, if there is one!) in the shell you intend to run the hydra-*
scripts in. If you intend to run them as root, make sure you run the set-environment
script in the root shell - using sudo
will make the environment variables get lost, so you'll be stuck with the same issue as before.
Fixing root filesystem errors with fsck on NixOS
If you run into an error like this:
An error occurred in stage 1 of the boot process, which must mount the root filesystem on `/mnt-root` and then start stage 2. Press one of the following keys:
r) to reboot immediately
*) to ignore the error and continue
Then you can fix it like this:
- Boot into a live CD/DVD for NixOS, or some other environment that has
fsck
installed, but not your installed copy of NixOS (as that will mount the root filesystem) (source) - Run
fsck -yf /dev/sda1
where you replace/dev/sda1
with your root filesystem. (source)- If you're on a (KVM) VPS, it'll probably be
/dev/vda1
. If you're using LVM (even on a VPS), then you need to specify your logical volume instead (eg./dev/vg_main/lv_root
, but it depends on what you've named it).
- If you're on a (KVM) VPS, it'll probably be
The above command will automatically agree to whatever suggestion fsck makes. This can technically lead to data loss!
Many distributions will give you an option to drop down into a shell from the error directly, but NixOS does not do that. In theory you could add the boot.shell_on_fail
flag to the boot options for your existing installation, but for reasons that I didn't bother debugging any further, the installed fsck
was unable to fix the issues.
Stepping through builder steps in your custom packages
This article was originally published at https://gist.github.com/joepie91/b0041188c043259e6e1059d026eff301.
- Create a temporary building folder in your repository (or elsewhere) and enter it:
mkdir test && cd test
nix-shell ../main.nix -A packagename
(assuming the entry point for your custom repository ismain.nix
in the parent directory)- Run the phases individually by entering their name (for a default phase) or doing something like
eval "$buildPhase"
(for an overridden phase) in the Nix shell - a summary of the common ones:unpackPhase
,patchPhase
,configurePhase
,buildPhase
,checkPhase
,installPhase
,fixupPhase
,distPhase
More information about these phases can be found here. If you use a different builder, you may have a different set of phases.
Don't forget to clear out your test
folder after every attempt!
Using dependencies in your build phases
This article was originally published at https://gist.github.com/joepie91/b0041188c043259e6e1059d026eff301.
You can just use string interpolation to add a dependency path to your script. For example:
{ # ... preBuildPhase = '' ${grunt-cli}/bin/grunt prepare ''; # ... }
Source roots that need to be renamed before they can be used
This article was originally published at https://gist.github.com/joepie91/b0041188c043259e6e1059d026eff301.
Some applications (such as Brackets) are very picky about the directory name(s) of your unpacked source(s). In this case, you might need to rename one or more source roots before cd
ing into them.
To accomplish this, do something like the following:
{ # ... sourceRoot = "."; postUnpack = '' mv brackets-release-${version} brackets mv brackets-shell-${shellBranch} brackets-shell cd brackets-shell; ''; # ... }
This keeps Nix from trying to move into the source directories immediately, by explicitly pointing it at the current (ie. top-most) directory of the environment.
Error: `error: cannot coerce a function to a string`
This article was originally published at https://gist.github.com/joepie91/b0041188c043259e6e1059d026eff301.
Probably caused by a syntax ambiguity when invoking functions within a list. For example, the following will throw this error:
{ # ... srcs = [ fetchurl { url = "https://github.com/adobe/brackets-shell/archive/${shellBranch}.tar.gz"; sha256 = shellHash; } fetchurl { url = "https://github.com/adobe/brackets/archive/release-${version}.tar.gz"; sha256 = "00yc81p30yamr86pliwd465ag1lnbx8j01h7a0a63i7hsq4vvvvg"; } ]; # ... }
This can be solved by adding parentheses around the invocations:
{ # ... srcs = [ (fetchurl { url = "https://github.com/adobe/brackets-shell/archive/${shellBranch}.tar.gz"; sha256 = shellHash; }) (fetchurl { url = "https://github.com/adobe/brackets/archive/release-${version}.tar.gz"; sha256 = "00yc81p30yamr86pliwd465ag1lnbx8j01h7a0a63i7hsq4vvvvg"; }) ]; # ... }
`buildInputs` vs. `nativeBuildInputs`?
This article was originally published at https://gist.github.com/joepie91/b0041188c043259e6e1059d026eff301.
More can be found here.
- buildInputs: Dependencies for the (target) system that your built package will eventually run on.
- nativeBuildInputs: Dependencies for the system where the build is being created.
The difference only really matters when cross-building - when building for your own system, both sets of dependencies will be exposed as nativeBuildInputs
.
QMake ignores my `PREFIX`/`INSTALL_PREFIX`/etc. variables!
This article was originally published at https://gist.github.com/joepie91/b0041188c043259e6e1059d026eff301.
QMake does not have a standardized configuration variable for installation prefixes - PREFIX
and INSTALL_PREFIX
only work if the project files for the software you're building specify it explicitly.
If the project files have a hardcoded path, there's still a workaround to install it in $out
anyway, without source code or project file patches:
{ # ... preInstall = "export INSTALL_ROOT=$out"; # ... }
This INSTALL_ROOT
environment variable will be picked up and used by make install
, regardless of the paths specified by QMake.
Useful tools for working with NixOS
This article was originally published at https://gist.github.com/joepie91/67316a114a860d4ac6a9480a6e1d9c5c. Some links have been removed, as they no longer exist, or are no longer updated.
Online things
Development tooling
- A
.drv
file parser in JS - rnix, a Nix (language) parser in Rust
(Reference) documentation
- Nix manual
- A complete list of Nix operators (the list in the official manual is incomplete)
- nixpkgs manual
- NixOS manual
- Official NixOS wiki
Tutorials and examples
- Step-by-step walkthrough of the Nix language
- A shorter primer of the Nix language (probably a better option if you already know another programming language)
- Nix pills (a series of articles about different aspects of Nix; ongoing work on a compact edition can be found here)
- Example configurations
- Hardware configurations (includes configurations for dealing with quirks on specific hardware and models)
Community and support
Miscellaneous notes and troubleshooting
- My Nix and NixOS notes: see the rest of the articles in this chapter!
- My Hydra setup notes
Proprietary AMD drivers (fglrx) causing fatal error in i387.h
This article was originally published at https://gist.github.com/joepie91/ce9267788fdcb37f5941be5a04fcdd0f. It should no longer be applicable, but is preserved here in case a similar issue reoccurs in the future.
If you get this error:
/tmp/nix-build-ati-drivers-15.7-4.4.18.drv-0/common/lib/modules/fglrx/build_mod/2.6.x/firegl_public.c:194:22: fatal error: asm/i387.h: No such file or directory
... it's because the drivers are not compatible with your current kernel version. I've worked around it by adding this to my configuration.nix
, to switch to a 4.1 kernel:
{ # ... boot.kernelPackages = plgs.linuxPackages_4_1; # ... }
Installing a few packages from `master`
This article was originally published at https://gist.github.com/joepie91/ce9267788fdcb37f5941be5a04fcdd0f.
You probably want to install from unstable
instead of master
, and you probably want to do it differently than described here (eg. importing from URL or specifying it as a Flake). This documentation is kept here for posterity, as it is still helpful to understand how to import a local copy of a nixpkgs into your configuration.
git clone https://github.com/NixOS/nixpkgs.git /etc/nixos/nixpkgs-master
- Edit your
/etc/nixos/configuration.nix
like this:
{ config, pkgs, ... }: let nixpkgsMaster = import ./nixpkgs-master {}; stablePackages = with pkgs; [ # This is where your packages from stable nixpkgs go ]; masterPackages = with nixpkgsMaster; [ # This is where your packages from `master` go nodejs-6_x ]; in { # This is where your normal config goes, we've just added a `let` block environment = { # ... systemPackages = stablePackages ++ masterPackages; }; # ... }
GRUB2 on UEFI
This article was originally published at https://gist.github.com/joepie91/ce9267788fdcb37f5941be5a04fcdd0f.
These instructions are most likely outdated. They are kept here for posterity.
This works fine. You need your boot
section configured like this:
{ # ... boot = { loader = { gummiboot.enable = false; efi = { canTouchEfiVariables = true; }; grub = { enable = true; device = "nodev"; version = 2; efiSupport = true; }; }; }; # ... }
Unblock ports in the firewall on NixOS
This article was originally published at https://gist.github.com/joepie91/ce9267788fdcb37f5941be5a04fcdd0f.
The firewall is enabled by default. This is how you open a port:
{ # ... networking = { # ... firewall = { allowedTCPPorts = [ 24800 ]; }; }; # ... }
Guake doesn't start because of a GConf issue
This article was originally published at https://gist.github.com/joepie91/ce9267788fdcb37f5941be5a04fcdd0f. It may or may not still be relevant.
From nixpkgs: GNOME's GConf implements a system-wide registry (like on Windows) that applications can use to store and retrieve internal configuration data. That concept is inherently impure, and it's very hard to support on NixOS.
- Follow the instructions here.
- Run the following to set up the GConf schema for Guake:
gconftool-2 --install-schema-file $(readlink $(which guake) | grep -Eo '\/nix\/store\/[^\/]+\/')"share/gconf/schemas/guake.schemas"
. This will not work if you have changed your Nix store path - in that case, modify the command accordingly.
You may need to re-login to make the changes apply.
FFMpeg support in youtube-dl
This article was originally published at https://gist.github.com/joepie91/ce9267788fdcb37f5941be5a04fcdd0f. It may no longer be necessary.
Based on this post:
{ # ... stablePackages = with pkgs; [ # ... (python35Packages.youtube-dl.override { ffmpeg = ffmpeg-full; }) # ... ]; # ... }
(To understand what stablePackages
is here, see this entry.)
An incomplete rant about the state of the documentation for NixOS
This article was originally published at https://gist.github.com/joepie91/5232c8f1e75a8f54367e5dfcfd573726.
Historical note: I wrote this rant in 2017, originally intended to be posted on the NixOS forums. This never ended up happening, as discussing the (then private) draft already started driving changes to the documentation approach. The documentation has improved since this was written, however some issues remain to this day at the time of writing this remark, in 2024. The rant ends abruptly, because I never ended up finishing it - but it still contains a lot of useful points regarding documentation quality, and so I am preserving it here.
I've now been using NixOS on my main system for a few months, and while I appreciate the technical benefits a lot, I'm constantly running into walls concerning documentation and general problem-solving. After discussing this briefly on IRC in the past, I've decided to post a rant / essay / whatever-you-want-to-call-it here.
An upfront note
My frustration about these issues has built up considerably over the past few months, moreso because I know that from a technical perspective it all makes a lot of sense, and there's a lot of potential behind NixOS. However, I've found it pretty much impenetrable on a getting-stuff-done level, because the documentation on many things is either poor or non-existent.
While my goal here is to get things fixed rather than just complaining about them, that frustration might occasionally shine through, and so I might come across as a bit harsh. This is not my intention, and there's no ill will towards any of the maintainers or users. I just want to address the issues head-on, and get them fixed effectively.
To address any "just send in a PR" comments ahead of time: while I do know how to write good documentation (and I do so on a regular basis), I still don't understand much of how NixOS and nixpkgs are structured, exactly because the documentation is so poorly accessible. I couldn't fix the documentation myself if I wanted to, simply because I don't have the understanding required to do so, and I'm finding it very hard to obtain that understanding.
One last remark: throughout the rant, I'll be posing a number of questions. These are not necessarily all questions that I still have, as I've found the answer to several of them after hours of research - they just serve to illustrate the interpretation of the documentation from the point of view of a beginner, so there's no need to try and answer them in this thread. These are just the type of questions that should be anticipated and answered in the documentation.
Types of documentation
Roughly speaking, there are three types of documentation for anything programming-related:
- Reference documentation
- Conceptual documentation
- Tutorials
In the sections below, "tooling" will refer to any kind of to-be-documented thing - a function, an API call, a command-line tool, and so on.
Reference documentation
Reference documentation is intended for readers who are already familiar with the tooling that is being documented. It typically follows a rigorous format, and defines things such as function names, arguments, return values, error conditions, and so on. Reference documentation is generally considered the "single source of truth" - whatever behaviour is specified there, is what the tooling should actually do.
Some examples of reference documentation:
Reference documentation generally assumes all of the following:
- The reader understands the purpose of the tooling
- The reader understands the concepts that the tooling uses or implements
- The reader understands the relation of the tooling to other tooling
Conceptual documentation
Conceptual documentation is intended for readers who do not yet understand the tooling, but are already familiar with the environment (language, shell, etc.) in which it's used.
Some examples of conceptual documentation:
- http://cryto.net/~joepie91/blog/2016/05/11/what-is-promise-try-and-why-does-it-matter/
- https://hughfdjackson.com/javascript/prototypes-the-short(est-possible)-story/
- https://doc.rust-lang.org/stable/book/the-stack-and-the-heap.html
Good conceptual documentation doesn't make any assumptions about the background of the reader or what other tooling they might already know about, and explicitly indicates any prior knowledge that's required to understand the documentation - preferably including a link to documentation about those "dependency topics".
Tutorials
Tutorials can be intended for two different groups of readers:
- Readers who don't yet understand the environment (eg. "Introduction to Bash syntax")
- Readers who don't want to understand the environment (eg. "How to build a full-stack web application")
While I would consider tutorials pandering to the second category actively harmful, they're a thing that exists nevertheless.
Some examples of tutorials:
- https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide
- https://zellwk.com/blog/crud-express-mongodb/
- http://www.freewebmasterhelp.com/tutorials/phpmysql
Tutorials don't make any assumptions about the background of the reader... but they have to be read from start to end. Starting in the middle of a tutorial is not likely to be useful, as tutorials are more designed to "hand-hold" the reader through the process (without necessarily understanding why things work how they work).
The current state of the Nix(OS) documentation
Unfortunately, the NixOS documentation is currently lacking in all three areas.
The official Nix, NixOS and nixpkgs manuals attempt to be all three types of documentation - tutorials (like this one), conceptual documentation (like this), and reference documentation (like this). The wiki sort of tries to be conceptual documentation (like here), and does so a little better than the manual, but... the wiki is being shut down, and it's still far from complete.
The most lacking aspect of the NixOS documentation is currently the conceptual documentation. What is a "derivation"? Why does it exist? How does it relate to what I, as a user, want to do? How is the Nix store structured, and what guarantees does this give me? What is the difference between /etc/nixos/configuration.nix
and ~/.nixpkgs/config.nix
, and can they be used interchangeably? Is nixpkgs
just a set of packages, or does it also include tooling? Which tooling is provided by Nix the package manager, which is provided by NixOS, and which is provided by nixpkgs? Is this different on non-NixOS, and why?
Most of the official documentation - including the wiki - is structured more like a very extensive tutorial. You're told, step by step, what to do... but not why any of it matters, what it's for, or how to use these techniques in different situations. This wiki section is a good example. What does overrideDerivation
actually do? What's the difference with override
? What's the difference between 'attributes' and 'arguments'? Why is there a random link about the Oracle JDK there? Is the src
completely overridden, or just the attributes that are specified there? What if I want to reevaluate all the other attributes based on the changes that I've made - for example, regenerating the name
attribute based on a changed version
attribute? Are any of these tools useful in other scenarios that aren't directly addressed here?
The "Nix pills" sort of try to address this lack of conceptual information, and are quite informational, but they have their problems too. They are not clearly structured (where's the index of all the articles?), the text formatting can be hard to read, and it is still half of a tutorial - it can be hard to understand later pills without having read earlier ones, because they're not fully self-contained. On top of that, they're third-party documentation and not part of the official documentation.
The official manuals have a number of formatting/structural issues as well. The single-page format is frankly horrible for navigating through - finding anything on the page is difficult, and following links to other things gets messy fast. Because it's all a single page, every tab has the exact same title, it's easy to scroll past the section you were reading, and so on. Half the point of the web is to have hyperlinked content across multiple documents, but the manuals completely forgo that and create a really poor user experience. It's awful for search engines too, because no matter what you search for, you always end up on the exact same page.
Another problem is the fact that I have to say "manuals" - there are multiple manuals, and the distinction between them is not at all clear. Because it's unclear what functionality is provided by what part of the stack, it usually becomes a hunt of going through all three manuals ctrl+F'ing for some keywords, and hoping that you will run into the thing you're looking for. Then once you (hopefully) do, you have to be careful not to accidentally scroll away from it and lose your reference. There's really no good reason for this separation; it just makes it harder to cross-reference between different parts of the stack, and most users will be using all of them anyway.
The manual, as it is, is not a viable format. While I understand that the wiki had issues with outdated information, it's still a far better structure than a set of single-page manuals. I'll go into more detail at the end of this rant, but my proposed solution here would be to follow a wiki-like format for the official documentation.
Missing documentation
Aside from the issues with the documentation format, there are also plenty of issues with its content. Many things are fully undocumented, especially where nixpkgs
is concerned. For example, nothing says that I should be using callPackage_i686
to package something with 32-bits dependencies. Or how to package something that requires the user to manually add a source file from their filesystem using nix-prefetch-url
, or using nix-store --add-fixed
. And what's the difference between those two anyway? And why is there a separate qt5.callPackage
, and when do I need it?
There are a ton of situations where you need oddball solutions to get something packaged. In fact, I would argue that this is the majority of cases - most of the easy pickings have been packaged by now, and the tricky ones are left. But as a new user that just wants to get an application working, I end up spending several hours on each of the above questions, and I'm still not convinced that I have the right answer. Had somebody taken 10 minutes to document this, even if just as a rough note, it would have saved me hours of work.
No clear path to solutions
When faced with a given packaging problem, it's not at all obvious how to get tp the solution. There's no obvious process for fixing or debugging issues, and error messages are often cryptic or poorly formatted. What does "cannot coerce a set to a string" mean, and why is it happening? How can I duct-tape-debug something by adding a print
statement of some variety? Is there an interactive debugger of some sort?
It's very difficult to learn enough about NixOS internals to figure out what the right way is to package any given thing, and because there's no good feedback on what's wrong either, it's too hard to get anything packaged that isn't a standard autotools build. There's no "Frequently Asked Questions" or "Common Packaging Problems" section, nor have I found any useful tooling for analyzing packaging problems in more detail. I've had to write some of this tooling myself!
The documentation should anticipate the common problems that new users run into, and give them some hints on where to start looking. It currently completely fails to do so, and assumes that the users will figure out the relation between things themselves.
Reading code
Because of the above issues, often the only solution is to read the code of existing packages, and try to infer from their expressions how to approach certain problems - but that comes with its own set of problems. There does not appear to be a consistent way of solving packaging problems in NixOS, and almost every package seems to have invented its own way of solving the same problems that other packages have already solved. After several hours of research, it often turns out that half the solutions are either outdated or just wrong. And then I still have no idea what the optimal solution is, out of the remaining options.
This is made worse by the serious lack of comments in nixpkgs
. Barely any packages have comments at all, and frequently there are complex multi-level abstractions in place to solve certain problems, but with absolutely no information to explain why those abstractions exist. They're not exactly self-evident either. Then there are the packages that do have comments, but they're aimed at the user rather than the packager - one such example is the Guake package. Essentially, it seems the repository is absolutely full of hacks with no standardized way of solving problems, no doubt helped by the fact that existing solutions simply aren't documented.
This is a tremendous waste of time for everybody involved, and makes it very hard to package anything unusual, often to the point of just giving up and hacking around the issue in an impure way. Right now we have what seems like a significant amount of people doing the same work over and over and over again, resulting in different implementations every time. If people took the time to document their solutions, this problem would pretty much instantly go away. From a technical point of view, there's absolutely no reason for packaging to be this hard to do.
Tooling
On top of all this, the tooling seems to change constantly - abstractions get deprecated, added, renamed, moved, and so on. Many of the stdenv
abstractions aren't documented, or their documentation is incomplete. There's no clear way to determine which tooling is still in use, and which tooling has been deprecated.
The tooling that is in use - in particular the command-line tooling - is often poorly designed from a usability perspective. Different tools using different flags for the same purpose, behaving differently in different scenarios for no obvious reason. There's a UX proposal that seems to fix many of these problems, but it seems to be more or less dead, and its existence is not widely known.
Rust
Futures and Tokio
This article was originally published at https://gist.github.com/joepie91/bc2d29fab43b63d16f59e1bd20fd7b6e. It may be out of date.
Event loops
If you're not familiar with the concept of an 'event loop' yet, watch this video first. While this video is about the event loop in JavaScript, most of the concepts apply to event loops in general, and watching it will help you understand Tokio and Futures better as well.
Concepts
- Futures: Think of a
Future
like an asynchronousResult
; it represents some sort of result (a value or an error) that will eventually exist, but doesn't yet. It has many of the same combinators as aResult
does, the difference being that they are executed at a later point in time, not immediately. Aside from representing a future result, aFuture
also contains the logic that is necessary to obtain it. AFuture
will 'complete' (either successfully or with an error) precisely once. - Streams: Think of a
Stream
like an asynchronousIterator
; like aFuture
, it represents some sort of data that will be obtained at a later point in time, but unlike aFuture
it can produce multiple results over time. It has many of the same combinators as anIterator
does. Essentially, aStream
is the "multiple results over time instead of one" counterpart to aFuture
. - Executor: An
Executor
is a thing that, when you pass aFuture
orStream
into it, is responsible for 1) turning the logic stored in theFuture
/Stream
into an internal task, 2) scheduling the work for that task, and 3) wiring up the Future's state to any underlying resources. You don't usually implement these yourself, but use a pre-madeExecutor
from some third-party library. The exact scheduling is left up to the implementation of theExecutor
. .wait()
: A method on aFuture
that will block the current thread until theFuture
has completed, and that then returns the result of thatFuture
. This is an example of anExecutor
, although not a particularly useful one; it won't allow you to do work concurrently.- Tokio reactor core: This is also an
Executor
, provided by the Tokio library. It's probably what you'll be using when you use Tokio. The frontpage of the Tokio website provides an example on how to use it. futures_cpupool
: Yet anotherExecutor
; this one schedules the work across a pool of threads.
Databases and data management
Database characteristics
This article was originally published at https://gist.github.com/joepie91/f9df0b96c600b4fb3946e68a3a3344af.
NOTE: This is simplified. However, it's a useful high-level model for determining what kind of database you need for your project.
Data models
- Documents: Single type of thing, no relations
- Relational: Multiple types of things, relations between different types of things
- Graph: Single or multiple types of things, relations between different things of the same type
Consistency models
- Strong consistency: There is a single canonical view of the database, and everything connecting to any node in the database cluster is guaranteed to see the same data at the same moment in time.
- Eventual consistency: There can be multiple different views of the database (eg. different nodes in the cluster may have a different idea of what the current state of the data is), but once you stop changing stuff, they will eventually converge into a single view.
- No consistency: There's no guarantee that all nodes in the cluster will ever converge to the same view, whatsoever.
Schemafulness
- Schemaful: You know the format (fields, types, etc.) of the data upfront. Fields may be optional, but every field you use is defined in the schema upfront.
- Schemaless: You have no idea what the format is going to be. This is rarely applicable, and only really applies when dealing with storing data from a source that doesn't produce data in a reliable format.
Maths and computer science
Articles and notes that are more about the conceptual side of maths and computer science, rather than anything specific to a particular programming language.
Prefix codes (explained simply)
This article was originally published at https://gist.github.com/joepie91/26579e2f73ad903144dd5d75e2f03d83.
A "prefix code" is a type of encoding mechanism ("code"). For something to be a prefix code, the entire set of possible encoded values ("codewords") must not contain any values that start with any other value in the set.
For example: [3, 11, 22]
is a prefix code, because none of the values start with ("have a prefix of") any of the other values. However, [1, 12, 33]
is not a prefix code, because one of the values (12) starts with another of the values (1).
Prefix codes are useful because, if you have a complete and accurate sequence of values, you can pick out each value without needing to know where one value starts and ends.
For example, let's say we have the following codewords: [1, 2, 33, 34, 50, 61]
. And let's say that the sequence of numbers we've received looks like this:
1611333425012
We can simply start from the left, until we have the first value:
1 611333425012
It couldn't have been any value other than 1
, because by definition of what a prefix code is, if we have a 1
codeword, none of the other codewords can start with a 1
.
Next, we just do the same thing again, with the numbers that are left:
1 61 1333425012
Again, it could only have been 61
- because in a prefix code, none of the other codewords would have been allowed to start with 61
.
Let's try it again for the next number:
1 61 1 333425012
Same story, it could only have been a 1
. And again:
1 61 1 33 3425012
Remember, our set of possible codewords is [1, 2, 33, 34, 50, 61]
.
In this case, it could only have been a 33
, because again, nothing else in the set of codewords was allowed to start with 33
. It couldn't have been 34
either - even though it also starts with a 3
(like 33
does), the lack of a 4
as the second digit excludes it as an option.
You can simply keep repeating this until there are no numbers left:
1 61 1 33 34 2 50 1 2
... and now we've 'decoded' the sequence of numbers, even though the sequence didn't contain any information on where one number starts and the next number ends.
Note how the fact that both 33
and 34
start with a 3
didn't matter; shared prefixes are fine, so long as one value isn't in its entirety used as a prefix of another value. So while [33, 34]
is fine (it only shares the 3
, neither of the numbers in its entirety is a prefix of the other), [33, 334]
would not be fine, since 33
is a prefix of 334
in its entirety (33
followed by 4
).
This only works if you can be certain that you got the entire message accurately, though; for example, consider the following sequence of numbers:
11333425012
(Note how this is just the last part of 16 11333425012 )
Now, let's look at the first number - it starts with a 1
. However, we don't know what came before, so is it part of a 61
, or is it just a single, independent 1
? There's no way to know for sure, so we can't split up this message.
It doesn't work if you violate the "can't start with another value" rule, either; for example, let's say that our codewords are [1, 3, 12, 23]
, and we want to decode the following sequence:
12323
Let's start with the first number. It starts with a 1
, so it could be either 1
or 12
. We have no way to know! In this particular example we can't figure it out from the numbers after it, either, as there are two different ways to decode this sequence:
1 23 23
12 3 23
And that's why a prefix code is useful, if you want to distinguish values in a sequence that doesn't have explicit 'markers' of where a value starts and ends.
Hardware
Associated notes about hardware hacking, maintenance, etc.
Cleaning sticky soda spills in a mechanical keyboard without disassembly
Follow these instructions at your own risk. This is an experimental approach that may or may not cause long-term damage to your keyboard.
This approach was only tested using Kailh Choc switches on a Glove80. It may not work with other switch designs.
A Glove80 is a pain to disassemble for cleaning, so I've figured out a way to deal with sticky spills without doing so. I am not certain that it is entirely safe, but so far (several weeks after the spill) I have not noticed any deterioration of functionality, despite having followed this process on multiple switches.
If you *can* disassemble your keyboard and clean it properly (with isopropyl alcohol and then relubricating switches), do that instead! This is a guide of last resort. It may damage your keyboard permanently. You have been warned.
Required tools:
- Alklanet all-purpose cleaner (note: not substitutable by any other cleaning agent, it has specific desirable properties!)
- Paper towel
- Key puller
Process:
- Turn off and disconnect your keyboard
- Remove keycap carefully with key puller (if you have a Glove80, follow their specific instructions for removing caps without damage)
- Spray some Alklanet on the paper towel - not on the switch itself!
- Press against the front of the switch, where there is a 'rail' indent that guides the stem, causing a droplet of Alklanet to seep into the switch. It should only be a tiny amount, just enough to seep down!
- Rapidly press and release the switch many times, you should start seeing the liquid slightly spread inside of the switch
- Blow strongly into the switch for a while, using compressed air of some kind if possible, to accelerate the drying process
- Verify that if you press the switch, you can no longer see liquid moving or air bubbles forming inside (ie. it is fully dry)
- Done!
The reason this works: Alklanet is good at dissolving organics, including sugary drinks, but quite bad (though not entirely ineffective) at degreasing. This means that it will primarily dissolve and remove the sticky spill, without affecting the lubrication much. Because alklanet dissipates into the air quickly, it leaves very little, if any residue behind, limiting the risk of shorted contacts.
If your switch is not registering reliably after this process, it has not been fully cleaned - do it again. If your switch is registering double presses only, then it has not dried sufficiently; immediately unplug and power off, and let it dry for longer. If both happen, it is also not sufficiently cleaned.
If Alklanet is not available where you are, you may try to acquire a different cleaning agent that quickly dissolves into the air, leaves behind no residue, and that affects organic substances but not grease. Commercial window cleaners are your best bet, but this is entirely at your own risk, and you should be certain that it has these properties - labels are often misleading.
Hacking the Areson L103G
This article was originally published many years ago at http://cryto.net/~joepie91/areson/, when I just started digging into supply chains more. The contents have not been checked for accuracy since!
(it's called the Areson G3 according to the USB identification data, though.)
Several years ago, I bought a Medion laser gaming mouse, the MD 86079, at the local Aldi. I'd been using it quite happily for years, and was always amazed at the comfort and accuracy of it, especially on glossy surfaces. Recently, I recommended it to somebody else, only to find that Medion had stopped selling it and it wasn't available anywhere else, either.
So I started researching, to figure out who actually made these things - after all, Medion barely does any manufacturing themselves. I started out by searching for it on Google, but this failed to bring up any useful results. The next step was to check the manual - but that didn't turn up anything useful either. Then I got an idea - what if I looked for clues in the driver software?
Certainly interesting, but not quite what I was looking for...
Bingo! The manufacturer turned out to be Areson.
Some further searching on the manufacturer name led me to believe that the particular model was the L103G, since the exterior shape matched that of my mouse. However, when I searched for this model number a bit more... I started running across the Mtek Gaming Xtreme L103G and the Bross L103G. And, more interestingly, an earlier version of my mouse from Medion that was branded the "USB Mouse Areson L103G"! Apparently it had been staring me in the face for a while, and I failed to notice it.
Either way, the various other mice with the exact same build piqued my interest, and I started looking for other Areson L103G variants. And oh man, there were many. It started out with the Cyber Snipa mice, but as it turns out, Areson builds OEM mice for a large array of brands. Even the OCZ Dominatrix is actually just an Areson L103G! I've made a list of L103G variants and some other Areson-manufactured mice down this page.
Another thing that I noticed, was that all these L103G variants advertised configurable macro keys and DPI settings, up to sometimes 5000 DPI, while my mouse was advertised as hard-set 1600 DPI with just an auto-fire button, and only came with driver software that let me remap the navigational buttons.
Surely if these mice are all the same model, they would have the same chipset and thus the same capabilities? I also wondered why my DPI switching and autofire (macro) buttons didn't work under Linux - if these mice are programmable, then surely this functionality is handled by the mouse chipset and not by the driver?
It was time to fire up a Windows XP VM.
After some mucking around with VirtualBox to get USB passthrough to work (hey, openSUSE packagers, you should probably document that you've disabled that by default for security reasons!), I installed the original driver software for my Medion mouse. Apparently it's not even really a kernel driver - it seems to just be a piece of userspace software that sends signals to the device.
Sure enough, when I installed the driver, then disabled the USB passthrough, and thereby returned the device to the host (Linux) OS... the DPI switcher and macro button still worked fine, despite there being no driver to talk to anymore.
So, what was going on here?
My initial guess was that the mouse initially acts as a 'dumb' preconfigured USB/2.0 mouse, in order to have acceptable behaviour and DPI on a driver-less system, and that it would only enable the 'advanced features' (macros, DPI switching) if it got a signal from the driver saying that the configuration software was present. Now of course this makes sense for a highly configurable gaming mouse, but as my mouse didn't come with such software I found it a little odd.
So I fired up SnoopyPro, and had a look at the interaction that took place. Compared to a 'regular' 5 euro optical USB mouse - which I always have laying around as a spare - I noticed that two more interactions took place:
I haven't gotten around to looking at this in more detail yet (more to come!), but to me, that looks like it registers an extra non-standard configuration interface. Presumably, that interface is used for configuring the DPI and macros, and I suspect that the registration of it triggers enabling the DPI and macro buttons on the device.
USB protocol stuff aside, I wondered - is the hardware in my mouse really the same as that in the other models? And could I (ab)use that fact to configure my mouse beyond its advertised DPI?
As it turns out, yes, I can!
The Trust GXT 33 is another Areson L103G model, advertised as configurable up to 5000 DPI. Its 'driver' software happily lets me configure my mouse up to those 5000 DPI - even though my Medion mouse was only advertised as 1600 DPI! I've changed the configuration (as you can see in the screenshot), and it really does take effect. It even keeps working after detaching it from the USB passthrough and thus returning it to Linux. And it doesn't stop there...
I can even configure macros for it. The interface isn't the most pleasant, but it works. And apparently, I now have some 5.7 KB of free storage space! I wonder if you could store arbitrary data in there...
Either way, back to the L103G. There is a quite wide array of variants of it, and I've made a list below for your perusal. Most of these are not sold anymore, but the Trust GXT 33 is - if it's sold near you (or any of the other L103G models are), I'd definitely recommend picking one up :)
A sidenote: some places reported particular mice (such as the Mtek L103G) as having a 1600 DPI sensor that can interpolate up to 3200 DPI with accuracy loss. However, even when cranking up mine to 5000 DPI, I did not notice any loss in quality - it is therefore possible that there are some differences between the sensors in different models.
The model list
Know of a model not listed here, or have a suggestion / correction / other addition? E-mail me!
Medion MD 86079Medion X81007Medion L103G
|
|
Bross L103G
|
|
Cyber Snipa Stinger
|
|
Mtek Gaming Extreme L103G
|
|
Trust GXT 33
|
|
MSI StarMouse GS501
|
|
OCZ Dominatrix
|
|
Revoltec FightMouse Pro
|
Earlier/simpler models (no macros, etc.)
Gigabyte GM-M6800
|
|
Gigabyte GM-M6880
|
|
PureTrak Valor
|
|
Sentey Whirlwind X
|
|
CANYON CNR-MSG01
|
Server administration
General Linux server management notes, not specific to anything in particular.
Batch-migrating Gitolite repositories to Gogs
This article was originally published at https://gist.github.com/joepie91/2ff74545f079352c740a.
NOTE: This will only work if you are an administrator on your Gogs instance, or if an administrator has enabled local repository importing for all users.
First, save the following as migrate.sh
somewhere, and make it executable (chmod +x migrate.sh
):
HOSTNAME="git.cryto.net"
BASEPATH="/home/git/old-repositories/projects/joepie91"
OWNER_ID="$1"
CSRF=`cat ./cookies.txt | grep _csrf | cut -f 7`
while read REPO; do
REPONAME=`echo "$REPO" | sed "s/\.git\$//"`
curl "https://$HOSTNAME/repo/migrate" \
-b "./cookies.txt" \
-H 'origin: null' \
-H 'content-type: application/x-www-form-urlencoded' \
-H "authority: $HOSTNAME" \
--data "_csrf=$CSRF" \
--data-urlencode "clone_addr=$BASEPATH/$REPO" \
--data-urlencode "uid=$OWNER_ID" \
--data-urlencode "auth_username=" \
--data-urlencode "auth_password=" \
--data-urlencode "repo_name=$REPONAME" \
--data-urlencode "description=Automatically migrated from Gitolite"
done
Change HOSTNAME
to point at your Gogs installation, and BASEPATH
to point at the folder where your Gitolite repositories live on the filesystem. It must be the entire base path - the repository names cannot contain slashes!
Now save the Gogs cookies from your browser as cookies.txt
, and create a file (eg. repositories.txt
) containing all your repository names, each on a new line. It could look something like this:
project1.git
project2.git
project3.git
After that, run the following command:
cat repositories.txt | ./migrate.sh 1
... where you replace 1
with your User ID on your Gogs instance.
Done!
What is(n't) Docker actually for?
This article was originally published at https://gist.github.com/joepie91/1427c8fb172e07251a4bbc1974cdb9cd.
This article was written in 2016. Some details may have changed since.
A brief listing of some misconceptions about the purpose of Docker.
Secure isolation
Some people try to use Docker as a 'containment system' for either:
- Untrusted user-submitted code, or
- Compromised applications
... but Docker explicitly does not provide that kind of functionality. You get essentially the same level of security from just running things under a user account.
If you want secure isolation, either use a full virtualization technology (Xen HVM, QEMU/KVM, VMWare, ...), or a containerization/paravirtualization technology that's explicitly designed to provide secure isolation (OpenVZ, Xen PV, unprivileged LXC, ...)
"Runs everywhere"
Absolutely false. Docker will not run (well) on:
- Old kernels
- OpenVZ
- Non-*nix systems (without additional virtualization that you could do yourself anyway)
- Many other containerized/paravirtualized environments
- Exotic architectures like MIPS
Docker is just a containerization system. It doesn't do magic. And due to environmental limitations, chances are that using Docker will actually make your application run in less environments.
No dependency conflicts
Sort of true, but misleading. There are many solutions to this, and in many cases it isn't even a realistic problem.
- Compiled languages: Just compile your binary statically. Same library overhead as when using Docker, less management overhead.
- Node.js: Completely unnecessary. Dependencies are already local to the project. For different Node.js versions (although you generally shouldn't need this due to LTS schedules and polyfills), nvm.
- Python: virtualenv and pyenv.
- Ruby: This one might actually be a valid reason to use some kind of containerization system. Supposedly tools like
rvm
exist but frankly I've never seen them work well. Even then, Docker is probably not the ideal option (see below). - External dependencies and other stuff: Usually, isolation isn't necessary, as these applications tend to have extremely lengthy backwards compatibility, so you can just run a recent version.
If you do need to isolate something and the above either doesn't suffice or it doesn't integrate with your management flow well enough, you should rather look at something like Nix/NixOS, which solves the dependency isolation problem in a much more robust and efficient way, and also solves the problem of state. It does incur management overhead, like Docker would.
Magic scalability
First of all: you probably don't need any of this. 99.99% of projects will never have to scale beyond a single system, and all you'll be doing is adding management overhead and moving parts that can break, to solve a problem you never had to begin with.
If you do need to scale beyond a single system, even if that needs to be done rapidly, you probably still don't get a big benefit from automated orchestration. You set up each server once, and assuming you run the same OS/distro on each system, the updating process will be basically the same for every system. It'll likely take you more time to set up and manage automated orchestration, than it would to just do it manually when needed.
The only usecase where automated orchestration really shines, is in cases where you have high variance in the amount of infrastructure you need - one day you need a single server, the next day you need ten, and yet another day later it's back down to five. There are extremely few applications that fall into this category, but even if your application does - there have been automated orchestration systems for a long time (Puppet, Chef, Ansible, ...) that don't introduce the kind of limitations or overhead that Docker does.
No need to rely on a sysadmin
False. Docker is not your system administrator, and you still need to understand what the moving parts are, and how they interact together. Docker is just a container system, and putting an application in a container doesn't somehow magically absolve you from having to have somebody manage your systems.
Blocking LLM scrapers on Alibaba Cloud from your nginx configuration
There are currently LLM scrapers running off many Alibaba Cloud IPs, that ignore robots.txt
and pretend to be desktop browsers. They also generate absurd request rates, to the point of being basically a DDoS attack. One way to deal with them is to simply block all of Alibaba Cloud.
This will also block legitimate users of Alibaba Cloud!
Here's how you can block them:
- Generate a deny entry list at https://www.enjen.net/asn-blocklist/index.php?asn=45102&type=nginx
- Add the entries to your nginx configuration. It goes directly in the
server { ... }
block.
On NixOS
If you're using Nix or NixOS, you can keep the deny list in a separate file, which makes it easier to maintain and won't clutter up your nginx configuration as much. It would look something like this:
services.nginx.virtualHosts.<name>.extraConfig = ''
${import ./alibaba-blocklist.nix}
# other config goes here
''
... where you replace <name>
with the name of your hostname.
Dealing with a degraded btrfs array due to disk failure
Forcing a btrfs filesystem to be mounted even though some drives are missing (in a default multi-disk setup, ie. RAID0 for data but RAID1 for metadata):
mount -o degraded,ro /path/to/mount
This assumes that the mounting configuration is defined in your fstab
, and will mount it as read-only in a degraded state. You will be able to browse the filesystem, but any file contents may have unexplained gaps and/or be corrupted. Mostly useful to figure out what data used to be on a degraded filesystem.
Never mount a degraded filesystem as read-write unless you have a very specific reason to need it, and you understand the risks. If applications are allowed to write to it, they can very easily make the data corruption worse, and reduce your chances of data recovery to zero!
Privacy
Don't use VPN services.
This article was originally published at https://gist.github.com/joepie91/5a9909939e6ce7d09e29.
No, seriously, don't. You're probably reading this because you've asked what VPN service to use, and this is the answer.
Note: The content in this post does not apply to using VPN for their intended purpose; that is, as a virtual private (internal) network. It only applies to using it as a glorified proxy, which is what every third-party "VPN provider" does.
- A Russian translation of this article can be found here, contributed by Timur Demin.
- A Turkish translation can be found here, contributed by agyild.
- There's also this article about VPN services, which is honestly better written (and has more cat pictures!) than my article.
Why not?
Because a VPN in this sense is just a glorified proxy. The VPN provider can see all your traffic, and do with it what they want - including logging.
But my provider doesn't log!
There is no way for you to verify that, and of course this is what a malicious VPN provider would claim as well. In short: the only safe assumption is that every VPN provider logs.
And remember that it is in a VPN provider's best interest to log their users - it lets them deflect blame to the customer, if they ever were to get into legal trouble. The $10/month that you're paying for your VPN service doesn't even pay for the lawyer's coffee, so expect them to hand you over.
But a provider would lose business if they did that!
I'll believe that when HideMyAss goes out of business. They gave up their users years ago, and this was widely publicized. The reality is that most of their customers will either not care or not even be aware of it.
But I pay anonymously, using Bitcoin/PaysafeCard/Cash/drugs!
Doesn't matter. You're still connecting to their service from your own IP, and they can log that.
But I want more security!
VPNs don't provide security. They are just a glorified proxy.
But I want more privacy!
VPNs don't provide privacy, with a few exceptions (detailed below). They are just a proxy. If somebody wants to tap your connection, they can still do so - they just have to do so at a different point (ie. when your traffic leaves the VPN server).
But I want more encryption!
Use SSL/TLS and HTTPS (for centralized services), or end-to-end encryption (for social or P2P applications). VPNs can't magically encrypt your traffic - it's simply not technically possible. If the endpoint expects plaintext, there is nothing you can do about that.
When using a VPN, the only encrypted part of the connection is from you to the VPN provider. From the VPN provider onwards, it is the same as it would have been without a VPN. And remember, the VPN provider can see and mess with all your traffic.
But I want to confuse trackers by sharing an IP address!
Your IP address is a largely irrelevant metric in modern tracking systems. Marketers have gotten wise to these kind of tactics, and combined with increased adoption of CGNAT and an ever-increasing amount of devices per household, it just isn't a reliable data point anymore.
Marketers will almost always use some kind of other metric to identify and distinguish you. That can be anything from a useragent to a fingerprinting profile. A VPN cannot prevent this.
So when should I use a VPN?
There are roughly two usecases where you might want to use a VPN:
- You are on a known-hostile network (eg. a public airport WiFi access point, or an ISP that is known to use MITM), and you want to work around that.
- You want to hide your IP from a very specific set of non-government-sanctioned adversaries - for example, circumventing a ban in a chatroom or preventing anti-piracy scareletters.
In the second case, you'd probably just want a regular proxy specifically for that traffic - sending all of your traffic over a VPN provider (like is the default with almost every VPN client) will still result in the provider being able to snoop on and mess with your traffic.
However, in practice, just don't use a VPN provider at all, even for these cases.
So, then... what?
If you absolutely need a VPN, and you understand what its limitations are, purchase a VPS and set up your own (either using something like Streisand or manually - I recommend using Wireguard). I will not recommend any specific providers (diversity is good!), but there are plenty of cheap ones to be found on LowEndTalk.
But how is that any better than a VPN service?
A VPN provider specifically seeks out those who are looking for privacy, and who may thus have interesting traffic. Statistically speaking, it is more likely that a VPN provider will be malicious or a honeypot, than that an arbitrary generic VPS provider will be.
So why do VPN services exist? Surely they must serve some purpose?
Because it's easy money. You just set up OpenVPN on a few servers, and essentially start reselling bandwidth with a markup. You can make every promise in the world, because nobody can verify them. You don't even have to know what you're doing, because again, nobody can verify what you say. It is 100% snake-oil.
So yes, VPN services do serve a purpose - it's just one that benefits the provider, not you.
This post is licensed under the WTFPL or CC0, at your choice. You may distribute, use, modify, translate, and license it in any way.
Normies just don't care about privacy
If you're a privacy enthusiast, you probably clicked a link to this post thinking it's going to vindicate you; that it's going to prove how you've been right all along, and "normies just don't care about privacy", despite your best efforts to make them care. That it's going to show how you're smarter, because you understand the threats to privacy and how to fight them.
Unfortunately, you're not right. You never were. Let's talk about why, and what you should do next.
So, first of all, let's dispense with the "normie" term. It's a pejorative term, a name to call someone when they don't have your exact set of skills and interests, a term to use when you want to imply that someone is clueless or otherwise below you. There's no good reason to use it, and it suggests that you're looking down on them. Just call them "people", like everybody else and like yourself - you don't need to turn them into a group of "others" to begin with.
Why does that matter? Well, would you take advice from someone who looks down on you? You probably wouldn't. Talking about "normies" pretty much sets the tone for a conversation; it means that you don't care about someone elses interests or circumstances, that you won't treat them like a full human being of equal value to yourself. In other words, you're being an arrogant asshole. And noone likes arrogant assholes.
And this is also exactly why you think that they "just don't care about privacy". They might have even explicitly told you that they don't! So then it's clear, right? If they say they don't care about privacy, that must mean that they don't care about privacy, otherwise they wouldn't say that!
Unfortunately, that's not how it works. Most likely, the reason they told you that they "don't care" is to make you go away. Most likely, you've been quite pushy, telling them what they should be doing or using instead, and responding to every counterpoint with an even stronger recommendation, maybe even trying to make them feel guilty about "not caring enough" just because they're not as enthusiastic about it as you are.
And how do you make an enthusiast like that go away? You cut off the conversation. You tell them that you don't care. You leave zero space for the enthusiast to wiggle their way back into the conversation, for them to try and continue arguing something that you've grown tired of. If you don't care, then there's nothing to argue about, and so that is what they tell you.
In reality, almost everybody does care about privacy. To different degrees, in different situations, and in different ways - but almost everybody cares. People lock the bathroom door; they use changing stalls; they don't like strangers shouldersurfing their phone screen; they hide letters and other things. Clearly people do care. They probably also know that Facebook and the like are pretty shitty, considering that media outlets have been reporting on it for a decade now. You don't need to tell them that.
So what should you do? It's easy for me to say "don't be pushy", but then how do you help people keep their communications private? How do you help advance the state of private communications in general?
The answer is to understand, not argue. Don't try to convince people, at least not directly. Don't tell them what to do, or what to use. Don't try to make them feel bad about using closed or privacy-unfriendly systems. Instead, ask questions. Try to understand their circumstances - who do they talk to, why do they need to use specific services? Does their employer require it? Are their friends refusing to move over to something without a specific feature?
Recognize and accept that caring about privacy does not mean it needs to be your primary purpose in life. Someone can simultaneously care about privacy, but also refuse to stop using Facebook because they care more about talking to a long-lost friend who is not reachable anywhere else. They can care about privacy, but care more about keeping their job which requires using Slack. They're not enthusiasts, and they shouldn't need to be to have privacy in their life - that's the whole point of the privacy movement, isn't it?
Finally, once you have asked enough questions - without being judgmental or considering answers 'wrong' in any way - you can build an understanding of someone's motivations and concerns and interests. You now have enough information to understand whether you can help them make their life more private without giving up on the things they care about.
Maybe they really want reactions in their messenger when talking to their friends, and just weren't aware that Matrix can do that, and that's what kept them on Discord. Maybe they've looked at Mastodon, but it looked like a ghost town to them, just because they didn't know about a good instance to join. But these are all things that you can't know until you've learned about someone's individual concerns and priorities. Things that you would never learn about to begin with, if they cut you off with "I don't care" because you're being pushy.
And maybe, the answer is that you can't do anything for them. Maybe, they just don't have any other options, and there are issues with all your alternative suggestions that would make them unworkable in their situation. Sometimes, the answer is just that something isn't good enough yet; and that you need to accept that, and put in the work to improve the tool instead of trying to convince people to use it as-is.
Don't be the insufferable privacy nut. Be the helpful, supportive and understanding friend who happens to know things about privacy.
Security
The computer kind, mostly.
Why you probably shouldn't use a wildcard certificate
This article was originally published at https://gist.github.com/joepie91/7e5cad8c0726fd6a5e90360a754fc568.
Recently, Let's Encrypt launched free wildcard certificates. While this is good news in and of itself, as it removes one of the last remaining reasons for expensive commercial certificates, I've unfortunately seen a lot of people dangerously misunderstand what wildcard certificates are for.
Therefore, in this brief post I'll explain why you probably shouldn't use a wildcard certificate, as it will put your security at risk.
A brief explainer
It's generally pretty poorly understood (and documented!) how TLS ("SSL") works, so let's go through a brief explanation of the parts that are important here.
The general (simplified) idea behind how real-world TLS deployments work, is that you:
- Generate a cryptographic keypair (private + public key)
- Generate a 'certificate' from that (containing the public key + some metadata, such as your hostname and when the certificate will expire)
- Send the certificate to a Certificate Authority (like Let's Encrypt), who will then validate the metadata - this is where it's ensured that you actually own the hostname you've created a certificate for, as the CA will check this.
- Receive a signed certificate - the original certificate, plus a cryptographic signature proving that a given CA validated it
- Serve up this signed certificate to your users' clients
The client will then do the following:
- Verify that the certificate was signed by a Certificate Authority that it trusts; the keys of all trusted CAs already exist on your system.
- If it's valid, treat the public key included with the certificate as the legitimate server's public key, and use that key to encrypt the communication with the server
This description is somewhat simplified, and I don't want to go into too much detail as to why this is secure from many attacks, but the general idea is this: nobody can snoop on your traffic or impersonate your server, so long as 1) no Certificate Authorities have their own keys compromised, and 2) your keypair + signed certificate have not been leaked.
So, what's a wildcard certificate really?
A typical TLS certificate will have an explicit hostname in its metadata; for example, Google might have a certificate for mail.google.com
. That certificate is only valid on https://mail.google.com/
- not on https://google.com/
, not on https://images.google.com/
, and not on https://my.mail.google.com/
either. In other words, the hostname has to be an exact match. If you tried to use that certificate on https://my.mail.google.com/
. you'd get a certificate error from your browser.
A wildcard certificate is different; as the name suggests, it uses a wildcard match rather than an exact match. You might have a certificate for *.google.com
, and it would be valid on https://mail.google.com/
and https://images.google.com/
- but still not on https://google.com/
or https://my.mail.google.com/
. In other words, the asterisk can match any one single 'segment' of a hostname, but nothing with a full stop in it.
There are some situations where this is very useful. Say that I run a website builder from a single server, and every user gets their own subdomain - for example, my website might be at https://joepie91.somesitebuilder.com/
, whereas your website might be at https://catdogcat.somesitebuilder.com/
.
It would be very impractical to have to request a new certificate for every single user that signs up; so, the easier option is to just request one for *.somesitebuilder.com
, and now that single certificate works for all users' subdomains.
So far, so good.
So, why can't I do this for everything with subdomains?
And this is where we run into trouble. Note how in the above example, all of the sites are hosted on a single server. If you run a larger website or organization with lots of subdomains that host different things - say, for example, Google with their images.google.com
and mail.google.com
- then these subdomains will probably be hosted on multiple servers.
And that's where the security of wildcard certificates breaks down.
Remember how one of the two requirements for TLS security is "your keypair + signed certificate have not been leaked". Sometimes certificates do leak - servers sometimes get hacked, for example.
When this happens, you'd want to limit the damage of the compromise - ideally, your certificate will expire pretty rapidly, and it doesn't affect anything other than the server that was compromised anyway. After fixing the issue, you then revoke the old compromised certificate, replace it with a new, non-compromised one, and all your other servers are unaffected.
In our single-server website builder example, this is not a problem. We have a single server, it got compromised, the stolen certificate only works for that one single server; we've limited the damage as much as possible.
But, consider the "multiple servers" scenario - maybe just the images.google.com
server got hacked, and mail.google.com
was unaffected. However, the certificate on images.google.com
was a wildcard certificate for *.google.com
, and now the thief can use it to impersonate the mail.google.com
server and intercept people's e-mail traffic, even though the mail.google.com
server was never hacked!
Even though originally only one server was compromised, we didn't correctly limit the damage, and now the e-mail server is at risk too. If we'd had two certificates, instead - one for mail.google.com
and one for images.google.com
, each of the servers only having access to their own certificate - then this would not have happened.
The moral of the story
Each certificate should only be used for one server, or one homogeneous cluster of servers. Different services on different servers should have their own, usually non-wildcard certificates.
If you have a lot of hostnames pointing at the same service on the same server(s), then it's fine to use a wildcard certificate - so long as that wildcard certificate doesn't also cover hostnames pointing at other servers; otherwise, each service should have its own certificates.
If you have a few hostnames pointing at unique servers and everything else at one single service - eg. login.mysite.com
and then a bunch of user-created sites - then you may want to put the wildcard-covered hostnames under their own prefix. For example, you might have one certificate for login.mysite.com
, and one (wildcard) certificate for *.users.mysite.com
.
In practice, you will almost never need wildcard certificates. It's nice that the option exists, but unless you're automatically generating subdomains for users, a wildcard certificate is probably an unnecessary and insecure option.
(To be clear: this is in no way specific to Let's Encrypt, it applies to wildcard certificates in general. But now that they're suddenly not expensive anymore, I think this problem requires a bit more attention.)
The Fediverse and Mastodon
The 5-minute guide to the fediverse and Mastodon
This article was originally published at https://gist.github.com/joepie91/f924e846c24ec7ed82d6d554a7e7c9a8.
There are lots of guides explaining Mastodon and the broader fediverse, but they often go into way too much detail. So I've written this guide - it only talks about the basics you need to know to start using it, and you can then gradually learn the rest from other helpful fediverse users. Let's get started!
The fediverse is not Twitter!
The fediverse is very different from Twitter, and that is by design. It's made for building close communities, not for building a "global town square" or as a megaphone for celebrities. That means many things will work differently from what you're used to. Give it some time, and ask around on the fediverse if you're not sure why something works how it does! People are usually happy to explain, as long as it's a genuine question. Some of the details are explained in this article, but it's not required reading.
The most important takeaway is the "community" part. Clout-chasing and dunking are strongly frowned upon in the fediverse. People expect you to talk to others like they're real human beings, and they will do the same for you.
The fediverse is also not just Mastodon
"The fediverse" is a name for the thousands of servers that connect together to form a big "federated" social network. Every server is its own community with its own people and "vibe", but you can talk to people in other communities as well. Different servers also run different software with different features, and Mastodon is the most well-known option - but you can also talk to servers using different fediverse software, like Misskey.
It doesn't matter what server you pick... mostly
Like I said, different servers have different communities. But don't get stuck on picking one - you can always move to a different server later, and your follows will move with you. Just pick the first server from https://joinmastodon.org/servers that looks good to you. In the long run, you'll probably want to use a smaller server with a closer community, but again it's okay if you start out on a big server first. Other people on your server can help you find a better option later on!
Also keep in mind that the fediverse is run by volunteers; if you run into issues with your server, then you can usually just talk to the admin to get them resolved. It's not like a faceless corporation where you get bounced from department to department!
It's a good idea to avoid mastodon.social and mastodon.online - they have long-standing moderation issues, and are frequently overloaded.
Content warnings and alt texts are important
There are two important parts of the culture on the fediverse that you might not be used; content warnings, and image alt texts. You should always give images a useful descriptive alt text (though it doesn't have to be detailed!), so that the many blind and vision-impaired users in the fediverse can also understand them. They can also help for people to understand jokes that they otherwise wouldn't get. Many people will never "boost" (basically retweet) images that don't have an alt text.
Content warnings are a bit subtler, but also very important. There is a strong culture of using content warnings on the fediverse, and so when in doubt, you should err on the side of using them. Because they are so widespread, people are used to them - you don't need to worry that people won't read things behind a CW. CW rules vary across communities, but you should at least put a CW on posts about violence, politics, sexuality, heavy topics, meta stuff about Twitter or the fediverse, and anything that's currently a "hot topic" that everybody seems to be talking about.
This helps people keep control over what they see, and stops people from getting overwhelmed, like you've probably seen (or felt) happen a lot on Twitter. Replies automatically get the same CW, so it's pretty easy to use.
Take your time
The fediverse isn't built around algorithmic feeds like Twitter is, so by default you won't really find much happening - what you see is entirely determined by who you follow, and it'll take some time to find people you like. This is normal! Things will get much more lively once you're following and interacting with a few people. Likewise, there's no "one big network" - you'll have a different 'view of the network' from every server, because communities tend to be tight-knit. This also means that it's difficult for unpleasant people to find you.
It's a good idea to make an introduction post, tagged with the #introduction
hashtag, and hashtags for any of the other topics you're interested in. Posts on the fediverse can only be found by their hashtag, so they're important to use if you want people to find you. Likewise, you can search for hashtags to find interesting people.
That's pretty much it! You'll find many more useful tips on the fediverse itself, under the #FediTips
hashtag. Take your time, explore, get used to how everything works, learn about the local culture, and ask for help in a post if you can't figure something out! There are many people who will be happy to help you out.
Cryptocurrency
No, your cryptocurrency cannot work
This article was originally published at https://gist.github.com/joepie91/daa93b9686f554ac7097158383b97838.
Whenever the topic of Bitcoin's energy usage comes up, there's always a flood of hastily-constructed comments by people claiming that their favourite cryptocurrency isn't like Bitcoin, that their favourite cryptocurrency is energy-efficient and scalable and whatnot.
They're wrong, and are quite possibly trying to scam you. Let's look at why.
What is a cryptocurrency anyway?
There are plenty of intricate and complex articles trying to convince you that cryptocurrencies are the future. They usually heavily use jargon and vague terms, make vague promises, and generally give you a sense that there must be something there, but you always come away from them more confused than you were before.
That's not because you're not smart enough; that's because such articles are intentionally written to be confusing and complex, to create the impression of cryptocurrency being some revolutionary technology that you must invest in, while trying to obscure that it's all just smoke and mirrors and there's really not much to it.
So we're not going to do any of that. Let's look at what cryptocurrency really is, the fundamental concept, in simple terms.
A cryptocurrency, put simply, is a currency that is not controlled by an appointed organization like a central bank. Instead, it's a system that's built out of technical rules, code that can independently decide whether someone holds a certain amount of currency and whether a given transaction is valid. The rules are defined upfront and difficult for anybody to change afterwards, because some amount of 'consensus' (agreement) between the systems of different users is needed for that. You can think of it kind of like an automated voting process.
Basically, a cryptocurrency is a currency that is built as software, and that software runs on many people's computers. On paper, this means that "nobody controls it", because everybody has to play by the predefined rules of the system. In practice, it's unfortunately not that simple, and cryptocurrencies end up being heavily centralized, as we'll get to later.
So why does Bitcoin need so much energy?
The idea of a currency that can be entirely controlled by independent software sounds really cool, but there are some problems. For example, how do you prevent one person from convincing the software that they are actually a million different people, and misusing that to influence that consensus process? If you have a majority vote system, then you want to make really sure that everybody can only cast one vote, otherwise it would be really easy to tamper with the outcome.
Cryptocurrencies try to solve this using a 'proof scheme', and Bitcoin specifically uses what's called "proof of work". The idea is that there is a finite amount of computing power in the world, computing power is expensive, and so you can prevent someone from tampering with the 'vote' by requiring them to do some difficult computations. After all, computations can be automatically and independently checked, and so nobody can pretend to have more computing power than they really do. So that's the problem solved, right?
The underlying trick here is to make a 'vote' require the usage of something scarce, something relatively expensive, something that you can't just infinitely wish into existence, like you could do with digital identities. It makes it costly in the real world to participate in the network. That's the core concept behind a proof scheme, and it is crucial for the functioning of a cryptocurrency - without a proof scheme requiring a scarce resource of some sort, the network cannot protect itself and would be easy to tamper with, making it useless as a currency.
To incentivize people to actually do this kind of computation - keep in mind, it's expensive! - cryptocurrencies are set up to reward those who do it, by essentially giving them first dibs on any newly minted currency. This is all fully automated based on that predefined set of rules, there are no manual decisions from some organization involved here.
Unfortunately, we're talking about currencies, and where there are currencies, there is money to be made. And many greedy people have jumped at the chance of doing so with Bitcoin. That's why there are entire datacenters filled with "Bitcoin miners" - computers that are built for just a single purpose, doing those computations, to get a claim on that newly minted currency.
And that is why Bitcoin uses so much energy. As long as the newly minted coins are worth slightly more than the cost of the computations, it's economically viable for these large mining organizations to keep building more and more 'miners' and consuming more and more energy to stake their claim. This is also why energy usage will always go up alongside the exchange rate; the more a Bitcoin is 'worth', the more energy miners are willing to put into obtaining one.
And that's a fundamental problem, one that simply cannot be solved, because it is so crucial to how Bitcoin works. Bitcoin will forever continue consuming more energy as the exchange rate rises, which is currently happening due to speculative bubbles, but which would happen if it gained serious real-world adoption as well. If everybody started using Bitcoin, it would essentially eat the world. There's no way around this.
Even renewable energy can't solve this; renewable energy still requires polluting manufacturing processes, it is often difficult to scale, and it is often more expensive than fossil fuels. So in practice, "mining Bitcoins on renewable energy" - insofar that happens at all - means that all the renewable energy you are now using could not be distributed to factories or households, and they have to continue running on non-renewable energy instead, so you're just shuffling chairs! And because of the endless growth of Bitcoin's energy consumption, it is pretty much guaranteed that those renewable energy resources won't even be enough in the end.
So there's this proof-of-stake thing, right?
You'll often see 'proof of stake' mentioned as an alternative proof scheme in response to this. So what is that, anyway?
The exact implementations vary and can get very complex, but every proof-of-stake scheme is basically some variation of "instead of the scarce resource being energy, it's the currency itself". In other words: the more of the currency that you own, the more votes you have, the more control you have over how the network (and therefore the currency) works as a whole.
You can probably begin to see the problem here already: if the currency is controlled by those who have most of it, how is this any different from government-issued currency, if it's the wealthy controlling the financial system either way? And you'd be completely right. There isn't really a difference.
But what you might not realize, is that this applies for proof-of-work cryptocurrencies too. The frequent claim is that Bitcoin is decentralized and controlled by nobody, but that isn't really true. Because who can afford to invest the most in specialized mining hardware? Exactly, the wealthy. And in practice, almost the entire network is controlled by a small handful of large mining companies and 'mining pools'. Not very decentralized at all.
The same is true for basically every other proof scheme, such as Chia's "proof of space and time", where the scarce resource is just "free storage space". Wealthy people can afford to buy more empty harddrives and SSDs and gain an edge. Look at any cryptocurrency with any proof scheme and you will find the same problem, because it is a fundamental one - if power in your system is handed out based on ownership of a scarce resource of some sort, the wealthy will always have an edge, because they can afford to buy whatever it is.
In other words: it doesn't actually matter what the specific scarce resource is, and it doesn't matter what the proof scheme is! Power will always centralize in the hands of the wealthy, either those who already were wealthy, or those who have recently gotten wealthy with cryptocurrency through dubious means.
The only redeeming feature of proof-of-stake (and many other proof schemes) over proof-of-work is that it does indeed address the energy consumption problem - but that's little comfort when none of these options actually work in a practical sense anyway. This is ultimately a socioeconomic problem, not a technical one, and so you can't solve it with technology.
And that brings us to the next point...
Yes, cryptocurrencies are effectively pyramid schemes
While Bitcoin was not originally designed to be a pyramid scheme, it is very much one now. Nearly every other cryptocurrency was designed to be one from the start.
The trick lies in encouraging people to buy a cryptocurrency. Whoever is telling you that their favourite cryptocurrency is the real deal, the solution to all problems, probably is holding quite a bit of that currency, and is waiting for it to appreciate in value so that they can 'cash out' and turn a profit. The way to make that value appreciation happen, is by trying to convince people like you to 'invest' or 'get in' on it. If you buy the cryptocurrency, that will drive up the price. If a lot of people buy the cryptocurrency, that will drive up the price a lot.
The more hype you can create for a cryptocurrency, the more profit potential there is in it, because more people will 'buy in' and drive up the price before you cash out. This is why there are flashy websites for cryptocurrencies promising the world and revolutionary technology, this is why people on Twitter follow you around incessantly spamming your replies with their favourite cryptocurrency, this is why people take out billboards to advertise the currency. It's a pump-and-dump stock.
This is also the reason why proponents of cryptocurrencies are always so mysterious about how it works, invoking jargon and telling you how much complicated work 'the team' has done on it. The goal is to make you believe that 'there must be something to it' for long enough that you will buy in and they can sell off. By the time you figure out it was all just smoke and mirrors, they're long gone with their profits.
And then the only choice to recoup your investment is for you to hype it up and try to replicate the rise in value. Like a pyramid scheme.
The bottom line
Cryptocurrency as we know it today, simply cannot work. It promises to decentralize power, but proof schemes necessarily give an edge to the wealthy. Meanwhile there's every incentive for people to hype up worthless cryptocurrencies to make a quick buck, all the while disrupting supply chains (GPUs, CPUs, hard drives, ...), and boiling the earth through energy usage that far exceeds that of all of Google.
Maybe some day, a legitimate cryptocurrency without Bitcoin's flaws will come to exist. If it does, it will be some boring research paper out of an academic lab in three decades, not a flashy startup promising easy money or revolutionary new tech today. There are no useful cryptocurrencies today, and there will not be any at any time in the near future. The tech just doesn't work.
Is my blockchain a blockchain?
This article was originally published at https://gist.github.com/joepie91/e49d2bdc9dfec4adc9da8a8434fd029b.
Your blockchain must have all of the following properties:
- It's a merkle tree, or a construct with equivalent properties.
- There is no single point of trust or authority; nodes are operated by different parties.
- Multiple 'forks' of the blockchain may exist - that is, nodes may disagree on what the full sequence of blocks looks like.
- In the case of such a fork, there must exist a deterministic consensus algorithm of some sort to decide what the "real" blockchain looks like (ie. which fork is "correct").
- The consensus algorithm must be executable with only the information contained in the blockchain (or its forks), and no external input (eg. no decisionmaking from a centralized 'trust node').
If your blockchain is missing any of the above properties, it is not a blockchain, it is just a ledger.
You don't need a blockchain.
This article was originally published at https://gist.github.com/joepie91/a90e21e3d06e1ad924a1bfdfe3c16902.
If you're reading this, you probably suggested to somebody that a particular technical problem could be solved with a blockchain.
Blockchains aren't a desirable thing; they're defined by having trustless consensus, which necessarily has to involve some form of costly signaling to work; that's what prevents attacks like sybil attacks.
In other words: blockchains must be expensive to operate, to work effectively. This makes it a last-resort solution, when you truly have no other options available for solving your problem; in almost every case you want a cheaper and less complex solution than a blockchain.
In particular, if your usecase is commercial, then you do not need or want trustless consensus. This especially includes usecases like supply chain tracking, ticketing, and so on. The whole point of a company is to centralize control; that's what allows a company to operate efficiently. Trustless consensus is the exact opposite of that.
Of course, you may still have a problem of trust, so let's look at some common solutions to common trust problems; solutions that are a better option than a blockchain.
- If you just need to provide authenticity for a piece of data: A cryptographic signature. There's plenty of options for this. Learn more about basic cryptographic concepts here.
- If you need an immutable chain of data: Something simple that uses a merkle tree. A well-known example of this application is Git, especially in combination with signed commits.
- If that immutable chain of data needs to be added to by multiple parties (eg. companies) that mutually distrust each other: A cryptographically signed, append-only, replicated log. Chronicle can do this, and a well-known public deployment of this type of technology is Certificate Transparency. There are probably other options. These are not blockchains.
- If you need to verify that nobody has tampered with physical goods: This is currently impossible, with or without a blockchain. Nobody has yet figured out a reliable way to feed information about the real-world into a digital system, without allowing the person entering it (or handling the sensors that do so) to tamper with that data.
Some people may try to sell you one of the above things as a "blockchain". It's not, and they're lying to you. A blockchain is defined by its trustless consensus; all of the above schemes have existed for way longer than blockchains have, and solve much simpler problems. The above systems also don't provide full decentralization - and that is a feature, because decentralization is expensive.
If somebody talks to you about a "permissioned blockchain" or a "private blockchain", they are also feeding you bullshit. Those things do not actually exist, and they are just buzzwords to make older concepts sound like a blockchain, when they're really not. It's most likely just a replicated append-only log.
There's quite a few derivatives of blockchains, like "tangles" and whatnot. They are all functionally the same as a blockchain, and they suffer from the same tradeoffs. If you do not need a blockchain, then you also do not need any of the blockchain derivatives.
In conclusion: blockchains were an interesting solution to an extremely specific problem, and certainly valuable from a research standpoint. But you probably don't have that extremely specific problem, so you don't need and shouldn't want a blockchain. It'll just cost you crazy amounts of money, and you'll end up with something that either doesn't work, or something that has conceptually existed for 20 years and that you could've just grabbed off GitHub yourself.
Additions
I'm going to add some common claims here over time, and address them.
"But it's useful as a platform to build upon!"
One of the most important properties of a platform is that it must be cost-efficient, or at least as cost-efficient as the requirements allow. When you build on an unnecessarily expensive foundation, you can never build anything competitive - whether commercial or otherwise.
Like all decentralized systems, blockchains fail this test for usecases that do not benefit from being decentralized, because decentralized systems are inherently more expensive than centralized systems; the lack of a trusted party means that work needs to be duplicated for both availability and verification purposes. It is a flat-out impossibility to do less work in an optimal decentralized system than in an equivalent optimal centralized system.
Unlike most decentralized systems, blockchains add an extra cost factor: costly signaling, as described above. For a blockchain to be resiliently decentralized, it must introduce some sort of significant participation cost. For proof-of-work, that cost is in the energy and hardware required, but any tangible participation cost will work. Forms of proof-of-stake are not resiliently decentralized; the cost factor can be bypassed by malicious adversaries in a number of ways, meaning that PoS-based systems aren't reliably decentralized.
In other words: due to blockchains being inherently expensive to operate, they only make sense as a platform for things that actually need trustless consensus - and that list pretty much ends at 'digital currency'. For everything else, it is an unnecessary expense and therefore a poor platform choice.
test.js
Test please ignore
Dependency management
Transitive dependencies and the commons
In this article, I want to explain why I personally only work with programming languages anymore that allow conflicting transitive dependencies, and why this matters for the purpose of building a commons for software.
Types of dependency structures
There are a lot of considerations in designing dependency systems, but there are two axes I've found to be particularly relevant to the topic of a software commons: nested vs. flat dependencies, and system-global vs. project-local dependencies.
Nested vs. flat dependencies
There are roughly two ways to handle transitive dependencies - that is, dependencies of your dependencies:
- Either you make the whole dependency set a 'flat' one, where every dependency is a top-level one, or
- You represent the dependency structure as a tree, where each dependency in your dependency set has its own, isolated dependency set internally.
I'm talking about the conceptual structure here, so it doesn't actually matter how these dependencies are stored on disk, but I'll illustrate the two forms below.
This is what a nested dependency tree in some project might look like:
And this would be the equivalent tree in a flat dependency structure:
This also immediately shows the primary limitation of a flat dependency set: you cannot have conflicting versions of a transitive dependency in your project! This is what most of the process of "dependency solving" is about - of all the theoretically possible versions of every dependency in your project, find the set that actually works together, ie. where every dependency matches the version constraints for all of its occurrences in the project.
Project-local vs. system-global dependencies
Another important, and related, design decision in a dependency system is whether dependencies are isolated to the project, or whether you have a single dependency set that's used system-wide. This is somewhat self-explanatory; if your dependencies are project-local then that means that they are stored within the project, but if they are system-global then there's a system-wide dependency set of some sort, and so the project gets its dependencies from the environment.
Some examples
Here are some examples of different combinations of these properties, and where you might find them:
- System-global, flat: Python (without virtual environment tools). There is a single system-wide collection of Python packages, that every piece of Python software on the system uses. Can be turned into "Project-local, flat" using
virtualenv
. Another example would be C-style libraries. - Project-local, flat: PHP, with Composer. Packages are installed to a
vendor
directory in your project, but only one version of a given dependency can be installed at a time. - System-global, nested: Nix. There is a system-wide package store called the "Nix store", where every unique variant of a package is stored under a deterministic hash, and every dependency within a piece of software is referenced by hash from that store explicitly.
- Project-local, nested: Node.js. Each project has its own
node_modules
folder, and each dependency has its ownnode_modules
folder containing its transitive dependencies, and so on.
So why does any of this actually matter?
It might seem like all of this is just an implementation detail, and it's the problem of the package manager developers to deal with. But the choice of dependency model actually has a big impact on how people use the dependency system.
The cost of conflict
The problem at the center of all of this, is that dependency conflicts are not free. Every time you run into a dependency conflict, you have to stop what you are doing and resolve a conflict. Resolving it may require anything from a small change to a complete architectural overhaul of your code, like in the case where a new version of a critical dependency introduced a different design paradigm.
Now you might think "huh, but I rarely run into that", and that is likely correct - but it's not because the problem doesn't happen. What tends to happen in mature language ecosystems, is that the whole ecosystem centers around a handful of large frameworks over time, where the maintainers do all this work of resolving conflicts preventatively; they coordinate with maintainers of other dependencies, for example, to make sure that these conflicts do not occur.
This has a large maintenance cost to maintainers, and indirectly also a cost to you - it means that that time is not spent on, for example, nice new features or usability improvements in the tools that you use. The cost is still there, it's just very difficult to see if you are not a maintainer of a large framework.
Frameworks and libraries
This also touches on another consequence of working with conflict-prone dependency systems: they incentivize centralization of the ecosystem around a handful of major frameworks, that are usually quite opinionated about how to use them. In a vacuum, small and single-responsibility libraries would be the optimal structure of an ecosystem, but that is simply not a sustainable model when your transitive dependencies can conflict; every dependency you add would superlinearly increase the chance of running into a conflict.
These frameworks are usually acceptable if you work on common systems, that solve common problems; there have been many people before you building similar things, and so the framework will likely have been designed to account for it. But it's a deathly barrier for unconventional or innovative projects, which do not fit into that mold; they are severely disadvantaged because in a framework-heavy ecosystem, every package comes with a large set of assumptions about what you'll be doing with it, and they're usually not going to be the right ones. Leaving you to either not use packages at all, or spend half your time working around them.
Consequences for the commons
A more abstract way in which this problem occurs, is in its impact to the commons. The idea of a 'software commons' is simple; a large, public, shared, freely accessible and usable collection of software that anyone can build upon according to their needs and contribute to according to their ability, resulting in a low-friction way to collaborate on software at a large scale. Some of the idealized consequences of such a commons would be that every problem only needs to be solved exactly once, and it will forever be reliably solved for everyone, and we can all collectively move on to solving other problems.
This is a laudable goal, but it too is harmed by conflict-prone dependency systems. For this goal to be achievable, there must be some sort of distribution format for 'software' that is universally usable, assumption-free, and isolated from the rest of the environment, so that it is guaranteed to fit into any project that has the problem it is designed to solve. But a flat or even system-global dependency model cannot do that - in such a model, it is possible for one piece of software to make it impossible to use another, after all this is what a dependency conflict is.
In other words, to achieve a true software commons on a technical level (the social requirements are for another article), we need a nested, project-local dependency mechanism - or at least a mechanism that can approximate or simulate those properties in some way.
So why are dependency systems so conflict-prone?
So given all of that, the answer would seem obvious, right? Just build nested, project-local dependency systems! And that does indeed solve these issues, but it brings some problems of its own.
Duplication
One of the most obvious problems, but also one of the easiest to solve, is that of duplication. If there are two uses of the same dependency in different parts of the dependency tree, you ideally want those to use the same copy to save space and resources, and indeed this is exactly the typical justification for a flat dependency space. This also applies to compilation; you'd usually want to avoid compiling more than one copy of the same library.
But there is a better way, and it is implemented today by systems like npm: a nested dependency tree which opportunistically moves dependencies to the top level when they are conflict-free. This way, they are only stored in a nested structure on disk when it is necessary to preserve the guarantees of a conflict-free dependency system, ie. when otherwise a dependency conflict would occur. This could be considered a hybrid form between nested and flattened dependencies, and is pretty close to an optimal representation.
The duplication problem exists in another form, and its solution is the optimal representation: duplication between projects. Two pieces of independent end-user software might use the same version of the same dependency, and you would probably want to reuse a single copy for all the same reasons as above. This is typically used as a justification for system-global dependency systems.
But here again, there is a better option, and this time it is truly an optimal representation: a single shared system-global store of packages, identified by some sort of unique identifier, with the software pointing to specific copies within that store. This optimally deduplicates everything, but still allows conflicting implementations to exist. This exists today in Nix (where each store entry is hashed and referenced by path from the dependent) and pnpm (an alternative Node.js package manager where the store is keyed by version and symlinks and hardlinks are used to access it in an npm-compatible manner).
Nominal types
Unfortunately, there is also a more difficult problem - it affects only a subset of languages, and explains why Node.js does have nested dependencies while a lot of other new systems do not. That problem is the nominal typing problem.
If you have a system with nominal typing, then that means that types are not identified by *what they are shaped like (as in structural typing), but by what they are called, or more accurately by their identity. In a typical nominal typing system, if you define the same thing under the same name twice, but in different files, they are different types.
This poses an obvious problem for a nested dependency system: if you can have two different copies of a dependency in your tree, that means you can also have two different types that are supposed to be identical! This would cause a lot of issues - for example, say that a value in a dependency is generated by a transitive dependency, and consumed by a different dependency that uses a different version of that same transitive dependency... the value generated by one copy would be rejected by the other, for not being the same type.
This is what can happen, for example, in Rust - Cargo will nominally let you have conflicting versions, but as soon as you try to exchange values between those copies in your code, you'll encounter a type mismatch.
There are some theoretical language-level solutions to this problem, for example in the form of type adapters - specification on how one copy of a type may be converted to another copy of a type. But this is a non-trivial thing to account for in a language design, and to this date I have not seen any major languages that have such a mechanism. Which means that nominally typed languages are, generally, stuck with flat dependencies.
(If you're wondering how this problem is overcome without nominal typing: the answer is that you're mostly just relying on the internal structure of types not changing in a breaking way between versions, or at least not without also changing the attribute names or, in a structurally typed system, the internal types. That sounds unreliable, but in practice it is very rare to run into situations where this goes wrong, to the point that it's barely worth worrying about.)
But even if this problem were overcome, there's another one.
Backwards compatibility
Dependencies are, almost always, something that is deeply integrated into a language. Whether through the design of the import syntax, or the compiler's lookup rules, or anything else, there's usually something in the design of a language that severely constrains the possibilities for package management. Nested dependencies can work for Node.js because CommonJS accounted for the needs of a nested dependency system from the start, and it is virtually impossible to retrofit it into most existing systems.
For the same reason that a software commons is a possible concept, dependencies are also subject to the network effect - they are a social endeavour, an exercise in interoperation and labour delegation, and that means that there is an immense ecosystem cost associated with breaking dependency interoperability - just ask anyone who has had to try fitting an ES Module into a CommonJS project, for example, or anyone who has gone through the Python 2 to 3 transition. This makes changing the dependency system a very unappealing move.
So in practice, a lot of languages simply aren't able to adopt a nested dependency system, because it would break everything they have today. For the most part, only new languages can adopt nested dependencies, and most new languages are going to be borrowing ideas from existing languages, which... have flat dependencies. Among other things, I'm hoping that this article might serve as an inspiration to choose differently.
My personal view
Now, to get back to why I, personally, don't want to work with conflict-prone languages anymore, which has to do with the 'software commons' point mentioned earlier. I have many motivations behind the projects I work on, but one of them is the desire to build on a software commons using FOSS; to build reliable implementations for solving problems that are generically usable, ergonomic, and just as useful in 20 years (or more) as they are today.
I do not think that this is achievable in a conflict-prone language. Even with the best possible API design that needs no changes, you would still need to periodically update dependencies to make sure that your dependency's transitive dependencies remain compatible with widely-used frameworks and tools. This makes it impossible to write 'forever libraries' that are written once and then, eventually, after real-world testing and improvements, done forever. The maintenance cost alone would become unsustainable.
The problem is made worse by conflict-prone dependency systems' preference for monolithic frameworks, as those necessarily are opinionated and make assumptions about the usecase; which, unlike a singular solution to a singular problem, is not something that will stand the test of time - needs change, and as such, so do common usecases. Therefore, 'forever libraries' cannot take the shape that a conflict-prone dependency system encourages.
In short, a conflict-prone dependency system simply throws up too many barriers to credibly and sustainably build a long-term software commons, and that means that whatever work I do in the context of such a system, does not contribute towards my actual goals. In practice this means that I am mostly stuck with Javascript today, and I am hoping to see more languages adopt a conflict-free dependency system in the future.
Community governance
How to de-escalate situations
I originally drafted this guide for the (public, semi-open) NixOS governance talks in 2024. It was written for participants in those governance discussions, as a de-escalation guide to steer conversation back to a constructive path. The recommendations in it, however, are more generally applicable to any sort of discussion, especially those in which decisions are to be made.
Governance is a complicated topic that often creates conflicts; some of them small, some of them not so small. Moderators are tasked with ensuring that the governance Zulip remains a constructive space for people to talk these things out, but there is a lot that you can do yourself to keep the discussion constructive; or even as a third party intervening in someone else's escalating discussion.
This guide describes some techniques that can be used to prevent and de-escalate conflicts, and help to keep the governance discussion productive for everyone involved. We encourage you to use them!
This guide is based in part on https://libera.chat/guides/catalyst, although several changes and additions have been made to better fit our specific situation.
Assume good faith
The people who participate in these governance conversations, are most likely here because they want the project governance to be improved, just like you. Try to assume that the other person is doing what they're doing in good faith. There are only very few people who genuinely seek to cause disruption, and if that is the case, it becomes a task for moderators to handle.
Listen and ask
A lot of conflicts can be both prevented and de-escalated by simply asking more questions and listening more, instead of speaking. In general, prefer to ask people why they feel a certain way if that is unclear, rather than assuming their intentions - this will provide more space for concerns that would otherwise go overlooked, and avoid creating conflicts due to wrong assumptions.
Even when a conflict has already arisen, asking questions can still be effective to de-escalate; asking people why they are doing something will encourage them to reflect on their behaviour, and this can often lead to self-moderation. Most people do not want to be viewed as "the bad guy".
Likewise, in a conflict, prefer asking for someone's input rather than admonishing their behaviour; this centers the conversation on them and their thoughts, instead of on yourself. Even if you disagree, you are more likely to gain a useful insight this way, and calming down the situation helps everyone involved.
If you need to concretely ask someone to change their behaviour, prefer asking them as a "can you do this?" question, rather than outright demanding it - they will likely be more receptive to your request, and if there is a reason why they cannot, you can look for a solution to that together.
Compromises and reconciliation
Many disagreements are not really fundamental: often there is just a miscommunication, or some mismatch in assumptions. When it seems like you cannot find agreement, try narrowing down exactly where the disagreement comes from, what the most precise difference between your views is. Often, this will inspire new solutions that work for everyone involved, and that reconcile your differences - eliminating the disagreement entirely.
If all else fails, it is often better to find a compromise that everyone can be reasonably happy with, than to leave one side of the conflict entirely unsatisfied. This should be a last resort; too many compromises can easily stack together into a sense of nothing ever being decided, or nothing being changeable. You should always prefer finding reconciliation instead, as described above. True compromise should be very rarely needed.
Health
Treating hair lice in difficult hair
Theory
Hair lice are a relatively innocent parasite that lives in human head hair. Although they are typically not disease carriers, the itching can be extremely frustrating.
Hair lice attach themselves by holding onto the hair, typically close to the skin of the head, which they need to do actively, ie. they need to be alive to do so. Dead hair lice will eventually fall out.
Typical recommendations revolve around using substances that are in some way deadly to lice; depending on substance, by poisoning them, dehydrating, or both. These substances are used along with a lice comb, which is a comb with very fine teeth (with just enough space between teeth for strands of hair to pass through), and essentially 'pulls' the weakened lice out of the hair.
Unfortunately this approach doesn't work for everybody; if you have long or particularly tangle-prone hair, it can be nearly impossible to get down to every bit of skin on your head. Given that the treatment needs to be repeated daily, and missing even one louse can make your efforts futile, this can make it impractical.
Heat treatment
There are experimental heat treatment techniques to remove hair lice; these involve purpose-built devices for removing lice by dehydrating them, causing them to die and lose grip. Heat travels through tangled hair much more easily than a comb, and so can have a higher success rate. Unfortunately, you are unlikely to have such a specialized device at home, and it can be difficult to find someone to do it for you, especially if traveling is difficult or you do not have a lot of money to spend.
Fortunately, however, this process can be replicated with a simple hairdryer, as long as you are careful. Make sure your hairdryer is set to 'hot' mode, and your hair is dry. Then blow hot air through your hair, close to your head, for at least several minutes daily, for the usual treatment period of two weeks.
You need to be very careful when using hot air this close to your head. It's okay for your head to start feeling hot, but as soon as you start getting a burning or scorched feeling on the top of your head, stop the treatment immediately, and keep more distance the next day. If you do not have a lot of hair, you may need to keep the hairdryer at quite some distance - thickness of the hair affects what the correct distance is for you.
A method that I've found particularly effective is to blow air upwards; that is, instead of blowing onto your hair from the top, point the hairdryer upwards and blow it under your mop of hair, as it were - it should feel a bit weird, causing your hair to go in all directions. This maximizes airflow, as the air somewhat gets trapped under your hair, and the only way out is through; this minimizes the deflection of air you would get when blowing from the top down. Note that the upwards technique can worsen hair tangling.
You may also want to use a lice comb in the places where this is easily possible to do; it is not strictly required for the treatment to work, but it makes it easier to clear out the dead lice in one go, instead of having them fall out by themselves over time.
Make sure you continue for the full two weeks, with daily treatment; hair lice have a short breeding cycle, and this treatment only affects the living lice, not their eggs. This means that over the span of two weeks, you will need to gradually dehydrate every new generation of lice. Doing it daily without fail ensures that no generation has a chance to lay new eggs. If you miss a day, you may need to restart the two week timer.
Matrix
State resolution attacks
These are some notes on various different kinds of attacks that might be attempted on state resolution algorithms, such as the one in Matrix. Different kinds of state resolution algorithms are vulnerable to different kinds of attacks; a reliable state algorithm should be vulnerable to none of them.
These notes are not complete. More details, graphs, etc. will be added at some later time.
Frontrunning attack
Detect an event that bans or demotes the user, then quickly craft a fake branch full of malicious events (eg. banning other users), but do not submit those events to any other homeserver yet, and then craft an event that parents both the fake branch and the event prior to the detected ban/demote, claiming that the fake branch came earlier and thereby bypassing the ban. Requires a malicious homeserver.
Dead horse attack
Attach crafted event to recent parent and ancient parent, to try and pull in ancient state and confuse the current state; eg. an event from back when a user wasn't banned yet, to try and get the membership state to revert to 'joined' by pulling it into current state. Named this because it involves "beating a dead horse".
Piggybacking attack
A low-powerlevel user places an event in a DAG branch that a high-powerlevel user has also attempted to change state in, as the high-powerlevel state change might cause their branch to become prioritized (ie. sorted in front) in state resolution.
Fir tree attack
Resource exhaustion attack; deliberately constantly creating side branches to trigger state resolution processes. Named after the shape of half a fir tree that it generates in the graph.
Huge graph attack
Resource exhaustion attack; attach crafted event to a wide range of other parent events throughout the history of the room, to pull as many sections of the event graph into state resolution as possible
Mirror attack
Takes advantage of non-deterministic state resolution algorithms to create a split-brain situation that breaks the room, by creating a fake branch containing the exact inverse operations of the real branch, and then resolving the two together; as there is no canonically 'correct' answer under these circumstances, the goal of the attack is to make different servers come to different conclusions.
Protocols and formats
Working with DBus
This article is a work in progress. It'll likely be expanded over time, but for now it's incomplete.
What is DBus?
DBus is a standardized 'message bus' protocol that is mainly used on Linux. It serves to let different applications on the same system talk to each other through a standardized format, with a standardized way of specifying the available API.
Additionally, and this is probably the most-used feature, it allows for different applications to 'claim' specific pre-defined ("well-known") namespaces, if they intend to provide the corresponding service. For example, there are many different services that can show desktop notifications to the user, and the user may be using any one of them depending on their desktop environment, but whichever one it is, it will always claim the standard org.freedesktop.Notifications
name.
That way, applications that want to show notifications don't need to know which specific notification service is running on the system - they can just send them to whoever claimed that name and implements the corresponding API.
How do you use DBus as a user?
As an end user, you don't really need to care about DBus. As long as a DBus daemon is running on your system (and this will be the case by default on almost every Linux distribution), applications using DBus should just work.
If you're curious, though, you can use a DBus introspection tool such as QDBusViewer or D-Spy to have a look at what sort of APIs the programs on your system provide. Just be careful not to send anything through it without researching it first - you can break things this way!
How do you use DBus as a developer?
You'll need a DBus protocol client. There are roughly two options:
- Bindings to libdbus for the language you are using, or
- A client implementation that's written directly in the language you are using (eg.
dbus-next
in JS)
You could also write your own client, as DBus typically just works over a local socket, but note that the serialization format is a little unusual, so it'll take some time to implement it correctly. Using an existing implementation is usually a better idea.
Note that you use a DBus client even when you want to provide an API over DBus; the 'server' in this arrangement is the DBus daemon, not your application.
How the protocol works
DBus implements a few different kinds of interaction mechanisms:
- Properties: These are (optionally read-only) values that can be read or written. They're usually used to check or change something.
- Methods: These are callable and can produce a result. They're usually used to do something.
- Signals: These are like events, and can be subscribed to. They're usually emitted when something happens on the other side.
All of these - properties, methods and signals - are addressable by pre-defined names. However, it takes a few steps to get there:
- First, you need to select a bus name - this is kind of like a process name (or, in the case of a "well-known" API, the standard name), although technically one process can present multiple bus names. Its components are delimited by dots.
- Then, on the resulting bus, you select an object path - essentially, this is the specific 'object' (or object type) within the process that you wish to access. Its components are delimited by slashes.
- Finally, on the selected object, you then select an interface - you can think of this as the 'service' that you wish to access. Custom DBus APIs often only implement a single interface, in addition to the standard DBus-specified interfaces for introspection (see below).
After these steps, you will end up with an interface that you can interact with - it has properties, methods, and/or signals. Don't worry too much about how exactly the hierarchy works here - the division between bus name, object path and interface can be (and in practice, is) implemented in many different ways depending on requirements, and if you merely wish to use a DBus API from some other application, you can simply specify whatever its documentation tells you for all of these values.
Some more information and context about this division can be found here, though keep in mind that you'll often encounter exactly one possible value for bus name, object path and interface, for any given application that exposes an API over DBus, so it's not required reading.
Introspection
An additional feature of DBus is that it allows introspection of DBus APIs; that is, you can use the DBus protocol itself to interrogate an API provider about its available API surface, the argument types, and so on. The details of this are currently not covered here.
Some well-known DBus APIs
- MPRIS - The 'media player control' API.
- FreeDesktop Notifications - The standard API for displaying desktop notifications.
Problems
Things I'm trying to work out.
Subgraph sorting
We have a graph:
We sort this graph topologically into a one-dimensional sequence:
A, B, C, D
The exact sorting order is determined by inspecting the contents of these nodes (not shown here), and doing some kind of unspecified complex comparison on those contents. As this is a topological sort, the comparison is essentially the secondary sorting criterium; the primary sorting criterium is whatever preserves the graph order of the nodes (that is, an ancestor always comes before the node that it is an ancestor of). Crucially, this means that nodes in different branches are compared to each other.
The resulting sorting order is stored in a database, in some sort of order representation. The exact representation is undefined; which representation would work best here, is part of the problem being posed.
Now, the graph is expanded with a newly discovered side branch, introducing two new nodes, E and F:
The new node E now participates in the sorting alongside B, C, and D - we know that E must come after A and before F, because of the ancestor relationships, but we do not know how exactly its ordering position in the sequence relates to the other three nodes, without actually doing the comparison against them.
The problem: the existing order (A, B, C, D) must be updated in the database, such that E and F also become part of the ordered sequence. The constraints are:
- The process may not load A, B, C and D into memory all at once. Loading them into memory on-demand is acceptable, as long as the performance cost is not too high, and there is never more than one node loaded into memory at once. Assume a standard file-backed key/value store as the database.
- The process should avoid rewriting the entry in the database for all of the existing nodes A, B, C and D. Some rewriting is acceptable, but having to rewrite every other participating node constitutes a performance problem.
- The outcome must be deterministically identical to what the outcome would have been if the graph had been fully known upfront, and sorted topologically, all at once in memory.
You may choose any internal representation in the database, and any sorting mechanism, as long as it fits within the above constraints.