base-64-url-no-pad inconsistent with RFC 4648 and previous use

The specification mandates the use of base-64-url-no-pad for the `u` Multibase header as defined in the document, however that definition seems to differ from base64url in RFC 4648, and even from how the `u` header had been defined in https://github.com/multiformats/multibase. Is this difference intentional or erroneous?

Specifically, it refers to a particular algorithm, however that algorithm is written for both base58 and base64, whose common usage, crucially, differs in the direction from which the bits are grouped. This leads to the two algorithms agreeing only when the input string has a length of multiple of 3:
```js
const message = 'Hello world'
const bytes = new TextEncoder().encode(message);

// EhlbGxvIHdvcmxk
console.log(baseEncode(bytes, 64, "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_"))

// SGVsbG8gd29ybGQ=
console.log(btoa(message))
```
I would expect these two to be equal (sans the padding). I have also never seen this reverse-directioned base64 used anywhere else, as it is also somewhat more computationally expensive.

Following with more elaboration I got on this from the Named Information designated expert:

> Normal base64url encoding (RFC 4648) splits a sequence of 8-bit bytes into 6-bit segments from the start (most-significant bit of the first byte).
> CID’s base-64-url-no-pad splits a sequence of 8-bit bytes into 6-bit segments from the end (least-significant bit of the last byte).
>
> Consider the single byte: 11111111 = 0xFF
> Base64url splits it from the start then append four 0’s to fill the last segment: 111111 | 110000 = \_w
> Base-64-url-no-pad splits it from the end then assumes leading 0’s in the first segment: 000011 | 111111 = D\_
> And the handling of leading 0x00 bytes is also different. Consider 0x00 0x00 0x00.
> Base64url splits 3 8-bit bytes into 4 6-bit chars (3\*8 = 24 = 4\*6): AAAA
> Base-64-url-no-pad replaces each of the 3 leading 0x00 bytes with 1 char: AAA
>
> ASIDE: The encoding algorithm is defined in terms of log(256) / log(targetBase) * length + 1. It mixes floating-point and integer arithmetic without enough care. There is a step (5) to "skip leading zeros in the base-encoded result”, which may correct any ambiguity from the floating-point step.

If this discrepancy is an error, it would seem to me as the best solution to simply refer to RFC 4648's base64url encoding with no padding as the definition for `u`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

base-64-url-no-pad inconsistent with RFC 4648 and previous use #158

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

base-64-url-no-pad inconsistent with RFC 4648 and previous use #158

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions