đź’ˇ Discuss the RFC here.

RFC 001: Core Architecture

In the context of this document, localization (L10n) oftentimes implicitly includes internationalization (i18n). Find a glossary at the end of this document.

Inlang's first goal is to make software localization easier for every stakeholder (developers, translators, product managers). We believe that localization is underutalized, and if the effort to localize is just low enough, the majority of projects and organizations will localize.

Inlang's second goal might become the provision of infrastructure to build apps with built-in version control and automation. Whereas the first app to be build on such infrastructure is inlang itself.

Background

Localization of software requires too much effort. Basic tools for developers are missing, translators face Excel spreadsheets, existing solutions are too complicated, and organizations struggle to implement localization. In conclusion: Nobody involved with localization is truly satisfied.

What started with a proof of concept to solve @samuelstroschein's frustrations turned into a year-long research project: "What solution(s) are required to make localization simple across all stakeholders?". Hundreds of interviews and multiple proofs of concepts later a pattern emerged: Software and translations are stored in git. Yet, most solutions do not acknowledge and embrace that fact. Developers manage translations in git while translators manage translations with Excel or CRUD UIs on top of databases. Workflows of developers and translators are de-coupled and siloed.

This RFC proposes a localization system that acknowledges git as the single source of truth, utilizes git for automation, and extends git to close collaboration gaps between developers and translators.

In short, a git-based localization system for developers and translators with a high degree of composability and automation.

<br/> <figure> <img src="./assets/001-git-based-architecture.png" alt="Git-based architecture"> <figcaption> <small> A git-based localization system enables seamless collaboration between developers and translators with endless automation possibilities. </small> </figcaption> </figure> <img>

Scope of this RFC

Goals

  • Define components of which inlang will eventually consist.
  • Define a core architecture that can be shared among those components.
  • Focus on the web platform but keep other platforms (Flutter, iOS, Android) in mind.

None-goals

  • Define the detailed architecture of individual components.

Components

The following components are exposed to users when localizing software: A syntax to express human languages, dev tools, an i18n library, automation (CI/CD), and a translation editor (also called CAT).

A user can either be a translator or developer.

flowchart BT
    Syntax[Human Language Syntax]
    Library
    DT[Dev Tools]
    Automation
    Editor

<!-- ```mermaid flowchart BT subgraph Components Syntax[Human Language Syntax] Library DT[Dev Tools] CICD[CI/CD] Editor end

Developer---Library Developer---DT Developer---Syntax Developer---CICD Translator---Editor Translator---CICD

example = {$userName} {$photoCount -> [one] added a new photo *[other] added {$photoCount} new photos } to {$userGender -> [male] his stream [female] her stream *[other] their stream }.

$userName = Anne $photoCount = 3 $userGender = female

Anne added 3 new photos to her stream.

Resources
Retrieving a message
Output

Observations

  1. A variety of good and adopted open source libraries exist [1, 2, 3, 4, and more]. Each serves a different programming language, framework, niche, or feature.

  2. The internals are identical: Resource -> Reference and Format a Message -> Output

Decision

Leverage existing i18n libraries instead of forcing a specific inlang library.

Pros
  • Already localized codebases can easily adopt inlang.

    Localized codebase use i18n libraries, or custom build solutions. By being i18n library agnostic, those codebases are not required to migrate to a library provided by inlang. Instead, inlang complements the existing library or solution in use.

  • Different libraries have different design goals and trade-offs.

    One library to rule them all is unlikely a feasible idea. Applications have different requirements. For example, different rendering techniques alone (client side vs server side) lead to different localization requirements. Furthermore, some languages and frameworks such as Apple's Swift language support localization out of the box, making the requirement for a (basic) i18n library obsolete. Supporting native localization features instead of forcing the use of an i18n library is arguably better. And so is supporting a suited localization approach, i.e. different i18n libraries, better than forcing the usage of a specific, but less suited, i18n library onto developers.

Cons
  • The user experience could potentially be higher with a dedicated library.

Considered alternative

Develop (and require) a dedicated library developed by inlang.

Pros
  • Potentially better user experience by delivering an "end to end" solution.
Cons
  • Adoption is severely limited.
  • Different libraries exist because a "one size fits all" solution likely does not work.

Dev tools

Using a library does not relieve developers from two extremely time-consuming and ever-repeating tasks:

  1. Extracting hard-coded strings.

    Developers have to manually copy & paste hardcoded strings into resources. That process can take weeks.

  2. Validating messages.

    No widely adopted tools exist to validate messages: Does the German message exist? Is the French message correct? Does the UI work as expected for Arabic?

Furthermore, the DX of localizing software, in general, is improvable. The pseudocode below illustrates some problems:

At this point you might be wondering why the text "Click the button to continue." itself is not used as key/id. The reasons stems from changing the text and thereby losing the connection to the translations or invalidating them. The Fluent project explained their rationale of using ids here.

Observations

  • Dev tools beyond i18n libraries are required to make localization effortless.
  • A CLI enables developers to build custom CI/CD pipelines.
  • Platform-specific tooling like GitHub actions can be built on top of dev tools (the CLI).
  • An IDE extension speeds up development.

Decision

Develop a CLI and VSCode extension to extract and validate resources and messages.

<br/> <figure> <img src="./assets/001-ide-extension.gif" alt="Localization IDE extension"> <figcaption> <small> An IDE extension speeds up development by providing message extraction, linting, and more. </small> </figcaption> </figure> <img>

Editor

<br/> <figure> <img src="./assets/001-editor.png" alt="Proof of concept translation editor CAT"> <figcaption> <small> Early iteration of the inlang editor from December 2021. </small> </figcaption> </figure> <img>

Translators need a dedicated editor to manage translations. Those editors exist and are called CAT (Computer Assisted Translation) editors. There are two types of editors:

  1. Local single-user editors such as MemoQ.
  2. Cloud based editors like [Lokalise, Smartling, or Transifex.

Single-user editors are displaced by cloud-based editors for simple string localization such as software. Collaboration is easier with a cloud-based solution. However, cloud-based editors add complexity by requiring continuous synchronization with the source code (git repository) and the cloud; thereby breaking the single source of truth contract.

Observations

  • The cloud is overhead. Git repositories are built for collaboration and store translations.

  • Git (including GitHub and GitLab) provides version control, collaboration, and an awesome review system. All of these are required for a CAT editor essentially for free.

Proposal

A git-based editor that combines the collaboration of cloud-based solutions with the simplicity of a local-first solution. Think of a combination of Figma and VSCode: VSCode brings out-of-the-box git and local file support while Figma brings ease of use to the table by running in the browser. A working proof of concept can be found here.

Pros
  • Leverage git workflows and features.
  • Easy adoption (just like vscode.dev).
  • Not requiring a server: Synchronization pipelines are not required.
  • Not requiring a server: No lock-in. Translations are not stored and therefore not owned by inlang (enables CI/CD).
Cons
  • Engineering effort: Git is not meant to build applications on top off and to be run in the browser.
  • Design effort: Git is difficult to understand. Abstractions for translators need to be designed.

Automation (CI/CD)

The hand-off between developers and translators requires automation. The current status quo is either no automation or limited automation via cloud-based solutions.

Observations

  • Every (software) company already has an automation solution in the form of CI/CD.
  • Since localization is tightly coupled with software development, existing CI/CD infrastructure can be used instead of building out (and forcing) another automation layer.

Proposal

Leverage existing CI/CD infrastructure that is built on top of git like CircleCI, GitHub actions, and GitLab.

Pros
  • Easier adoption of inlang as a suite by not forcing the adoption of a dedicated automation layer.
  • The dev tools can be leveraged for automation.
Cons
  • No GUI (Graphical User Interface) - but also no GUI limitation of expressiveness.
  • Relying on external CI/CD infrastructure.

Architecture

The following describes a core architecture that is designed to support the components defined above while sharing code and business logic. Interestingly, only two questions need to be elaborated, and both go hand-in-hand:

  1. How are messages stored?
  2. How are components configured to work hand-in-hand?

1. How are messages stored?

The common pattern across larger projects is to store messages in dedicated resource files. The pattern is easy to implement: Only the paths of resource files need to be known. A string-based config is sufficient to implement read and write operations by parsing and serializing the resource files.

Dedicated resource files pattern:

Exemplary inlang config:

However, smaller projects store messages directly in code [1, 2, 3]. The motivation behind storing messages in code is the ease of implementation and use.

Proposal

Be unopinionated where and in which format resources are stored and thereby ease adoption. Being unopinionated directly leads to the configuration question: "How can inlang be unopinionated while requiring reading and writing to resources?".

2. How is inlang configured?

Using JavaScript as a configuration format would allow unopinionated storing of resources. The read and write problem of resources could be solved by exposing readResources and writeResources as callbacks in a config file. Developers are empowered to adjust those functions, and more, to their needs. Furthermore, JavaScript as config solves two common config file annoyances. First, comments are supported, and type annotations via JSDoc/TypeScript enable autocomplete and type safety.

Flowchart of JS as config

flowchart BT

    subgraph Components
      Library
      DT[Dev Tools]
      CICD[CI/CD]
      Editor
    end

    Config <--> |read & write callbacks| Resources
    Config <--> |uses config functions as business logic| Components

Pseudocode inlang config

One (the?) drawback of JS as config is security. The JS config could contain malicious code that would be executed by inlang components. An example exploit: An attacker could steal user authentification information by writing malicious code in the config that reads authentification information from the editor. JS as config would require sandboxing to a certain degree to eliminate exploit vulnerability.

Proposal

Leveraging JavaScript, or any programming language allows for tremendous flexibility and therefore unopinionated workflows. Flexibility is required: Codebases differ, approaches to localize software differ, syntaxes differ, and last but not least workflows differ. JavaScript as config could even be used to adjust the business logic of components:

Conclusion

The common denominator across all proposed components is a JavaScript [instead of JSON/YAML/TOML] config and an AST. Hence, the core architecture is a config and AST package that is consumed by every other component of inlang.

The JS config solves the storage [of resources] and different syntaxes [to express human languages] problems. Developers can define how resources are read, parsed, serialized, and written to the filesystem. Furthermore, developers are empowered to adjust the business logic of inlang components to the needs of the project. Due to the complexity of sandboxing JS, the detailed design of the config will follow in RFC-003.

The config itself requires an AST specification for developers to parse resources to and serialize from which is further consumed by all inlang components. TODO: THE CHOICE HAS NOT BEEN MADE YET.

Dependency graph

The ast and config packages are the core components that power inlang. The ast is used to act on resources whether it be CRUD operations or validation. The config package defines a schema (functions) that the inlang.config.js file implements. Among those functions is defined how resources are read, parsed to an AST, serialized from an AST, and written to the file system. Hence the dependency of config to ast. Furthermore, config provides sandboxing to import and securely execute the implemented code.

flowchart LR
    Applications
    subgraph Packages
        subgraph Hosts
                GitHub
                Azure
                GitLab
                Other[etc...]
        end
        subgraph "git-sdk"
            git[git-sdk/api]
            Host[git-sdk/host]
        end
        subgraph core
            validation[core/validation]
            ast[core/ast]
            query[core/query]
            config[core/config]
        end
        Hosts-.-Host
    end
    subgraph External[inlang.config.js]
        direction LR
        Parsing[Read & write / AST]
        Workflow[Business logic]
        Other2[And more...]
    end
    Community[Community code]-->|can be imported|External
    config-->External
    External-->Applications
    Host-->Applications
    validation-->Applications
    git-->Applications
    ast-->config
    ast-->validation
    ast-->query
    query-->Applications

git-sdk

The SDK to build applications on top off git.

git-sdk/api

Git will be run in JS environments (browser and node). isomorphic git provides a base version that is likely fast enough. If not, a switch to a wasm version of libgit2 should be possible. In any case, git is a CLI and not an SDK. We expect to extend git substiantally as the requirements evolve.

git-sdk/host

Git (hosting) providers (hosts) add features on top of git like pull or merge requests and handle authorization differently. The git-sdk/host provides one API that deals with the API differences between hosting providers.

core

The core components and packages that depend on the AST.

core/ast

Defines the AST (abstract syntax tree) that every component, and hence inlang overall, builds upon.

core/validation

Validates resources and messages based on the AST and can be used to further build on top.

core/config

Defines the config schema(s) and provides types and utility functions to create a config.

Community code

The JS inlang config file is supposed to be able to import external code. By providing import functionality, read, write and business logic functions can be shared.


Glossary

Inlang

Inlang is the project and product name. Inlang stands for "in your lang(uage)".

Component

A component in the context of this RFC is either an application or a package. An application is usually broken down into multiple packages.

Locale

Locale refers to a language or country. A more suitable name would probably be demographic. Think of speaking different languages (German, English, Dutch) and/or living in different locations (Germany, US, UK, India).

Internationalization (i18n)

Internationalization refers to the engineering effort of ensuring that an app can be localized (display content in different languages) and behaves correctly in different locales.

i18n = developers work.

Localization (L10n)

Localization is the act of translating a piece of software into multiple languages, including its assets (images, videos, etc.).

l10n = translators work.

Message

The basic unit of translation is called a message. Messages are containers for information. Messages are used to identify, store, and recall translation information to be used in the product [source].

Resource

A collection of messages. Think of a file or object containing multiple messages.

<- Back to Blog