ecode/docs/customlanguages.md at main · SpartanJ/ecode

Custom languages support

Custom languages support can be added in the languages directory found at:

Linux: uses XDG_CONFIG_HOME, usually translates to ~/.config/ecode/languages
macOS: uses Application Support folder in HOME, usually translates to ~/Library/Application Support/ecode/languages
Windows: uses APPDATA, usually translates to C:\Users\{username}\AppData\Roaming\ecode\languages

ecode will read each file located at that directory with json extension. Each file can contain one or several language definitions.

Single Language: If the root element of the JSON file is an object, it defines a single language.
Multiple Languages / Sub-Grammars: If the root element is an array, it can define multiple independent languages or a main language along with sub-language definitions used for nesting within the main language (see Nested Syntaxes below). Each object in the array must be a complete language definition with at least a unique "name".

Language definitions can override any currently supported definition. ecode will prioritize user defined definitions. Sub-language definitions used only for nesting might not need fields like "files" or "headers" if they aren't intended to be selectable top-level languages.

Language definition format

{
	"name": "language_name",            // (Required) The display name of the language. Must be unique, especially if referenced by other definitions for nesting or includes.
	"files": [                          // (Required if `visible` is `true`) An array of Lua patterns matching filenames for this language.
		"%.ext$",                       // Example: Matches files ending in .ext
		"^Makefile$"                    // Example: Matches the exact filename Makefile
	],
	"comment": "//",                    // (Optional) Sets the single-line comment string used for auto-comment functionality.
	"patterns": [                       // (Required) An array defining syntax highlighting rules. See "Pattern Rule Types" below for details.
		// ... pattern rules defined here ...
	],
	"repository": {                     // (Optional) A collection of named pattern sets for reuse within this language definition.
	                                    // Keys are repository item names (e.g., "comments", "strings", "expressions").
	                                    // Values are arrays of pattern rules, following the same format as the main "patterns" array.
	                                    // These items can be referenced in any "patterns" array using an "include" rule (e.g., { "include": "#comments" }).
		"my_reusable_rules": [
			{ "pattern": "foo", "type": "keyword" },
			{ "pattern": ["bar_start", "bar_end"], "type": "string" }
		]
	},
	"symbols": [                        // (Optional) An array defining specific types for exact words, primarily used in conjunction with patterns having `type: "symbol"`.
		// Structure: An array where each element is an object containing exactly one key-value pair.
		//   - The key is the literal word (symbol) to match.
		//   - The value is the `type_name` to apply when that word is matched via a `type: "symbol"` pattern rule.

		// How it works:
		// When a pattern rule results in a `type: "symbol"` match (either for the whole pattern or a capture group),
		// the actual text matched by that pattern/group is looked up within this `symbols` array.
		// The editor iterates through the array, checking if the key of any object matches the text.
		// If a match is found (e.g., the text is "if" and an object `{ "if": "keyword" }` exists),
		// the corresponding value ("keyword") is used for highlighting.
		// If the matched text is not found as a key in any object within this array, the highlighting typically falls back to the "normal" type.
		// This mechanism is essential for highlighting keywords, built-in constants/literals, and other reserved words
		// that might otherwise be matched by more general patterns (like a pattern for all words).

		// Example:
		{ "if": "keyword" },            // If a `type: "symbol"` pattern matches "if", it will be highlighted as "keyword".
		{ "else": "keyword" },
		{ "true": "literal" },          // If a `type: "symbol"` pattern matches "true", it will be highlighted as "literal".
		{ "false": "literal" },
		{ "MyClass": "type" },      // If a `type: "symbol"` pattern matches "MyClass", it will be highlighted as "type".
		{ "begin": "keyword" },
		{ "end": "keyword" }
		// ... add other specific words and their types as needed ...
	],
	"headers": [                        // (Optional) Array of Lua Patterns to identify file type by reading the first few lines (header).
		"^#!.*[ /]bash"                 // Example: Identifies bash scripts like '#!/bin/bash'
	],
	"visible": true,                    // (Optional) If true (default), language appears in main selection menus. Set to false for internal/helper languages.
	"case_insensitive": false,          // (Optional) If true, pattern matching ignores case. Default is false (case-sensitive).
	"auto_close_xml_tag": false,        // (Optional) If true, enables auto-closing of XML/HTML tags (e.g., typing `<div>` automatically adds `</div>`). Default is false.
	"extension_priority": false,        // (Optional) If true, this language definition takes priority if multiple languages define the same file extension. Default is false.
	"lsp_name": "language_server_name", // (Optional) Specifies the name recognized by Language Servers (LSP). Defaults to the 'name' field in lowercase if omitted.

	"fold_range_type": "braces",        // (Optional) Specifies the strategy used to detect foldable code regions. Default behavior if omitted might be no folding or a global default. Possible values:
	                                    //   - "braces": Folding is determined by matching pairs of characters (defined in `fold_braces`). Suitable for languages like C, C++, Java, JavaScript, JSON.
	                                    //   - "indentation": Folding is determined by changes in indentation level. Suitable for languages like Python, YAML, Nim.
	                                    //   - "tag": Folding is based on matching HTML/XML tags (e.g., `<div>...</div>`). Suitable for HTML, XML, SVG.
	                                    //   - "markdown": Folding is based on Markdown header levels (e.g., `## Section Title`). Suitable for Markdown.
	"fold_braces": [                    // (Required *only if* `fold_range_type` is "braces") Defines the pairs of characters used for brace-based folding.
	                                    // This is an array of objects, where each object specifies a starting and ending character pair.
		{ "start": "{", "end": "}" },   // Example: Standard curly braces
		{ "start": "[", "end": "]" },   // Example: Square brackets
		{ "start": "(", "end": ")" }    // Example: Parentheses
	]
}

Pattern Rule Types

The "patterns" array is the core of syntax highlighting. It contains an ordered list of rules that ecode attempts to match against the text. Each rule is an object. Here are the types of rules you can define:

"patterns": [
	// --- Simple Rules ---
	// Rule using Lua patterns:
	{ "pattern": "lua_pattern", "type": "type_name" },
	// Rule using Lua patterns with capture groups mapping to different types:
	{ "pattern": "no_capture(pattern_capture_1)(pattern_capture_2)", "type": [ "no_capture_type_name", "capture_1_type_name", "capture_2_type_name" ] },
	// Rule using Perl-compatible regular expressions (PCRE):
	{ "regex": "perl_regex", "type": "type_name" },
	// Rule using PCRE with capture groups mapping to different types:
	{ "regex": "no_capture(pattern_capture_1)(pattern_capture_2)", "type": [ "no_capture_type_name", "capture_1_type_name", "capture_2_type_name" ] },

	// --- Multi-line Block Rules (Lua Patterns or PCRE) ---
	// Defines a block spanning multiple lines using start/end patterns and an optional escape character.
	// These rules can use either "pattern": ["start", "end", "escape?"] for Lua patterns
	// or "regex": ["start_regex", "end_regex", "escape_char?"] for PCRE.
	// They support the same "type" and "end_type" combinations for styling delimiters as detailed previously.
	//
	// Basic usage (same type for start and end delimiters):
	{ "pattern": ["lua_pattern_start", "lua_pattern_end", "escape_character"], "type": "type_name" },
	// Using different types for start and end delimiters:
	{ "regex": ["regex_start", "regex_end"], "type": "start_type_name", "end_type": "end_type_name" },
	// Using capture groups with different types for start and end delimiters:
	{ "pattern": ["start_nocap(scap1)", "end_nocap(ecap1)(ecap2)", "escape"], "type": ["start_nocap_type", "start_cap1_type"], "end_type": ["end_nocap_type", "end_cap1_type", "end_cap2_type"] },

	// --- Contextual Patterns within Blocks (Inner Patterns) ---
	// Multi-line block rules can define their own "patterns" array to apply specific rules
	// to the content *between* their start and end delimiters. This allows for more granular
	// highlighting within a block without needing to define a full sub-language via the "syntax" key.
	{
		"regex": ["<section>", "</section>"], // Defines the block
		"type": "keyword",                     // Type for "<section>" delimiter
		"end_type": "keyword",                 // Type for "</section>" delimiter
		"patterns": [                          // (Optional) Inner patterns for content *inside* <section>...</section>
			{ "regex": "highlight_this_inside", "type": "function" },
			{ "include": "#common_section_rules" } // Can also include repository items
			// These inner patterns are matched only against the text between "<section>" and "</section>".
			// Inner patterns can themselves be block rules with their own inner patterns, allowing for nested contextual highlighting.
		]
	},
	// Note: If a block rule includes both inner "patterns" and a "syntax" key (for nested languages),
	// the "syntax" key typically takes precedence, causing the content to be highlighted by the
	// specified sub-language. Inner "patterns" are primarily for applying rules from the
	// *current* language's context or ad-hoc rules specifically to the content of this block.

	// --- Custom Parser Rule ---
	// Rule using a custom parser implemented in native code (as previously described):
	{ "parser": "custom_parser_name", "type": "type_name" },

	// --- Symbol Lookup Rule ---
	// Rule assigning the "symbol" type, for lookup in the language's "symbols" definition (as previously described):
	{ "pattern": "[%a_][%w_]*", "type": "symbol" },
	// Rule using "symbol" within capture groups (as previously described):
	{ "pattern": "(%s%-%a[%w_%-]*%s+)(%a[%a%-_:=]+)", "type": [ "normal", "function", "symbol" ] },

	// --- Include Rules ---
	// Allows reusing sets of patterns defined elsewhere in the grammar.
	// This helps in organizing complex grammars and avoiding repetition.
	{
		"include": "#repository_item_name" // Includes rules from the 'repository_item_name' entry
		                                   // in the top-level "repository" object of this language definition.
		                                   // The '#' prefix is mandatory for repository items.
	},
	{
		"include": "$self"                 // Includes all rules from the main top-level "patterns" array
		                                   // of the *current* language definition. This is useful for
		                                   // recursive definitions, such as nested expressions or blocks.
	},
	// Example using "include" with a "repository":
	// "repository": {
	//   "comments_and_strings": [
	//     { "pattern": "//.*", "type": "comment" },
	//     { "pattern": ["\"", "\"", "\\\\"], "type": "string" }
	//   ]
	// },
	// "patterns": [
	//   { "include": "#comments_and_strings" },
	//   // ... other rules ...
	// ]

	// --- NESTED SYNTAX RULE ---
	// Defines a multi-line block that switches to a DIFFERENT language syntax inside.
	// Uses the same multi-line pattern/regex format and delimiter styling options ("type", "end_type").
	{
		"pattern": ["lua_pattern_start", "lua_pattern_end", "escape_character"], // Can also use "regex"
		"syntax": "NestedLanguageName",                                          // (Required for nesting) The 'name' of another language definition.
		"type": "start_delimiter_type",                                          // (Optional) Type(s) for the START delimiter.
		"end_type": "end_delimiter_type"                                         // (Optional) Type(s) for the END delimiter.
	}
	// This is distinct from "Contextual Patterns within Blocks (Inner Patterns)".
	// The "syntax" key switches highlighting to a completely different, pre-defined language
	// for the content within the delimiters. Inner "patterns", on the other hand, apply a
	// specific set of rules from the *current* language's context or ad-hoc rules to the block's content.
]

Nested Syntaxes (Sub-Grammars)

ecode supports nested syntaxes, allowing a block of code within one language to be highlighted according to the rules of another language. This is crucial for accurately representing modern languages that often embed other languages or domain-specific languages.

How it works:

Define Sub-Languages: Define the syntax for the language to be embedded (e.g., "CSS", "JavaScript", "XML", "SQL") as a separate language definition. Often, these are defined within the same JSON file as the main language, using a JSON array as the root element (see Custom languages support). The sub-language definition needs a unique "name".
Reference in Patterns: In the main language's "patterns", use a multi-line block rule (pattern or regex array). Add the "syntax" key to this rule, setting its value to the "name" of the sub-language definition you want to use inside the block.
Highlighting: When ecode encounters this block, it applies the highlighting rules from the specified sub-language to the content between the start and end delimiters. The delimiters themselves are styled according to the type (and end_type) specified in the outer rule.

Contrast with Inner Patterns: While the syntax key is used to embed an entirely different language within a block, multi-line block rules can also contain their own patterns array (see "Contextual Patterns within Blocks" under Pattern Rule Types). This inner patterns array allows for defining specific highlighting rules for the content within the block using rules from the current language's context or ad-hoc rules. This is useful when a full language switch isn't necessary but more granular control over the block's content highlighting is desired (e.g., highlighting specific keywords differently only within a certain type of block in the parent language).

Example Use Cases for syntax key:

HTML files containing <style> blocks (CSS) and <script> blocks (JavaScript).
Markdown files with fenced code blocks (e.g., python ... ).
Templating languages embedding HTML and code.

Nesting Depth: Syntax nesting (using the syntax key) is supported up to 8 levels deep. The nesting capabilities of inner patterns within block rules depend on the parser's implementation but also allow for complex hierarchical structures.

See the description of the syntax key and inner patterns under the patterns section for the exact rule format.

Type Names

Type names are used to define the color of the tokenized pattern.

These are the currently supported type names:

normal: Used for non-highlighted words. This is the default color.
comment: Code comments color.
keyword: Used for language reserved words (if, else, class, struct, then, enum, etc).
type (previously keyword2): Used for type names.
parameter (previously keyword3): Used for coloring parameters but can be used for anything.
number: Number colors
literal: Word literals colors (usually things like NULL, true, false, undefined, etc). Can be used also for any particular reserved symbol.
string: String colors.
operator: Operators colors (+, -, =, <, >, etc).
function: Function name color.
link: Links color.
link_hover: Hovered Links color.

Porting language definitions

ecode uses the same format for language definition as lite and lite-xl editors for its original features. The newly added features like repository, include, and inner patterns for block rules are inspired by TextMate grammars, extending ecode's capabilities beyond the traditional lite/lite-xl format while maintaining compatibility with the core structure. This makes it easier to add new languages to ecode. There's also a helper tool that can be download from ecode repository located here that allows to directly export a lite language definition to the JSON file format used in ecode.

The advanced features (repository, include, and inner patterns within block rules) detailed in this documentation allow for the creation of complex and accurate syntax highlighting in ecode's native format.

Using TextMate Grammars (`.tmLanguage.json`)

In addition to its native format, ecode offers support for directly loading TextMate grammar files (typically ending with .tmLanguage.json). This can be a convenient way to leverage existing TextMate grammars.

Direct Loading:
- To use a TextMate grammar, simply place the .tmLanguage.json file into your custom languages directory (e.g., ~/.config/ecode/languages). ecode will attempt to load it.
fileTypes Array Requirement:
- For ecode to correctly associate the TextMate grammar with specific file extensions, the grammar file must contain a fileTypes (or filetypes) array at its root level. This array lists the file extensions (without the leading dot) that the grammar should handle.
- Example:
```
// Inside your .tmLanguage.json file
{
  "name": "MyTextMateLanguage",
  "scopeName": "source.mylang",
  "fileTypes": [ // Or "filetypes"
    "mylang_ext1",
    "mylang_ext2"
  ],
  // ... rest of the TextMate grammar ...
}
```
  Without this fileTypes array, ecode will not know which files to apply the grammar to.
Limitations:
- Sub-syntaxes / Embedded Grammars: A key limitation when directly loading TextMate grammars is that sub-syntaxes or embedded languages that are defined by referencing external grammar scopes (e.g., include: 'source.js' where source.js is expected to be another grammar file) are not currently supported. ecode's native mechanism for nested syntaxes (using the "syntax" key to refer to a language "name" defined within the same JSON file or another loaded ecode definition) operates differently. Therefore, complex embeddings common in TextMate grammars might not render correctly when loaded directly.
Conversion to ecode's Native Format:
- To overcome limitations or to fully integrate a TextMate grammar into ecode's feature set (including potentially adapting its sub-syntax definitions), you can convert it to ecode's native JSON format.
- First, ensure the TextMate grammar file (e.g., MyLanguage.tmLanguage.json) is in the languages directory so ecode can load it. Make sure it includes the fileTypes array.
- Then, use the --export-lang CLI argument. The name you use to refer to the language should match the name field within the TextMate grammar (or how ecode lists it after loading).
```
ecode --export-lang=MyTextMateLanguage --export-lang-path=./my_language_ecode_format.json
```
- This command will process the loaded TextMate grammar and output its structure in ecode's native JSON format to my_language_ecode_format.json. You can then further edit this file, adapt its structure (especially for sub-syntaxes if needed), and use it as a standard ecode custom language definition.

This approach allows users to leverage the vast library of existing TextMate grammars, either directly with some limitations, or by converting them into ecode's more feature-rich native format for deeper integration.

Extending language definitions

It's possible to easily extend any language definition by exporting it using the CLI arguments provided: --export-lang and --export-lang-path. A user wanting to extend or improve a language definition can export it, modify it and install the definition with a .json extension in the custom languages path. For example, to extend the language vue you will need to run: ecode --export-lang=vue --export-lang-path=./vue.json, exit the exported file and move it to the custom languages path.

Language definition example

{
  "comment": "#",
  "files": [
    "%.sh$",
    "%.bash$",
    "^%.bashrc$",
    "^%.bash_profile$",
    "^%.profile$",
    "%.zsh$",
    "%.fish$",
    "^PKGBUILD$",
    "%.winlib$"
  ],
  "headers": [
    "^#!.*[ /]bash",
    "^#!.*[ /]sh"
  ],
  "lsp_name": "shellscript",
  "name": "Shell script",
  "patterns": [
    { "pattern": "$[%a_@*#][%w_]*", "type": "type" },
    { "pattern": "#.*\n", "type": "comment" },
    { "pattern": [ "<<%-?%s*EOF", "EOF" ], "type": "string" },
    { "pattern": [ "\"", "\"", "\\" ], "type": "string" },
    { "pattern": [ "'", "'", "\\" ], "type": "string" },
    { "pattern": [ "`", "`", "\\" ], "type": "string" },
    { "pattern": "%f[%w_%.%/]%d[%d%.]*%f[^%w_%.]", "type": "number" },
    { "pattern": "[!<>|&%[%]:=*]", "type": "operator" },
    { "pattern": "%f[%S][%+%-][%w%-_:]+", "type": "function" },
    { "pattern": "%f[%S][%+%-][%w%-_]+%f[=]", "type": "function" },
    { "pattern": "(%s%-%a[%w_%-]*%s+)(%d[%d%.]+)", "type": [ "normal", "function", "number" ] },
    { "pattern": "(%s%-%a[%w_%-]*%s+)(%a[%a%-_:=]+)", "type": [ "normal", "function", "symbol" ] },
    { "pattern": "[_%a][%w_]+%f[%+=]", "type": "type" },
    { "pattern": "${.-}", "type": "type" },
    { "pattern": "$[%d$%a_@*][%w_]*", "type": "type" },
    { "pattern": "[%a_%-][%w_%-]*[%s]*%f[(]", "type": "function" },
    { "pattern": "[%a_][%w_]*", "type": "symbol" }
  ],
  "symbols": [
    { "in": "keyword" },
    { "then": "keyword" },
    { "exit": "keyword" },
    { "alias": "keyword" },
    { "continue": "keyword" },
    { "getopts": "keyword" },
    { "set": "keyword" },
    { "return": "keyword" },
    { "unalias": "keyword" },
    { "pwd": "keyword" },
    { "fi": "keyword" },
    { "printf": "keyword" },
    { "unset": "keyword" },
    { "cd": "keyword" },
    { "echo": "keyword" },
    { "false": "literal" },
    { "help": "keyword" },
    { "for": "keyword" },
    { "test": "keyword" },
    { "mapfile": "keyword" },
    { "shift": "keyword" },
    { "while": "keyword" },
    { "readarray": "keyword" },
    { "eval": "keyword" },
    { "select": "keyword" },
    { "elif": "keyword" },
    { "function": "keyword" },
    { "true": "literal" },
    { "else": "keyword" },
    { "exec": "keyword" },
    { "enable": "keyword" },
    { "local": "keyword" },
    { "jobs": "keyword" },
    { "source": "keyword" },
    { "break": "keyword" },
    { "declare": "keyword" },
    { "history": "keyword" },
    { "case": "keyword" },
    { "until": "keyword" },
    { "if": "keyword" },
    { "esac": "keyword" },
    { "hash": "keyword" },
    { "kill": "keyword" },
    { "time": "keyword" },
    { "let": "keyword" },
    { "export": "keyword" },
    { "do": "keyword" },
    { "done": "keyword" },
    { "read": "keyword" },
    { "type": "keyword" }
  ]
}

For more complex syntax definitions please see the definition of all the native languages supported by ecode here and here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom languages support

Language definition format

Pattern Rule Types

Nested Syntaxes (Sub-Grammars)

Type Names

Porting language definitions

Using TextMate Grammars (`.tmLanguage.json`)

Extending language definitions

Language definition example

FilesExpand file tree

customlanguages.md

Latest commit

History

customlanguages.md

File metadata and controls

Custom languages support

Language definition format

Pattern Rule Types

Nested Syntaxes (Sub-Grammars)

Type Names

Porting language definitions

Using TextMate Grammars (.tmLanguage.json)

Extending language definitions

Language definition example

Using TextMate Grammars (`.tmLanguage.json`)