Textadept
- Home |
- Download |
- Lua API |
- Source |
- Language Modules |
- Stats |
- Wiki |
- Mailing List
Contents
- lexer
- Overview
- Writing a Lexer
- Fields
- CLASS
- COMMENT
- CONSTANT
- DEFAULT
- ERROR
- FUNCTION
- IDENTIFIER
- KEYWORD
- LABEL
- NUMBER
- OPERATOR
- PREPROCESSOR
- REGEX
- SC_FOLDLEVELBASE
- SC_FOLDLEVELHEADERFLAG
- SC_FOLDLEVELNUMBERMASK
- SC_FOLDLEVELWHITEFLAG
- STRING
- TYPE
- VARIABLE
- WHITESPACE
- alnum
- alpha
- any
- ascii
- cntrl
- dec_num
- digit
- extend
- float
- graph
- hex_num
- integer
- lower
- newline
- nonnewline
- nonnewline_esc
- oct_num
- punct
- space
- style_class
- style_comment
- style_constant
- style_definition
- style_embedded
- style_error
- style_function
- style_identifier
- style_keyword
- style_label
- style_nothing
- style_number
- style_operator
- style_preproc
- style_regex
- style_string
- style_tag
- style_type
- style_variable
- style_whitespace
- upper
- word
- xdigit
- Functions
- Tables
lexer
Performs lexing of Scintilla documents.
Overview
Dynamic lexers are more flexible than Scintilla’s static ones. They are often more readable as well. This document provides all the information necessary in order to write a new lexer. For illustrative purposes, a Lua lexer will be created. Lexers are written using Parsing Expression Grammars or PEGs with the Lua LPeg library. Please familiarize yourself with LPeg’s documentation before proceeding.
Writing a Lexer
Rather than writing a lexer from scratch, first see if your language is similar to any of the 80+ languages supported. If so, you can copy and modify that lexer, saving some time and effort.
Introduction
All lexers are contained in the lexers/
directory. To begin, create a Lua
script with the name of your lexer and open it for editing.
$> cd lexers
$> textadept lua.lua
Inside the file, the lexer should look like the following:
-- Lua LPeg lexer
local l = lexer
local token, word_match = l.token, l.word_match
local P, R, S, V = lpeg.P, lpeg.R, lpeg.S, lpeg.V
local M = { _NAME = 'lua' }
-- Lexer code goes here.
return M
where the value of _NAME
should be replaced with your lexer’s name.
Like most Lua modules, the lexer module will store all of its data in a table
M
so as not to clutter the global namespace with lexer-specific patterns
and variables. Therefore, remember to use the prefix M
when declaring and
using non-local variables. Also, do not forget the return M
at the end.
The local variables above give easy access to the many useful functions available for creating lexers.
Language Structure
It is important to spend some time considering the structure of the language you are creating the lexer for. What kinds of tokens does it have? Comments, strings, keywords, etc.? Lua has 9 tokens: whitespace, comments, strings, numbers, keywords, functions, constants, identifiers, and operators.
Tokens
In a lexer, tokens are comprised of a token type followed by an LPeg pattern.
They are created using the token()
function. The lexer
(l
)
module provides a number of default token types:
DEFAULT
WHITESPACE
COMMENT
STRING
NUMBER
KEYWORD
IDENTIFIER
OPERATOR
ERROR
PREPROCESSOR
CONSTANT
VARIABLE
FUNCTION
CLASS
TYPE
LABEL
REGEX
Please note you are not limited to just these token types; you can create your own. If you create your own, you will have to specify how they are colored. The procedure is discussed later.
A whitespace token typically looks like:
local ws = token(l.WHITESPACE, S('\t\v\f\n\r ')^1)
It is difficult to remember that a space character is either a \t
, \v
,
\f
, \n
, \r
, or . The
lexer
module also provides you with a
shortcut for this and many other character sequences. They are:
any
ascii
extend
alpha
digit
alnum
lower
upper
xdigit
cntrl
graph
print
punct
space
newline
nonnewline
nonnewline_esc
dec_num
hex_num
oct_num
integer
float
word
The above whitespace token can be rewritten more simply as:
local ws = token(l.WHITESPACE, l.space^1)
The next Lua token is a comment. Short comments beginning with --
are easy
to express with LPeg:
local line_comment = '--' * l.nonnewline^0
On the other hand, long comments are more difficult to express because they have levels. See the Lua Reference Manual for more information. As a result, a functional pattern is necessary:
local longstring = #('[[' + ('[' * P('=')^0 * '[')) *
P(function(input, index)
local level = input:match('^%[(=*)%[', index)
if level then
local _, stop = input:find(']'..level..']', index, true)
return stop and stop + 1 or #input + 1
end
end)
local block_comment = '--' * longstring
The token for a comment is then:
local comment = token(l.COMMENT, line_comment + block_comment)
It is worth noting that while token names are arbitrary, you are encouraged
to use the ones listed in the tokens
table because a standard
color theme is applied to them. If you wish to create a unique token, no
problem. You can specify how it will be displayed later on.
Lua strings should be easy to express because they are just characters
surrounded by '
or "
characters, right? Not quite. Lua strings contain
escape sequences (\
char
) so a \'
sequence in a single-quoted string
does not indicate the end of a string and must be handled appropriately.
Fortunately, this is a common occurance in many programming languages, so a
convenient function is provided: delimited_range()
.
local sq_str = l.delimited_range("'", '\\', true)
local dq_str = l.delimited_range('"', '\\', true)
Lua also has multi-line strings, but they have the same format as block comments. All strings can all be combined into a token:
local string = token(l.STRING, sq_str + dq_str + longstring)
Numbers are easy in Lua using lexer
’s predefined patterns.
local lua_integer = P('-')^-1 * (l.hex_num + l.dec_num)
local number = token(l.NUMBER, l.float + lua_integer)
Keep in mind that the predefined patterns may not be completely accurate for
your language, so you may have to create your own variants. In the above
case, Lua integers do not have octal sequences, so the l.integer
pattern is
not used.
Depending on the number of keywords for a particular language, a simple
P(keyword1) + P(keyword2) + ... + P(keywordN)
pattern can get quite large.
In fact, LPeg has a limit on pattern size. Also, if the keywords are not case
sensitive, additional complexity arises, so a better approach is necessary.
Once again, lexer
has a shortcut function: word_match()
.
local keyword = token(l.KEYWORD, word_match {
'and', 'break', 'do', 'else', 'elseif', 'end', 'false', 'for',
'function', 'goto', 'if', 'in', 'local', 'nil', 'not', 'or', 'repeat',
'return', 'then', 'true', 'until', 'while'
})
If keywords were case-insensitive, an additional parameter would be specified
in the call to word_match()
; no other action is needed.
Lua functions and constants are specified like keywords:
local func = token(l.FUNCTION, word_match {
'assert', 'collectgarbage', 'dofile', 'error', 'getmetatable',
'ipairs', 'load', 'loadfile', 'next', 'pairs', 'pcall', 'print',
'rawequal', 'rawget', 'rawlen', 'rawset', 'require', 'setmetatable',
'tonumber', 'tostring', 'type', 'xpcall'
})
local constant = token(l.CONSTANT, word_match {
'_G', '_VERSION'
})
Like most programming languages, Lua allows the usual characters in
identifier names (variables, functions, etc.) so the usual l.word
can be
used:
local identifier = token(l.IDENTIFIER, l.word)
Lua has labels too:
local label = token(l.LABEL, '::' * l.word * '::')
Finally, an operator character is one of the following:
local operator = token(l.OPERATOR, '~=' + S('+-*/%^#=<>;:,.{}[]()'))
Rules
Rules are just a combination of tokens. In Lua, all rules consist of a single token, but other languages may have two or more tokens in a rule. For example, an HTML tag consists of an element token followed by an optional set of attribute tokens. This allows each part of the tag to be colored distinctly.
The set of rules that comprises Lua is specified in a M._rules
table for
the lexer.
M._rules = {
{ 'whitespace', ws },
{ 'keyword', keyword },
{ 'function', func },
{ 'constant', constant },
{ 'identifier', identifier },
{ 'string', string },
{ 'comment', comment },
{ 'number', number },
{ 'label', label },
{ 'operator', operator },
{ 'any_char', l.any_char },
}
Each entry is a rule name and its associated pattern. Please note that the names of the rules can be completely different than the names of the tokens contained within them.
The order of the rules is important because of the nature of LPeg. LPeg tries
to apply the first rule to the current position in the text it is matching.
If there is a match, it colors that section appropriately and moves on. If
there is not a match, it tries the next rule, and so on. Suppose instead that
the identifier
rule was before the keyword
rule. It can be seen that all
keywords satisfy the requirements for being an identifier, so any keywords
would be incorrectly colored as identifiers. This is why identifier
is
where it is in the M._rules
table.
You might be wondering what that any_char
is doing at the bottom of
M._rules
. Its purpose is to match anything not accounted for in the above
rules. For example, suppose the !
character is in the input text. It will
not be matched by any of the first 9 rules, so without any_char
, the text
would not match at all, and no coloring would occur. any_char
matches one
single character and moves on. It may be colored red (indicating a syntax
error) if desired because it is a token, not just a pattern.
Summary
The above method of defining tokens and rules is sufficient for a majority of
lexers. The lexer
module provides many useful patterns and functions for
constructing a working lexer quickly and efficiently. In most cases, the
amount of knowledge of LPeg required to write a lexer is minimal.
As long as you used the default token types provided by lexer
, you do not
have to specify any coloring (or styling) information in the lexer; it is
taken care of by the user’s color theme.
The rest of this document is devoted to more complex lexer techniques.
Styling Tokens
The term for coloring text is styling. Just like with predefined LPeg
patterns in lexer
, predefined styles are available.
style_nothing
style_class
style_comment
style_constant
style_definition
style_error
style_function
style_keyword
style_label
style_number
style_operator
style_regex
style_string
style_preproc
style_tag
style_type
style_variable
style_whitespace
style_embedded
style_identifier
Each style consists of a set of attributes:
font
: The style’s font name.size
: The style’s font size.bold
: Flag indicating whether or not the font is boldface.italic
: Flag indicating whether or not the font is italic.underline
: Flag indicating whether or not the font is underlined.fore
: The color of the font face.back
: The color of the font background.eolfilled
: Flag indicating whether or not to color the end of the line.characterset
: The character set of the font.case
: The case of the font. 1 for upper case, 2 for lower case, 0 for normal case.visible
: Flag indicating whether or not the text is visible.changable
: Flag indicating whether or not the text is read-only.hotspot
: Flag indicating whether or not the style is clickable.
Styles are created with style()
. For example:
-- style with default theme settings
local style_nothing = l.style { }
-- style with bold text with default theme font
local style_bold = l.style { bold = true }
-- style with bold italic text with default theme font
local style_bold_italic = l.style { bold = true, italic = true }
The style_bold_italic
style can be rewritten in terms of style_bold
:
local style_bold_italic = style_bold..{ italic = true }
In this way you can build on previously defined styles without having to rewrite them. Note the previous style is left unchanged.
Style colors are different than the #rrggbb RGB notation you may be familiar
with. Instead, create a color using color()
.
local red = l.color('FF', '00', '00')
local green = l.color('00', 'FF', '00')
local blue = l.color('00', '00', 'FF')
The default set of colors varies depending on the color theme used. Please see the current theme for more information.
Finally, styles are assigned to tokens via a M._tokenstyles
table in the
lexer. Styles do not have to be assigned to the default tokens; it is done
automatically. You only have to assign styles for tokens you create. For
example:
local lua = token('lua', P('lua'))
-- ... other patterns and tokens ...
M._tokenstyles = {
{ 'lua', l.style_keyword },
}
Each entry is the token name the style is for and the style itself. The order
of styles in M._tokenstyles
does not matter.
For examples of how styles are created, please see the theme files in the
lexers/themes/
folder.
Line Lexer
Sometimes it is advantageous to lex input text line by line rather than a chunk at a time. This occurs particularly in diff, patch, or make files. Put
M._LEXBYLINE = true
somewhere in your lexer in order to do this.
Embedded Lexers
A particular advantage that dynamic lexers have over static ones is that lexers can be embedded within one another very easily, requiring minimal effort. There are two kinds of embedded lexers: a parent lexer that embeds other child lexers in it, and a child lexer that embeds itself within a parent lexer.
Parent Lexer
An example of this kind of lexer is HTML with embedded CSS and Javascript.
After creating the parent lexer, load the children lexers in it using
lexer.load()
. For example:
local css = l.load('css')
There needs to be a transition from the parent HTML lexer to the child CSS
lexer. This is something of the form <style type="text/css">
. Similarly,
the transition from child to parent is </style>
.
local css_start_rule = #(P('<') * P('style') * P(function(input, index)
if input:find('^[^>]+type%s*=%s*(["\'])text/css%1', index) then
return index
end
end)) * tag
local css_end_rule = #(P('</') * P('style') * ws^0 * P('>')) * tag
where tag
and ws
have been previously defined in the HTML lexer.
Now the CSS lexer can be embedded using embed_lexer()
:
l.embed_lexer(M, css, css_start_rule, css_end_rule)
Remember M
is the parent HTML lexer object. The lexer object is needed by
embed_lexer()
.
The same procedure can be done for Javascript.
local js = l.load('javascript')
local js_start_rule = #(P('<') * P('script') * P(function(input, index)
if input:find('^[^>]+type%s*=%s*(["\'])text/javascript%1', index) then
return index
end
end)) * tag
local js_end_rule = #('</' * P('script') * ws^0 * '>') * tag
l.embed_lexer(M, js, js_start_rule, js_end_rule)
Child Lexer
An example of this kind of lexer is PHP embedded in HTML. After creating the child lexer, load the parent lexer. As an example:
local html = l.load('hypertext')
Since HTML should be the main lexer, (PHP is just a preprocessing language), the following statement changes the main lexer from PHP to HTML:
M._lexer = html
Like in the previous section, transitions from HTML to PHP and back are specified:
local php_start_rule = token('php_tag', '<?' * ('php' * l.space)^-1)
local php_end_rule = token('php_tag', '?>')
And PHP is embedded:
l.embed_lexer(html, M, php_start_rule, php_end_rule)
Code Folding
It is sometimes convenient to “fold”, or not show blocks of text. These blocks can be functions, classes, comments, etc. A folder iterates over each line of input text and assigns a fold level to it. Certain lines can be specified as fold points that fold subsequent lines with a higher fold level.
Simple Folding
To specify the fold points of your lexer’s language, create a
M._foldsymbols
table of the following form:
M._foldsymbols = {
_patterns = { 'patt1', 'patt2', ... },
token1 = { ['fold_on'] = 1, ['stop_on'] = -1 },
token2 = { ['fold_on'] = 1, ['stop_on'] = -1 },
token3 = { ['fold_on'] = 1, ['stop_on'] = -1 }
...
}
_patterns
: Lua patterns that match a fold or stop point.token
N
: The name of a token a fold or stop point must be part of.fold_on
: Text in a token that matches a fold point.stop_on
: Text in a token that matches a stop point.
Fold points must ultimately have a value of 1 and stop points must ultimately have a value of -1 so the value in the table could be a function as long as it returns 1, -1, or 0. Any functions are passed the following arguments:
text
: The text to fold.pos
: The position intext
of the start of the current line.line
: The actual text of the current line.s
: The position inline
the matched text starts at.match
: The matched text itself.
Lua folding would be implemented as follows:
M._foldsymbols = {
_patterns = { '%l+', '[%({%)}%[%]]' },
[l.KEYWORD] = {
['if'] = 1, ['do'] = 1, ['function'] = 1, ['end'] = -1,
['repeat'] = 1, ['until'] = -1
},
[l.COMMENT] = { ['['] = 1, [']'] = -1 },
[l.OPERATOR] = { ['('] = 1, ['{'] = 1, [')'] = -1, ['}'] = -1 }
}
_patterns
matches lower-case words and any brace character. These are the
fold and stop points in Lua. If a lower-case word happens to be a keyword
token and that word is if
, do
, function
, or repeat
, the line
containing it is a fold point. If the word is end
or until
, the line is a
stop point. Any unmatched parenthesis or braces counted as operators are also
fold points. Finally, unmatched brackets in comments are fold points in order
to fold long (multi-line) comments.
Advanced Folding
If you need more granularity than M._foldsymbols
, you can define your own
fold function:
function M._fold(text, start_pos, start_line, start_level)
end
text
: The text to fold.start_pos
: Current position in the buffer of the text (used for obtaining style information from the document).start_line
: The line number the text starts at.start_level
: The fold level of the text atstart_line
.
The function must return a table whose indices are line numbers and whose values are tables containing the fold level and optionally a fold flag.
The following Scintilla fold flags are available:
Have your fold function interate over each line, setting fold levels. You can
use the get_style_at()
, get_property()
,
get_fold_level()
, and
get_indent_amount()
functions as necessary to
determine the fold level for each line. The following example sets fold
points by changes in indentation.
function M._fold(text, start_pos, start_line, start_level)
local folds = {}
local current_line = start_line
local prev_level = start_level
for indent, line in text:gmatch('([\t ]*)(.-)\r?\n') do
if line ~= '' then
local current_level = l.get_indent_amount(current_line)
if current_level > prev_level then -- next level
local i = current_line - 1
while folds[i] and folds[i][2] == l.SC_FOLDLEVELWHITEFLAG do
i = i - 1
end
if folds[i] then
folds[i][2] = l.SC_FOLDLEVELHEADERFLAG -- low indent
end
folds[current_line] = { current_level } -- high indent
elseif current_level < prev_level then -- prev level
if folds[current_line - 1] then
folds[current_line - 1][1] = prev_level -- high indent
end
folds[current_line] = { current_level } -- low indent
else -- same level
folds[current_line] = { prev_level }
end
prev_level = current_level
else
folds[current_line] = { prev_level, l.SC_FOLDLEVELWHITEFLAG }
end
current_line = current_line + 1
end
return folds
end
SciTE users note: do not use get_property
for getting fold options from a
.properties
file because SciTE is not set up to forward them to your lexer.
Instead, you can provide options that can be set at the top of the lexer.
Using with SciTE
Create a .properties
file for your lexer and import
it in either your
SciTEUser.properties
or SciTEGlobal.properties
. The contents of the
.properties
file should contain:
file.patterns.[lexer_name]=[file_patterns]
lexer.$(file.patterns.[lexer_name])=[lexer_name]
where [lexer_name] is the name of your lexer (minus the .lua
extension)
and [file_patterns] is a set of file extensions matched to your lexer.
Please note any styling information in .properties
files is ignored.
Using with Textadept
Put your lexer in your ~/.textadept/lexers/
directory. That way
your lexer will not be overwritten when upgrading. Also, lexers in this
directory override default lexers. (A user lua
lexer would be loaded
instead of the default lua
lexer. This is convenient if you wish to tweak
a default lexer to your liking.) Do not forget to add a mime-type for
your lexer.
Optimization
Lexers can usually be optimized for speed by re-arranging tokens so that the
most common ones are recognized first. Keep in mind the issue that was raised
earlier: if you put similar tokens like identifier
s before keyword
s, the
latter will not be styled correctly.
Troubleshooting
Errors in lexers can be tricky to debug. Lua errors are printed to STDERR
and _G.print()
statements in lexers are printed to STDOUT.
Limitations
True embedded preprocessor language highlighting is not available. For most cases this will not be noticed, but code like
<div id="<?php echo $id; ?>">
or
<div <?php if ($odd) { echo 'class="odd"'; } ?>>
will not highlight correctly.
Performance
There might be some slight overhead when initializing a lexer, but loading a file from disk into Scintilla is usually more expensive.
On modern computer systems, I see no difference in speed between LPeg lexers and Scintilla’s C++ ones.
Risks
Poorly written lexers have the ability to crash Scintilla, so unsaved data might be lost. However, these crashes have only been observed in early lexer development, when syntax errors or pattern errors are present. Once the lexer actually starts styling text (either correctly or incorrectly; it does not matter), no crashes have occurred.
Acknowledgements
Thanks to Peter Odding for his lexer post on the Lua mailing list that inspired me, and of course thanks to Roberto Ierusalimschy for LPeg.
Fields
CLASS
Token type for class tokens.
COMMENT
Token type for comment tokens.
CONSTANT
Token type for constant tokens.
DEFAULT
Token type for default tokens.
ERROR
Token type for error tokens.
FUNCTION
Token type for function toeksn.
IDENTIFIER
Token type for identifier tokens.
KEYWORD
Token type for keyword tokens.
LABEL
Token type for label tokens.
NUMBER
Token type for number tokens.
OPERATOR
Token type for operator tokens.
PREPROCESSOR
Token type for preprocessor tokens.
REGEX
Token type for regex tokens.
SC_FOLDLEVELBASE
The initial (root) fold level.
SC_FOLDLEVELHEADERFLAG
Flag indicating the line is fold point.
SC_FOLDLEVELNUMBERMASK
Flag used with SCI_GETFOLDLEVEL(line)
to get the fold level of a line.
SC_FOLDLEVELWHITEFLAG
Flag indicating that the line is blank.
STRING
Token type for string tokens.
TYPE
Token type for type tokens.
VARIABLE
Token type for variable tokens.
WHITESPACE
Token type for whitespace tokens.
alnum
Matches any alphanumeric character (A-Z
, a-z
, 0-9
).
alpha
Matches any alphabetic character (A-Z
, a-z
).
any
Matches any single character.
ascii
Matches any ASCII character (0
..127
).
cntrl
Matches any control character (0
..31
).
dec_num
Matches a decimal number.
digit
Matches any digit (0-9
).
extend
Matches any ASCII extended character (0
..255
).
float
Matches a floating point number.
graph
Matches any graphical character (!
to ~
).
hex_num
Matches a hexadecimal number.
integer
Matches a decimal, hexadecimal, or octal number.
lower
Matches any lowercase character (a-z
).
newline
Matches any newline characters.
nonnewline
Matches any non-newline character.
nonnewline_esc
Matches any non-newline character excluding newlines escaped with \\
.
oct_num
Matches an octal number.
print
Matches any printable character (space to ~
).
punct
Matches any punctuation character not alphanumeric (!
to /
, :
to @
,
[
to '
, {
to ~
).
space
Matches any whitespace character (\t
, \v
, \f
, \n
, \r
, space).
style_class
Style typically used for class definitions.
style_comment
Style typically used for code comments.
style_constant
Style typically used for constants.
style_definition
Style typically used for definitions.
style_embedded
Style typically used for embedded code.
style_error
Style typically used for erroneous syntax.
style_function
Style typically used for function definitions.
style_identifier
Style typically used for identifier words.
style_keyword
Style typically used for language keywords.
style_label
Style typically used for labels.
style_nothing
Style typically used for no styling.
style_number
Style typically used for numbers.
style_operator
Style typically used for operators.
style_preproc
Style typically used for preprocessor statements.
style_regex
Style typically used for regular expression strings.
style_string
Style typically used for strings.
style_tag
Style typically used for markup tags.
style_type
Style typically used for static types.
style_variable
Style typically used for variables.
style_whitespace
Style typically used for whitespace.
upper
Matches any uppercase character (A-Z
).
word
Matches a typical word starting with a letter or underscore and then any alphanumeric or underscore characters.
xdigit
Matches any hexadecimal digit (0-9
, A-F
, a-f
).
Functions
color
(r, g, b)
Creates a Scintilla color.
Parameters:
r
: The string red component of the hexadecimal color.g
: The string green component of the color.b
: The string blue component of the color.
Usage:
local red = color('FF', '00', '00')
delimited_range
(chars, escape, end_optional, balanced, forbidden)
Creates an LPeg pattern that matches a range of characters delimitted by a specific character(s). This can be used to match a string, parenthesis, etc.
Parameters:
chars
: The character(s) that bound the matched range.escape
: Optional escape character. This parameter may be omitted, nil, or the empty string.end_optional
: Optional flag indicating whether or not an ending delimiter is optional or not. If true, the range begun by the start delimiter matches until an end delimiter or the end of the input is reached.balanced
: Optional flag indicating whether or not a balanced range is matched, like%b
in Lua’sstring.find
. This flag only applies ifchars
consists of two different characters (e.g. ‘()’).forbidden
: Optional string of characters forbidden in a delimited range. Each character is part of the set.
Usage:
local sq_str_noescapes = delimited_range("'")
local sq_str_escapes = delimited_range("'", '\\', true)
local unbalanced_parens = delimited_range('()', '\\', true)
local balanced_parens = delimited_range('()', '\\', true, true)
embed_lexer
(parent, child, start_rule, end_rule)
Embeds a child lexer language in a parent one.
Parameters:
parent
: The parent lexer.child
: The child lexer.start_rule
: The token that signals the beginning of the embedded lexer.end_rule
: The token that signals the end of the embedded lexer.
Usage:
embed_lexer(M, css, css_start_rule, css_end_rule)
embed_lexer(html, M, php_start_rule, php_end_rule)
embed_lexer(html, ruby, ruby_start_rule, rule_end_rule)
fold
(text, start_pos, start_line, start_level)
Folds the given text. Called by LexLPeg.cxx; do not call from Lua. If the current lexer has no _fold function, folding by indentation is performed if the ‘fold.by.indentation’ property is set.
Parameters:
text
: The document text to fold.start_pos
: The position in the document text starts at.start_line
: The line number text starts on.start_level
: The fold level text starts on.
Return:
- Table of fold levels.
fold_line_comments
(prefix)
Returns a fold function that folds consecutive line comments.
This function should be used inside the lexer’s _foldsymbols
table.
Parameters:
prefix
: The prefix string defining a line comment.
Usage:
[l.COMMENT] = { ['--'] = l.fold_line_comments('--') }
[l.COMMENT] = { ['//'] = l.fold_line_comments('//') }
get_fold_level
(line_number)
Returns the fold level for a given line.
This level already has SC_FOLDLEVELBASE
added to it, so you do not need to
add it yourself.
Parameters:
line_number
: The line number to get the fold level of.
get_indent_amount
(line)
Returns the indent amount of text for a given line.
Parameters:
line
: The line number to get the indent amount of.
get_property
(key, default)
Returns an integer property value for a given key.
Parameters:
key
: The property key.default
: Optional integer value to return if key is not set.
get_style_at
(pos)
Returns the string style name and style number at a given position.
Parameters:
pos
: The position to get the style for.
lex
(text, init_style)
Lexes the given text. Called by LexLPeg.cxx; do not call from Lua. If the lexer has a _LEXBYLINE flag set, the text is lexed one line at a time. Otherwise the text is lexed as a whole.
Parameters:
text
: The text to lex.init_style
: The current style. Multilang lexers use this to determine which language to start lexing in.
load
(lexer_name)
Initializes the specified lexer.
Parameters:
lexer_name
: The name of the lexing language.
nested_pair
(start_chars, end_chars, end_optional)
Similar to delimited_range()
, but allows for multi-character delimitters.
This is useful for lexers with tokens such as nested block comments. With
single-character delimiters, this function is identical to
delimited_range(start_chars..end_chars, nil, end_optional, true)
.
Parameters:
start_chars
: The string starting a nested sequence.end_chars
: The string ending a nested sequence.end_optional
: Optional flag indicating whether or not an ending delimiter is optional or not. If true, the range begun by the start delimiter matches until an end delimiter or the end of the input is reached.
Usage:
local nested_comment = l.nested_pair('/*', '*/', true)
starts_line
(patt)
Creates an LPeg pattern from a given pattern that matches the beginning of a line and returns it.
Parameters:
patt
: The LPeg pattern to match at the beginning of a line.
Usage:
local preproc = token(l.PREPROCESSOR, #P('#') * l.starts_line('#' * l.nonnewline^0))
style
(style_table)
Creates a Scintilla style from a table of style properties.
Parameters:
style_table
: A table of style properties. Style properties available:
Usage:
local bold_italic = style { bold = true, italic = true }
See also:
token
(name, patt)
Creates an LPeg capture table index with the name and position of the token.
Parameters:
name
: The name of token. If this name is not inl.tokens
then you will have to specify a style for it inlexer._tokenstyles
.patt
: The LPeg pattern associated with the token.
Usage:
local ws = token(l.WHITESPACE, l.space^1)
php_start_rule = token('php_tag', '<?' * ('php' * l.space)^-1)
word_match
(words, word_chars, case_insensitive)
Creates an LPeg pattern that matches a set of words.
Parameters:
words
: A table of words.word_chars
: Optional string of additional characters considered to be part of a word (default is%w_
).case_insensitive
: Optional boolean flag indicating whether the word match is case-insensitive.
Usage:
local keyword = token(l.KEYWORD, word_match { 'foo', 'bar', 'baz' })
local keyword = token(l.KEYWORD, word_match({ 'foo-bar', 'foo-baz', 'bar-foo', 'bar-baz', 'baz-foo', 'baz-bar' }, '-', true))
Tables
_EMBEDDEDRULES
Set of rules for an embedded lexer.
For a parent lexer name, contains child’s start_rule
, token_rule
, and
end_rule
patterns.
_RULES
List of rule names with associated LPeg patterns for a specific lexer. It is accessible to other lexers for embedded lexer applications.
colors
Table of common colors for a theme. This table should be redefined in each theme.