API Reference
Complete API documentation for ts-syntax-highlighter.
Tokenizer
The main class for tokenizing source code.
Constructor
new Tokenizer(languageId: string): Tokenizer
Creates a new tokenizer instance for the specified language.
Parameters:
languageId
- Language identifier ('javascript'
,'typescript'
,'html'
,'css'
,'json'
, or'stx'
)
Returns: Tokenizer
instance
Throws: Error if language is not supported
Example:
const tokenizer = new Tokenizer('javascript')
tokenize()
tokenize(code: string): LineTokens[]
Synchronously tokenizes the provided code.
Parameters:
code
- Source code to tokenize
Returns: Array of LineTokens
objects
Example:
const tokens = tokenizer.tokenize('const x = 42')
tokenizeAsync()
tokenizeAsync(code: string): Promise<LineTokens[]>
Asynchronously tokenizes the provided code (recommended for better performance).
Parameters:
code
- Source code to tokenize
Returns: Promise resolving to array of LineTokens
objects
Example:
const tokens = await tokenizer.tokenizeAsync('const x = 42')
Language Functions
getLanguage()
getLanguage(id: string): Language | undefined
Retrieves a language by its ID, alias, or extension.
Parameters:
id
- Language ID, alias, or extension (e.g.,'javascript'
,'js'
,'jsx'
,'.js'
)
Returns: Language
object or undefined
if not found
Example:
const js = getLanguage('javascript')
const jsx = getLanguage('jsx') // Returns JavaScript
const ts = getLanguage('ts') // Returns TypeScript
getLanguageByExtension()
getLanguageByExtension(ext: string): Language | undefined
Retrieves a language by file extension.
Parameters:
ext
- File extension with or without dot (e.g.,'.js'
or'js'
)
Returns: Language
object or undefined
if not found
Example:
const js = getLanguageByExtension('.js')
const ts = getLanguageByExtension('ts')
const tsx = getLanguageByExtension('.tsx')
languages
languages: Language[]
Array of all supported languages.
Example:
import { languages } from 'ts-syntax-highlighter'
console.log(languages.length) // 6
languages.forEach(lang => {
console.log(`${lang.name}: ${lang.id}`)
})
Types
Token
Represents a single token in the source code.
interface Token {
type: string // Scope name (e.g., 'keyword.control.js')
content: string // The actual text content
line: number // Line number (0-indexed)
startIndex: number // Character position in the line (0-indexed)
}
Properties:
type: Semantic scope of the token following TextMate naming conventions
- Examples:
'keyword.control.js'
,'string.quoted.double.ts'
,'entity.name.function.js'
- Examples:
content: The actual text from the source code
- Examples:
'const'
,'Hello World'
,'function'
- Examples:
line: Zero-indexed line number where the token appears
- First line is
0
, second line is1
, etc.
- First line is
startIndex: Zero-indexed character position within the line
- First character is
0
, second character is1
, etc.
- First character is
Example:
const token: Token = {
type: 'storage.type.js',
content: 'const',
line: 0,
startIndex: 0
}
LineTokens
Represents all tokens on a single line.
interface LineTokens {
line: number // Line number (0-indexed)
tokens: Token[] // Array of tokens on this line
}
Properties:
- line: Zero-indexed line number
- tokens: Array of
Token
objects appearing on this line
Example:
const lineTokens: LineTokens = {
line: 0,
tokens: [
{ type: 'storage.type.js', content: 'const', line: 0, startIndex: 0 },
{ type: 'punctuation.js', content: ' ', line: 0, startIndex: 5 },
{ type: 'punctuation.js', content: 'x', line: 0, startIndex: 6 }
]
}
Language
Represents a supported language.
interface Language {
id: string // Unique identifier
name: string // Display name
aliases?: string[] // Alternative identifiers
extensions?: string[] // File extensions
grammar: Grammar // Grammar definition
}
Properties:
- id: Unique language identifier (e.g.,
'javascript'
,'typescript'
) - name: Human-readable name (e.g.,
'JavaScript'
,'TypeScript'
) - aliases: Alternative identifiers (e.g.,
['js', 'jsx']
for JavaScript) - extensions: Supported file extensions (e.g.,
['.js', '.jsx', '.mjs', '.cjs']
) - grammar: The grammar definition for tokenization
Example:
const language: Language = {
id: 'javascript',
name: 'JavaScript',
aliases: ['js', 'jsx'],
extensions: ['.js', '.jsx', '.mjs', '.cjs'],
grammar: javascriptGrammar
}
Grammar
Represents a language grammar definition.
interface Grammar {
name: string
scopeName: string
keywords?: Record<string, string>
patterns: Pattern[]
repository?: Record<string, PatternRepository>
}
Properties:
- name: Grammar name (e.g.,
'JavaScript'
) - scopeName: Root scope name (e.g.,
'source.js'
) - keywords: Keyword to scope mappings
- patterns: Array of patterns to match
- repository: Named pattern collections for reuse
Example:
const grammar: Grammar = {
name: 'JavaScript',
scopeName: 'source.js',
keywords: {
'const': 'storage.type.js',
'let': 'storage.type.js',
'var': 'storage.type.js'
},
patterns: [
{ include: '#keywords' },
{ include: '#strings' }
],
repository: {
keywords: {
patterns: [
{
name: 'storage.type.js',
match: '\\b(const|let|var)\\b'
}
]
}
}
}
Pattern
Represents a pattern in a grammar.
interface Pattern {
name?: string
match?: string
begin?: string
end?: string
include?: string
patterns?: Pattern[]
captures?: Record<string, PatternCapture>
beginCaptures?: Record<string, PatternCapture>
endCaptures?: Record<string, PatternCapture>
}
Properties:
- name: Scope name for matched text
- match: Regex pattern for simple matches
- begin: Regex pattern for start of multi-line match
- end: Regex pattern for end of multi-line match
- include: Reference to repository pattern or
'$self'
- patterns: Nested patterns for complex matching
- captures: Named captures for
match
pattern - beginCaptures: Named captures for
begin
pattern - endCaptures: Named captures for
end
pattern
Usage Examples
Basic Tokenization
import { Tokenizer } from 'ts-syntax-highlighter'
const tokenizer = new Tokenizer('javascript')
const code = 'const greeting = "Hello World"'
// Async (recommended)
const tokens = await tokenizer.tokenizeAsync(code)
// Sync
const syncTokens = tokenizer.tokenize(code)
Language Detection
import { getLanguage, getLanguageByExtension } from 'ts-syntax-highlighter'
// By ID
const js = getLanguage('javascript')
// By alias
const jsx = getLanguage('jsx')
// By extension
const ts = getLanguageByExtension('.ts')
// Check if language exists
const lang = getLanguage('python')
if (!lang) {
console.error('Language not supported')
}
Processing Tokens
const tokenizer = new Tokenizer('typescript')
const tokens = await tokenizer.tokenizeAsync(code)
// Iterate through lines
for (const line of tokens) {
console.log(`Line ${line.line}:`)
// Iterate through tokens
for (const token of line.tokens) {
console.log(` ${token.type}: "${token.content}"`)
}
}
// Filter tokens
const keywords = tokens.flatMap(line =>
line.tokens.filter(token => token.type.includes('keyword'))
)
// Map tokens
const contents = tokens.flatMap(line =>
line.tokens.map(token => token.content)
).join('')
Type Guards
function isKeyword(token: Token): boolean {
return token.type.includes('keyword')
}
function isString(token: Token): boolean {
return token.type.includes('string')
}
function isComment(token: Token): boolean {
return token.type.includes('comment')
}
// Usage
const tokens = await tokenizer.tokenizeAsync(code)
const hasKeywords = tokens.some(line =>
line.tokens.some(isKeyword)
)
Custom Processing
interface ProcessedToken extends Token {
category: string
color: string
}
function processTokens(tokens: LineTokens[]): ProcessedToken[][] {
return tokens.map(line =>
line.tokens.map(token => ({
...token,
category: categorize(token),
color: getColor(token)
}))
)
}
function categorize(token: Token): string {
if (token.type.includes('keyword')) return 'keyword'
if (token.type.includes('string')) return 'literal'
if (token.type.includes('comment')) return 'comment'
return 'other'
}
function getColor(token: Token): string {
const colorMap: Record<string, string> = {
'keyword': '#C586C0',
'string': '#CE9178',
'comment': '#6A9955'
}
for (const [key, color] of Object.entries(colorMap)) {
if (token.type.includes(key)) return color
}
return '#D4D4D4'
}
Error Handling
import { Tokenizer, getLanguage } from 'ts-syntax-highlighter'
function safeTokenize(code: string, languageId: string): LineTokens[] {
// Check if language exists
const language = getLanguage(languageId)
if (!language) {
throw new Error(`Unsupported language: ${languageId}`)
}
try {
const tokenizer = new Tokenizer(languageId)
return tokenizer.tokenize(code)
} catch (error) {
console.error('Tokenization failed:', error)
return []
}
}
// Async version
async function safeTokenizeAsync(
code: string,
languageId: string
): Promise<LineTokens[]> {
const language = getLanguage(languageId)
if (!language) {
throw new Error(`Unsupported language: ${languageId}`)
}
try {
const tokenizer = new Tokenizer(languageId)
return await tokenizer.tokenizeAsync(code)
} catch (error) {
console.error('Tokenization failed:', error)
return []
}
}
Performance Optimization
// ✅ Reuse tokenizer instances
const tokenizer = new Tokenizer('javascript')
const results = await Promise.all(
files.map(file => tokenizer.tokenizeAsync(file))
)
// ✅ Use async for large files
if (code.length > 1000) {
tokens = await tokenizer.tokenizeAsync(code)
} else {
tokens = tokenizer.tokenize(code)
}
// ✅ Process tokens efficiently
const keywords = []
for (const line of tokens) {
for (const token of line.tokens) {
if (token.type.includes('keyword')) {
keywords.push(token)
break // Early exit if only need one
}
}
}
TypeScript Integration
import type {
Token,
LineTokens,
Language,
Grammar,
Tokenizer
} from 'ts-syntax-highlighter'
// Type-safe token processing
function extractStrings(tokens: LineTokens[]): string[] {
const strings: string[] = []
for (const line of tokens) {
for (const token of line.tokens) {
if (token.type.includes('string')) {
strings.push(token.content)
}
}
}
return strings
}
// Generic token filter
function filterTokens<T extends Token>(
tokens: LineTokens[],
predicate: (token: Token) => token is T
): T[] {
return tokens.flatMap(line =>
line.tokens.filter(predicate)
) as T[]
}
Next Steps
- Check out Usage Examples
- Explore Configuration
- Learn about Grammars