Synthetic.js — An Open Source Programming Language for Genetic Engineering

5 min readDec 27, 2021

Preface

In Genetic Engineering, DNA sequences of ‘A’, ‘C’, ‘G’ and ‘T’ are combined to form a genetic program. This genetic program can then be manufactured chemically and get injected into biological cells. The injected program is then executed by the cell machinery to create new molecules, logic and state.

Sequences can be categorized into different functional components, or different types of genetic LEGO parts (called BioBricks), that can be combined together in accordance to a specific biological syntax.

A program to express a green fluorescent protein (GFP), can be created by combining 4 genetic parts:

Promoter
caatacgcaaaccgcctctccccgcgcgttggccgattcattaatgcagctggcacgacaggtttcccgactggaaagcgggcagtgagcgcaacgcaattaatgtgagttagctcactcattaggcaccccaggctttacactttatgcttccggctcgtatgttgtgtggaattgtgagcggataacaatttcacaca
Ribosome Binding Site (RBS)
aaagaggagaaa
Protein Coding Sequence (CDS)
atgcgtaaaggagaagaacttttcactggagttgtcccaattcttgttgaattagatggtgatgttaatgggcacaaattttctgtcagtggagagggtgaaggtgatgcaacatacggaaaacttacccttaaatttatttgcactactggaaaactacctgttccatggccaacacttgtcactactttcggttatggtgttcaatgctttgcgagatacccagatcatatgaaacagcatgactttttcaagagtgccatgcccgaaggttatgtacaggaaagaactatatttttcaaagatgacgggaactacaagacacgtgctgaagtcaagtttgaaggtgatacccttgttaatagaatcgagttaaaaggtattgattttaaagaagatggaaacattcttggacacaaattggaatacaactataactcacacaatgtatacatcatggcagacaaacaaaagaatggaatcaaagttaacttcaaaattagacacaacattgaagatggaagcgttcaactagcagaccattatcaacaaaatactccaattggcgatggccctgtccttttaccagacaaccattacctgtccacacaatctgccctttcgaaagatcccaacgaaaagagagaccacatggtccttcttgagtttgtaacagctgctgggattacacatggcatggatgaactatacaaataataa
Terminator
ccaggcatcaaataaaacgaaaggctcagtcgaaagactgggcctttcgttttatctgttgtttgtcggtgaacgctctctactagagtcacactggctcaccttcgggtgggcctttctgcgtttata

The combined sequence:

caatacgcaaaccgcctctccccgcgcgttggccgattcattaatgcagctggcacgacaggtttcccgactggaaagcgggcagtgagcgcaacgcaattaatgtgagttagctcactcattaggcaccccaggctttacactttatgcttccggctcgtatgttgtgtggaattgtgagcggataacaatttcacacaaaagaggagaaaatgcgtaaaggagaagaacttttcactggagttgtcccaattcttgttgaattagatggtgatgttaatgggcacaaattttctgtcagtggagagggtgaaggtgatgcaacatacggaaaacttacccttaaatttatttgcactactggaaaactacctgttccatggccaacacttgtcactactttcggttatggtgttcaatgctttgcgagatacccagatcatatgaaacagcatgactttttcaagagtgccatgcccgaaggttatgtacaggaaagaactatatttttcaaagatgacgggaactacaagacacgtgctgaagtcaagtttgaaggtgatacccttgttaatagaatcgagttaaaaggtattgattttaaagaagatggaaacattcttggacacaaattggaatacaactataactcacacaatgtatacatcatggcagacaaacaaaagaatggaatcaaagttaacttcaaaattagacacaacattgaagatggaagcgttcaactagcagaccattatcaacaaaatactccaattggcgatggccctgtccttttaccagacaaccattacctgtccacacaatctgccctttcgaaagatcccaacgaaaagagagaccacatggtccttcttgagtttgtaacagctgctgggattacacatggcatggatgaactatacaaataataaccaggcatcaaataaaacgaaaggctcagtcgaaagactgggcctttcgttttatctgttgtttgtcggtgaacgctctctactagagtcacactggctcaccttcgggtgggcctttctgcgtttata

When we look at the combined sequence, it’s very clear, that reading, editing and collaborating on such sequence combinations is not easy. Also what if we want to have a much more complex system, with hundreds or thousands of different genetic circuits?

Synthetic Biology — Making Biology Easier To Engineer

In the chip design world, the level of abstraction for complex systems varies from assembling bare transistors, to describing logic gates, and finally to hardware description languages like Verilog or VHDL.

In the computer programming world, each program is based on accumulated knowledge of tens of thousands of different APIs and source codes, that were tested and got verified by a community of developers, so there is no need to reinvent the wheel every time and it’s easy to create, collaborate on, and deploy complex architectures.

“Computer programming is an art, because it applies accumulated knowledge to the world, because it requires skill and ingenuity, and especially because it produces objects of beauty.”

- Don Knuth

Synthetic Biology is a subset of Genetic Engineering, it means “Making Biology Easier To Engineer”. How we can apply the same concepts from chip design and computer programming worlds to Genetic Engineering?

Synthetic.js — A High Level Programming Language for Genetic Engineering

The answer is very clear — a high level programming language for describing genetic circuits is mandatory to accelerate the development of new materials, medicine, crops, and much more.

In Synthetic.js, the example above is described in a single line of code:

promoter(‘BBa_R0010’) + rbs(‘BBa_B0034’) + cds(‘BBa_E0040’) + terminator(‘BBa_B0015’)

Combining genetic parts is done with the ‘+’ operator.
promoter(…), rbs(…), cds(…), terminator(…), etc. describe the type of genetic part that is being combined.
Genetic parts from the iGEM catalog can be referenced with their part id (‘BBa_B0034’, ‘BBa_B0015’, …).

Synthetic.JS code is written on the left screen, in Visual Studio Code. The compiled DNA sequences can then be seen in the command line (right screen).

Open Source, Runs in the Web Browser

We live in a time where there is no place for a programming language that is not distributed as Open Source or as a Free Software (Free as in Free Speech, not Free Beer).

Furthermore, downloading and installing software is the world of the past, if it doesn’t run in the web browser then, it’s not being simple enough.

Synthetic.js was written on top of JavaScript as a framework, it can be used from the web browser without need to install anything, it can be downloaded from NPM or GitHub and run in any imaginable environment such as web, mobile, desktop or as a server app.

It is released under a permissive MIT license, so it can be integrated in any software without the need to expose the source code (as long as the copyright notice is being distributed).

Easy Installation

Install Node.js.
Run “npm init” in the project directory.
Run “npm i synthetic” (see npm)
Create an app.js file, use the following boilerplate code:

require( ‘synthetic.js’)

compile(
‘acgt’
)

5. run “node app” in the command line to view the compiled sequence.

6. For more code examples, view the examples folder in the Synthetic GitHub page.

Simple Syntax, Adheres to iGEM Notations

In Synthetic.js, a genetic circuit can be represented by a set of connected genetic parts. The types of genetic parts are the same as the types of genetic parts provided by the iGEM registry catalog of parts:

promoter()
rbs() — (Ribosome Binding Site)
cds() — (Protein Coding Sequence)
terminator()
proteinDomain()
translationalUnit()
dna()
composite()
proteinGenerator()
restrictionSite()
reporter()
inverter()
receiver()
sender()
measurement()
primerBindingSite()

Genetic parts can be combined with the ‘+’ operator to form a larger sequence:

promoter(‘agcg…’) + rbs(aaactac…) + cds(‘atg…TGSNVFQTRAGCLIG…taataa’) + terminator(‘aagcgaaat…’)

DNA sequences are written with lowercase letters.
Amino acid sequences are written with uppercase letters.

Direct Connection to Databases — It’s Really Part of the Language

Parts from the iGEM registry or from the RCSB Protein Data Bank can be referenced with their id. The compiler then fetches the sequences and performs codon usage optimization for the selected chassis.

For example:

promoter(‘BBa_R0010’) + rbs(‘BBa_B0034’) + cds(‘BBa_E0040’) + terminator(‘BBa_B0015’)

Or:

promoter(‘BBa_R0010’) + rbs(‘BBa_B0034’) + cds(‘6VXX’) + terminator(‘BBa_B0015’)

Codon Usage Optimization

Amino acid sequences can be optimized for the selected organism. The default organism is E.Coli, but any organism that exist in the Codon Usage Database can be used.

You can search for available codon usage tables for your organism of choice by calling (this function is still a work in progress):

let availableOrgnisms = await searchCodonUsageTable(query)

This function then returns an array with available species and their codon usage table ids. A codon usage table id can then be used to set the current organism for optimization:

await setCodonUsageTable(availableOrgnisms[0].tableId)

Combinatorial Design

One of the most important features of the programming language is the ability to express easily sequence variants. Each genetic part can contain an array of different sequence options:

For example: here 3 sequence options are suggested for the promoter and 2 sequence options are suggested for the protein domain:

promoter([‘sequence 1’, ‘sequence 2’, ‘sequence 3’]) rbs(‘…’) cds(‘…’) proteinDomain([‘sequence A’, ‘sequence B’])

The compiler will then generate raw DNA sequences for 6 variants:

promoter(‘sequence 1’) … proteinDomain(‘sequence A’);
promoter(‘sequence 1’) … proteinDomain(‘sequence B’);
promoter(‘sequence 2’) … proteinDomain(‘sequence A’);
promoter(‘sequence 2’) … proteinDomain(‘sequence B’);
promoter(‘sequence 3’) … proteinDomain(‘sequence A’);
promoter(‘sequence 3’) … proteinDomain(‘sequence B’);

Any number of sequence options can be placed anywhere, so genetic libraries can be created with a few lines of code.

What’s Next

Improving debugger messages.
Adding support for restriction sites evaluation.
Developing an open source, web based editor.

Ultrasound Simulation