Synthetic.js — An Open Source Programming Language for Genetic Engineering

Preface

In Genetic Engineering, DNA sequences of ‘A’, ‘C’, ‘G’ and ‘T’ are combined to form a genetic program. This genetic program can then be manufactured chemically and get injected into biological cells. The injected program is then executed by the cell machinery to create new molecules, logic and state.

Sequences can be categorized into different functional components, or different types of genetic LEGO parts (called BioBricks), that can be combined together in accordance to a specific biological syntax.

A program to express a green fluorescent protein (GFP), can be created by combining 4 genetic parts:

  1. Promoter
    caatacgcaaaccgcctctccccgcgcgttggccgattcattaatgcagctggcacgacaggtttcccgactggaaagcgggcagtgagcgcaacgcaattaatgtgagttagctcactcattaggcaccccaggctttacactttatgcttccggctcgtatgttgtgtggaattgtgagcggataacaatttcacaca
  2. Ribosome Binding Site (RBS)
    aaagaggagaaa
  3. Protein Coding Sequence (CDS)
    atgcgtaaaggagaagaacttttcactggagttgtcccaattcttgttgaattagatggtgatgttaatgggcacaaattttctgtcagtggagagggtgaaggtgatgcaacatacggaaaacttacccttaaatttatttgcactactggaaaactacctgttccatggccaacacttgtcactactttcggttatggtgttcaatgctttgcgagatacccagatcatatgaaacagcatgactttttcaagagtgccatgcccgaaggttatgtacaggaaagaactatatttttcaaagatgacgggaactacaagacacgtgctgaagtcaagtttgaaggtgatacccttgttaatagaatcgagttaaaaggtattgattttaaagaagatggaaacattcttggacacaaattggaatacaactataactcacacaatgtatacatcatggcagacaaacaaaagaatggaatcaaagttaacttcaaaattagacacaacattgaagatggaagcgttcaactagcagaccattatcaacaaaatactccaattggcgatggccctgtccttttaccagacaaccattacctgtccacacaatctgccctttcgaaagatcccaacgaaaagagagaccacatggtccttcttgagtttgtaacagctgctgggattacacatggcatggatgaactatacaaataataa
  4. Terminator
    ccaggcatcaaataaaacgaaaggctcagtcgaaagactgggcctttcgttttatctgttgtttgtcggtgaacgctctctactagagtcacactggctcaccttcgggtgggcctttctgcgtttata

The combined sequence:

caatacgcaaaccgcctctccccgcgcgttggccgattcattaatgcagctggcacgacaggtttcccgactggaaagcgggcagtgagcgcaacgcaattaatgtgagttagctcactcattaggcaccccaggctttacactttatgcttccggctcgtatgttgtgtggaattgtgagcggataacaatttcacacaaaagaggagaaaatgcgtaaaggagaagaacttttcactggagttgtcccaattcttgttgaattagatggtgatgttaatgggcacaaattttctgtcagtggagagggtgaaggtgatgcaacatacggaaaacttacccttaaatttatttgcactactggaaaactacctgttccatggccaacacttgtcactactttcggttatggtgttcaatgctttgcgagatacccagatcatatgaaacagcatgactttttcaagagtgccatgcccgaaggttatgtacaggaaagaactatatttttcaaagatgacgggaactacaagacacgtgctgaagtcaagtttgaaggtgatacccttgttaatagaatcgagttaaaaggtattgattttaaagaagatggaaacattcttggacacaaattggaatacaactataactcacacaatgtatacatcatggcagacaaacaaaagaatggaatcaaagttaacttcaaaattagacacaacattgaagatggaagcgttcaactagcagaccattatcaacaaaatactccaattggcgatggccctgtccttttaccagacaaccattacctgtccacacaatctgccctttcgaaagatcccaacgaaaagagagaccacatggtccttcttgagtttgtaacagctgctgggattacacatggcatggatgaactatacaaataataaccaggcatcaaataaaacgaaaggctcagtcgaaagactgggcctttcgttttatctgttgtttgtcggtgaacgctctctactagagtcacactggctcaccttcgggtgggcctttctgcgtttata

When we look at the combined sequence, it’s very clear, that reading, editing and collaborating on such sequence combinations is not easy. Also what if we want to have a much more complex system, with hundreds or thousands of different genetic circuits?

Synthetic Biology — Making Biology Easier To Engineer

In the chip design world, the level of abstraction for complex systems varies from assembling bare transistors, to describing logic gates, and finally to hardware description languages like Verilog or VHDL.

In the computer programming world, each program is based on accumulated knowledge of tens of thousands of different APIs and source codes, that were tested and got verified by a community of developers, so there is no need to reinvent the wheel every time and it’s easy to create, collaborate on, and deploy complex architectures.

“Computer programming is an art, because it applies accumulated knowledge to the world, because it requires skill and ingenuity, and especially because it produces objects of beauty.”

- Don Knuth

Synthetic Biology is a subset of Genetic Engineering, it means “Making Biology Easier To Engineer”. How we can apply the same concepts from chip design and computer programming worlds to Genetic Engineering?

Synthetic.js — A High Level Programming Language for Genetic Engineering

The answer is very clear — a high level programming language for describing genetic circuits is mandatory to accelerate the development of new materials, medicine, crops, and much more.

In Synthetic.js, the example above is described in a single line of code:

promoter(‘BBa_R0010’) + rbs(‘BBa_B0034’) + cds(‘BBa_E0040’) + terminator(‘BBa_B0015’)

  • Combining genetic parts is done with the ‘+’ operator.
  • promoter(…), rbs(…), cds(…), terminator(…), etc. describe the type of genetic part that is being combined.
  • Genetic parts from the iGEM catalog can be referenced with their part id (BBa_B0034’, ‘BBa_B0015’, …).
Synthetic.JS code is written on the left screen, in Visual Studio Code. The compiled DNA sequences can then be seen in the command line (right screen).

Open Source, Runs in the Web Browser

We live in a time where there is no place for a programming language that is not distributed as Open Source or as a Free Software (Free as in Free Speech, not Free Beer).

Furthermore, downloading and installing software is the world of the past, if it doesn’t run in the web browser then, it’s not being simple enough.

Synthetic.js was written on top of JavaScript as a framework, it can be used from the web browser without need to install anything, it can be downloaded from NPM or GitHub and run in any imaginable environment such as web, mobile, desktop or as a server app.

It is released under a permissive MIT license, so it can be integrated in any software without the need to expose the source code (as long as the copyright notice is being distributed).

Easy Installation

  1. Install Node.js.
  2. Run “npm init” in the project directory.
  3. Run “npm i synthetic” (see npm)
  4. Create an app.js file, use the following boilerplate code:

require( ‘synthetic.js’)

compile(
‘acgt’
)

5. run “node app” in the command line to view the compiled sequence.

6. For more code examples, view the examples folder in the Synthetic GitHub page.

Simple Syntax, Adheres to iGEM Notations

In Synthetic.js, a genetic circuit can be represented by a set of connected genetic parts. The types of genetic parts are the same as the types of genetic parts provided by the iGEM registry catalog of parts:

promoter()
rbs() — (Ribosome Binding Site)
cds() — (Protein Coding Sequence)
terminator()
proteinDomain()
translationalUnit()
dna()
composite()
proteinGenerator()
restrictionSite()
reporter()
inverter()
receiver()
sender()
measurement()
primerBindingSite()

Genetic parts can be combined with the ‘+’ operator to form a larger sequence:

promoter(‘agcg…’) + rbs(aaactac…) + cds(‘atg…TGSNVFQTRAGCLIG…taataa’) + terminator(‘aagcgaaat…’)

  • DNA sequences are written with lowercase letters.
  • Amino acid sequences are written with uppercase letters.

Direct Connection to Databases — It’s Really Part of the Language

Parts from the iGEM registry or from the RCSB Protein Data Bank can be referenced with their id. The compiler then fetches the sequences and performs codon usage optimization for the selected chassis.

For example:

promoter(‘BBa_R0010’) + rbs(‘BBa_B0034’) + cds(‘BBa_E0040’) + terminator(‘BBa_B0015’)

Or:

promoter(‘BBa_R0010’) + rbs(‘BBa_B0034’) + cds(‘6VXX’) + terminator(‘BBa_B0015’)

Codon Usage Optimization

Amino acid sequences can be optimized for the selected organism. The default organism is E.Coli, but any organism that exist in the Codon Usage Database can be used.

You can search for available codon usage tables for your organism of choice by calling (this function is still a work in progress):

let availableOrgnisms = await searchCodonUsageTable(query)

This function then returns an array with available species and their codon usage table ids. A codon usage table id can then be used to set the current organism for optimization:

await setCodonUsageTable(availableOrgnisms[0].tableId)

Combinatorial Design

One of the most important features of the programming language is the ability to express easily sequence variants. Each genetic part can contain an array of different sequence options:

For example: here 3 sequence options are suggested for the promoter and 2 sequence options are suggested for the protein domain:

promoter([‘sequence 1’, ‘sequence 2’, ‘sequence 3’]) rbs(‘…’) cds(‘…’) proteinDomain([‘sequence A’, ‘sequence B’])

The compiler will then generate raw DNA sequences for 6 variants:

promoter(‘sequence 1’) … proteinDomain(‘sequence A’);
promoter(‘sequence 1’) … proteinDomain(‘sequence B’);
promoter(‘sequence 2’) … proteinDomain(‘sequence A’);
promoter(‘sequence 2’) … proteinDomain(‘sequence B’);
promoter(‘sequence 3’) … proteinDomain(‘sequence A’);
promoter(‘sequence 3’) … proteinDomain(‘sequence B’);

Any number of sequence options can be placed anywhere, so genetic libraries can be created with a few lines of code.

What’s Next

  • Improving debugger messages.
  • Adding support for restriction sites evaluation.
  • Developing an open source, web based editor.

--

--

--

CTO at Innoging Medical

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

4 Really Small Things That Say a Lot about You Hint: It’s not what you say.

Microservices and the next phase of Digital Transformation.

Building Java Projects with Maven

CS373 Fall 2021 Week 7: Thomas Norman

My Sprinklers are Smart? The Internet of Things

Build a Cloud Server State Machine with PubNub Functions

Four sucker bets that are eating your R&D budget

Java Nptel Unproctored Exam Questions

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Eliad Moshe

Eliad Moshe

CTO at Innoging Medical

More from Medium

Free stock photo sites for your blog

Free stock photo sites for your blog

CS371p Spring 2022 Week 6: Jae Garcia-Herrera

Week to Week-1

How to use your Ledger hardware wallet with SORA network and Polkaswap.io