2. Description of Inputs

coralME takes a total of 7 inputs, 2 required and 5 optional. Additionally, it takes 2 configuration files.

2.1. Types of inputs

2.1.1. Required

  1. Genome file (genome.gb)

  2. M-model (m_model.json or m_model.xml)

2.1.2. Optional

Downloadable from an existing BioCyc database under Special SmartTables. If no optional files are provided, coralME complements them with genome.gb

  1. Genes file, by default: genes.txt

  2. RNAs file, by default: RNAs.txt

  3. Proteins file, by default: proteins.txt

  4. TUs file, by default: TUs.txt.

  5. Sequences file, by default: sequences.fasta

2.1.3. Configuration

  1. Paths file, by default: inputs.json

  2. Parameters file, by default: organism.json

Drawing

2.2. Description

2.2.1. Genome (genome.gb)

2.2.1.1. Description

The genome file contains provides coralME with:

  • Gene annotations.

  • Gene sequences.

2.2.1.2. Requirements

  1. Locus tags (locus_tag or old_locus_tag) MUST be consistent with m_model.json. Make sure you download the same genome file that was used to reconstruct the M-model.

  2. Has name genome.gb.

  3. Genbank-compliant file. Must be read by BioPython correctly.

  4. It must contain the entire genome sequence. Make sure to enable Customize View>Show Sequence before downloading the genbank file from NCBI.

See an example of genome.gb and sequences.fasta

2.2.2. M-model (m_model.json)

2.2.2.1. Description

The M-model provides coralME with the metabolic model components:

  • Metabolic network (M-matrix)

  • Gene-protein-reaction associations

  • Environmental and internal constraints

  • Reaction subsystems

  • Biomass composition

2.2.2.2. Requirements

  1. Gene identifiers MUST be consistent with genome.gb locus_tag or old_locus_tag. Make sure you download the same genome file that was used to reconstruct the M-model.

  2. Has name m_model.json.

  3. COBRApy-compliant. Must be read by cobrapy-0.25.0.

See an example of m_model.json

2.2.3. Gene dictionary (genes.txt) [optional]

2.2.3.1. Description

genes.txt is a gene information table that can be downloaded from the All genes of organism SmartTable of the BioCyc database. Click Export>to Spreadsheet File>frame IDs. This file is optional and is meant to complement the information from genome.gb in case the latter is missing genes.

genes.txt provides coralME with:

  • Gene locus tags

  • Gene names

  • Gene annotations

  • Gene positions

  • Gene products (protein, tRNA, etc.)

2.2.3.2. Requirements

  1. Contains the index Gene Name and columns Accession-1, Left-End-Position, Right-End-Position, and Product.

  2. Accession-1 MUST be consistent with the gene IDs in the GPRs of m_model.json and with the locus_tag (or old_locus_tag) in genome.gb.

  3. Gene Name is consistent with:

    • Column Genes of polypeptide, complex, or RNA of proteins.txt

    • Column Gene of RNAs.txt

    • Column Genes of transcription unit of TUs.txt

    • Gene identifiers in sequences.fasta

  4. Product is consistent with:

    • Index of proteins.txt

    • Index of RNAs.txt

  5. Must be tab-separated

See an example of genes.txt

Note: Requirement 2 regarding ID consistency should be directly met if the files are downloaded from the correct BioCyc database.

Note: Requirements 3, 4 and 5 regarding ID consistency should be directly met if the files are downloaded from the same BioCyc database.

2.2.4. Proteins (proteins.txt) [optional]

2.2.4.1. Description

proteins.txt is a protein complex information table that can be downloaded from the All proteins of organism SmartTable of the BioCyc database. Click Export>to Spreadsheet File>frame IDs. This file is optional and is meant to complement the information from genome.gb.

proteins.txt provides coralME with: * Protein complex compositions

2.2.4.2. Requirements

  1. Contains the index (Proteins Complexes) and columns Common-Name, Genes of polypeptide, complex, or RNA, and Locations.

  2. (Proteins Complexes) is consistent with:

    • Column Product of genes.txt

  3. Genes of polypeptide, complex, or RNA is consistent with:

    • Index Gene Name of genes.txt

  4. Must be tab-separated

See an example of proteins.txt

Note: Requirements 2, 3 and 4 regarding ID consistency should be directly met if the files are downloaded from the same BioCyc database.

2.2.5. RNAs (RNAs.txt) [optional]

2.2.5.1. Description

RNAs.txt is an RNA annotation table that can be downloaded from the All RNAs of organism SmartTable of the BioCyc database. Click Export>to Spreadsheet File>frame IDs. This file is optional and is meant to complement the information from genome.gb.

RNAs.txt provides coralME with:

  • Genes annotated as RNA products (e.g. tRNA, rRNA, etc.)

  • RNA gene annotations (e.g. amino acids - tRNA associations)

2.2.5.2. Requirements

  1. Contains the index (All-tRNAs Misc-RNAs rRNAs) and columns Common-Name, and Gene

  2. (All-tRNAs Misc-RNAs rRNAs) is consistent with:

    • Column Product of genes.txt

  3. Gene is consistent with:

    • Index Gene Name of genes.txt

  4. Must be tab-separated

See an example of RNAs.txt

Note: Requirements 2, 3 and 4 regarding ID consistency should be directly met if the files are downloaded from the same BioCyc database.

2.2.6. TUs (TUs.txt) [optional]

2.2.6.1. Description

TUs.txt is a transcription unit annotation table that can be downloaded from the All TUs of organism SmartTable of the BioCyc database. Click Export>to Spreadsheet File>frame IDs. This file is optional and is meant to complement the information from genome.gb.

TUs.txt provides coralME with:

  • Co-transcribed genes (operons).

  • Direction of transcription.

  • TU IDs.

2.2.6.2. Requirements

  1. Contains the index Transcription-Units and columns Genes of transcription unit, and Direction

  2. Genes of transcription unit is consistent with:

    • Index Gene Name of genes.txt

  3. Must be tab-separated

See an example of TUs.txt

Note: Requirements 2 and 3 regarding ID consistency should be directly met if the files are downloaded from the same BioCyc database.

2.2.7. Gene sequences (sequences.fasta) [optional]

2.2.7.1. Description

sequences.fasta is a nucleotide FASTA file that can be downloaded from the All genes of organism SmartTable of the BioCyc database. Click Export>FASTA>Find sequences. This file is optional and is meant to complement the information from genome.gb in case the latter is missing genes.

sequences.fasta provides coralME with:

  • Gene sequences

2.2.7.2. Requirements

  1. Gene identifiers are consistent with:

    • Index Gene Name of genes.txt

  2. Must be tab-separated

See an example of sequences.fasta

Note: Requirements 1, 2 and 3 regarding ID consistency should be directly met if the files are downloaded from the same BioCyc database.

2.2.8. Configuration of paths to files (inputs.json)

2.2.8.1. Description

inputs.json is a JSON file containing paths to input files for coralME.

inputs.json provides coralME with:

  • Paths to input files

2.2.8.2. Requirements

  1. Must be JSON-compliant

  2. Must contain paths to required files (M-model and Genome).

  3. All defined files must exist.

See an example of input.json

2.2.9. Configuration of parameters (organism.json)

2.2.9.1. Description

organism.json is a JSON file containing paths to input files for coralME.

organism.json provides coralME with:

  • ME-modeling parameters

2.2.9.2. Requirements

  1. Must be JSON-compliant

  2. Must contain the standard fields.

See an example of organism.json