{
"cells": [
{
"cell_type": "markdown",
"id": "dbfb1aa7-3761-404c-adc6-880fdb4c6305",
"metadata": {},
"source": [
"# Description of Inputs"
]
},
{
"cell_type": "markdown",
"id": "06516aeb-7c58-4d94-8def-580b08b00ee0",
"metadata": {},
"source": [
"coralME takes a total of 7 inputs, 2 required and 5 optional. Additionally, it takes 2 configuration files.\n",
"\n",
"## Types of inputs\n",
"\n",
"### Required\n",
"\n",
"1. __Genome file__ (**genome.gb**)\n",
"2. __M-model__ (**m_model.json** or **m_model.xml**)\n",
"\n",
"### Optional\n",
"\n",
"Downloadable from an existing **BioCyc** database under **Special SmartTables**. If no optional files are provided, coralME complements them with **genome.gb**\n",
"\n",
"3. __Genes file__, by default: **genes.txt**\n",
"4. __RNAs file__, by default: **RNAs.txt**\n",
"5. __Proteins file__, by default: **proteins.txt**\n",
"6. __TUs file__, by default: **TUs.txt.**\n",
"7. __Sequences file__, by default: **sequences.fasta**\n",
"\n",
"### Configuration\n",
"8. __Paths file__, by default: **inputs.json**\n",
"9. __Parameters file__, by default: **organism.json**"
]
},
{
"cell_type": "markdown",
"id": "4c77ad67-e6d9-4803-b9b7-7fb6bfa3d5c5",
"metadata": {},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"id": "5afa35ec-e485-4135-ae58-4d4aae93c151",
"metadata": {},
"source": [
"## Description\n",
"### Genome (**genome.gb**)"
]
},
{
"cell_type": "markdown",
"id": "85e9f705-8137-4736-bfc4-7e49d3194a8b",
"metadata": {},
"source": [
"#### Description\n",
"\n",
"The genome file contains provides coralME with:\n",
"\n",
"* Gene annotations.\n",
"* Gene sequences."
]
},
{
"cell_type": "markdown",
"id": "e7e21506-dd98-4a4c-b0ee-ef594dbdce0d",
"metadata": {},
"source": [
"#### Requirements\n",
"\n",
"1. Locus tags (locus_tag or old_locus_tag) MUST be consistent with **m_model.json**. Make sure you download the same genome file that was used to reconstruct the M-model.\n",
"2. Has name **genome.gb**.\n",
"3. Genbank-compliant file. Must be read by BioPython correctly.\n",
"4. It must contain the entire genome sequence. Make sure to enable **Customize View**>**Show Sequence** before downloading the genbank file from NCBI."
]
},
{
"cell_type": "markdown",
"id": "2f561f2e-628b-4b2b-8952-89077d35f59e",
"metadata": {},
"source": [
"See an example of [genome.gb](./helper_files/inputs/genome.gb) and [sequences.fasta](./helper_files/inputs/sequences.fasta)"
]
},
{
"cell_type": "markdown",
"id": "6d682750-457a-4ccb-81da-452cd35a2abf",
"metadata": {},
"source": [
"### M-model (**m_model.json**)"
]
},
{
"cell_type": "markdown",
"id": "9a65f985-9b1e-467c-aa4c-e5619bbc3dba",
"metadata": {},
"source": [
"#### Description\n",
"\n",
"The M-model provides coralME with the metabolic model components:\n",
"\n",
"* Metabolic network (M-matrix)\n",
"* Gene-protein-reaction associations\n",
"* Environmental and internal constraints\n",
"* Reaction subsystems\n",
"* Biomass composition"
]
},
{
"cell_type": "markdown",
"id": "c28fed0f-c8be-4ee6-a44e-bba3cb3e9ad4",
"metadata": {},
"source": [
"#### Requirements\n",
"\n",
"1. Gene identifiers MUST be consistent with **genome.gb** locus_tag or old_locus_tag. Make sure you download the same genome file that was used to reconstruct the M-model.\n",
"2. Has name **m_model.json**.\n",
"3. COBRApy-compliant. Must be read by cobrapy-0.25.0."
]
},
{
"cell_type": "markdown",
"id": "acbfb2b3-e80e-4564-965d-5d8305c7b189",
"metadata": {},
"source": [
"See an example of [m_model.json](./helper_files/inputs/m_model.json)"
]
},
{
"cell_type": "markdown",
"id": "bb9d6160-c642-4876-8f12-8c6e0fb60538",
"metadata": {},
"source": [
"### Gene dictionary (**genes.txt**) [optional]"
]
},
{
"cell_type": "markdown",
"id": "9ded3740-aaa4-41a6-bbb7-5a9409da7a31",
"metadata": {},
"source": [
"#### Description\n",
"\n",
"**genes.txt** is a gene information table that can be downloaded from the **All genes of organism SmartTable** of the **[BioCyc](https://biocyc.org/)** database. Click **Export**>**to Spreadsheet File**>**frame IDs**. This file is optional and is meant to complement the information from **genome.gb** in case the latter is missing genes.\n",
"\n",
"**genes.txt** provides coralME with:\n",
"\n",
"* Gene locus tags\n",
"* Gene names\n",
"* Gene annotations\n",
"* Gene positions\n",
"* Gene products (protein, tRNA, etc.)"
]
},
{
"cell_type": "markdown",
"id": "0213c150-af6d-4f24-811a-64a93253ac35",
"metadata": {},
"source": [
"#### Requirements\n",
"\n",
"1. Contains the index **Gene Name** and columns **Accession-1**, **Left-End-Position**, **Right-End-Position**, and **Product**.\n",
"2. **Accession-1** MUST be consistent with the gene IDs in the GPRs of **m_model.json** and with the locus_tag (or old_locus_tag) in **genome.gb**.\n",
"3. **Gene Name** is consistent with:\n",
"\n",
" * Column **Genes of polypeptide, complex, or RNA** of **proteins.txt**\n",
" * Column **Gene** of **RNAs.txt** \n",
" * Column **Genes of transcription unit** of **TUs.txt**\n",
" * Gene identifiers in **sequences.fasta**\n",
"4. **Product** is consistent with:\n",
"\n",
" * Index of **proteins.txt**\n",
" * Index of **RNAs.txt**\n",
" \n",
"5. Must be tab-separated\n",
"\n",
"See an example of [genes.txt](./helper_files/inputs/genes.txt)\n",
"\n",
"
proteins.txt**) [optional]"
]
},
{
"cell_type": "markdown",
"id": "4fb432b9-91fc-44d0-bea3-ff0f4502a8a0",
"metadata": {},
"source": [
"#### Description\n",
"**proteins.txt** is a protein complex information table that can be downloaded from the **All proteins of organism SmartTable** of the **[BioCyc](https://biocyc.org/)** database. Click **Export**>**to Spreadsheet File**>**frame IDs**. This file is optional and is meant to complement the information from **genome.gb**.\n",
"\n",
"**proteins.txt** provides coralME with:\n",
"* Protein complex compositions"
]
},
{
"cell_type": "markdown",
"id": "6aa1307f-7ad5-4d8e-b5db-3f7934173f36",
"metadata": {},
"source": [
"#### Requirements\n",
"\n",
"1. Contains the index **(Proteins Complexes)** and columns **Common-Name**, **Genes of polypeptide, complex, or RNA**, and **Locations**.\n",
"2. **(Proteins Complexes)** is consistent with:\n",
" * Column **Product** of **genes.txt**\n",
"3. **Genes of polypeptide, complex, or RNA** is consistent with:\n",
" * Index **Gene Name** of **genes.txt**\n",
"4. Must be tab-separated\n",
"\n",
"See an example of [proteins.txt](./helper_files/inputs/proteins.txt)\n",
"\n",
"RNAs.txt**) [optional]"
]
},
{
"cell_type": "markdown",
"id": "0de2b032-94fb-4e5e-98d6-bd235b349627",
"metadata": {},
"source": [
"#### Description\n",
"**RNAs.txt** is an RNA annotation table that can be downloaded from the **All RNAs of organism SmartTable** of the **[BioCyc](https://biocyc.org/)** database. Click **Export**>**to Spreadsheet File**>**frame IDs**. This file is optional and is meant to complement the information from **genome.gb**.\n",
"\n",
"**RNAs.txt** provides coralME with:\n",
"\n",
"* Genes annotated as RNA products (e.g. tRNA, rRNA, etc.)\n",
"* RNA gene annotations (e.g. amino acids - tRNA associations)"
]
},
{
"cell_type": "markdown",
"id": "82313222-2621-463c-8bb9-34e407f0aac2",
"metadata": {},
"source": [
"#### Requirements\n",
"\n",
"1. Contains the index **(All-tRNAs Misc-RNAs rRNAs)** and columns **Common-Name**, and **Gene**\n",
"2. **(All-tRNAs Misc-RNAs rRNAs)** is consistent with:\n",
"\n",
" * Column **Product** of **genes.txt**\n",
"3. **Gene** is consistent with:\n",
"\n",
" * Index **Gene Name** of **genes.txt**\n",
"4. Must be tab-separated\n",
" \n",
"See an example of [RNAs.txt](./helper_files/inputs/RNAs.txt)\n",
"\n",
"TUs.txt**) [optional]"
]
},
{
"cell_type": "markdown",
"id": "5d1ea005-9fc3-4dbf-bf9b-200100dd23a9",
"metadata": {},
"source": [
"#### Description\n",
"\n",
"**TUs.txt** is a transcription unit annotation table that can be downloaded from the **All TUs of organism SmartTable** of the **[BioCyc](https://biocyc.org/)** database. Click **Export**>**to Spreadsheet File**>**frame IDs**. This file is optional and is meant to complement the information from **genome.gb**.\n",
"\n",
"**TUs.txt** provides coralME with:\n",
"\n",
"* Co-transcribed genes (operons).\n",
"* Direction of transcription.\n",
"* TU IDs."
]
},
{
"cell_type": "markdown",
"id": "73c53359-c89b-4f43-84b3-0f90a7d672cb",
"metadata": {},
"source": [
"#### Requirements\n",
"\n",
"1. Contains the index **Transcription-Units** and columns **Genes of transcription unit**, and **Direction**\n",
"2. **Genes of transcription unit** is consistent with:\n",
"\n",
" * Index **Gene Name** of **genes.txt**\n",
"3. Must be tab-separated\n",
" \n",
"See an example of [TUs.txt](./helper_files/inputs/TUs.txt)\n",
" \n",
"sequences.fasta**) [optional]"
]
},
{
"cell_type": "markdown",
"id": "72361c99-fa06-4702-81be-3893e94df157",
"metadata": {},
"source": [
"#### Description\n",
"**sequences.fasta** is a nucleotide FASTA file that can be downloaded from the **All genes of organism SmartTable** of the **[BioCyc](https://biocyc.org/)** database. Click **Export**>**FASTA**>**Find sequences**. This file is optional and is meant to complement the information from **genome.gb** in case the latter is missing genes.\n",
"\n",
"**sequences.fasta** provides coralME with:\n",
"\n",
"* Gene sequences"
]
},
{
"cell_type": "markdown",
"id": "3b0e09a0-11c6-4bd6-ab5b-9462c254d9f6",
"metadata": {},
"source": [
"#### Requirements\n",
"\n",
"1. Gene identifiers are consistent with:\n",
"\n",
" * Index **Gene Name** of **genes.txt** \n",
"2. Must be tab-separated\n",
" \n",
"See an example of [sequences.fasta](./helper_files/inputs/sequences.fasta)\n",
" \n",
"inputs.json**)"
]
},
{
"cell_type": "markdown",
"id": "4a966960-a195-4752-8bc6-0ac4dbd6ec29",
"metadata": {},
"source": [
"#### Description\n",
"**inputs.json** is a JSON file containing paths to input files for coralME.\n",
"\n",
"**inputs.json** provides coralME with:\n",
"\n",
"* Paths to input files"
]
},
{
"cell_type": "markdown",
"id": "670188ec-e3aa-4ab4-9875-953c3181f04a",
"metadata": {},
"source": [
"#### Requirements\n",
"\n",
"1. Must be JSON-compliant\n",
"2. Must contain paths to required files (M-model and Genome).\n",
"3. All defined files must exist.\n",
"\n",
"See an example of [input.json](./helper_files/input.json)"
]
},
{
"cell_type": "markdown",
"id": "a5c00264-195c-4c3d-9c87-3e4521b60cd6",
"metadata": {},
"source": [
"### Configuration of parameters (**organism.json**)"
]
},
{
"cell_type": "markdown",
"id": "925659dd-87b8-4aa3-bb21-3529c8058ade",
"metadata": {},
"source": [
"#### Description\n",
"\n",
"**organism.json** is a JSON file containing paths to input files for coralME.\n",
"\n",
"**organism.json** provides coralME with:\n",
"\n",
"* ME-modeling parameters"
]
},
{
"cell_type": "markdown",
"id": "f9a9ef86-30e2-4263-8048-813f4416988e",
"metadata": {},
"source": [
"#### Requirements\n",
"\n",
"1. Must be JSON-compliant\n",
"2. Must contain the standard fields.\n",
"\n",
"See an example of [organism.json](./helper_files/organism.json)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "coralme-1.1.5",
"language": "python",
"name": "coralme-1.1.5"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}