{ "cells": [ { "cell_type": "markdown", "id": "dbfb1aa7-3761-404c-adc6-880fdb4c6305", "metadata": {}, "source": [ "# Description of Inputs" ] }, { "cell_type": "markdown", "id": "06516aeb-7c58-4d94-8def-580b08b00ee0", "metadata": {}, "source": [ "coralME takes a total of 7 inputs, 2 required and 5 optional. Additionally, it takes 2 configuration files.\n", "\n", "## Types of inputs\n", "\n", "### Required\n", "\n", "1. __Genome file__ (**genome.gb**)\n", "2. __M-model__ (**m_model.json** or **m_model.xml**)\n", "\n", "### Optional\n", "\n", "Downloadable from an existing **BioCyc** database under **Special SmartTables**. If no optional files are provided, coralME complements them with **genome.gb**\n", "\n", "3. __Genes file__, by default: **genes.txt**\n", "4. __RNAs file__, by default: **RNAs.txt**\n", "5. __Proteins file__, by default: **proteins.txt**\n", "6. __TUs file__, by default: **TUs.txt.**\n", "7. __Sequences file__, by default: **sequences.fasta**\n", "\n", "### Configuration\n", "8. __Paths file__, by default: **inputs.json**\n", "9. __Parameters file__, by default: **organism.json**" ] }, { "cell_type": "markdown", "id": "4c77ad67-e6d9-4803-b9b7-7fb6bfa3d5c5", "metadata": {}, "source": [ "\"Drawing\"" ] }, { "cell_type": "markdown", "id": "5afa35ec-e485-4135-ae58-4d4aae93c151", "metadata": {}, "source": [ "## Description\n", "### Genome (**genome.gb**)" ] }, { "cell_type": "markdown", "id": "85e9f705-8137-4736-bfc4-7e49d3194a8b", "metadata": {}, "source": [ "#### Description\n", "\n", "The genome file contains provides coralME with:\n", "\n", "* Gene annotations.\n", "* Gene sequences." ] }, { "cell_type": "markdown", "id": "e7e21506-dd98-4a4c-b0ee-ef594dbdce0d", "metadata": {}, "source": [ "#### Requirements\n", "\n", "1. Locus tags (locus_tag or old_locus_tag) MUST be consistent with **m_model.json**. Make sure you download the same genome file that was used to reconstruct the M-model.\n", "2. Has name **genome.gb**.\n", "3. Genbank-compliant file. Must be read by BioPython correctly.\n", "4. It must contain the entire genome sequence. Make sure to enable **Customize View**>**Show Sequence** before downloading the genbank file from NCBI." ] }, { "cell_type": "markdown", "id": "2f561f2e-628b-4b2b-8952-89077d35f59e", "metadata": {}, "source": [ "See an example of [genome.gb](./helper_files/inputs/genome.gb) and [sequences.fasta](./helper_files/inputs/sequences.fasta)" ] }, { "cell_type": "markdown", "id": "6d682750-457a-4ccb-81da-452cd35a2abf", "metadata": {}, "source": [ "### M-model (**m_model.json**)" ] }, { "cell_type": "markdown", "id": "9a65f985-9b1e-467c-aa4c-e5619bbc3dba", "metadata": {}, "source": [ "#### Description\n", "\n", "The M-model provides coralME with the metabolic model components:\n", "\n", "* Metabolic network (M-matrix)\n", "* Gene-protein-reaction associations\n", "* Environmental and internal constraints\n", "* Reaction subsystems\n", "* Biomass composition" ] }, { "cell_type": "markdown", "id": "c28fed0f-c8be-4ee6-a44e-bba3cb3e9ad4", "metadata": {}, "source": [ "#### Requirements\n", "\n", "1. Gene identifiers MUST be consistent with **genome.gb** locus_tag or old_locus_tag. Make sure you download the same genome file that was used to reconstruct the M-model.\n", "2. Has name **m_model.json**.\n", "3. COBRApy-compliant. Must be read by cobrapy-0.25.0." ] }, { "cell_type": "markdown", "id": "acbfb2b3-e80e-4564-965d-5d8305c7b189", "metadata": {}, "source": [ "See an example of [m_model.json](./helper_files/inputs/m_model.json)" ] }, { "cell_type": "markdown", "id": "bb9d6160-c642-4876-8f12-8c6e0fb60538", "metadata": {}, "source": [ "### Gene dictionary (**genes.txt**) [optional]" ] }, { "cell_type": "markdown", "id": "9ded3740-aaa4-41a6-bbb7-5a9409da7a31", "metadata": {}, "source": [ "#### Description\n", "\n", "**genes.txt** is a gene information table that can be downloaded from the **All genes of organism SmartTable** of the **[BioCyc](https://biocyc.org/)** database. Click **Export**>**to Spreadsheet File**>**frame IDs**. This file is optional and is meant to complement the information from **genome.gb** in case the latter is missing genes.\n", "\n", "**genes.txt** provides coralME with:\n", "\n", "* Gene locus tags\n", "* Gene names\n", "* Gene annotations\n", "* Gene positions\n", "* Gene products (protein, tRNA, etc.)" ] }, { "cell_type": "markdown", "id": "0213c150-af6d-4f24-811a-64a93253ac35", "metadata": {}, "source": [ "#### Requirements\n", "\n", "1. Contains the index **Gene Name** and columns **Accession-1**, **Left-End-Position**, **Right-End-Position**, and **Product**.\n", "2. **Accession-1** MUST be consistent with the gene IDs in the GPRs of **m_model.json** and with the locus_tag (or old_locus_tag) in **genome.gb**.\n", "3. **Gene Name** is consistent with:\n", "\n", " * Column **Genes of polypeptide, complex, or RNA** of **proteins.txt**\n", " * Column **Gene** of **RNAs.txt** \n", " * Column **Genes of transcription unit** of **TUs.txt**\n", " * Gene identifiers in **sequences.fasta**\n", "4. **Product** is consistent with:\n", "\n", " * Index of **proteins.txt**\n", " * Index of **RNAs.txt**\n", " \n", "5. Must be tab-separated\n", "\n", "See an example of [genes.txt](./helper_files/inputs/genes.txt)\n", "\n", "
\n", "**Note:** **Requirement 2** regarding ID consistency should be directly met if the files are downloaded from the correct BioCyc database.\n", "
\n", "\n", "
\n", "**Note:** **Requirements 3, 4 and 5** regarding ID consistency should be directly met if the files are downloaded from the same BioCyc database.\n", "
\n", "\n", "\n", " " ] }, { "cell_type": "markdown", "id": "ddfee514-b485-48b6-8fce-a9bdbbda86fe", "metadata": {}, "source": [ "### Proteins (**proteins.txt**) [optional]" ] }, { "cell_type": "markdown", "id": "4fb432b9-91fc-44d0-bea3-ff0f4502a8a0", "metadata": {}, "source": [ "#### Description\n", "**proteins.txt** is a protein complex information table that can be downloaded from the **All proteins of organism SmartTable** of the **[BioCyc](https://biocyc.org/)** database. Click **Export**>**to Spreadsheet File**>**frame IDs**. This file is optional and is meant to complement the information from **genome.gb**.\n", "\n", "**proteins.txt** provides coralME with:\n", "* Protein complex compositions" ] }, { "cell_type": "markdown", "id": "6aa1307f-7ad5-4d8e-b5db-3f7934173f36", "metadata": {}, "source": [ "#### Requirements\n", "\n", "1. Contains the index **(Proteins Complexes)** and columns **Common-Name**, **Genes of polypeptide, complex, or RNA**, and **Locations**.\n", "2. **(Proteins Complexes)** is consistent with:\n", " * Column **Product** of **genes.txt**\n", "3. **Genes of polypeptide, complex, or RNA** is consistent with:\n", " * Index **Gene Name** of **genes.txt**\n", "4. Must be tab-separated\n", "\n", "See an example of [proteins.txt](./helper_files/inputs/proteins.txt)\n", "\n", "
\n", "**Note:** **Requirements 2, 3 and 4** regarding ID consistency should be directly met if the files are downloaded from the same BioCyc database. \n", "
\n", "\n" ] }, { "cell_type": "markdown", "id": "37d1e491-259c-4add-b863-e67ff301033c", "metadata": {}, "source": [ "### RNAs (**RNAs.txt**) [optional]" ] }, { "cell_type": "markdown", "id": "0de2b032-94fb-4e5e-98d6-bd235b349627", "metadata": {}, "source": [ "#### Description\n", "**RNAs.txt** is an RNA annotation table that can be downloaded from the **All RNAs of organism SmartTable** of the **[BioCyc](https://biocyc.org/)** database. Click **Export**>**to Spreadsheet File**>**frame IDs**. This file is optional and is meant to complement the information from **genome.gb**.\n", "\n", "**RNAs.txt** provides coralME with:\n", "\n", "* Genes annotated as RNA products (e.g. tRNA, rRNA, etc.)\n", "* RNA gene annotations (e.g. amino acids - tRNA associations)" ] }, { "cell_type": "markdown", "id": "82313222-2621-463c-8bb9-34e407f0aac2", "metadata": {}, "source": [ "#### Requirements\n", "\n", "1. Contains the index **(All-tRNAs Misc-RNAs rRNAs)** and columns **Common-Name**, and **Gene**\n", "2. **(All-tRNAs Misc-RNAs rRNAs)** is consistent with:\n", "\n", " * Column **Product** of **genes.txt**\n", "3. **Gene** is consistent with:\n", "\n", " * Index **Gene Name** of **genes.txt**\n", "4. Must be tab-separated\n", " \n", "See an example of [RNAs.txt](./helper_files/inputs/RNAs.txt)\n", "\n", "
\n", "**Note:** **Requirements 2, 3 and 4** regarding ID consistency should be directly met if the files are downloaded from the same BioCyc database. \n", "
" ] }, { "cell_type": "markdown", "id": "b734e85f-de85-463e-85d7-5d094a97274d", "metadata": {}, "source": [ "### TUs (**TUs.txt**) [optional]" ] }, { "cell_type": "markdown", "id": "5d1ea005-9fc3-4dbf-bf9b-200100dd23a9", "metadata": {}, "source": [ "#### Description\n", "\n", "**TUs.txt** is a transcription unit annotation table that can be downloaded from the **All TUs of organism SmartTable** of the **[BioCyc](https://biocyc.org/)** database. Click **Export**>**to Spreadsheet File**>**frame IDs**. This file is optional and is meant to complement the information from **genome.gb**.\n", "\n", "**TUs.txt** provides coralME with:\n", "\n", "* Co-transcribed genes (operons).\n", "* Direction of transcription.\n", "* TU IDs." ] }, { "cell_type": "markdown", "id": "73c53359-c89b-4f43-84b3-0f90a7d672cb", "metadata": {}, "source": [ "#### Requirements\n", "\n", "1. Contains the index **Transcription-Units** and columns **Genes of transcription unit**, and **Direction**\n", "2. **Genes of transcription unit** is consistent with:\n", "\n", " * Index **Gene Name** of **genes.txt**\n", "3. Must be tab-separated\n", " \n", "See an example of [TUs.txt](./helper_files/inputs/TUs.txt)\n", " \n", "
\n", "**Note:** **Requirements 2 and 3** regarding ID consistency should be directly met if the files are downloaded from the same BioCyc database. \n", "
" ] }, { "cell_type": "markdown", "id": "b72b58d0-d3a6-4325-bfdd-2e3c49beb65a", "metadata": {}, "source": [ "### Gene sequences (**sequences.fasta**) [optional]" ] }, { "cell_type": "markdown", "id": "72361c99-fa06-4702-81be-3893e94df157", "metadata": {}, "source": [ "#### Description\n", "**sequences.fasta** is a nucleotide FASTA file that can be downloaded from the **All genes of organism SmartTable** of the **[BioCyc](https://biocyc.org/)** database. Click **Export**>**FASTA**>**Find sequences**. This file is optional and is meant to complement the information from **genome.gb** in case the latter is missing genes.\n", "\n", "**sequences.fasta** provides coralME with:\n", "\n", "* Gene sequences" ] }, { "cell_type": "markdown", "id": "3b0e09a0-11c6-4bd6-ab5b-9462c254d9f6", "metadata": {}, "source": [ "#### Requirements\n", "\n", "1. Gene identifiers are consistent with:\n", "\n", " * Index **Gene Name** of **genes.txt** \n", "2. Must be tab-separated\n", " \n", "See an example of [sequences.fasta](./helper_files/inputs/sequences.fasta)\n", " \n", "
\n", "**Note:** **Requirements 1, 2 and 3** regarding ID consistency should be directly met if the files are downloaded from the same BioCyc database.\n", "
" ] }, { "cell_type": "markdown", "id": "ca6c798f-08e1-4a02-a5b5-be62afd5dfb7", "metadata": {}, "source": [ "### Configuration of paths to files (**inputs.json**)" ] }, { "cell_type": "markdown", "id": "4a966960-a195-4752-8bc6-0ac4dbd6ec29", "metadata": {}, "source": [ "#### Description\n", "**inputs.json** is a JSON file containing paths to input files for coralME.\n", "\n", "**inputs.json** provides coralME with:\n", "\n", "* Paths to input files" ] }, { "cell_type": "markdown", "id": "670188ec-e3aa-4ab4-9875-953c3181f04a", "metadata": {}, "source": [ "#### Requirements\n", "\n", "1. Must be JSON-compliant\n", "2. Must contain paths to required files (M-model and Genome).\n", "3. All defined files must exist.\n", "\n", "See an example of [input.json](./helper_files/input.json)" ] }, { "cell_type": "markdown", "id": "a5c00264-195c-4c3d-9c87-3e4521b60cd6", "metadata": {}, "source": [ "### Configuration of parameters (**organism.json**)" ] }, { "cell_type": "markdown", "id": "925659dd-87b8-4aa3-bb21-3529c8058ade", "metadata": {}, "source": [ "#### Description\n", "\n", "**organism.json** is a JSON file containing paths to input files for coralME.\n", "\n", "**organism.json** provides coralME with:\n", "\n", "* ME-modeling parameters" ] }, { "cell_type": "markdown", "id": "f9a9ef86-30e2-4263-8048-813f4416988e", "metadata": {}, "source": [ "#### Requirements\n", "\n", "1. Must be JSON-compliant\n", "2. Must contain the standard fields.\n", "\n", "See an example of [organism.json](./helper_files/organism.json)" ] } ], "metadata": { "kernelspec": { "display_name": "coralme-1.1.5", "language": "python", "name": "coralme-1.1.5" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 5 }