{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "dbfb1aa7-3761-404c-adc6-880fdb4c6305",
   "metadata": {},
   "source": [
    "# Description of Inputs"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "06516aeb-7c58-4d94-8def-580b08b00ee0",
   "metadata": {},
   "source": [
    "coralME takes a total of 7 inputs, 2 required and 5 optional. Additionally, it takes 2 configuration files.\n",
    "\n",
    "## Types of inputs\n",
    "\n",
    "### Required\n",
    "\n",
    "1. __Genome file__ (**<code>genome.gb</code>**)\n",
    "2. __M-model__ (**<code>m_model.json</code>** or **<code>m_model.xml</code>**)\n",
    "\n",
    "### Optional\n",
    "\n",
    "Downloadable from an existing **BioCyc** database under **<code>Special SmartTables</code>**. If no optional files are provided, coralME complements them with **<code>genome.gb</code>**\n",
    "\n",
    "3. __Genes file__, by default: **<code>genes.txt</code>**\n",
    "4. __RNAs file__, by default: **<code>RNAs.txt</code>**\n",
    "5. __Proteins file__, by default: **<code>proteins.txt</code>**\n",
    "6. __TUs file__, by default: **<code>TUs.txt.</code>**\n",
    "7. __Sequences file__, by default: **<code>sequences.fasta</code>**\n",
    "\n",
    "### Configuration\n",
    "8. __Paths file__, by default: **inputs.json**\n",
    "9. __Parameters file__, by default: **organism.json**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4c77ad67-e6d9-4803-b9b7-7fb6bfa3d5c5",
   "metadata": {},
   "source": [
    "<img src=\"./pngs/inputs.png\" alt=\"Drawing\" style=\"width: 800px;\"/>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5afa35ec-e485-4135-ae58-4d4aae93c151",
   "metadata": {},
   "source": [
    "## Description\n",
    "### Genome (**<code>genome.gb</code>**)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "85e9f705-8137-4736-bfc4-7e49d3194a8b",
   "metadata": {},
   "source": [
    "#### Description\n",
    "\n",
    "The genome file contains provides coralME with:\n",
    "\n",
    "* Gene annotations.\n",
    "* Gene sequences."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e7e21506-dd98-4a4c-b0ee-ef594dbdce0d",
   "metadata": {},
   "source": [
    "#### Requirements\n",
    "\n",
    "1. Locus tags (locus_tag or old_locus_tag) MUST be consistent with **<code>m_model.json</code>**. Make sure you download the same genome file that was used to reconstruct the M-model.\n",
    "2. Has name **<code>genome.gb</code>**.\n",
    "3. Genbank-compliant file. Must be read by BioPython correctly.\n",
    "4. It must contain the entire genome sequence. Make sure to enable **<code>Customize View</code>**>**<code>Show Sequence</code>** before downloading the genbank file from NCBI."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2f561f2e-628b-4b2b-8952-89077d35f59e",
   "metadata": {},
   "source": [
    "See an example of [genome.gb](./helper_files/inputs/genome.gb) and [sequences.fasta](./helper_files/inputs/sequences.fasta)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6d682750-457a-4ccb-81da-452cd35a2abf",
   "metadata": {},
   "source": [
    "### M-model (**<code>m_model.json</code>**)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9a65f985-9b1e-467c-aa4c-e5619bbc3dba",
   "metadata": {},
   "source": [
    "#### Description\n",
    "\n",
    "The M-model provides coralME with the metabolic model components:\n",
    "\n",
    "* Metabolic network (M-matrix)\n",
    "* Gene-protein-reaction associations\n",
    "* Environmental and internal constraints\n",
    "* Reaction subsystems\n",
    "* Biomass composition"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c28fed0f-c8be-4ee6-a44e-bba3cb3e9ad4",
   "metadata": {},
   "source": [
    "#### Requirements\n",
    "\n",
    "1. Gene identifiers MUST be consistent with **<code>genome.gb</code>** locus_tag or old_locus_tag. Make sure you download the same genome file that was used to reconstruct the M-model.\n",
    "2. Has name **<code>m_model.json</code>**.\n",
    "3. COBRApy-compliant. Must be read by cobrapy-0.25.0."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "acbfb2b3-e80e-4564-965d-5d8305c7b189",
   "metadata": {},
   "source": [
    "See an example of [m_model.json](./helper_files/inputs/m_model.json)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bb9d6160-c642-4876-8f12-8c6e0fb60538",
   "metadata": {},
   "source": [
    "### Gene dictionary (**<code>genes.txt</code>**) [optional]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9ded3740-aaa4-41a6-bbb7-5a9409da7a31",
   "metadata": {},
   "source": [
    "#### Description\n",
    "\n",
    "**<code>genes.txt</code>** is a gene information table that can be downloaded from the **All genes of <i>organism</i> SmartTable** of the **[BioCyc](https://biocyc.org/)** database. Click **<code>Export</code>**>**<code>to Spreadsheet File</code>**>**<code>frame IDs</code>**. This file is optional and is meant to complement the information from **<code>genome.gb</code>** in case the latter is missing genes.\n",
    "\n",
    "**<code>genes.txt</code>** provides coralME with:\n",
    "\n",
    "* Gene locus tags\n",
    "* Gene names\n",
    "* Gene annotations\n",
    "* Gene positions\n",
    "* Gene products (protein, tRNA, etc.)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0213c150-af6d-4f24-811a-64a93253ac35",
   "metadata": {},
   "source": [
    "#### Requirements\n",
    "\n",
    "1. Contains the index **Gene Name** and columns **Accession-1**, **Left-End-Position**, **Right-End-Position**, and **Product**.\n",
    "2. **Accession-1** MUST be consistent with the gene IDs in the GPRs of **<code>m_model.json</code>** and with the locus_tag (or old_locus_tag) in **<code>genome.gb</code>**.\n",
    "3. **Gene Name** is consistent with:\n",
    "\n",
    "    * Column **Genes of polypeptide, complex, or RNA** of **<code>proteins.txt</code>**\n",
    "    * Column **Gene** of **<code>RNAs.txt</code>** \n",
    "    * Column **Genes of transcription unit** of **<code>TUs.txt</code>**\n",
    "    * Gene identifiers in **<code>sequences.fasta</code>**\n",
    "4. **Product** is consistent with:\n",
    "\n",
    "    * Index of **<code>proteins.txt</code>**\n",
    "    * Index of **<code>RNAs.txt</code>**\n",
    "    \n",
    "5. Must be tab-separated\n",
    "\n",
    "See an example of [genes.txt](./helper_files/inputs/genes.txt)\n",
    "\n",
    "<div class=\"alert alert-info\">\n",
    "**Note:** **Requirement 2** regarding ID consistency should be directly met if the files are downloaded from the correct BioCyc database.\n",
    "</div>\n",
    "\n",
    "<div class=\"alert alert-info\">\n",
    "**Note:** **Requirements 3, 4 and 5** regarding ID consistency should be directly met if the files are downloaded from the same BioCyc database.\n",
    "</div>\n",
    "\n",
    "\n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ddfee514-b485-48b6-8fce-a9bdbbda86fe",
   "metadata": {},
   "source": [
    "### Proteins (**<code>proteins.txt</code>**) [optional]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4fb432b9-91fc-44d0-bea3-ff0f4502a8a0",
   "metadata": {},
   "source": [
    "#### Description\n",
    "**<code>proteins.txt</code>** is a protein complex information table that can be downloaded from the **All proteins of <i>organism</i> SmartTable** of the **[BioCyc](https://biocyc.org/)** database. Click **<code>Export</code>**>**<code>to Spreadsheet File</code>**>**<code>frame IDs</code>**. This file is optional and is meant to complement the information from **<code>genome.gb</code>**.\n",
    "\n",
    "**<code>proteins.txt</code>** provides coralME with:\n",
    "* Protein complex compositions"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6aa1307f-7ad5-4d8e-b5db-3f7934173f36",
   "metadata": {},
   "source": [
    "#### Requirements\n",
    "\n",
    "1. Contains the index **(Proteins Complexes)** and columns **Common-Name**, **Genes of polypeptide, complex, or RNA**, and **Locations**.\n",
    "2. **(Proteins Complexes)** is consistent with:\n",
    "    * Column **Product** of **<code>genes.txt</code>**\n",
    "3. **Genes of polypeptide, complex, or RNA** is consistent with:\n",
    "    * Index **Gene Name** of **<code>genes.txt</code>**\n",
    "4. Must be tab-separated\n",
    "\n",
    "See an example of [proteins.txt](./helper_files/inputs/proteins.txt)\n",
    "\n",
    "<div class=\"alert alert-info\">\n",
    "**Note:** **Requirements 2, 3 and 4** regarding ID consistency should be directly met if the files are downloaded from the same BioCyc database.    \n",
    "</div>\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "37d1e491-259c-4add-b863-e67ff301033c",
   "metadata": {},
   "source": [
    "### RNAs (**<code>RNAs.txt</code>**) [optional]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0de2b032-94fb-4e5e-98d6-bd235b349627",
   "metadata": {},
   "source": [
    "#### Description\n",
    "**<code>RNAs.txt</code>** is an RNA annotation table that can be downloaded from the **All RNAs of <i>organism</i> SmartTable** of the **[BioCyc](https://biocyc.org/)** database. Click **<code>Export</code>**>**<code>to Spreadsheet File</code>**>**<code>frame IDs</code>**. This file is optional and is meant to complement the information from **<code>genome.gb</code>**.\n",
    "\n",
    "**<code>RNAs.txt</code>** provides coralME with:\n",
    "\n",
    "* Genes annotated as RNA products (e.g. tRNA, rRNA, etc.)\n",
    "* RNA gene annotations (e.g. amino acids - tRNA associations)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "82313222-2621-463c-8bb9-34e407f0aac2",
   "metadata": {},
   "source": [
    "#### Requirements\n",
    "\n",
    "1. Contains the index **(All-tRNAs Misc-RNAs rRNAs)** and columns **Common-Name**, and **Gene**\n",
    "2. **(All-tRNAs Misc-RNAs rRNAs)** is consistent with:\n",
    "\n",
    "    * Column **Product** of **<code>genes.txt</code>**\n",
    "3. **Gene** is consistent with:\n",
    "\n",
    "    * Index **Gene Name** of **<code>genes.txt</code>**\n",
    "4. Must be tab-separated\n",
    "    \n",
    "See an example of [RNAs.txt](./helper_files/inputs/RNAs.txt)\n",
    "\n",
    "<div class=\"alert alert-info\">\n",
    "**Note:** **Requirements 2, 3 and 4** regarding ID consistency should be directly met if the files are downloaded from the same BioCyc database.    \n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b734e85f-de85-463e-85d7-5d094a97274d",
   "metadata": {},
   "source": [
    "### TUs (**<code>TUs.txt</code>**) [optional]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5d1ea005-9fc3-4dbf-bf9b-200100dd23a9",
   "metadata": {},
   "source": [
    "#### Description\n",
    "\n",
    "**<code>TUs.txt</code>** is a transcription unit annotation table that can be downloaded from the **All TUs of <i>organism</i> SmartTable** of the **[BioCyc](https://biocyc.org/)** database. Click **<code>Export</code>**>**<code>to Spreadsheet File</code>**>**<code>frame IDs</code>**. This file is optional and is meant to complement the information from **<code>genome.gb</code>**.\n",
    "\n",
    "**<code>TUs.txt</code>** provides coralME with:\n",
    "\n",
    "* Co-transcribed genes (operons).\n",
    "* Direction of transcription.\n",
    "* TU IDs."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "73c53359-c89b-4f43-84b3-0f90a7d672cb",
   "metadata": {},
   "source": [
    "#### Requirements\n",
    "\n",
    "1. Contains the index **Transcription-Units** and columns **Genes of transcription unit**, and **Direction**\n",
    "2. **Genes of transcription unit** is consistent with:\n",
    "\n",
    "    * Index **Gene Name** of **<code>genes.txt</code>**\n",
    "3. Must be tab-separated\n",
    "    \n",
    "See an example of [TUs.txt](./helper_files/inputs/TUs.txt)\n",
    "    \n",
    "<div class=\"alert alert-info\">\n",
    "**Note:** **Requirements 2 and 3** regarding ID consistency should be directly met if the files are downloaded from the same BioCyc database.    \n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b72b58d0-d3a6-4325-bfdd-2e3c49beb65a",
   "metadata": {},
   "source": [
    "### Gene sequences (**<code>sequences.fasta</code>**) [optional]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "72361c99-fa06-4702-81be-3893e94df157",
   "metadata": {},
   "source": [
    "#### Description\n",
    "**<code>sequences.fasta</code>** is a nucleotide FASTA file that can be downloaded from the **All genes of <i>organism</i> SmartTable** of the **[BioCyc](https://biocyc.org/)** database. Click **<code>Export</code>**>**<code>FASTA</code>**>**<code>Find sequences</code>**. This file is optional and is meant to complement the information from **<code>genome.gb</code>** in case the latter is missing genes.\n",
    "\n",
    "**<code>sequences.fasta</code>** provides coralME with:\n",
    "\n",
    "* Gene sequences"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3b0e09a0-11c6-4bd6-ab5b-9462c254d9f6",
   "metadata": {},
   "source": [
    "#### Requirements\n",
    "\n",
    "1. Gene identifiers are consistent with:\n",
    "\n",
    "    * Index **Gene Name** of **<code>genes.txt</code>**    \n",
    "2. Must be tab-separated\n",
    "    \n",
    "See an example of [sequences.fasta](./helper_files/inputs/sequences.fasta)\n",
    "    \n",
    "<div class=\"alert alert-info\">\n",
    "**Note:** **Requirements 1, 2 and 3** regarding ID consistency should be directly met if the files are downloaded from the same BioCyc database.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ca6c798f-08e1-4a02-a5b5-be62afd5dfb7",
   "metadata": {},
   "source": [
    "### Configuration of paths to files (**<code>inputs.json**<code>)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4a966960-a195-4752-8bc6-0ac4dbd6ec29",
   "metadata": {},
   "source": [
    "#### Description\n",
    "**<code>inputs.json</code>** is a JSON file containing paths to input files for coralME.\n",
    "\n",
    "**<code>inputs.json</code>** provides coralME with:\n",
    "\n",
    "* Paths to input files"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "670188ec-e3aa-4ab4-9875-953c3181f04a",
   "metadata": {},
   "source": [
    "#### Requirements\n",
    "\n",
    "1. Must be JSON-compliant\n",
    "2. Must contain paths to required files (M-model and Genome).\n",
    "3. All defined files  must exist.\n",
    "\n",
    "See an example of [input.json](./helper_files/input.json)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a5c00264-195c-4c3d-9c87-3e4521b60cd6",
   "metadata": {},
   "source": [
    "### Configuration of parameters (**<code>organism.json**<code>)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "925659dd-87b8-4aa3-bb21-3529c8058ade",
   "metadata": {},
   "source": [
    "#### Description\n",
    "\n",
    "**<code>organism.json</code>** is a JSON file containing paths to input files for coralME.\n",
    "\n",
    "**<code>organism.json</code>** provides coralME with:\n",
    "\n",
    "* ME-modeling parameters"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f9a9ef86-30e2-4263-8048-813f4416988e",
   "metadata": {},
   "source": [
    "#### Requirements\n",
    "\n",
    "1. Must be JSON-compliant\n",
    "2. Must contain the standard fields.\n",
    "\n",
    "See an example of [organism.json](./helper_files/organism.json)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "coralme-1.1.5",
   "language": "python",
   "name": "coralme-1.1.5"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}