10. Frequently Asked Questions

10.1. My gene identifiers are not consistent, what should I do?

We know that consistent gene ID conventions are a problem across all platforms of bioinformatics. We tried to generalize as much as possible what the gene conventions could be, but often different genome assemblies or M-model reconstructions yield inconsistent files.

What you should do depends on your problem, so we will classify the gene ID convention issues considering there are three main sources of information that must be consistent:

  • M-model gene identifiers

  • Genome locus_tag

  • Optional file column Accession-1

Overall, you can assume that modifying the genome genbank file is the hardest approach and thus, the last resort.

10.1.1. 1. M-model and Genbank are consistent, but they are not consistent with the BioCyc files.

Make sure that you looked for the correct BioCyc database, which corresponds to the M-model reconstruction. One quick way to ensure that is to copy one gene from your genbank or M-model and paste it in the search bar of BioCyc. Best case scenario, you microbe will appear in the list. Download the files from there and your problems are solved!

That didn’t help?

It is possible that even when ensuring the BioCyc database is correct, the Accession-1 column of genes.txt is still not consistent. However, you can assume that the correct IDs are somewhere in the database, since you found it looking for a gene id that follows your conventions (see Getting Started).

Try:

  • Adding new columns in the gene SmartTable, Accession-2 or Synonyms could contain your IDs.

  • Maybe your IDs and BioCyc’s only differ by an underscore, e.g. “PP0001” and “PP_0001”. Use a text editor to change the IDs accordingly in Accession-1 of genes.txt. Make sure not to make a mistake by editing gene IDs!

10.1.2. 2. M-model and Genbank are not consistent

Make sure that you downloaded the same genbank file that was used to reconstruct the M-model, that is critical! If this is happening, you probably have the wrong genbank.

If you have a gene dictionary to convert between conventions, change the files to the IDs that are consistent with BioCyc.