项目作者: margotbligh

项目描述 :
Command line executable tool to calculate all possible glycan molecules and the m/z values of their ions given a set up input parameters.
高级语言: Python
项目地址: git://github.com/margotbligh/sugarMassesPredict.git
创建时间: 2021-03-09T13:55:27Z
项目社区:https://github.com/margotbligh/sugarMassesPredict

开源协议:

下载


sugarMassesPredict

command line tool to calculate all possible glycan molecules and the m/z values of their ions given a set of input parameters

NOTE: I have now also added an ‘R’ version - if you use the reticulate package in R, you can source the file (sugarMassesPredict-r.py) and then use the function ‘predict_sugars’ directly in R to generate output there.

this tool is being frequently updated :) if you have any questions or issue please feel free to contact me at mbligh@mpi-bremen.de

word of caution: please note that this tool will predict sugars that are not possible as the nature of sugar chemistry means that it would take a long time to add in all the constraints!

e.g.

  1. library(reticulate)
  2. py_install("pandas", "numpy")
  3. source_python("sugarMassesPredict-r.py")
  4. dp1 = as.integer(1)
  5. dp2 = as.integer(3)
  6. ESI_mode = 'pos'
  7. scan_range1 = as.integer(100)
  8. scan_range2 = as.integer(800)
  9. pent_option = as.integer(1)
  10. modifications = list('sulphate', 'deoxy')
  11. label = "procainamide"
  12. df <- predict_sugars(dp1 = dp1, dp2 = dp2, ESI_mode = ESI_mode, scan_range1 = scan_range1, scan_range2 = scan_range2, pent_option = pent_option, modifications = modifications, label = label)

dependencies

  • pandas
  • numpy
  • python 3

input parameters

required

  • dp (degree of polymerisation) range
  • whether pentose monomers should be used in addition to hexose
  • modifications - possible options are none, all, or any combination of:
    • sulphate
    • carboxyl
    • phosphate
    • deoxy
    • N-acetyl
    • O-acetyl
    • O-methyl
    • anhydrobridge
    • unsaturated
    • alditol
    • amino
    • dehydrated
  • maximum number of modifications per monomer on average
  • ionisation mode
  • scan range (m/z)

    optional

  • label - current options are procainamide (added by reductive amination) and benzoic acid (added on free alcohol groups, will calculate glycans with no label to the maximum number of labels possible)
  • output file path - defaults to “predicted_sugars.txt”
  • options to do with the calculation of the possible number of structural isomers (but this section needs to be fixed)

output

tab delimited text file with one row per molecule. m/z values outside the scan range as shown as “NA”, and molecules with no ions with m/z values within the scan are not returned. columns are as follows:

  • degree of polymerisation (dp)
  • name
  • monoisotopic mass (Da)
  • sum formula
  • columns with m/ values of possible ions given the input parameters.
    • positive mode:
      • [M+H]+
      • [M+Na]+
    • negative mode:
      • [M+Cl]-
      • [M+CHOO]-
      • [M+2Cl]2-
      • [M+2CHOO]2-
      • [M+Cl-H]2-
      • [M+CHOO-H]2-
      • [M+CHOO+Cl]2-
      • [M-nH]n-, where n is 1 to the maximum number anionic groups that any single molecule in the table has
  • if parameters related to isomers were specified in the input, there is an additional column for the number of possible isomers per molecule

how to run

the help menu accessed with:

  1. sugarMassesPredict.py -h

returns the following:

  1. usage: sugarMassesPredict.py [-h] -dp int int [-p int] -m str [str ...]
  2. [-n int] [-ds int] [-ld str [str ...]] [-oh int]
  3. [-b int] -i str [str ...] -s int int [-l label]
  4. [-o filepath]
  5. Script to predict possible masses of unknown sugars. Written by Margot Bligh.
  6. optional arguments:
  7. -h, --help show this help message and exit
  8. -dp int int, --dp_range int int
  9. DP range to predict within: two space separated
  10. numbers required (lower first)
  11. -p int, --pent_option int
  12. should pentose monomers be considered as well as
  13. hexose: 0 for no {default}, 1 for yes
  14. -m str [str ...], --modifications str [str ...]
  15. space separated list of modifications to consider.
  16. note that alditol and unsaturated are max once per
  17. saccharide. allowed values: none OR all OR any
  18. combination of carboxyl, phosphate, deoxy, nacetyl,
  19. omethyl, anhydrobridge, oacetyl, unsaturated, alditol,
  20. sulphate
  21. -n int, --nmod_max int
  22. max no. of modifications per monomer on average
  23. {default 1}. does not take into account unsaturated or
  24. alditol.
  25. -ds int, --double_sulphate int
  26. can monomers be double-sulphated: 0 for no {default},
  27. 1 for yes. for this you MUST give a value of at least
  28. 2 to -n/--nmod_max
  29. -ld str [str ...], --LorD_isomers str [str ...]
  30. isomers calculated for L and/or D enantiomers {default
  31. D only}. write space separated if both
  32. -oh int, --OH_stereo int
  33. stereochem of OH groups considered when calculating
  34. no. of isomers: 0 for no {default}, 1 for yes
  35. -b int, --bond_stereo int
  36. stereochem of glycosidic bonds and reducing end
  37. anomeric carbons considered when calculating no. of
  38. isomers: 0 for no {default}, 1 for yes
  39. -i str [str ...], --ESI_mode str [str ...]
  40. neg and/or pos mode for ionisation (space separated if
  41. both)
  42. -s int int, --scan_range int int
  43. mass spec scan range to predict within: two space
  44. separated numbers required (lower first)
  45. -l label, --label label
  46. name a label added to the oligosaccharide. if not
  47. labelled do not include. options: procainamide OR
  48. benzoic_acid.
  49. -o filepath, --output filepath
  50. filepath to .txt file for output table {default:
  51. predicted_sugars.txt}