Markovian State and Action Abstractions for MDPs via Hierarchical MCTS within a POMDP Formulation
This is the code release of paper:
make
to build hplanning
./run.sh
to run a problem instance with default settings./debug.sh
to build and run using debug version./release.sh
to build and run using release versionAllowed options of hplanning
:
--help produce help message
--test run unit tests
--problem arg problem to run
--map arg map to use for (continus) rooms domain
--outputfile arg (=output.txt) summary output file
--size arg size of problem (problem specific)
--number arg number of elements in problem (problem
specific)
--timeout arg timeout (seconds)
--mindoubles arg minimum power of two simulations
--maxdoubles arg maximum power of two simulations
--runs arg number of runs
--accuracy arg accuracy level used to determine horizon
--horizon arg horizon to use when not discounting
--num steps arg number of steps to run when using average
reward
--verbose arg verbosity level
--usetransforms arg Use transforms
--useparticlefilter arg Use particle fileter
--transformdoubles arg Relative power of two for transforms compared
to simulations
--transformattempts arg Number of attempts for each transform
--treeknowledge arg Knowledge level in tree (0=Pure, 1=Legal,
2=Smart)
--rolloutknowledge arg Knowledge level in rollouts (0=Pure, 1=Legal,
2=Smart)
--smarttreecount arg Prior count for preferred actions during smart
tree search
--smarttreevalue arg Prior value for preferred actions during smart
tree search
--reusetree arg Reuse tree generated during previous search
--seeding arg Use pid as random seed
--thompsonsampling arg use Thompson Sampling instead of UCB1
--timeoutperaction arg timeout per action (seconds)
--polling arg use polling rollout for hplanning
--stack arg use call stack for hplanning
--localreward arg use local reward
--hplanning arg use hplanning when possible
--actionabstraction arg use hplanning w/ action abstraction when
possible
--memoryless arg find a memoryless policy in hplanning