12/17/2023 0 Comments Peg solitaire a star solutionThe file buffer.py contains a small Python Class implementing a buffer structure. The latter file contains the functions to build the different blocks of the network. The former contains a Python Class Net implementing a neural network in TensorFlow with a shared representation of the state which then splits into two heads: the policy head and the value head. The folder network contains two python files network.py and build.py. The actor-critic agent implements A3C and consists of a neural network implemented in the file network.py found in the folder network. The basic core class and its methods are described first, then the classes RandomAgent and ActorCriticAgent are implemented using the base methods from the parent class Agent. The file agent.py contains the implementation of different classes of agents. The file border_constraints.py contains a function to compute the actions which would yield a marble out of the borders of the board. It also contains a function (render) to visualize the environment. The first file contains the implementation of the soliatire environment as a Python Class Env and the basic functions (init, step, reset, etc) that will be used to interact with it. The folder env contains three files : env.py, rendering.py and border_constraints.py. This is why this game is difficult for a reinforcement learning algorithm, since it can easily learn to get high rewards by leaving only a few marbles, but it has to leave even less than 2 marbles to solve the game. It is fairly easy to leave between 2 and 5 marbles at the end of the game, but much more difficult to leave only 1. See the gif demo below to better understand the game : To remove a marble, another marble has to move to an empty space and pass over the marble to remove. The goal is to remove the marbles one by one until there is only one left. There are 33 positions in the cross-shaped board, and the initial position of the game contains all 32 marbles but one is missing in the center position of the cross. The game consists of 32 marbles (or pegs) set out in a cross shape. I used an adapted version of Asynchronous Advantage Actor Critic ( A3C) which I implemented from scratch myself to train an RL agent to solve the game of peg solitaire. Solving the game of peg solitaire with a Reinforcement Learning (RL) Algorithm.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |