The environment rolls 2 standard 6 sided dice , one red and one blue. The red dice shows the number of identical copies of your agent that will be created. Each agent will be shown one of the 2 dice. This is an independent coinflip for each agent. The agents are colourblind so have no idea which dice they saw. They just see a number. The agents must assign probabilities to each of the 36 outcomes, and are scored on the log of the probability they assigned to the correct outcome.
Write code for an agent that maximizes its total score across all copies.
Write code for an agent that maximizes its average score.
The environment rolls 2 standard 6 sided dice , one red and one blue. The red dice shows the number of identical copies of your agent that will be created. Each agent will be shown one of the 2 dice. This is an independent coinflip for each agent. The agents are colourblind so have no idea which dice they saw. They just see a number. The agents must assign probabilities to each of the 36 outcomes, and are scored on the log of the probability they assigned to the correct outcome.
Write code for an agent that maximizes its total score across all copies.
Write code for an agent that maximizes its average score.
Explain how and why these differ.