huggingbutt/roller_ball

RollerBall

Objective:

The blue ball aims to catch the red block.

Agent:

The agent is a blue ball.

It accepts a 2-dimensional np.array of type np.float32 as input. The first parameter ranges from -1 to 1 and represents the force along the x-axis. The second parameter ranges from -1 to 1 and represents the force along the z-axis.

Observation:

The observation space consists of 13 dimensions:

target_position_x: x-axis position of the target block

target_position_y: y-axis position of the target block

target_position_z: z-axis position of the target block

player_position_x: x-axis position of the blue ball

player_position_y: y-axis position of the blue ball

player_position_z: z-axis position of the blue ball

player_velocity_x: x-axis velocity of the blue ball

player_velocity_y: y-axis velocity of the blue ball

player_velocity_z: z-axis velocity of the blue ball

player_angular_velocity_x: x-axis angular velocity of the blue ball

player_angular_velocity_y: y-axis angular velocity of the blue ball

player_angular_velocity_z: z-axis angular velocity of the blue ball

distance: Distance between the target block and the blue ball, calculated from their respective positions to determine game termination.

Customizable Functions:

transform_fun: Processes the raw observation returned by the environment for further handling by reward_fun and control_fun.

reward_fun: Determines the reward for the current step based on original observation.

control_fun: Determines the environment's state based on based on original observation.

Default Function Implementations:

def transform_fun(obs):
  return np.array([
    # Write your code here.
    obs.player_position_x,
    obs.player_position_y,
    obs.player_position_z,
    obs.player_velocity_x,
    obs.player_velocity_y,
    obs.player_velocity_z,
    obs.player_angular_velocity_x,
    obs.player_angular_velocity_y,
    obs.player_angular_velocity_z,
    obs.target_position_x,
    obs.target_position_y,
    obs.target_position_z
    # End of your code.
  ], dtype=np.float32)

def reward_fun(obs):
  # Your must return a floating-point number as the reward of this step.
  # Write your reward function here.
  if obs.distance <= 1.42:
    return 1.0
  else:
    return 0.0
  # End of your code.
  # Do not alter the following line.
  return 0.0

def control_fun(obs):
  response = {}
  # Write your code here.
  if obs.distance <= 1.42:
    response['terminated'] = 'true'
  if obs.player_position_y <= 0:
    response['terminated'] = 'true'
    response['re-entry'] = 'true'
  # End of your code.
  return response