MrMind comments on Open thread, Jan. 25 - Jan. 31, 2016

MrMind Jan 28, 2016, 11:17 AM
−1 points
Those that follows are random spurts of ideas that emerged when thinking at AlphaGo. I make no claim of either validity, soundness or even sanity. But they are random interesting directions that are fun for me to investigate, and they might turn out interesting for you too:
- AlphaGo uses two deep neural networks to prune the enormous search tree of a Go position, and it does so unsupervised.
- Information geometry allows us to treat information theory as geometry.
- Neural networks allows us to partition high-dimensional data.
- Pruning a search tree is also strangely similar to dual intuitionistic logic.
- Deep neural networks can thus apply a sort of paraconsistent probabilistic deduction.
- Probabilistc self-reflection is possible.
- Deep neural networks can operate a sort of paraconsistent probabilistic self-reflection?
- Gunnar_Zarncke Jan 29, 2016, 10:18 PM
  2 points
  Parent
  The the Alpha Go Discussion Post.
- bogus Jan 29, 2016, 10:25 PM
  0 points
  Parent
  
  AlphaGo uses two deep neural networks to prune the enormous search tree of a Go position, and it does so unsupervised.
  
  lol no. The pruning (‘policy’) network is entirely the result of supervised learning from human games. The other network is used to evaluate game states.
  
  Your other ideas are more interesting, but they are not related to AlphaGo specifically, just deep neural networks.
  - MrMind Feb 1, 2016, 9:32 AM
    0 points
    Parent
    
    lol no. The pruning (‘policy’) network is entirely the result of supervised learning from human games.
    
    If I understood correctly, this is only the first stage in the training of the policy network. Then (quoting from Nature):
    
    The second stage of the training pipeline aims at improving the policy network by policy gradient reinforcement learning (RL). The RL policy network pρ is identical in structure to the SL policy network, and its weights ρ are initialised to the same values, ρ = σ. We play games between the current policy network pρ and a randomly selected previous iteration of the policy network.
    - bogus Feb 1, 2016, 8:04 PM
      0 points
      Parent
      
      The second stage of the training pipeline aims at improving the policy network by policy gradient reinforcement learning (RL).
      
      Except that they don’t seem to use the resulting network in actual play; the only use is for deriving their state-evaluation network.