niplav comments on shortplav

niplav Jan 10, 2024, 12:51 AM
2 points
I’ll take a look, but afaik AlphaZero only uses an NN for position evaluation in MCTS, and not for the search part itself?
- faul_sname Jan 10, 2024, 7:44 AM
  4 points
  Parent
  Looking at the AlphaZero paper
  Our new method uses a deep neural network fθ with parameters θ. This neural network takes as an input the raw board representation s of the position and its history, and outputs both move probabilities and a value, (p, v) = fθ(s). The vector of move probabilities p represents the probability of selecting each move a (including pass), pa = Pr(a| s). The value v is a scalar evaluation, estimating the probability of the current player winning from position s. This neural network combines the roles of both policy network and value network12 into a single architecture. The neural network consists of many residual blocks4 of convolutional layers16,17 with batch normalization18 and rectifier nonlinearities19 (see Methods).
  So if I’m interpreting that correctly, the NN is used for both position evaluation and also for the search part.
- ryan_greenblatt Jan 10, 2024, 5:06 AM
  4 points
  Parent
  The implicit claim is that the policy might be doing internal search?
- gwern Mar 5, 2024, 12:39 AM
  2 points
  0
  Parent
  Leaving aside the issue of to what extent the NN itself is already doing something approximately isomorphic to search or how easy it would be to swap in MuZero instead, I think that the important thing is to measure the benefit of search in particular problems (like Jones does by sweeping over search budgets vs training budgets for Hex etc) rather than how hard the exact algorithm of search itself is.
  
  I mean, MCTS is a simple generic algorithm; you can just treat learning it in a ‘neural’ way as a fixed cost—there’s not much in the way of scaling laws to measure about the MCTS implementation itself. MCTS is MCTS. You can plug in chess as easily as Go or Hex.
  
  It seems much more interesting to know about how expensive ‘real’ problems like Hex or Go are, how well NNs learn, how to trade off architectures or allocate compute between train & runtime...

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer