reallyeli’s Shortform

reallyeliMar 4, 2025, 12:47 AM

3 points

4 comments LW link

reallyeli Apr 10, 2025, 3:37 AM
12 points
0

If AI turns out to be very useful for cheaply writing formally verified code, what does that do for AI control? We can now request that the untrusted AI produce along with any code it writes a spec and certificate verifying that the code matches the spec.
How bad of a position does this put the untrusted model in, when it’s trying to write code that does bad stuff? Some sub-questions towards answering that question:
- How tight are the constraints that the formal verification process, at its AI-driven heights, can put on the code? Seems like once something is formally verified at all this rules out large classes of vulnerabilities an AI might exploit. Can we push this further to remove almost all attack vectors?
- How easy is it to write a spec that looks good to humans, but actually contains a significant omission?
- How feasible is it for the spec writer to collude with the code writer?
reallyeli Mar 4, 2025, 12:47 AM
10 points
9

A good ask for frontier AI companies, for avoiding massive concentration of power, might be:
- “don’t have critical functions controllable by the CEO alone or any one person alone, and check that this is still the case / check for backdoors periodically”
since this seems both important and likely to be popular.
reallyeli Jun 9, 2025, 9:35 PM
3 points
0

A Berkeley professor speculates that LLMs are doing something more like “copying the human mind” than “learning from the world.” This seems like it would imply some things we already see (e.g. fuzzily, they’re “not very creative”), and it seems like it would imply nontrivial things for what we should expect out of LLMs in the future, though I’m finding it hard to concretize this.
That is, if LLMs are trained with a simple algorithm and acquire functionality that resembles that of the mind, then their underlying algorithm should also resemble the algorithm by which the mind acquires its functionality. However, there is one very different alternative explanation: instead of acquiring its capabilities by observing the world in the same way as humans, LLMs might acquire their capabilities by observing the human mind and copying its function. Instead of implementing a learning process that can learn how the world works, they implement an incredibly indirect process for scanning human brains to construct a crude copy of human cognitive processes.
- cubefox Jun 12, 2025, 11:45 PM
  2 points
  0
  Parent
  
  This is also discussed here. LLMs seem to do a form of imitation learning (evidence: the plateau in base LLM capabilities roughly at human level), while humans, and animals generally, predict reality more directly. The latter is not a form of imitation and therefore not bounded by the abilities of some imitated system.

  

Since:

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer