simon comments on D&D.Sci Coliseum: Arena of Data Evaluation and Ruleset

simon 29 Oct 2024 2:35 UTC
4 points
0
One learning experience for me here was trying out LLM-empowered programming after the initial spreadsheet-based solution finding. Claude enables quickly writing (from my perspective as a non-programmer, at least) even a relatively non-trivial program. And you can often ask it to write a program that solves a problem without specifying the algorithm and it will actually give something useful...but if you’re not asking for something conventional it might be full of bugs—not just in the writing up but also in the algorithm chosen. I don’t object, per se, to doing things that are sketchy mathematically—I do that myself all the time—but when I’m doing it myself I usually have a fairly good sense of how sketchy what I’m doing is*, whereas if you ask Claude to do something it doesn’t know how to do in a rigorous way, it seems it will write something sketchy and present it as the solution just the same as if it actually had a rigorous way of doing it. So you have to check. I will probably be doing more of this LLM-based programming in the future, but am thinking of how I can maybe get Claude to check its own work. Some automated way to pipe the output to another (or the same) LLM and ask “how sketchy is this and what are the most likely problems?”. Maybe manually looking through to see what it’s doing, or at least getting the LLM to explain how the code works, is unavoidable for now.
* when I have a clue what I’m doing which is not the case, e.g. in machine learning.
- SarahSrinivasan 29 Oct 2024 15:05 UTC
  4 points
  0
  Parent
  I found myself having done some data exploration but without time to focus and go much deeper. But also with a conviction that bouts were determined in a fairly simple way without persistent hidden variables (see Appendix A). I’ve done work with genetic programming but it’s been many years, so I tried getting ChatGPT-4o w/ canvas to set me up a good structure with crossover and such and fill out the various operation nodes, etc. This was fairly ineffective; perhaps I could have better described the sort of operation trees I wanted, but I’ve done plenty of LLM generation / tweak / iterate work, and it felt like I would need a good bit of time to get something actually useful.
  That said, I believe any halfway decently regularized genetic programming setup would have found either the correct ruleset or close enough that manual inspection would yield the right guess. The setup I had begun contained exactly one source of randomness: an operation “roll a d6”. :D
  Appendix A: an excerpt from my LLM instructions
  I believe the hidden generation is a simple fairly intuitive simulation. For example (this isn’t right, just illustrative) maybe first we check for range (affected by class), see if speed (affected by race and boots) changes who goes first, see if strength matters at all for whether you hit (race and gauntlets), determine who gets the first hit (if everything else is tied then ⁵⁰⁄₅₀ chance), and first hit wins. Maybe some simple dice rolls are involved.
- aphyer 29 Oct 2024 14:48 UTC
  4 points
  0
  Parent
  Yeah, my recent experience with trying out LLMs has not filled me with confidence.
  In my case the correct solution to my problem (how to use kerberos credentials to authenticate a database connection using a certain library) was literally ‘do nothing, the library will find a correctly-initialized krb file on its own as long as you don’t tell it to use a different authentication approach’. Sadly, AI advice kept inventing ways for me to pass in the path of the krb file, none of which worked.
  I’m hopeful that they’ll get better going forward, but right now they are a substantial drawback rather than a useful tool.