Edit: I didn’t spot gwern’s more careful analysis. I am still digesting it. gwern, you should use the above link, it contains the below-10 quotes, too.
The extra data doesn’t seem to make much difference:
R> karma <- read.table("http://people.mokk.bme.hu/~daniel/rationality_quotes_2012/scores″)
R> karma ← sort(karma$V2)
R> summary(karma)
Min. 1st Qu. Median Mean 3rd Qu. Max.
−8.0 4.0 8.0 10.7 15.0 105.0
…
Nonlinear regression model
model: y ~ exp(a + b * x)
data: temp
a b
−0.01088 0.00134
residual sum-of-squares: 22772
Number of iterations to convergence: 7
Achieved convergence tolerance: 3.59e-06
It is roughly exponential in the range between 3 and 60 karma.
Eyeballing it, looks like the previous fit crosses around 40.
R> karma <- karma[karma<40]
...
Nonlinear regression model
model: y ~ exp(a + b * x)
data: temp
a b
-0.01088 0.00134
residual sum-of-squares: 22772
Number of iterations to convergence: 7
Achieved convergence tolerance: 3.59e-06
When I stated that the middle is roughly exponential, this was the graph that I was looking at:
d ← density(karma)
plot(log(d$y) ~ d$x)
I don’t do this for a living, so I am not sure at all, but if I really really had to make this formal, I would probably use maximum likelihood to fit an exponential distribution on the relevant interval, and then Kolmogorov-Smirnoff. It’s what shminux said, except there is probably no closed formula because the cutoffs complicate the thing. And at least one of the cutoffs is really necessary, because below 3 it is obviously not exponential.
It is roughly exponential in the range between 3 and 60 karma.
You can find the raw data here.
Edit: I didn’t spot gwern’s more careful analysis. I am still digesting it. gwern, you should use the above link, it contains the below-10 quotes, too.
The extra data doesn’t seem to make much difference:
Eyeballing it, looks like the previous fit crosses around 40.
The fit looks much better:
I am afraid I don’t understand your methodology. How is a rank versus value function supposed to look like for an exponentially distributed sample?
How else would you do it?
When I stated that the middle is roughly exponential, this was the graph that I was looking at:
d ← density(karma)
plot(log(d$y) ~ d$x)
I don’t do this for a living, so I am not sure at all, but if I really really had to make this formal, I would probably use maximum likelihood to fit an exponential distribution on the relevant interval, and then Kolmogorov-Smirnoff. It’s what shminux said, except there is probably no closed formula because the cutoffs complicate the thing. And at least one of the cutoffs is really necessary, because below 3 it is obviously not exponential.
I expected something like this or the section thereafter.