Thanks for the replies and sorry for the inaccuracies. I initially reported 4,331 blog posts and 890k words; the real results are that Robin wrote 3,302 blog posts (thanks DominikPeters for pointing this out, and for finding these better urls) and 1.5M words.
(4,331 blog posts corresponds to all authors on overcomingbias. 890k words doesn’t represent anything, because the posts were truncated when accessed from the monthly archive urls.)
# Get the real number of words from Robin
$ n_current_pages=331
$ echo https://www.overcomingbias.com/author/robin-hanson > /tmp/page_urls
$ for i in $(seq 2 $n_current_pages); do echo https://www.overcomingbias.com/author/robin-hanson/page/$i >> /tmp/page_urls; done
$ getwords() { curl $1 | pup ‘#content’ | html2text—ignore-links | wc -w; }
$ export -f getwords
$ parallel getwords < /tmp/page_urls > /tmp/words_by_page
$ awk ‘{sum += $1} END {print sum}’ /tmp/words_by_page
1481344
ORIGINAL
Scoping: 4331 blog posts and 890k words (for overcomingbias only).
# Number of blog posts
$ curl https://www.overcomingbias.com/archives | pup ‘#monthly-archives’ | rg ‘\(\d+\)’ | tr -d ′ ()′ | awk ‘{sum += $1} END {print sum}’
4331
# Rough number of words (bash)
$ curl https://www.overcomingbias.com/archives | pup ‘#monthly-archives a attr{href}’ > /tmp/urls_monthly_archives
$ getwords() { curl $1 | pup ‘#content’ | html2text—ignore-links | wc -w; }
$ export -f getwords
$ parallel getwords < /tmp/urls_monthly_archives > /tmp/words_per_month
$ awk ‘{sum += $1} END {print sum}’ /tmp/words_per_month
891666
EDIT
Thanks for the replies and sorry for the inaccuracies. I initially reported 4,331 blog posts and 890k words; the real results are that Robin wrote 3,302 blog posts (thanks DominikPeters for pointing this out, and for finding these better urls) and 1.5M words.
(4,331 blog posts corresponds to all authors on overcomingbias. 890k words doesn’t represent anything, because the posts were truncated when accessed from the monthly archive urls.)
ORIGINAL
Scoping: 4331 blog posts and 890k words (for overcomingbias only).
The first author archives page that throws a 404 is https://www.overcomingbias.com/author/robin-hanson/page/332, but https://www.overcomingbias.com/author/robin-hanson/page/331 exists. Each page contains 10 posts, except the last one (page 331) which contains two posts. So there are 3302 posts by Hanson.
Doesn’t that include posts by other people too? Like Eliezer, for example?
Yes. So I think you’d have to use that scoping as an approximation. Maybe like 90-95% is Robins.
I think much less, maybe 50-75%. Eliezer posted a lot in the early years.
And the word count should be even less as other people’s posts (like Eliezer’s) are usually much longer than Hanson’s.