SatvikBeri

Karma: 1,334

SatvikBeri Aug 16, 2024, 3:58 PM
42 points
10
on: SatvikBeri’s Shortform
It’s really useful to ask the simple question “what tests could have caught the most costly bugs we’ve had?”
At one job, our code had a lot of math, and the worst bugs were when our data pipelines ran without crashing but gave the wrong numbers, sometimes due to weird stuff like “a bug in our vendor’s code caused them to send us numbers denominated in pounds instead of dollars”. This is pretty hard to catch with unit tests, but we ended up applying a layer of statistical checks that ran every hour or so and raised an alert if something was anomalous, and those alerts probably saved us more money than all other tests combined.

SatvikBeri’s Shortform

SatvikBeriAug 16, 2024, 3:58 PM

6 points

1 comment LW link

Fermi Estimating How Long an Algorithm Takes

SatvikBeriAug 10, 2024, 1:34 AM

1 point

0 comments2 min readLW link

SatvikBeri Aug 3, 2024, 4:25 PM
4 points
0
on: Optimizing Repeated Correlations
There was a serious bug in this post that invalidated the results, so I took it down for a while. The bug has now been fixed and the posted results should be correct.

Optimizing Repeated Correlations

SatvikBeriAug 1, 2024, 5:33 PM

26 points

1 comment1 min readLW link

Julia Tasks 101

SatvikBeriMay 27, 2024, 11:32 AM

1 point

0 comments4 min readLW link

SatvikBeri Nov 28, 2022, 5:42 PM
5 points
1
in reply to: tailcalled’s comment on: Always know where your abstractions break
One sort-of counterexample would be The Unreasonable Effectiveness of Mathematics in the Natural Sciences, where a lot of Math has been surprisingly accurate even when the assumptions where violated.

SatvikBeri May 25, 2022, 11:12 PM
3 points
on: Request for small textbook recommendations
The Mathematical Theory of Communication by Shannon and Weaver. It’s an extended version of Shannon’s original paper that established Information Theory, with some extra explanations and background. 144 pages.

SatvikBeri May 25, 2022, 11:10 PM
2 points
on: Request for small textbook recommendations
Atiyah & McDonald’s Introduction to Commutative Algebra fits. It’s 125 pages long, and it’s possible to do all the exercises in 2-3 weeks – I did them over winter break in preparation for a course.
Lang’s Algebra and Eisenbud’s Commutative Algebra are both supersets of Atiyah & McDonald, I’ve studied each of those as well and thought A&M was significantly better.

SatvikBeri May 7, 2021, 11:01 PM
4 points
in reply to: Adam Zerner’s comment on: Taking the outside view on code quality
Unfortunately, I think it isn’t very compatible with the way management works at most companies. Normally there’s pressure to get your tickets done quickly, which leaves less time for “refactor as you go”.
I’ve heard this a lot, but I’ve worked at 8 companies so far, and none of them have had this kind of time pressure. Is there a specific industry or location where this is more common?

SatvikBeri May 6, 2021, 6:34 PM
2 points
on: Why are the websites of major companies so bad at core functionality?
A big piece is that companies are extremely siloed by default. It’s pretty easy for a team to improve things in their silo, it’s significantly harder to improve something that requires two teams, it’s nearly impossible to reach beyond that.
Uber is particularly siloed, they have a huge number of microservices with small teams, at least according to their engineering talks on youtube. Address validation is probably a separate service from anything related to maps, which in turn is separate from contacts.
Because of silos, companies have to make an extraordinary effort to actually end up with good UX. Apple was an example of these, where it was literally driven by the founder & CEO of the company. Tumblr was known for this as well. But from what I heard, Travis was more of a logistics person than a UX person, etc.
(I don’t think silos explain the bank validation issue)

SatvikBeri May 5, 2021, 4:23 PM
9 points
on: [link] If something seems unusually hard for you, see if you’re missing a minor insight
Cooking:
- Smelling ingredients & food is a good way to develop intuition about how things will taste when combined
- Salt early is generally much better than salt late
Data Science:
- Interactive environments like Jupyter notebooks are a huge productivity win, even with their disadvantages
- Automatic code reloading makes Jupyter much more productive (e.g. autoreload for Python, or Revise for Julia)
- Bootstrapping gives you fast, accurate statistics in a lot of areas without needing to be too precise about theory
Programming:
- Do everything in a virtual environment or the equivalent for your language. Even if you use literally one environment on your machine, the tooling around these tends to be much better
- Have some form of reasonably accurate, reasonably fast feedback loop(s). Types, tests, whatever – the best choice depends a lot on the problem domain. But the worst default is no feedback loop
Ping-pong:
- People adapt to your style very rapidly, even within a single game. Learn 2-3 complementary styles and switch them up when somebody gets used to one
Friendship:
- Set up easy, default ways to interact with your friends, such as getting weekly coffees, making it easy for them to visit, hosting board game nights etc.
- Take notes on what your friends like
- When your friends have persistent problems, take notes on what they’ve tried. When you hear something they haven’t tried, recommend it. This is both practical and the fact that you’ve customized it is generally appreciated
Conversations:
- Realize that small amounts of awkwardness, silence etc. are generally not a problem. I was implicitly following a strategy that tried to absolutely minimize awkwardness for a long time, which was a bad idea

SatvikBeri May 5, 2021, 3:36 PM
2 points
in reply to: Rudi C’s comment on: [link] If something seems unusually hard for you, see if you’re missing a minor insight
- using vector syntax is much faster than loops in Python
To generalize this slightly, using Python to call C/C++ is generally much faster than pure Python. For example, built-in operations in Pandas tend to be pretty fast, while using .apply() is usually pretty slow.

SatvikBeri Apr 30, 2021, 4:51 PM
4 points
in reply to: philh’s comment on: Two Designs
I didn’t know about that, thanks!

SatvikBeri Apr 26, 2021, 4:35 PM
5 points
on: Spoiler-Free Reviews: Monster Slayers, Dream Quest and Loop Hero
I found Loop Hero much better with higher speed, which you can fix by modifying a variables.ini file: https://www.pcinvasion.com/loop-hero-speed-mod/

SatvikBeri Apr 25, 2021, 11:16 PM
10 points
in reply to: ChristianKl’s comment on: Is there a good software solution for mathematical questions?
I’ve used Optim.jl for similar problems with good results, here’s an example: https://julianlsolvers.github.io/Optim.jl/stable/#user/minimization/

SatvikBeri Apr 23, 2021, 3:39 AM
2 points
in reply to: Zac Hatfield-Dodds’s comment on: Two Designs
The general lesson is that “magic” interfaces which try to ‘do what I mean’ are nice to work with at the top-level, but it’s a lot easier to reason about composing primitives if they’re all super-strict.
100% agree. In general I usually aim to have a thin boundary layer that does validation and converts everything to nice types/data structures, and then a much stricter core of inner functionality. Part of the reason I chose to write about this example is because it’s very different from what I normally do.
Important caveat for the pass-through approach: if any of your build_dataset() functions accept **kwargs, you have to be very careful about how they’re handled to preserve the property that “calling a function with unused arguments is an error”. It was a lot of work to clean this up in Matplotlib...
To make the pass-through approach work, the build_dataset functions do accept excess parameters and throw them away. That’s definitely a cost. The easiest way to handle it is to have the build_dataset functions themselves just pass the actually needed arguments to a stricter, core function, e.g.:
```
def build_dataset(a, b, **kwargs):
    build_dataset_strict(a, b)
    
    
build_dataset(**parameters) # Succeeds as long as keys named "a" and "b" are in parameters
```

SatvikBeri Apr 22, 2021, 8:59 PM
2 points
in reply to: Zolmeister’s comment on: Two Designs
This is a perfect example of the AWS Batch API ‘leaking’ into your code. The whole point of a compute resource pool is that you don’t have to think about how many jobs you create.

This is true. We’re using AWS Batch because it’s the best tool we could find for other jobs that actually do need hundreds/thousands of spot instances, and this particular job goes in the middle of those. If most of our jobs looked like this one, using Batch wouldn’t make sense.
You get language-level validation either way. The assert statements are superfluous in that sense. What they do add is in effect check_dataset_params(), whose logic probably doesn’t belong in this file.
You’re right. In the explicit example, it makes more sense to have that sort of logic at the call site.

SatvikBeri Apr 22, 2021, 7:42 PM
2 points
in reply to: Zolmeister’s comment on: Two Designs
The reason to be explicit is to be able to handle control flow.
The datasets aren’t dependent on each other, though some of them use the same input parameters.
If your jobs are independent, then they should be scheduled as such. This allows jobs to run in parallel.
Sure, there’s some benefit to breaking down jobs even further. There’s also overhead to spinning up workers. Each of these functions takes ~30s to run, so it ends up being more efficient to put them in one job instead of multiple.
Your errors would come out just as fast if you ran check_dataset_params() up front.
So then you have to maintain check_dataset_params, which gives you a level of indirection. I don’t think this is likely to be much less error-prone.
The benefit of the pass-through approach is that it uses language-level features to do the validation – you simply check whether the parameters dict has keywords for each argument the function is expecting.
A good way to increase feedback rate is to write better tests.
I agree in general, but I don’t think there are particularly good ways to test this without introducing indirection.
Failure in production should be the exception, not the norm.
The failure you’re talking about here is tripping a try clause. I agree that exceptions aren’t the best control flow – I would prefer if the pattern I’m talking about could be implemented with if statements – but it’s not really a major failure, and (unfortunately) a pretty common pattern in Python.

SatvikBeri 22 Apr 2021 18:02 UTC
6 points
in reply to: maximkazhenkov’s comment on: Thiel on secrets and indefiniteness
“refine definite theories”
Where does this quote come from – is it in the book?

SatvikBeri

SatvikBeri’s Shortform

Fermi Es­ti­mat­ing How Long an Al­gorithm Takes

Op­ti­miz­ing Re­peated Correlations

Ju­lia Tasks 101

Fermi Estimating How Long an Algorithm Takes

Optimizing Repeated Correlations

Julia Tasks 101