Each of these functions takes ~30s to run, so it ends up being more efficient to put them in one job instead of multiple.
This is a perfect example of the AWS Batch API ‘leaking’ into your code. The whole point of a compute resource pool is that you don’t have to think about how many jobs you create.
It sounds like you’re using the wrong tool for the job (or a misconfiguration—e.g. limit the batch template to 1 vcpu).
The benefit of the pass-through approach is that it uses language-level features to do the validation
You get language-level validation either way. The assert statements are superfluous in that sense. What they do add is in effect check_dataset_params(), whose logic probably doesn’t belong in this file.
The failure you’re talking about here is tripping a try clause.
No, I meant a developer introducing a runtime bug.
This is a perfect example of the AWS Batch API ‘leaking’ into your code. The whole point of a compute resource pool is that you don’t have to think about how many jobs you create.
This is true. We’re using AWS Batch because it’s the best tool we could find for other jobs that actually do need hundreds/thousands of spot instances, and this particular job goes in the middle of those. If most of our jobs looked like this one, using Batch wouldn’t make sense.
You get language-level validation either way. The assert statements are superfluous in that sense. What they do add is in effect check_dataset_params(), whose logic probably doesn’t belong in this file.
You’re right. In the explicit example, it makes more sense to have that sort of logic at the call site.
This is a perfect example of the AWS Batch API ‘leaking’ into your code. The whole point of a compute resource pool is that you don’t have to think about how many jobs you create.
It sounds like you’re using the wrong tool for the job (or a misconfiguration—e.g. limit the batch template to 1 vcpu).
You get language-level validation either way. The
assert
statements are superfluous in that sense. What they do add is in effectcheck_dataset_params()
, whose logic probably doesn’t belong in this file.No, I meant a developer introducing a runtime bug.
This is true. We’re using AWS Batch because it’s the best tool we could find for other jobs that actually do need hundreds/thousands of spot instances, and this particular job goes in the middle of those. If most of our jobs looked like this one, using Batch wouldn’t make sense.
You’re right. In the explicit example, it makes more sense to have that sort of logic at the call site.