A few pretty good ones are in development right now. For me the most promising is the CodeScore (
https://arxiv.org/abs/2301.09043)
Also there are datasets of interview and olimpiad tasks that are deeper than 164 python tasks, but they still not quite related to the general purpose coding tasks