There's still some work to do here, but I should have a PR ready for review soon. I found a lot of interesting bugs and inconsistencies along the way, see the child tickets.
Hm. This looks like a complete rewrite to me. That's okay, but I'm going to have to make a few general comments before I move in to a line-by-line review. I hope that's okay.
The changes made by this file look good, and the improved documentation and code structure is nice.
I think this is the kind of rewrite where we want to have tests now. Do you think tests are in order here?
Do you think that we should standardize on one of PRIVATE, INTERNAL, or EXPOSE? Alternatively, do you think we should document the difference between them?
Can/should we reuse python's logging framework rather than rolling our own set of warning/error reporting functions?
When using "global", please make sure that you're actually modifying the variable. It isn't necessary to say "global" when you're only reading the variable.
This branch uses several different new maps, with new semantics. Would it make sense to turn them into one or more classes? If not, we should at least have an overview listing all of them and what they're for.
Does is really make sense to have "current_file" be a global? Is there some more OO approach that would make the code cleaner? (Obviously we shouldn't do that if it makes the code uglier.)
Consider running this script through a python style checker, if you haven't done so already; it usually catches a few things when I remember to do that.
I don't think that we should remove any normalizations that this script does, but before we add any more normalizations, we should be sure that we aren't replicating work that our chosen code styling tool can already do for us.
Once we've decided what to do with the above, I can start on a line-by-line review.
Hm. This looks like a complete rewrite to me. That's okay, but I'm going to have to make a few general comments before I move in to a line-by-line review. I hope that's okay.
The changes made by this file look good, and the improved documentation and code structure is nice.
I think this is the kind of rewrite where we want to have tests now. Do you think tests are in order here?
Yes, and the script can be re-targeted at a test directory using its command-line options.
Do you think that we should standardize on one of PRIVATE, INTERNAL, or EXPOSE? Alternatively, do you think we should document the difference between them?
Yes, probably PRIVATE. I can make this change, and then modify the script.
Can/should we reuse python's logging framework rather than rolling our own set of warning/error reporting functions?
Probably, if we can work out how to get the current file context into it.
When using "global", please make sure that you're actually modifying the variable. It isn't necessary to say "global" when you're only reading the variable.
I think this is tied up with 4) and 7).
This branch uses several different new maps, with new semantics. Would it make sense to turn them into one or more classes? If not, we should at least have an overview listing all of them and what they're for.
Probably one or two classes, implemented with an internal map.
Does is really make sense to have "current_file" be a global? Is there some more OO approach that would make the code cleaner? (Obviously we shouldn't do that if it makes the code uglier.)
I'll see if python's logging has some kind of context argument.
Consider running this script through a python style checker, if you haven't done so already; it usually catches a few things when I remember to do that.
Do we have a recommended python style checker? Should we standardise on one, and start moving our scripts to it?
I don't think that we should remove any normalizations that this script does, but before we add any more normalizations, we should be sure that we aren't replicating work that our chosen code styling tool can already do for us.
I don't expect to add any more normalizations, my next step is #32655 (moved), which a styling tool definitely can't do.
Once we've decided what to do with the above, I can start on a line-by-line review.