Reduce code duplication across client programs

added component::archived/flashproxy owner::dcf priority::low resolution::fixed status::closed type::enhancement labels

I have a branch for this at https://github.com/aallai/flashproxy.git code_dup.

It factors out parse_addr_spec and format_addr into a .py file and adds a setup.py to install the client programs. I think it pretty much replaces the install make target.

Some odds and ends:

The setup.py is copied into the zip for the windows distribution, where it is useless.
The version is hard coded in the setup.py script. So it will need to be changed both in the makefile and the setup.py. Maybe the dist target could modify the setup script to include the correct version. I notice George uses something called versioneer to manage his version numbers, I'll look into that.

Trac:
Username: aallai
Status: new to needs_review

Replying to aallai:

I have a branch for this at https://github.com/aallai/flashproxy.git code_dup.

It factors out parse_addr_spec and format_addr into a .py file and adds a setup.py to install the client programs. I think it pretty much replaces the install make target.

Some odds and ends:

The setup.py is copied into the zip for the windows distribution, where it is useless.

The version is hard coded in the setup.py script. So it will need to be changed both in the makefile and the setup.py. Maybe the dist target could modify the setup script to include the correct version. I notice George uses something called versioneer to manage his version numbers, I'll look into that.

Thanks for doing this. I made some changes and started a branch at https://gitweb.torproject.org/user/dcf/flashproxy.git/shortlog/refs/heads/code_dup.

So far, I found problems that make the setuptools solution appear worse than code duplication, at least for now.

Version number in two places, which you noted.
Everything should be driven from Makefile. make install should call setup.py if that's what it takes to install. This is pretty easy to fix.
dist directory name conflicts with our existing use of that name. That is probably fixable.
setup.py doesn't install the man pages, and it seems setuptools doesn't even know about man pages. Man pages and other documentation need to appear in dist packages and with make install.
python setup.py dist puts a dumb PKG-INFO file in the dist.
If you do python setup.py build and then python setup.py dist, then remove a file, then do python setup.py dist again, the removed file will remain in the dist because setuptools doesn't clean the existing build first. This is infuriating and makes me question what ends up in the dist every time.

Packaging in Python seems really terrible. If we're only using setuptools to help install the package/module, it might be better to just roll that step ourselves in the makefile.

Trac:
Status: needs_review to new

https://github.com/infinity0/flashproxy/compare/src-tree...code-dedup

This is my attempt. Some key points:

common code in the client moved to a library, "flashproxy-common" (import flashproxy.common)
common code in the facilitator (fac.py) moved to a library, "flashproxy-fac" (import flashproxy.fac)
relevant unit tests moved away from client/facilitator into library tests.

flashproxy-fac stays with the main facilitator package and is installed together with it. But since flashproxy-common is used by the facilitator and client, it's currently in its own subdirectory. HOWEVER, "make dist" (at the top-level) still builds a self-contained "binary" package (as in, for users not developers) which includes the flashproxy-common code, which I guess is what you intended. "make dist-exe" should work similarly, but I haven't tested it yet.

Oh, and to "install" flashproxy-common onto your system you can run something like

$ sudo [OPTIONAL checkinstall] python setup.py install --single-version-externally-managed

After doing this, "make install" for both the client and facilitator should work.

I'll continue to work on this, so this process will be smoother.

Trac:
Status: new to needs_review

Can you explain the purpose and benefits of the versioneer library?

I wonder if it's possible to flatten the taxonomy a bit. I think I prefer

import flashproxy
flashproxy.parse_addr_spec()

over

from flashproxy.common.util import parse_addr_spec
parse_addr_spec()

Three levels just feels a bit too deep. I'd lean toward packages flashproxy and flashproxy.fac. What do you think?

I think it's fine to ship the facilitator library functions in the client tarball, so there's no need for a split between common and non-common functions.

Replying to infinity0:

Oh, and to "install" flashproxy-common onto your system you can run something like

$ sudo [OPTIONAL checkinstall] python setup.py install --single-version-externally-managed

After doing this, "make install" for both the client and facilitator should work.

I'll continue to work on this, so this process will be smoother.

make install was a sticking point the last time, too. Maybe we should just remove the install target? It can't be that very many people are using make install or even using the client packages. Or do you need make install for the Debian package?

Your idea of using setup.py only for installing the library was a good one. It avoids the bugs we found earlier in trying to use setup.py to install binaries and man pages, and to build tarballs. Can we just do the python setup.py install thing on make install automatically? I know it's possible to pass the bindir and destdir in this way.

Replying to dcf:

Can you explain the purpose and benefits of the versioneer library?

It automates version numbers from git tags, so you don't need to keep editing the version file. It generates versions like 1.3+git1fce958 (like git describe) for non-tag commits. I copied the idea over from obfsproxy.

I wonder if it's possible to flatten the taxonomy a bit. I think I prefer {{{ import flashproxy flashproxy.parse_addr_spec() }}} over {{{ from flashproxy.common.util import parse_addr_spec parse_addr_spec() }}} Three levels just feels a bit too deep. I'd lean toward packages flashproxy and flashproxy.fac. What do you think?

How about 2 levels so flashproxy.{keys,util,proc,fac} - I originally had this split, I only went to 3 levels to work around some python paths issue, but that won't be necessary if I merge the fac library into here as suggested below. (I could also merge proc.py/util.py together.)

I think it's fine to ship the facilitator library functions in the client tarball, so there's no need for a split between common and non-common functions.

OK, it will make the packaging smaller as well. The original split was because I thought you wanted to keep the client binary distribution as small as possible.

Replying to dcf:

make install was a sticking point the last time, too. Maybe we should just remove the install target? It can't be that very many people are using make install or even using the client packages. Or do you need make install for the Debian package?

Your idea of using setup.py only for installing the library was a good one. It avoids the bugs we found earlier in trying to use setup.py to install binaries and man pages, and to build tarballs. Can we just do the python setup.py install thing on make install automatically? I know it's possible to pass the bindir and destdir in this way.

Yes, that's fine, and here we are going into packaging territory so I've written some more background info FYI on #9668 (closed)

Replying to infinity0:

Replying to dcf:

Can you explain the purpose and benefits of the versioneer library?

It automates version numbers from git tags, so you don't need to keep editing the version file. It generates versions like 1.3+git1fce958 (like git describe) for non-tag commits. I copied the idea over from obfsproxy.

Okay. I wonder, could we deal with the versioneer issue in a separate patch/ticket? The factoring of functions into packages is very close to a slam dunk. The addition of a new largish third-party source file is less obviously a slam dunk. Also it's nice if a deduplication branch ends up with a net loss of lines of code.

I'm relatively open to the idea of using versioneer, but as I say, could we deal with it separately? It appears there may be a technical obstacle in communicating the version number to setup.py from Makefile, but maybe you have an idea for how to deal with that?

Replying to dcf:

make install was a sticking point the last time, too. Maybe we should just remove the install target? It can't be that very many people are using make install or even using the client packages. Or do you need make install for the Debian package?

Your idea of using setup.py only for installing the library was a good one. It avoids the bugs we found earlier in trying to use setup.py to install binaries and man pages, and to build tarballs. Can we just do the python setup.py install thing on make install automatically? I know it's possible to pass the bindir and destdir in this way.

Yes, that's fine, and here we are going into packaging territory so I've written some more background info FYI on #9668 (closed)

Sure, but understand I can't merge a patch that breaks make install, unless we also decide to jettison make install. Could you try having setup.py called automatically? My first concern is ease-of-use from the end-user's point of view. Ease for packagers is still important but secondary. That is, I would rather have make install take one command and packaging take two commands, than the other way around.

I'd like to retain the ability to run the client programs from within a source checkout, if possible. Could we put the flashproxy package directory next to the client programs, the way it ends up with make dist? Or is that weird from a Python packaging point of view?

Replying to dcf:

I'm relatively open to the idea of using versioneer, but as I say, could we deal with it separately? It appears there may be a technical obstacle in communicating the version number to setup.py from Makefile, but maybe you have an idea for how to deal with that?

OK, I'll split this off, and will look into that issue too.

Replying to dcf:

Sure, but understand I can't merge a patch that breaks make install, unless we also decide to jettison make install. Could you try having setup.py called automatically? My first concern is ease-of-use from the end-user's point of view. Ease for packagers is still important but secondary. That is, I would rather have make install take one command and packaging take two commands, than the other way around.

make install is actually not relevant for the currently-distributed packages, which are binary packages. For example, make dist does not currently package the Makefile, and you are not distributing it in the zipball on the main website, only in the git repo. I doubt anyone actually uses it, so I don't see the importance of not breaking it. For our current purposes, we can jettison make install.

However, make install is relevant and necessary for a source package. But, the current Makefile's structure is ambiguous, which is where I think your confusion comes from. If we make a source package for flashproxy-client (in a separate client-only Makefile), its make install should not install flashproxy-common, since the latter is a separate source package to be installed separately. (And this is necessary if we want to be able to install the facilitator without installing the client.)

Note that this does not affect the existing custom binary packages; you still have your easy-to-use run-from-source single package (make dist) that contains both -client and -common.

(I also want to create a Makefile.all whose make install installs all components, which would be useful for the Debian package, but that is again an independent concern that doesn't affect anything discussed so far.)

Replying to dcf:

I'd like to retain the ability to run the client programs from within a source checkout, if possible. Could we put the flashproxy package directory next to the client programs, the way it ends up with make dist? Or is that weird from a Python packaging point of view?

Yes, this now makes more sense that we are merging fac.py into it too. But note that in the pending proposed setup, it is still possible to run from checkout with e.g. PYTHONPATH=common ./flashproxy-client. It's not unusual for source repos to forgo the ability to run-from-checkout if the install process is smooth.

Replying to infinity0:

Replying to dcf:

I'd like to retain the ability to run the client programs from within a source checkout, if possible. Could we put the flashproxy package directory next to the client programs, the way it ends up with make dist? Or is that weird from a Python packaging point of view?

Yes, this now makes more sense that we are merging fac.py into it too. But note that in the pending proposed setup, it is still possible to run from checkout with e.g. PYTHONPATH=common ./flashproxy-client. It's not unusual for source repos to forgo the ability to run-from-checkout if the install process is smooth.

I just took another look and I don't think this is a good idea now. If we have flashproxy/ in the top-level, we also must have setup.py and MANIFEST.in in the top-level, plus anything else we might want to add later. This would cause some confusion on which files belonged to -client, vs -common.

Are you OK with running things like PYTHONPATH=common ./flashproxy-client? To make the process easier I can create a script init-dev-env that exports this, so you can just do . ./init-dev-env then run ./flashproxy-client as normal.

Replying to infinity0:

Replying to infinity0:

Replying to dcf:

I'd like to retain the ability to run the client programs from within a source checkout, if possible. Could we put the flashproxy package directory next to the client programs, the way it ends up with make dist? Or is that weird from a Python packaging point of view?

Yes, this now makes more sense that we are merging fac.py into it too. But note that in the pending proposed setup, it is still possible to run from checkout with e.g. PYTHONPATH=common ./flashproxy-client. It's not unusual for source repos to forgo the ability to run-from-checkout if the install process is smooth.

I just took another look and I don't think this is a good idea now. If we have flashproxy/ in the top-level, we also must have setup.py and MANIFEST.in in the top-level, plus anything else we might want to add later. This would cause some confusion on which files belonged to -client, vs -common.

It seems to me that someone having a source checkout won't be confused by -client versus -common because the question will never enter their mind. I don't think we have to divide the source code based on how the Debian packages are divided.

We have a setup.py in the top level already, only for py2exe purposes. I would be fine with adding library installation code to it, if that's convenient. I suppose there is some danger that people will see the setup.py and automatically assume they should run python setup.py install. (But that danger exists already.) Any other files we decide to add to support the distutils installation of the common libraries, we should strive to keep as few as possible. I'm thinking maybe one or two files over the next few years, not five or ten.

What I'm trying to say is that having MANIFEST.in and setup.py in the root is fine with me, if you decide that is better than alternatives.

Are you OK with running things like PYTHONPATH=common ./flashproxy-client? To make the process easier I can create a script init-dev-env that exports this, so you can just do . ./init-dev-env then run ./flashproxy-client as normal.

Having to set PYTHONPATH is annoying but probably tolerable. I would rather not have an init-dev-env script; let's just document that you have to set PYTHONPATH. (The name init-dev-env is mostly unknown to Google--is there some other name commonly used by other projects?)

obfsproxy does a trick to insert the directory containing the executable to sys.path. I would consider such an approach acceptable. (But sadly would that little code block have to be duplicated in every program, since it needs to run before the common library is available?)

The above comments apply to the client programs. Having to set PYTHONPATH only for running the facilitator programs is far less annoying, I'm completely fine with it.

Having code duplication sucks (thus this ticket), but the current setup has these nice properties:

You can download a single .py file and run it standalone.
You can run from a source checkout.
Installation is super easy on all platforms. Number 1 I don't care about at all. Number 2 is nice but I'd be willing to give it up if there are sufficient other benefits (elimination of code duplication being the most obvious benefit). Number 3, in my opinion, can change from "super easy" to "easy," where "super easy" is "copy some files to bindir" and "easy" is "copy some files to bindir and copy some other files to the platform-specific library directory."

I haven't decided on what I'll do yet, but regarding the manual-environment approach:

Replying to dcf:

Having to set PYTHONPATH is annoying but probably tolerable. I would rather not have an init-dev-env script; let's just document that you have to set PYTHONPATH. (The name init-dev-env is mostly unknown to Google--is there some other name commonly used by other projects?)

The abstract idea here would be {launch a shell with,export} an environment suitable for manually {running,debugging} {from source,built-but-not-installed components,tests}.

Or in simpler terms, do everything you need to do to run tests, only instead of building-and-running the tests, you run an interactive shell (or some other program) instead.

It sounds like something that's just on the wrong side of the boundary of "popular enough to be standardised". I did find a few examples though:

"init-dev-env": https://github.com/karma-runner/karma/tree/master/scripts
"devshell": http://www.openembedded.org/wiki/Devshell

I'll look into the topic a bit more. Documenting is OK for now (that is if I choose this path), and if we ever need to do more than just "set the environment" we can consider turning it into a script then.

Replying to dcf:

We have a setup.py in the top level already, only for py2exe purposes. I would be fine with adding library installation code to it, if that's convenient.

We would need to have two separate setup.py scripts (e.g setup.py and setup-common.py), there is no easy way to have the same script act on two distinct source packages. However, I think I have found the appropriate solution, this one below:

Replying to dcf:

Having to set PYTHONPATH is annoying but probably tolerable. I would rather not have an init-dev-env script; let's just document that you have to set PYTHONPATH. (The name init-dev-env is mostly unknown to Google--is there some other name commonly used by other projects?)

setuptools supports installing a package in development mode which hardlinks a dummy system lib to the current directory. This process is reversible.

flashproxy$ ./flashproxy-client
Traceback (most recent call last):
  File "./flashproxy-client", line 23, in <module>
    from flashproxy.util import parse_addr_spec, format_addr
ImportError: No module named flashproxy.util
1

flashproxy$ ( cd common/ && sudo python setup.py develop )
running develop
[..]
Creating /usr/local/lib/python2.7/dist-packages/flashproxy-common.egg-link (link to .)
[..]

flashproxy$ ./flashproxy-client
2013-09-26 17:18:53 VERSION-ERROR no-version
VERSION-ERROR no-version

No TOR_PT_MANAGED_TRANSPORT_VER found in environment.
If you are running flashproxy-client from the command line and not from
a ClientTransportPlugin configuration line, you must use the --external
option.
1

flashproxy$ ( cd common/ && sudo python setup.py develop --uninstall )
running develop
[..]
Removing /usr/local/lib/python2.7/dist-packages/flashproxy-common.egg-link (link to .)
[..]

flashproxy$ ./flashproxy-client
Traceback (most recent call last):
  File "./flashproxy-client", line 23, in <module>
    from flashproxy.util import parse_addr_spec, format_addr
ImportError: No module named flashproxy.util
1

This is even better than setting your environment, because it's persistent across shell sessions. The only annoyance is you do have to cd common first, but I can document this.

To explain a bit more: I'd like to avoid having two setup-*.py scripts in the same dir. Theoretically, it could work, if we specify different build directories per script, but this is very fiddly to set up and I'd like to avoid this headache. Also, after doing some tests, the install target always builds in build/ and you can't override this with --build-base nor --bdist-base as you can for the build and bdist targets.

Practically, we won't actually run into this situation since setup.py only runs py2exe. So if you insist we could go down this path, but I would then insist to rename setup.py to setup-client-exe.py, to enforce the fact that these two scripts have distinct functionalities that have no chance of stepping on each other's toes.

OTOH, one of the other advantages of the common/ and setup.py develop approach is that once you run it, it also works for the facilitator without having to do anything extra. This is my recommended approach and I intend to start work on it if I don't hear back soon.

I have some code. Haven't tested py2exe yet but fixing it (on any broken branches) would be a small tweak:

flashproxy/ in top-level, two "setup.py" scripts in same dir: https://github.com/infinity0/flashproxy/compare/master...common-toplevel

common/flashproxy with instructions on running from source checkout: https://github.com/infinity0/flashproxy/compare/master...common-sub

also a separate commit for versioneer: https://github.com/infinity0/flashproxy/compare/versioneer^...versioneer

Some of the above share commits, see diagram here: https://github.com/infinity0/flashproxy/network

Replying to infinity0:

I have some code. Haven't tested py2exe yet but fixing it (on any broken branches) would be a small tweak:

flashproxy/ in top-level, two "setup.py" scripts in same dir: https://github.com/infinity0/flashproxy/compare/master...common-toplevel

I like this one the best. obfsproxy also has both setup.py and setup_py2exe.py--we could use the same names.

common/flashproxy with instructions on running from source checkout: https://github.com/infinity0/flashproxy/compare/master...common-sub

This one is acceptable too. I'll let you make the decision.

Aside: I think I will stick with setting PYTHONPATH. At a first (and second) glance, this setuptools develop mode seems unpredictable. You're supposed to do something as root just so you can run code in a directory you own? If you have two separate source directories, one of them will mysteriously fail until you realize it's using modules hardlinked from the other?

This is ready to merge.

https://github.com/infinity0/flashproxy/compare/master...common-toplevel

I've settled on doing it the top-level way and added a big descriptive docstring to the top of setup-common.py explaining the potential future issues with this.

I've also refactored Makefile a bit - Makefile retains the original dist/dist-exe targets (with exactly the same contents as before - I verified this), and Makefile.client deals with the client specifics. Although a bit more code, it is less entangled than before. In particular, Makefile.client adheres to the GNU conventions and "looks like" a normal source-package Makefile; and Makefile is a more coherent "composition" of the other per-component build scripts. (For example, now it is structurally possible to build dist and dist-exe in a single command line - that is if py2exe worked natively on Linux.)

edit: just added another minor tweak so now you can build both dist and dist-exe at once, using `make PYTHON_W32="wine python" dist dist-exe". Yes it's crazy but should work in principle.

Replying to infinity0:

This is ready to merge.

https://github.com/infinity0/flashproxy/compare/master...common-toplevel

I've settled on doing it the top-level way and added a big descriptive docstring to the top of setup-common.py explaining the potential future issues with this.

Looks fine. Let's merge it once #7167 (moved) and #10006 (moved) are done, when we have some comfortable time to make any other changes in e.g. the bundle build system. In fact, it might be a good time to cut straight to #9444 (closed).

I just tagged flashproxy-1.4. Feel free to merge this.

If you would, check if our bundle instructions still work, and commit any small fixes if necessary. (I would guess that there needs to be at least another new command to copy the new flashproxy Python package.)

Merged to git.tpo, thanks!

I realised that now the version is defined in 4 separate places including the ChangeLog, so wrote a commit to get this automatically from the ChangeLog - it's a bit hacky but it's very short and works even for your in-between-versions changelogs where the "Changes in version x" aren't on the first line. If you like it, I'll push that to git.tpo too.

I had to change update pt-bundle too, and have pushed those updates to a separate branch - if you like that, I'll merge/fast-forward that into master as well.

Replying to infinity0:

Merged to git.tpo, thanks!

I realised that now the version is defined in 4 separate places including the ChangeLog, so wrote a commit to get this automatically from the ChangeLog - it's a bit hacky but it's very short and works even for your in-between-versions changelogs where the "Changes in version x" aren't on the first line. If you like it, I'll push that to git.tpo too.

I don't think we should grep the version from ChangeLog. Getting it from a git tag like versioneer does sounds better. Or, manually updating in three places also sounds acceptable. (I don't think of ChangeLog as being something that has to be kept in sync, because nothing breaks if it's not.)

Maybe you could create a VERSION file with the contents

VERSION = "1.4"

which happens to parse as both Make and Python, and then include/import it from Makefile and the setup.py scripts. If the quotes cause a problem in Makefile, maybe follow the include with a

VERSION := $(subst ",,$(VERSION))

I had to change update pt-bundle too, and have pushed those updates to a separate branch - if you like that, I'll merge/fast-forward that into master as well.

That looks good, go ahead and apply it to master.

Replying to dcf:

I don't think we should grep the version from ChangeLog. [..] (I don't think of ChangeLog as being something that has to be kept in sync, because nothing breaks if it's not.)

There's some precedence for this - Debian packaging only defines the version from debian/changelog and nowhere else, and debian build tools read this as the canonical place to get the version string. (It's doesn't matter if you don't sync it, as it will just use the old version automatically, nothing "breaks" as such, and it's a good habit to update the ChangeLog when appropriate.)

I could reduce the code duplication of the sed script by, instead of having a VERSION file as you suggest, turn it into a version.sh that contains the sed script and call it from the Makefile/setup.py. How's that? If not, no worries, I'll just do the VERSION file.

Committed and pushed as discussed on IRC. Closing!

Trac:
Status: needs_review to closed
Resolution: N/A to fixed

closed

mentioned in issue #8092 (closed)

mentioned in issue #9349 (closed)

mentioned in issue #9668 (closed)

mentioned in issue #9739 (closed)

mentioned in issue #9975 (closed)

Reduce code duplication across client programs

Child items ...

Activity