Friday, April 8, 2011

Python - How to check if a file is used by another application?

I want to open a file which is periodically written to by another application. This application cannot be modified. I'd therefore like to only open the file when I know it is not been written to by an other application.

Is there a pythonic way to do this? Otherwise, how do I achieve this in Unix and Windows?

edit: I'll try and clarify. Is there a way to check if the current file has been opened by another application?

I'd like to start with this question. Whether those other application read/write is irrelevant for now.

I realize it is probably OS dependent, so this may not really be python related right now.

From stackoverflow
  • Will your python script desire to open the file for writing or for reading? Is the legacy application opening and closing the file between writes, or does it keep it open?

    It is extremely important that we understand what the legacy application is doing, and what your python script is attempting to achieve.

    This area of functionality is highly OS-dependent, and the fact that you have no control over the legacy application only makes things harder unfortunately. Whether there is a pythonic or non-pythonic way of doing this will probably be the least of your concerns - the hard question will be whether what you are trying to achieve will be possible at all.


    UPDATE

    OK, so knowing (from your comment) that:

    the legacy application is opening and closing the file every X minutes, but I do not want to assume that at t = t_0 + n*X + eps it already closed the file.

    then the problem's parameters are changed. It can actually be done in an OS-independent way given a few assumptions, or as a combination of OS-dependent and OS-independent techniques. :)

    1. OS-independent way: if it is safe to assume that the legacy application keeps the file open for at most some known quantity of time, say T seconds (e.g. opens the file, performs one write, then closes the file), and re-opens it more or less every X seconds, where X is larger than 2*T.
      • stat the file
      • subtract file's modification time from now(), yielding D
      • if T <= D < X then open the file and do what you need with it
      • This may be safe enough for your application. Safety increases as T/X decreases. On *nix you may have to double check /etc/ntpd.conf for proper time-stepping vs. slew configuration (see tinker). For Windows see MSDN
    2. Windows: in addition (or in-lieu) of the OS-independent method above, you may attempt to use either:
      • sharing (locking): this assumes that the legacy program also opens the file in shared mode (usually the default in Windows apps); moreover, if your application acquires the lock just as the legacy application is attempting the same (race condition), the legacy application will fail.
        • this is extremely intrusive and error prone. Unless both the new application and the legacy application need synchronized access for writing to the same file and you are willing to handle the possibility of the legacy application being denied opening of the file, do not use this method.
      • attempting to find out what files are open in the legacy application, using the same techniques as ProcessExplorer (the equivalent of *nix's lsof)
        • you are even more vulnerable to race conditions than the OS-independent technique
    3. Linux/etc.: in addition (or in-lieu) of the OS-independent method above, you may attempt to use the same technique as lsof or, on some systems, simply check which file the symbolic link /proc/<pid>/fd/<fdes> points to
      • you are even more vulnerable to race conditions than the OS-independent technique
      • it is highly unlikely that the legacy application uses locking, but if it is, locking is not a real option unless the legacy application can handle a locked file gracefully (by blocking, not by failing - and if your own application can guarantee that the file will not remain locked, blocking the legacy application for extender periods of time.)

    UPDATE 2

    If favouring the "check whether the legacy application has the file open" (intrusive approach prone to race conditions) then you can solve the said race condition by:

    1. checking whether the legacy application has the file open (a la lsof or ProcessExplorer)
    2. suspending the legacy application process
    3. repeating the check in step 1 to confirm that the legacy application did not open the file between steps 1 and 2; delay and restart at step 1 if so, otherwise proceed to step 4
    4. doing your business on the file -- ideally simply renaming it for subsequent, independent processing in order to keep the legacy application suspended for a minimal amount of time
    5. resuming the legacy application process

    Cheers, V.

    Gilad Naor : The legacy application is opening and closing the file every X minutes, but I do not want to assume that at t = t_0 + n*X + eps it already closed the file.
  • Unix does not have file locking as a default. The best suggestion I have for a Unix environment would be to look at the sources for the lsof command. It has deep knowledge about which process have which files open. You could use that as the basis of your solution. Here are the Ubuntu sources for lsof.

    Gilad Naor : This looks like it may be a good solution on Unix. I can just parse the output of lsof for the file I need before accessing it. Not completely safe without a lock, but I can handle correctness only in 99.99% of the cases. I just need a solution for Windows as well.
  • You can extend the following approach:

    http://mail.python.org/pipermail/python-list/2004-June/267512.html

    http://answers.google.com/answers/threadview/id/17003.html

  • One thing I've done is have python very temporarily rename the file. If we're able to rename it, then no other process is using it. I only tested this on Windows.

    John Fouhy : The phrase "race condition" springs to mind :-/

0 comments:

Post a Comment