internationalization - Python's os.path choking on Hebrew filenames -


I am writing a script that has to move some files around, but unfortunately this os guess not. Path plays very well with internationalization when I have files named Hebrew, then there are problems: Here is a screenshot of the contents of a directory:

Now this code Consider the files that go to the files in this directory: .listdir ('test_source') for the file in the

  files = os file: pf = os.path.join ('test_source', F) Print PF, os.path.exists (pf)  

Output is:

  test_source \ ex correct test_source \ joe True test_source \ mie.txt True test_source \ __ () '' '. Txt True test_source \ ???? Txt incorrect  

Note how os.path.exists thinks that the Hebrew named file is not even present? How can I fix this?

ActivePython 2.5.2 on Windows XP Home SP2

Hmm, it appears That's when OSILDID supplies a Unicode string, it works:

  files = os.listdir (u'test_source) for files in f: pf = os Path.join (u'test_source ', f) print pf.encode (' ascii ',' replace '), os.path.exists (pf)  

===>

  test_source \ ex True test_source \ joe true test_source \ mie.txt True test_source \ __ () '' '. Txt True test_source \ ???? Here are some important comments:  
  • Unicode in Windows XP (like all NT derivatives) stores all filenames < / Un>
  • OS Listdir (and similar functions, such as oswalk ), a Unicode string must be passed to work correctly with the Unicode path. A quote from the above link is:

os.listdir (), which gives the file name, raises an issue: should it return the Unicode version of file names, or Bit strings back with 8 encoded versions? Oslistdir () will both, depending on whether you provided the directory path in the form of an 8-bit string or unicode string. If you pass the Unicode string as a path, then filenames will be decoded using the file system encoding and a list of Unicode strings will be returned, while the 8-bit path will return an 8-bit version of the filenames .

  • And finally, print wants an ascii string, not unicode, so the path has to be encoded with eski.

Comments