Read Columns in Txt and Save Each Column to an Arry

11. Reading and Writing Information Files: ndarrays

Past Bernd Klein. Last modified: 01 February 2022.

There are lots of means for reading from file and writing to data files in numpy. Nosotros will hash out the dissimilar means and corresponding functions in this chapter:

  • savetxt
  • loadtxt
  • tofile
  • fromfile
  • salvage
  • load
  • genfromtxt

Saving textfiles with savetxt

Scrabble with the Text Numpy, read, write, array

The showtime two functions nosotros volition encompass are savetxt and loadtxt.

In the post-obit simple example, we define an array 10 and save it as a textfile with savetxt:

            import            numpy            as            np            x            =            np            .            array            ([[            i            ,            2            ,            three            ],            [            4            ,            five            ,            6            ],            [            7            ,            8            ,            9            ]],            np            .            int32            )            np            .            savetxt            (            "test.txt"            ,            x            )          

The file "test.txt" is a textfile and its content looks like this:

          [electronic mail protected]:~/Dropbox/notebooks/numpy$ more test.txt one.000000000000000000e+00 2.000000000000000000e+00 3.000000000000000000e+00 4.000000000000000000e+00 5.000000000000000000e+00 6.000000000000000000e+00 7.000000000000000000e+00 8.000000000000000000e+00 9.000000000000000000e+00        

Attention: The above output has been created on the Linux control prompt!

It's likewise possible to print the array in a special format, like for instance with three decimal places or as integers, which are preceded with leading blanks, if the number of digits is less than 4 digits. For this purpose nosotros assign a format string to the third parameter 'fmt'. We saw in our offset instance that the default delimeter is a blank. We tin can change this behaviour by assigning a cord to the parameter "delimiter". In most cases this string will consist solely of a single grapheme merely it can be a sequence of character, like a smiley " :-) " too:

            np            .            savetxt            (            "test2.txt"            ,            ten            ,            fmt            =            "            %two.3f            "            ,            delimiter            =            ","            )            np            .            savetxt            (            "test3.txt"            ,            x            ,            fmt            =            "            %04d            "            ,            delimiter            =            " :-) "            )          

The newly created files look like this:

          [e-mail protected]:~/Dropbox/notebooks/numpy$ more test2.txt  1.000,2.000,3.000 4.000,5.000,6.000 7.000,eight.000,ix.000          [e-mail protected]:~/Dropbox/notebooks/numpy$ more than test3.txt  0001 :-) 0002 :-) 0003 0004 :-) 0005 :-) 0006 0007 :-) 0008 :-) 0009        

The complete syntax of savetxt looks like this:

savetxt(fname, 10, fmt='%.18e', delimiter=' ', newline='\n', header='', footer='', comments='# ')        
Parameter Meaning
X array_like Information to be saved to a text file.
fmt str or sequence of strs, optional
A single format (%10.5f), a sequence of formats, or a multi-format string, e.one thousand. 'Iteration %d -- %10.5f', in which case 'delimiter' is ignored. For complex '10', the legal options for 'fmt' are:
a) a single specifier, "fmt='%.4e'", resulting in numbers formatted like "' (%s+%sj)' % (fmt, fmt)"
b) a full string specifying every real and imaginary part, e.g. "' %.4e %+.4j %.4e %+.4j %.4e %+.4j'" for 3 columns
c) a list of specifiers, one per column - in this case, the real and imaginary part must accept split up specifiers, eastward.one thousand. "['%.3e + %.3ej', '(%.15e%+.15ej)']" for two columns
delimiter A string used for separating the columns.
newline A cord (eastward.g. "\due north", "\r\n" or ",\n") which will end a line instead of the default line ending
header A String that volition be written at the kickoff of the file.
footer A String that will be written at the end of the file.
comments A Cord that will be prepended to the 'header' and 'footer' strings, to marking them every bit comments. The hash tag '#' is used as the default.

Loading Textfiles with loadtxt

We volition read in now the file "examination.txt", which we take written in our previous subchapter:

              y              =              np              .              loadtxt              (              "exam.txt"              )              print              (              y              )            

OUTPUT:

[[ 1.  2.  3.]  [ iv.  5.  six.]  [ seven.  8.  9.]]            
              y              =              np              .              loadtxt              (              "test2.txt"              ,              delimiter              =              ","              )              print              (              y              )            

OUTPUT:

[[ 1.  2.  3.]  [ iv.  5.  6.]  [ 7.  8.  9.]]            

Nothing new, if we read in our text, in which we used a smiley to separator:

              y              =              np              .              loadtxt              (              "test3.txt"              ,              delimiter              =              " :-) "              )              print              (              y              )            

OUTPUT:

[[ 1.  2.  3.]  [ 4.  5.  6.]  [ vii.  8.  9.]]            

Information technology's as well possible to choose the columns past index:

              y              =              np              .              loadtxt              (              "test3.txt"              ,              delimiter              =              " :-) "              ,              usecols              =              (              0              ,              two              ))              print              (              y              )            

OUTPUT:

[[ 1.  three.]  [ 4.  6.]  [ 7.  ix.]]            

We will read in our adjacent example the file "times_and_temperatures.txt", which we have created in our chapter on Generators of our Python tutorial. Every line contains a fourth dimension in the format "hh::mm::ss" and random temperatures between 10.0 and 25.0 degrees. We have to convert the time string into bladder numbers. The time will be in minutes with seconds in the hundred. We define first a office which converts "hh::mm::ss" into minutes:

              def              time2float_minutes              (              time              ):              if              type              (              time              )              ==              bytes              :              time              =              fourth dimension              .              decode              ()              t              =              time              .              divide              (              ":"              )              minutes              =              bladder              (              t              [              0              ])              *              lx              +              float              (              t              [              i              ])              +              bladder              (              t              [              2              ])              *              0.05              /              3              render              minutes              for              t              in              [              "06:00:10"              ,              "06:27:45"              ,              "12:59:59"              ]:              print              (              time2float_minutes              (              t              ))            

OUTPUT:

360.1666666666667 387.75 779.9833333333333            

You might take noticed that we check the type of fourth dimension for binary. The reason for this is the use of our function "time2float_minutes in loadtxt in the following example. The keyword parameter converters contains a lexicon which tin concur a office for a column (the key of the column corresponds to the key of the lexicon) to convert the string information of this column into a float. The string information is a byte string. That is why we had to transfer information technology into a a unicode string in our function:

              y              =              np              .              loadtxt              (              "times_and_temperatures.txt"              ,              converters              =              {              0              :              time2float_minutes              })              print              (              y              )            

OUTPUT:

[[  360.     20.1]  [  361.five    16.1]  [  363.     xvi.9]  ...,   [ 1375.5    22.five]  [ 1377.     11.ane]  [ 1378.five    15.2]]            
            # delimiter = ";" , # i.eastward. employ ";" every bit delimiter instead of whitespace                      

tofile

tofile is a office to write the content of an array to a file both in binary, which is the default, and text format.

A.tofile(fid, sep="", format="%southward")

The data of the A ndarry is ever written in 'C' social club, regardless of the order of A.

The data file written past this method can be reloaded with the office fromfile().

Parameter Meaning
fid tin can exist either an open file object, or a string containing a filename.
sep The cord 'sep' defines the separator betwixt array items for text output. If it is empty (''), a binary file is written, equivalent to file.write(a.tostring()).
format Format string for text file output. Each entry in the array is formatted to text by first converting it to the closest Python type, so using 'format' % particular.

Remark:

Data on endianness and precision is lost. Therefore it may not exist a good idea to employ the office to archive data or transport data between machines with different endianness. Some of these problems can be overcome by outputting the data as text files, at the expense of speed and file size.

              dt              =              np              .              dtype              ([(              'fourth dimension'              ,              [(              'min'              ,              int              ),              (              'sec'              ,              int              )]),              (              'temp'              ,              bladder              )])              10              =              np              .              zeros              ((              ane              ,),              dtype              =              dt              )              x              [              'fourth dimension'              ][              'min'              ]              =              ten              x              [              'temp'              ]              =              98.25              print              (              x              )              fh              =              open              (              "test6.txt"              ,              "bw"              )              x              .              tofile              (              fh              )            

OUTPUT:

Live Python training

instructor-led training course

Upcoming online Courses

Enrol hither

fromfile

fromfile to read in data, which has been written with the tofile function. Information technology's possible to read binary data, if the data blazon is known. Information technology's also possible to parse simply formatted text files. The information from the file is turned into an assortment.

The general syntax looks like this:

numpy.fromfile(file, dtype=bladder, count=-1, sep='')

Parameter Meaning
file 'file' can be either a file object or the proper noun of the file to read.
dtype defines the data type of the array, which will exist constructed from the file data. For binary files, it is used to determine the size and byte-order of the items in the file.
count defines the number of items, which volition be read. -1 means all items volition be read.
sep The string 'sep' defines the separator between the items, if the file is a text file. If it is empty (''), the file will be treated equally a binary file. A space (" ") in a separator matches zero or more whitespace characters. A separator consisting solely of spaces has to friction match at least ane whitespace.
              fh              =              open              (              "test4.txt"              ,              "rb"              )              np              .              fromfile              (              fh              ,              dtype              =              dt              )            

OUTPUT:

array([((4294967296, 12884901890), 1.0609978957e-313),        ((30064771078, 38654705672), ii.33419537056e-313),        ((55834574860, 64424509454), iii.60739284543e-313),        ((81604378642, 90194313236), 4.8805903203e-313),        ((107374182424, 115964117018), 6.1537877952e-313),        ((133143986206, 141733920800), 7.42698527006e-313),        ((158913789988, 167503724582), 8.70018274493e-313),        ((184683593770, 193273528364), 9.9733802198e-313)],        dtype=[('fourth dimension', [('min', '<i8'), ('sec', '<i8')]), ('temp', '<f8')])
              import              numpy              as              np              import              bone              # platform dependent: difference between Linux and Windows              #data = np.arange(50, dtype=np.int)              information              =              np              .              arange              (              50              ,              dtype              =              np              .              int32              )              data              .              tofile              (              "test4.txt"              )              fh              =              open              (              "test4.txt"              ,              "rb"              )              # 4 * 32 = 128              fh              .              seek              (              128              ,              os              .              SEEK_SET              )              x              =              np              .              fromfile              (              fh              ,              dtype              =              np              .              int32              )              print              (              x              )            

OUTPUT:

[32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49]            

Attention:

It can cause bug to utilize tofile and fromfile for data storage, because the binary files generated are not platform independent. There is no byte-order or information-type information saved by tofile. Data tin be stored in the platform independent .npy format using save and load instead.

All-time Practise to Load and Salvage Data

The recommended way to store and load information with Numpy in Python consists in using load and relieve. We also apply a temporary file in the following :

              import              numpy              every bit              np              print              (              x              )              from              tempfile              import              TemporaryFile              outfile              =              TemporaryFile              ()              10              =              np              .              arange              (              10              )              np              .              save              (              outfile              ,              x              )              outfile              .              seek              (              0              )              # Merely needed here to simulate endmost & reopening file              np              .              load              (              outfile              )            

OUTPUT:

[32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49] array([0, 1, ii, 3, 4, 5, 6, 7, 8, nine])

and yet another way: genfromtxt

There is yet another way to read tabular input from file to create arrays. As the name implies, the input file is supposed to be a text file. The text file can exist in the form of an archive file as well. genfromtxt tin can process the archive formats gzip and bzip2. The blazon of the archive is determined by the extension of the file, i.e. '.gz' for gzip and bz2' for an bzip2.

genfromtxt is slower than loadtxt, but it is capable of coping with missing data. It processes the file data in two passes. At offset it converts the lines of the file into strings. Thereupon information technology converts the strings into the requested data type. loadtxt on the other hand works in 1 become, which is the reason, why it is faster.

recfromcsv(fname, **kwargs)

This is not really another style to read in csv data. 'recfromcsv' basically a shortcut for

np.genfromtxt(filename, delimiter=",", dtype=None)

Live Python training

instructor-led training course

Upcoming online Courses

Enrol here

lairdese1972.blogspot.com

Source: https://python-course.eu/numerical-programming/reading-and-writing-data-files-ndarrays.php

0 Response to "Read Columns in Txt and Save Each Column to an Arry"

Postar um comentário

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel