Loading data from csv file

AI/Machine Learning

Loading data from csv file

Linuxias 2017. 6. 3. 15:38

1. numpy - loadtxt

[source]

Load data from a text file.

Each row in the text file must have the same number of values.

Parameters:

Parameters:	fname : file, str, or pathlib.Path File, filename, or generator to read. If the filename extension is `.gz` or `.bz2`, the file is first decompressed. Note that generators should return byte strings for Python 3k. dtype : data-type, optional Data-type of the resulting array; default: float. If this is a structured data-type, the resulting array will be 1-dimensional, and each row will be interpreted as an element of the array. In this case, the number of columns used must match the number of fields in the data-type. comments : str or sequence, optional The characters or list of characters used to indicate the start of a comment; default: ‘#’. delimiter : str, optional The string used to separate values. By default, this is any whitespace. converters : dict, optional A dictionary mapping column number to a function that will convert that column to a float. E.g., if column 0 is a date string: `converters = {0: datestr2num}`. Converters can also be used to provide a default value for missing data (but see also `genfromtxt`): `converters = {3: lambda s: float(s.strip() or0)}`. Default: None. skiprows : int, optional Skip the first skiprows lines; default: 0. usecols : int or sequence, optional Which columns to read, with 0 being the first. For example, usecols = (1,4,5) will extract the 2nd, 5th and 6th columns. The default, None, results in all columns being read. New in version 1.11.0. Also when a single column has to be read it is possible to use an integer instead of a tuple. E.g `usecols = 3` reads the fourth column the same way as usecols = (3,)` would. unpack : bool, optional If True, the returned array is transposed, so that arguments may be unpacked using `x, y, z = loadtxt(...)`. When used with a structured data-type, arrays are returned for each field. Default is False. ndmin : int, optional The returned array will have at least ndmin dimensions. Otherwise mono-dimensional axes will be squeezed. Legal values: 0 (default), 1 or 2. New in version 1.6.0.
Returns:	out : ndarray Data read from the text file.

fname : file, str, or pathlib.Path

File, filename, or generator to read. If the filename extension is .gz or .bz2, the file is first decompressed. Note that generators should return byte strings for Python 3k.

dtype : data-type, optional

Data-type of the resulting array; default: float. If this is a structured data-type, the resulting array will be 1-dimensional, and each row will be interpreted as an element of the array. In this case, the number of columns used must match the number of fields in the data-type.

comments : str or sequence, optional

The characters or list of characters used to indicate the start of a comment; default: ‘#’.

delimiter : str, optional

The string used to separate values. By default, this is any whitespace.

converters : dict, optional

A dictionary mapping column number to a function that will convert that column to a float. E.g., if column 0 is a date string: converters = {0: datestr2num}. Converters can also be used to provide a default value for missing data (but see also genfromtxt): converters = {3: lambda s: float(s.strip() or0)}. Default: None.

skiprows : int, optional

Skip the first skiprows lines; default: 0.

usecols : int or sequence, optional

Which columns to read, with 0 being the first. For example, usecols = (1,4,5) will extract the 2nd, 5th and 6th columns. The default, None, results in all columns being read.
New in version 1.11.0.
Also when a single column has to be read it is possible to use an integer instead of a tuple. E.g usecols = 3 reads the fourth column the same way as usecols = (3,)` would.

unpack : bool, optional

If True, the returned array is transposed, so that arguments may be unpacked using x, y, z = loadtxt(...). When used with a structured data-type, arrays are returned for each field. Default is False.

ndmin : int, optional

The returned array will have at least ndmin dimensions. Otherwise mono-dimensional axes will be squeezed. Legal values: 0 (default), 1 or 2.
New in version 1.6.0.

Returns:

out : ndarray

Data read from the text file.

2. tensorflow 이용하기

import tensorflow as tf

filename_queue = tf.train.string_input_producer(
    ['file1.csv', 'file2.csv', 'file3.csv', ..], shuffle=False, name='filename_queue')

#아래 그림과 같이 Reader를 만듬.
reader = tf.TextLineReader()
key, value = reader.read(filename_queue)

# 값이 없는 경우에 Default 값과 Type으로 생성해 줄 부분,
# 현재 4개의 column을 가진 csv 파일들이기에 4개를 float형으로 읽어온다.
record_defaults = [[0.], [0.], [0.], [0.]]
data = tf.decode_csv(value, record_defaults=record_defaults)

# tf.train.batch를 이용해 csv 데이터 읽어오기
train_x_batch, train_y_batch = \
    tf.train.batch([xy[0:-1], xy[-1:]], batch_size=10)

# placeholders for a tensor that will be always fed.
X = tf.placeholder(tf.float32, shape=[None, 3])
Y = tf.placeholder(tf.float32, shape=[None, 1])

참고 URL

: https://www.youtube.com/watch?v=o2q4QNnoShY

https://www.tensorflow.org/programmers_guide/reading_data

저작자표시