How to define field delimiter for text file on HDFS? #1440

foxgarden · 2019-09-05T07:02:41Z

I have a Hive table stored on HDFS:

CREATE TABLE `ewt_ods.crm_customer_f_1d`(
	  `id` bigint,
	  `ctmname` string,
	  `areacode` string,
	  `addr` string,
	  `addtime` string,
	  `isdelete` string
PARTITIONED BY ( 
	  `day` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS textfile;

Now I want to use it as a snappydata external table, so I use:

snappy>
CREATE EXTERNAL TABLE crm_customer_ext
using text options(path 'hdfs://hadoop01:8020/user/hive/warehouse/ewt_ods.db/crm_customer_f_1d/');

When select this table, I get only two column(one is partition column day):

snappy> select * from crm_customer_ext limit 1;
value                                                                                                                           |day       
-------------------------------------------------------------------------------------------------------------------------------------------
12660   testuser 513231  513200  2013-07-24 14:45:43.96  false                                                                          |2019-08-20

If define columns when creating table, create-table sql execute successfully, but raise an exception when select:

snappy>
CREATE EXTERNAL TABLE crm_customer_ext(
ID BIGINT,
CTMNAME STRING,
AREACODE STRING,
ADDR STRING,
ADDTIME STRING,
ISDELETE STRING) using text options(path 'hdfs://hadoop01:8020/user/hive/warehouse/ewt_ods.db/crm_customer_f_1d/');

snappy> select * from crm_customer_ext limit 1;

ERROR 38000: (SQLState=38000 Severity=20000) (Server=test-spark03/10.0.11.111[1527] Thread=ThriftProcessor-1) The exception 'com.pivotal.gemfirexd.internal.engine.jdbc.GemFireXDRuntimeException: myID: 10.0.11.111(21792)<v3>:12486, caused by java.lang.AssertionError: assertion failed: Text data source only produces a single data column named "value".' was thrown while evaluating an expression.

I want to know how to define field delimiter when creating text external table. (parquet tables do not have such issue, but changing all tables to parquet will be a huge work load)
Version: 1.1.0 & 1.1.1

The text was updated successfully, but these errors were encountered:

jramnara · 2019-09-05T17:30:09Z

use something like this " create external table xxx using CSV options(delimiter '..', path '....') .." For precise syntax search for how CSV data loading is done in Spark.

…

----- Jags SnappyData acquired by TIBCO <http://snappydata.io> Download binary, source <https://www.snappydata.io/download>

On Thu, Sep 5, 2019 at 12:02 AM foxgarden ***@***.***> wrote: I have a Hive table stored on HDFS: CREATE TABLE `ewt_ods.crm_customer_f_1d`( `id` bigint, `ctmname` string, `areacode` string, `addr` string, `addtime` string, `isdelete` string PARTITIONED BY ( `day` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ( 'field.delim'='\t', 'line.delim'='\n', 'serialization.format'='\t') STORED AS text; Now I want to use it as a snappydata external table, so I use: snappy> CREATE EXTERNAL TABLE crm_customer_ext using text options(path 'hdfs://hadoop01:8020/user/hive/warehouse/ewt_ods.db/crm_customer_f_1d/'); When select this table, I get only two column(one is partition column day ): snappy> select * from crm_customer_ext limit 1; value |day ------------------------------------------------------------------------------------------------------------------------------------------- 12660 testuser 513231 513200 2013-07-24 14:45:43.96 false |2019-08-20 If define columns when creating table, create-table sql execute successfully, but raise an exception when select: snappy> CREATE EXTERNAL TABLE crm_customer_ext( ID BIGINT, CTMNAME STRING, AREACODE STRING, ADDR STRING, ADDTIME STRING, ISDELETE STRING) using text options(path 'hdfs://hadoop01:8020/user/hive/warehouse/ewt_ods.db/crm_customer_f_1d/'); snappy> select * from crm_customer_ext limit 1; ERROR 38000: (SQLState=38000 Severity=20000) (Server=test-spark03/ 10.0.11.111[1527] Thread=ThriftProcessor-1) The exception 'com.pivotal.gemfirexd.internal.engine.jdbc.GemFireXDRuntimeException: myID: 10.0.11.111(21792)<v3>:12486, caused by java.lang.AssertionError: assertion failed: Text data source only produces a single data column named "value".' was thrown while evaluating an expression. I want to know how to define field delimiter when creating text external table. (parquet tables do not have such issue, but changing all tables to parquet will be a huge work load) — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1440>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAOYUBWJU3DIS6MPYBD4RXLQICVJHANCNFSM4IT2D2VQ> .

foxgarden · 2019-09-06T01:19:03Z

@jramnara
This works for my table.
Furthermore, I check the source code, csv file format supports these special character delimiter: \t,\r,\b,\f,",',\u0000 , but does not support \u0001 (which is Hive default field delimiter), could you make an enhancement?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to define field delimiter for text file on HDFS? #1440

How to define field delimiter for text file on HDFS? #1440

foxgarden commented Sep 5, 2019 •

edited

jramnara commented Sep 5, 2019 via email

foxgarden commented Sep 6, 2019

How to define field delimiter for text file on HDFS? #1440

How to define field delimiter for text file on HDFS? #1440

Comments

foxgarden commented Sep 5, 2019 • edited

jramnara commented Sep 5, 2019 via email

foxgarden commented Sep 6, 2019

foxgarden commented Sep 5, 2019 •

edited