Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to define field delimiter for text file on HDFS? #1440

Open
foxgarden opened this issue Sep 5, 2019 · 2 comments
Open

How to define field delimiter for text file on HDFS? #1440

foxgarden opened this issue Sep 5, 2019 · 2 comments

Comments

@foxgarden
Copy link

foxgarden commented Sep 5, 2019

I have a Hive table stored on HDFS:

CREATE TABLE `ewt_ods.crm_customer_f_1d`(
	  `id` bigint,
	  `ctmname` string,
	  `areacode` string,
	  `addr` string,
	  `addtime` string,
	  `isdelete` string
PARTITIONED BY ( 
	  `day` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS textfile;

Now I want to use it as a snappydata external table, so I use:

snappy>
CREATE EXTERNAL TABLE crm_customer_ext
using text options(path 'hdfs://hadoop01:8020/user/hive/warehouse/ewt_ods.db/crm_customer_f_1d/');

When select this table, I get only two column(one is partition column day):

snappy> select * from crm_customer_ext limit 1;
value                                                                                                                           |day       
-------------------------------------------------------------------------------------------------------------------------------------------
12660   testuser 513231  513200  2013-07-24 14:45:43.96  false                                                                          |2019-08-20

If define columns when creating table, create-table sql execute successfully, but raise an exception when select:

snappy>
CREATE EXTERNAL TABLE crm_customer_ext(
ID BIGINT,
CTMNAME STRING,
AREACODE STRING,
ADDR STRING,
ADDTIME STRING,
ISDELETE STRING) using text options(path 'hdfs://hadoop01:8020/user/hive/warehouse/ewt_ods.db/crm_customer_f_1d/');

snappy> select * from crm_customer_ext limit 1;

ERROR 38000: (SQLState=38000 Severity=20000) (Server=test-spark03/10.0.11.111[1527] Thread=ThriftProcessor-1) The exception 'com.pivotal.gemfirexd.internal.engine.jdbc.GemFireXDRuntimeException: myID: 10.0.11.111(21792)<v3>:12486, caused by java.lang.AssertionError: assertion failed: Text data source only produces a single data column named "value".' was thrown while evaluating an expression.

I want to know how to define field delimiter when creating text external table. (parquet tables do not have such issue, but changing all tables to parquet will be a huge work load)
Version: 1.1.0 & 1.1.1

@jramnara
Copy link
Contributor

jramnara commented Sep 5, 2019 via email

@foxgarden
Copy link
Author

@jramnara
This works for my table.
Furthermore, I check the source code, csv file format supports these special character delimiter: \t,\r,\b,\f,",',\u0000 , but does not support \u0001 (which is Hive default field delimiter), could you make an enhancement?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants