Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): INode is not a regular file: / (inode 0) [Lease. Holder: #1650

Open
bright-zy opened this issue Apr 25, 2024 · 3 comments
Assignees
Labels
bug Something isn't working can not reproduce P2 minor issue

Comments

@bright-zy
Copy link

(you don't have to strictly follow this form)

Bug Report

1 billion hdfs data import errors

Briefly describe the bug

2024.04.24 18:39:15.592303 [ 34916 ] {} <Error> PlanSegmentExecutor: [e91000a7-d251-4e35-8c09-aa7d83f1d2d5_1]: Query has excpetion with code: 204, detail 
: Code: 204, e.displayText() = DB::ErrnoException: Cannot HDFS sync/user/bigdata/bak_data/c68e0516-329d-4ad0-a854-b38512130b76/20240225_449303060040122370_449303060040122370_0_449302400528809994_0/data HdfsIOException: Unexpected exception: when unwrap the rpc remote exception "org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException", INode is not a regular file: / (inode 0) [Lease.  Holder: libhdfs3_client_random_129678365_count_146249_pid_47956_tid_139914715764480, pending creates: 1]
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2880)
        at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.abandonBlock(FSDirWriteFileOp.java:123)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.abandonBlock(FSNamesystem.java:2854)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.abandonBlock(NameNodeRpcServer.java:966)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.abandonBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:562)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1045)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:963)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2976)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): INode is not a regular file: / (inode 0) [Lease.  Holder: libhdfs3_client_random_129678365_count_146249_pid_47956_tid_139914715764480, pending creates: 1]
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2880)
        at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.abandonBlock(FSDirWriteFileOp.java:123)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.abandonBlock(FSNamesystem.java:2854)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.abandonBlock(NameNodeRpcServer.java:966)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.abandonBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:562)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1045)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:963)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2976)

        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)
        at org.apache.hadoop.ipc.Client.call(Client.java:1492)
        at org.apache.hadoop.ipc.Client.call(Client.java:1389)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
        at com.sun.proxy.$Proxy19.abandonBlock(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.abandonBlock(ClientNamenodeProtocolTranslatorPB.java:495)
        at sun.reflect.GeneratedMethodAccessor341.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invoke(RouterRpcClient.java:557)
        at org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:457)
        at org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSingle(RouterRpcClient.java:746)
        at org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSingleBlockPool(RouterRpcClient.java:718)
        at org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSingle(R, errno: 5, strerror: Input/output error: While executing TableWrite SQLSTATE: HY000, Stack trace (when copying this message, always include the lines below):

0. Poco::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0x26f45d32 in /data/byconity/clickhouse
1. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0x105d16e0 in /data/byconity/clickhouse
2. DB::throwFromErrno(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, int) @ 0x105d2314 in /data/byconity/clickhouse
3. DB::WriteBufferFromHDFS::WriteBufferFromHDFSImpl::sync() const @ 0x20084f42 in /data/byconity/clickhouse
4. DB::MergeTreeCNCHDataDumper::dumpTempPart(std::__1::shared_ptr<DB::IMergeTreeDataPart> const&, std::__1::shared_ptr<DB::IDisk> const&, bool) const @ 0x21ba32b1 in /data/byconity/clickhouse
5. void std::__1::__function::__policy_invoker<void ()>::__call_impl<std::__1::__function::__default_alloc_func<DB::CnchDataWriter::dumpCnchParts(std::__1::vector<std::__1::shared_ptr<DB::IMergeTreeDataPart>, std::__1::allocator<std::__1::shared_ptr<DB::IMergeTreeDataPart> > > const&, std::__1::vector<std::__1::shared_ptr<DB::LocalDeleteBitmap>, std::__1::allocator<std::__1::shared_ptr<DB::LocalDeleteBitmap> > > const&, std::__1::vector<std::__1::shared_ptr<DB::IMergeTreeDataPart>, std::__1::allocator<std::__1::shared_ptr<DB::IMergeTreeDataPart> > > const&)::$_2, void ()> >(std::__1::__function::__policy_storage const*) @ 0x1fe3a585 in /data/byconity/clickhouse
6. ThreadPoolImpl<ThreadFromGlobalPool>::worker(std::__1::__list_iterator<ThreadFromGlobalPool, void*>) @ 0x106114f1 in /data/byconity/clickhouse
7. ThreadFromGlobalPool::ThreadFromGlobalPool<void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda0'()>(void&&, void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda0'()&&...)::'lambda'()::operator()() @ 0x10613008 in /data/byconity/clickhouse
8. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0x1060dc00 in /data/byconity/clickhouse
9. void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda0'()> >(void*) @ 0x1061207a in /data/byconity/clickhouse
10. start_thread @ 0x7e25 in /usr/lib64/libpthread-2.17.so
11. __clone @ 0xfebad in /usr/lib64/libc-2.17.so
 (version 21.8.7.1)

The result you expected

The operation is normal

How to Reproduce

When I do the import below

INSERT OVERWRITE olap.olap_comm_deal_margin_agreement_indicator_di_cube_v37 (
shop_code,performance_city_code,city_code,agent_code,statistical_code,brand_code,is_core_30_city_yj_code,is_agent_city_code,battle_region_code,province_region_code,corp_code,marketing_code,area_code,shop_director_code,team_code,sub_brand_code,role_category_code,biz_type_code,dim_ref_employee_base_info_da_on_job_status_code,dim_ref_shop_brand_org_info_da_original_battle_region_code,dim_ref_shop_brand_org_info_da_original_province_region_code,dim_ref_shop_brand_org_info_da_original_performance_city_code,dim_ref_shop_brand_org_info_da_original_city_code,dim_ref_shop_brand_org_info_da_original_corp_code,dim_ref_shop_brand_org_info_da_original_region_code,dim_ref_shop_brand_org_info_da_original_marketing_code,dim_ref_shop_brand_org_info_da_original_area_code,dim_ref_shop_brand_org_info_da_original_shop_code,pt,stat_date,statistical_name,brand_name,is_core_30_city_yj_name,is_agent_city_name,is_has_lianjia,is_direct_sale,is_valid_agent,on_job_days_section,position_type,del_type_name,battle_region_abbr_name,province_region_abbr_name,performance_city_abbr_name,city_abbr_name,corp_name,marketing_name,area_name,shop_director_name,shop_name,team_name,agent_name,sub_brand_name,is_direct,role_category_name,is_effective,is_last_180d_linkshop,biz_type_name,corp_director_no,corp_director_name,marketing_director_no,marketing_director_name,area_director_no,area_director_name,dim_public_date_info_year,dim_public_date_info_month,dim_public_date_info_week,dim_public_date_info_sun_dt,dim_public_date_info_short_week_cn,dim_public_date_info_short_year_weeks,dim_ref_employee_base_info_da_is_link_agent,dim_ref_employee_base_info_da_job_level_seq_name,dim_ref_employee_base_info_da_agent_job_category_name,dim_ref_employee_base_info_da_on_job_status,dim_ref_shop_director_role_relation_da_is_link,dim_ref_shop_brand_org_info_da_original_battle_region_abbr_name,dim_ref_shop_brand_org_info_da_original_province_region_abbr_name,dim_ref_shop_brand_org_info_da_original_performance_city_abbr_name,dim_ref_shop_brand_org_info_da_original_city_abbr_name,dim_ref_shop_brand_org_info_da_original_corp_name,dim_ref_shop_brand_org_info_da_original_region_name,dim_ref_shop_brand_org_info_da_original_marketing_name,dim_ref_shop_brand_org_info_da_original_area_name,dim_ref_shop_brand_org_info_da_original_shop_name,dim_ref_city_base_info_da_city_level,dim_ref_city_base_info_da_is_bplus_city,original_team_code,original_team_name,original_agent_ucid,original_agent_name,dim_public_date_info_date,assess_income_amt_sum,same_shop_margin_assess_income_amt_sum,coop_agent_order_outer_rebate_amt_sum,receivable_assign_amt_sum,received_assign_amt_sum,sign_receivable_amt_sum,sign_receivable_pre_amt_sum,received_pre_amt_sum,valid_assess_income_amt_sum,person_create_amt_sum,s_person_create_amt_sum
) FORMAT Parquet
INFILE 'hdfs://xx-xxx/user/olap/data/byconity/olap_comm_deal_margin_agreement_indicator_di_cube_v37/188d6e60f4ad4e9eb386196b898535e3/part-{00000..00099}-3a617160-f743-4aff-a6e3-9ca444fdfe2e-c000.parquet';

Version

0.4.0

@bright-zy bright-zy added the bug Something isn't working label Apr 25, 2024
@Alima777
Copy link
Collaborator

Hi, @bright-zy

have you retried and this error still exists?

@bright-zy
Copy link
Author

Hi, @Alima777
yes, batch 10 million or so import no problem, when the data hundreds of millions happen problems.

@Alima777
Copy link
Collaborator

Sorry, @bright-zy

It seems I can't reproduce your problem.

image image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working can not reproduce P2 minor issue
Projects
None yet
Development

No branches or pull requests

3 participants