Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic duplication of files to alive servers #1188

Open
w1nns opened this issue Jul 9, 2019 · 4 comments
Open

Automatic duplication of files to alive servers #1188

w1nns opened this issue Jul 9, 2019 · 4 comments
Assignees
Labels

Comments

@w1nns
Copy link

w1nns commented Jul 9, 2019

Hello!
I deployed a cluster of leofs in docker. Master Manager, Slave Manager, three storages and gateway. Replication is configured to two storages.
I can't understand if there is a mechanism for auto-duplicating a file to third alive storage, if one of the two storages that contains the file fell.

I found the command leofs-adm recover-file, but error is displayed:

[ERROR] Could not recover

Error logs are empty.

Regards

@yosukehara yosukehara self-assigned this Jul 10, 2019
@yosukehara
Copy link
Member

yosukehara commented Jul 10, 2019

Let me know your LeoManager's configuration file which is leo_manager_0/etc/leo_manager.conf and the result of $ leofs-adm status to solve the problem.

@mocchira
Copy link
Member

@Kirsun25 If I understand your question correctly, there are no mechanisms for auto-duplicating a file to third alive storage (I think this is the feature called Hinted Hand Off) on the current latest LeoFS.
so recover-file never works without bringing the failed node back to the cluster.

As we noted on https://github.com/leo-project/leofs#version-2, Hinted Hand Off (The feature you'd expect) will be implemented on the version 2.1.

Let me know if I misunderstand your question.

@w1nns
Copy link
Author

w1nns commented Jul 10, 2019

Thank you for the quick reply!

root@917285ece53a:/# cat /usr/local/leofs/1.4.3/leo_manager_0/etc/leo_manager.conf 
##======================================================================
## LeoFS - Manager Configuration (MASTER)
##
## See: http://leo-project.net/leofs/docs/configuration/configuration_1.html
##
## Additional configuration files from leo_manager.d/*.conf (if exist) are
## processed after this file and can be used to override these settings.
##======================================================================
## --------------------------------------------------------------------
## SASL
## --------------------------------------------------------------------
## See: http://www.erlang.org/doc/man/sasl_app.html
##
## The following configuration parameters are defined for
## the SASL application. See app(4) for more information
## about configuration parameters

## SASL error log path
## sasl.sasl_error_log = ./log/sasl/sasl-error.log

## Restricts the error logging performed by the specified sasl_error_logger
## to error reports, progress reports, or both.
## errlog_type = [error | progress | all]
## sasl.errlog_type = error

## Specifies in which directory the files are stored.
## If this parameter is undefined or false, the error_logger_mf_h is not installed.
## sasl.error_logger_mf_dir = ./log/sasl

## Specifies how large each individual file can be.
## If this parameter is undefined, the error_logger_mf_h is not installed.
## sasl.error_logger_mf_maxbytes = 10485760

## Specifies how many files are used.
## If this parameter is undefined, the error_logger_mf_h is not installed.
## sasl.error_logger_mf_maxfiles = 5

## --------------------------------------------------------------------
## MANAGER
## --------------------------------------------------------------------
## Partner of manager's alias
manager.partner = manager_1@172.21.0.8

## Manager-console accepatable ip address
console.bind_address = localhost

## Manager-console accepatable port number
console.port.cui  = 10010
console.port.json = 10020

## Manager-console's number of acceptors
console.acceptors.cui = 3
console.acceptors.json = 16

## # of histories to display at once
console.histories.num_of_display = 200

## --------------------------------------------------------------------
## MANAGER - System
##     * Only set its configurations to **Manager-master**
## --------------------------------------------------------------------
## DC Id
system.dc_id = dc_1

## Cluster Id
system.cluster_id = leofs_1

## --------------------------------------------------------------------
## MANAGER - Consistency Level
##     * Only set its configurations to **Manager-master**
##     * See: http://leo-project.net/leofs/docs/configuration/configuration_1.html
## --------------------------------------------------------------------
## A number of replicas
consistency.num_of_replicas = 2

## A number of replicas needed for a successful WRITE operation
consistency.write = 1

## A number of replicas needed for a successful READ operation
consistency.read = 1

## A number of replicas needed for a successful DELETE operation
consistency.delete = 1

## A number of rack-aware replicas
consistency.rack_aware_replicas = 0


## --------------------------------------------------------------------
## MANAGER - Multi DataCenter Settings
## --------------------------------------------------------------------
## A number of replication targets
## mdc_replication.max_targets = 2

## A number of replicas per a datacenter
## [note] A local LeoFS sends a stacked object which contains an items of a replication method:
##          - [L1_N] A number of replicas
##          - [L1_W] A number of replicas needed for a successful WRITE-operation
##          - [L1_R] A number of replicas needed for a successful READ-operation
##          - [L1_D] A number of replicas needed for a successful DELETE-operation
##       A remote cluster of a LeoFS system which receives this cluster's objects,
##       and then replicates them, which adhere to a replication method of each object
## mdc_replication.num_of_replicas_a_dc = 1

## MDC replication / A number of replicas needed for a successful WRITE-operation
## mdc_replication.consistency.write = 1

## MDC replication / A number of replicas needed for a successful READ-operation
## mdc_replication.consistency.read = 1

## MDC replication / A number of replicas needed for a successful DELETE-operation
## mdc_replication.consistency.delete = 1


## --------------------------------------------------------------------
## MANAGER - Mnesia
##     * Store the info storage-cluster and the info of gateway(s)
##     * Store the RING and the command histories
## --------------------------------------------------------------------
## Mnesia dir
mnesia.dir = ./work/mnesia/127.0.0.1

## The write threshold for transaction log dumps
## as the number of writes to the transaction log
mnesia.dump_log_write_threshold = 50000

## Controls how often disc_copies tables are dumped from memory
mnesia.dc_dump_limit = 40


## --------------------------------------------------------------------
## MANAGER - Log
## --------------------------------------------------------------------
## Log level: [0:debug, 1:info, 2:warn, 3:error]
## log.log_level = 1

## Output log file(s) - Erlang's log
## log.erlang = ./log/erlang

## Output log file(s) - app
## log.app = ./log/app

## Output log file(s) - members of storage-cluster
## log.member_dir = ./log/ring

## Output log file(s) - ring
## log.ring_dir = ./log/ring


## --------------------------------------------------------------------
## MANAGER - Other Directories
## --------------------------------------------------------------------
## Directory of queue for monitoring "RING"
## queue_dir = ./work/queue

## Directory of SNMP agent configuration
## snmp_agent = ./snmp/snmpa_manager_0/LEO-MANAGER


## --------------------------------------------------------------------
## RPC
## --------------------------------------------------------------------
## RPC-Server's acceptors
rpc.server.acceptors = 16

## RPC-Server's listening port number
rpc.server.listen_port = 13075

## RPC-Server's listening timeout
rpc.server.listen_timeout = 5000

## RPC-Client's size of connection pool
rpc.client.connection_pool_size = 16

## RPC-Client's size of connection buffer
rpc.client.connection_buffer_size = 16


## --------------------------------------------------------------------
## Other Libs
## --------------------------------------------------------------------
## Enable profiler - leo_backend_db
## leo_backend_db.profile = false

## Enable profiler - leo_logger
## leo_logger.profile = false

## Enable profiler - leo_mq
## leo_mq.profile = false

## Enable profiler - leo_redundant_manager
## leo_redundant_manager.profile = false

## Enable profiler - leo_statistics
## leo_statistics.profile = false


##======================================================================
## For vm.args
##======================================================================
## Name of the LeoFS's manager node
nodename = manager_0@172.21.0.7

## Cookie for distributed node communication.  All nodes in the same cluster
## should use the same cookie or they will not be able to communicate.
distributed_cookie = 401321b4

## Enable kernel poll
erlang.kernel_poll = true

## Number of async threads
erlang.asyc_threads = 32

## Increase number of concurrent ports/sockets
erlang.max_ports = 64000

## Set the location of crash dumps
erlang.crash_dump = ./log/erl_crash.dump

## Raise the ETS table limit
erlang.max_ets_tables = 256000

## Enable SMP
erlang.smp = enable

## Raise the default erlang process limit
process_limit = 1048576

## Path of SNMP-agent configuration
## snmp_conf = ./snmp/snmpa_manager_0/leo_manager_snmp
root@917285ece53a:/# leofs-adm status
/usr/local/bin/leofs-adm: line 71: lsb_release: command not found
/usr/local/bin/leofs-adm: line 72: lsb_release: command not found
 [System Confiuration]
-----------------------------------+----------
 Item                              | Value    
-----------------------------------+----------
 Basic/Consistency level
-----------------------------------+----------
                    system version | 1.4.3
                        cluster Id | leofs_1
                             DC Id | dc_1
                    Total replicas | 2
          number of successes of R | 1
          number of successes of W | 1
          number of successes of D | 1
 number of rack-awareness replicas | 0
                         ring size | 2^128
-----------------------------------+----------
 Multi DC replication settings
-----------------------------------+----------
 [mdcr] max number of joinable DCs | 2
 [mdcr] total replicas per a DC    | 1
 [mdcr] number of successes of R   | 1
 [mdcr] number of successes of W   | 1
 [mdcr] number of successes of D   | 1
-----------------------------------+----------
 Manager RING hash
-----------------------------------+----------
                 current ring-hash | 6c4473de
                previous ring-hash | 6c4473de
-----------------------------------+----------

 [State of Node(s)]
-------+----------------------------+--------------+---------+----------------+----------------+----------------------------
 type  |            node            |    state     | rack id |  current ring  |   prev ring    |          updated at         
-------+----------------------------+--------------+---------+----------------+----------------+----------------------------
  S    | storage_0@172.21.0.9       | stop         |         |                |                | 2019-07-10 08:42:38 +0000
  S    | storage_1@172.21.0.10      | running      |         | 6c4473de       | 6c4473de       | 2019-07-05 14:17:48 +0000
  S    | storage_2@172.21.0.12      | running      |         | 6c4473de       | 6c4473de       | 2019-07-05 14:17:53 +0000
  G    | gateway_0@172.21.0.11      | running      |         | 6c4473de       | 6c4473de       | 2019-07-10 08:16:35 +0000
-------+----------------------------+--------------+---------+----------------+----------------+----------------------------

root@917285ece53a:/# leofs-adm whereis mybucket/adaptive2.smil
/usr/local/bin/leofs-adm: line 71: lsb_release: command not found
/usr/local/bin/leofs-adm: line 72: lsb_release: command not found
-------+----------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
 del?  |            node            |             ring address             |    size    |   checksum   |  has children  |  total chunks  |     clock      |             when            
-------+----------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
       | storage_0@172.21.0.9       |                                      |            |              |                |                |                | 
       | storage_1@172.21.0.10      | cfbd634b9e7d0fc0c65a2b80d027e435     |       656B |   e4eb5205af | false          |              0 | 58ca1291adedd  | 2019-07-01 16:31:30 +0000

[Failure Nodes]
storage_0@172.21.0.9:unavailable

Perhaps I incorrectly formulated my question. Actually, I'm interested in the recovery mechanism.
As I said above, there are three repositories. Two of them contain a file. One of the storages, which contain the file fell and suppose it does not recover. How can I recover the number of duplicates(to two) of a file on servers?

@yosukehara
Copy link
Member

Sorry for the late reply. Read the following LeoFS' documentation - Cluster Settings / Consistency Level.
The document seems to contain what you want to know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants