Board logo

subject: Replication of Data [print this page]


Replication of Data
Replication of Data

Replication ensures consistency between redundant sources such as hardware or software components and is the process of sharing information. It is used to improve accessibility, fault-tolerance and reliability. When the same computing task is repeated many times, this is referred to as computation replication. It is known as data replication if the same data is stored on multiple storage devices. Computational tasks are generally replicated in space, for example executed on separate devices, or replicated in time when it is repeatedly executed on a single device. The replication itself, according to Plan de Continuidad, should be transparent to an external user. Access to a single, non-replicated entity is typically uniform with access to a replicated entity. In a failure scenario, a failover of replicas is hidden as much as possible.

There are both passive and active replications in systems that replicate services or data. In passive replication each request is processed on a single replica. It is then transferred it its state to the other replicas. Active replication is defined by processing the same request at every replica. A multi-primary scheme is when any replica processes a request and then distributes a new state. In a multi-primary scheme, some form of distributed concurrency control such as a distributed lock manager must be utilized. A primary-backup scheme is when at any time one master replica is designated to process all of the requests. This is predominant in high-availability clusters.

Load balancing occasionally uses data replication internally, in order to distribute its data among machines. Load balancing differs from task replication since it distributes a load of different computations across machines. In the event of failure, load balancing allows a single computation to be dropped. Backup also differs from replication as it saves a copy of unchanged data for a long time. Conversely, replicas quickly lose any historical state and are frequently updated.

Database replication is often used on systems that have a master/slave relationship between the copies and the originals. The master is responsible to log the updates which then flow down to the slaves. Once the slave has received the update successfully, it outputs a message which allows for the sending of subsequent updates.

The process where updates can be submitted to any database and then transferred to other servers is called multi-master replication. Although this method is often desired, it can substantially increase complexity and costs making it and impractical choice in some situations. Transactional conflict prevention or resolution is the most common issue that exists in multi-master replication. Many eager or synchronous replication solutions do provide conflict prevention. Asynchronous solutions also provide conflict resolution.

If a record is changed on 2 nodes at the same time, a synchronous replication system would detect the conflict prior to confirming the commit. It would then be proactive and abort one of the transactions, while a lazy replication system would allow both transactions to run and commit a conflict resolution during resynchronization mode. The resolution of this kind of conflict may be determined by various factors such as the hierarchy of origin nodes or the timestamp of the transaction.

Real-time or active storage replication is typically implemented by distributing updates of a block device to various physical hard disks. Any file system supported by the operating system can be replicated without modifications this way. The file system code operates on a level above the block device driver layer and is implemented either in software, as in a device driver or in hardware, through a disk array controller.

Database replication becomes more challenging when it scales up. Typically, the scale up goes with 2 dimensions, vertical and horizontal. Vertical scale up has data replicas located further away, while horizontal scale up has more data replicas. A multi-layer, multi-view access protocol can solve issues that occur with horizontal scale up. Since internet performance and overall reliability are improving, issues with vertical scale up are occurring less frequently.

Typically for locally-connected disks, the most basic method used is disk mirroring. The storage industry narrows the definitions. Mirroring is defined as a short-distance or local operation. A replication is extendable across a computer network and this enables the disks to be located in various physically distant locations. Availability in case of local disasters or failures can be improved and damage can be avoided by implementing this method. Usually, the aforementioned master/slave replication model is applied in these circumstances.




welcome to loan (http://www.yloan.com/) Powered by Discuz! 5.5.0