Planet MySQL

Planet MySQL - https://planet.mysql.com
  1. If you ever tried to backup MySQL 5.7 with Percona Xtrabackup that comes in percona-xtrabackup package most likely you got error message like this: # innobackupex ./ 170623 05:58:43 innobackupex: Starting the backup operation IMPORTANT: Please check that the backup run completes successfully. At the end of a successful backup run innobackupex prints "completed OK!". 170623 05:58:44 version_check Connecting to MySQL server with DSN 'dbi:mysql:;mysql_read_default_group=xtrabackup' as 'dba' (using password: YES). 170623 05:58:44 version_check Connected to MySQL server 170623 05:58:44 version_check Executing a version check against the server... 170623 05:58:44 version_check Done. 170623 05:58:44 Connecting to MySQL server host: localhost, user: dba, password: set, port: not set, socket: not set Error: Unsupported server version: '5.7.18-15'. Please report a bug at https://bugs.launchpad.net/percona-xtrabackup This happens because percona-xtrabackup actually installs version 2.3 that doesn’t support MySQL 5.7. To take copies from MySQL 5.7 you need to use Percona XtraBackup version 2.4 and higher. Starting from version 2.13.0 TwinDB Backup uses Percona XtraBackup 2.4 to backup MySQL 5.7, 5.6 and earlier versions. Configuring repositories. For Debian based systems run following script: wget https://repo.percona.com/apt/percona-release_0.1-4.$(lsb_release -sc)_all.deb dpkg -i percona-release_0.1-4.$(lsb_release -sc)_all.deb curl -s https://packagecloud.io/install/repositories/twindb/main/script.deb.sh | sudo bash For RedHat based systems run this: yum install http://www.percona.com/downloads/percona-release/redhat/0.1-4/percona-release-0.1-4.noarch.rpm curl -s https://packagecloud.io/install/repositories/twindb/main/script.rpm.sh | sudo bash Installing TwinDB Backup. For TwinDB Backup just install package twindb-backup. # On Debian/Ubuntu apt-get install twindb-backup # On CentOS/RedHat yum install twindb-backup Package percona-xtrabackup-24 conflicts with percona-xtrabackup. So if you get the error just uninstall percona-xtrabackup. Configuring TwinDB Backup There are many ways to configure backups. See documentation for more detailed description of options in the configuration file. Along with MySQL database I prefer to take copies of /etc/ as well. As the destination S3 is probably the obvious choice. So, the configuration file /etc/twindb/twindb-backup.cfg should look something like this: [source] backup_dirs="/etc" backup_mysql=yes [destination] backup_destination=s3 keep_local_path=/var/log/mysql/DBBACKUP/ [s3] AWS_ACCESS_KEY_ID=** AWS_SECRET_ACCESS_KEY=*** AWS_DEFAULT_REGION=us-west-2 BUCKET=my_bucket_for_backups [mysql] mysql_defaults_file=/root/.my.cnf full_backup=daily [retention] hourly_copies=24 daily_copies=7 weekly_copies=4 monthly_copies=12 yearly_copies=3 [retention_local] hourly_copies=1 daily_copies=1 weekly_copies=0 monthly_copies=0 yearly_copies=0 [intervals] run_hourly=yes run_daily=yes run_weekly=yes run_monthly=yes run_yearly=yes Test run TwinDB Backup package installs a cron job, so you could let it run and check the backups an hour later. Or you can run it manually of course to make sure everything is configured correctly. # Let's do a daily run. Cron will run exactly this command every day. twindb-backup backup daily If success the tool should not produce any output. If any errors drop me a message on support@twindb.com. Backups verification You don’t have backups if you don’t verify them. Unfortunately too many users learned this lesson hard way. So don’t be that user and do verify your backups. TwinDB Backup is super friendly when it comes to restoring backup copies. It will take care of full or incremental backups, local or remote copies. First step in the verification process is to check what backups we actually have: twindb-backup ls 2017-06-23 07:23:27,830: INFO: ls.list_available_backups():22: Local copies: /var/backup/master1/daily/files/_etc-2017-06-23_06_58_11.tar.gz /var/backup/master1/daily/mysql/mysql-2017-06-23_06_58_22.xbstream.gz /var/backup/master1/hourly/files/_etc-2017-06-23_07_00_03.tar.gz /var/backup/master1/hourly/files/_home-2017-06-23_07_00_12.tar.gz /var/backup/master1/hourly/mysql/mysql-2017-06-23_07_00_14.xbstream.gz 2017-06-23 07:23:27,835: INFO: ls.list_available_backups():33: hourly copies: /path/to/twindb-server-backups/master1/hourly/files/_etc-2017-06-23_07_00_03.tar.gz /path/to/twindb-server-backups/master1/hourly/mysql/mysql-2017-06-23_07_00_14.xbstream.gz 2017-06-23 07:23:28,451: INFO: ls.list_available_backups():33: daily copies: /path/to/twindb-server-backups/master1/daily/files/_etc-2017-06-23_06_58_11.tar.gz /path/to/twindb-server-backups/master1/daily/mysql/mysql-2017-06-23_06_58_22.xbstream.gz 2017-06-23 07:23:29,073: INFO: ls.list_available_backups():33: weekly copies: 2017-06-23 07:23:29,704: INFO: ls.list_available_backups():33: monthly copies: 2017-06-23 07:23:30,337: INFO: ls.list_available_backups():33: yearly copies: Ok, we have some copies on the local and remote storage. Let’s grab and restore the last hourly copy: twindb-backup restore mysql /path/to/twindb-server-backups/master1/hourly/mysql/mysql-2017-06-23_07_00_14.xbstream.gz --dst /var/lib/mysql_restored 2017-06-23 07:28:59,242: INFO: restore.restore_from_mysql():308: Restoring /path/to/twindb-server-backups/master1/hourly/mysql/mysql-2017-06-23_07_00_14.xbstream.gz in /var/lib/mysql_restored 170623 07:29:06 innobackupex: Starting the apply-log operation IMPORTANT: Please check that the apply-log run completes successfully. At the end of a successful apply-log run innobackupex prints "completed OK!". innobackupex version 2.4.7 based on MySQL server 5.7.13 Linux (x86_64) (revision id: 05f1fcf) ... InnoDB: Shutdown completed; log sequence number 2542148 170623 07:29:13 completed OK! 2017-06-23 07:29:13,279: INFO: restore.restore_from_mysql():380: Successfully restored /path/to/twindb-server-backups/master1/hourly/mysql/mysql-2017-06-23_07_00_14.xbstream.gz in /var/lib/mysql_restored. 2017-06-23 07:29:13,279: INFO: restore.restore_from_mysql():382: Now copy content of /var/lib/mysql_restored to MySQL datadir: cp -R /var/lib/mysql_restored/* /var/lib/mysql/ 2017-06-23 07:29:13,279: INFO: restore.restore_from_mysql():383: Fix permissions: chown -R mysql:mysql /var/lib/mysql/ 2017-06-23 07:29:13,279: INFO: restore.restore_from_mysql():385: Make sure innodb_log_file_size and innodb_log_files_in_group in /var/lib/mysql_restored/backup-my.cnf and in /etc/my.cnf are same. 2017-06-23 07:29:13,279: INFO: restore.restore_from_mysql():388: Original my.cnf is restored in /var/lib/mysql_restored/_config. 2017-06-23 07:29:13,279: INFO: restore.restore_from_mysql():390: Then you can start MySQL normally. So the backup copy is restored successfully. To be absolutely sure the backup copy is usable I’d recommend to run a test instance on /var/lib/mysql_restored and run some queries to make sure the data is there. The post How to backup MySQL 5.7 appeared first on Backup and Data Recovery for MySQL.
  2. In this blog post, we’ll look at how ClickHouse performs in a general analytical workload using the star schema benchmark test. We have mentioned ClickHouse in some recent posts (ClickHouse: New Open Source Columnar Database, Column Store Database Benchmarks: MariaDB ColumnStore vs. Clickhouse vs. Apache Spark), where it showed excellent results. ClickHouse by itself seems to be event-oriented RDBMS, as its name suggests (clicks). Its primary purpose, using Yandex Metrica (the system similar to Google Analytics), also points to an event-based nature. We also can see there is a requirement for date-stamped columns. It is possible, however, to use ClickHouse in a general analytical workload. This blog post shares my findings. For these tests, I used a Star Schema benchmark — slightly-modified so that able to handle ClickHouse specifics. First, let’s talk about schemas. We need to adjust to ClickHouse data types. For example, the biggest fact table in SSB is “lineorder”. Below is how it is defined for Amazon RedShift (as taken from https://docs.aws.amazon.com/redshift/latest/dg/tutorial-tuning-tables-create-test-data.html): CREATE TABLE lineorder ( lo_orderkey INTEGER NOT NULL, lo_linenumber INTEGER NOT NULL, lo_custkey INTEGER NOT NULL, lo_partkey INTEGER NOT NULL, lo_suppkey INTEGER NOT NULL, lo_orderdate INTEGER NOT NULL, lo_orderpriority VARCHAR(15) NOT NULL, lo_shippriority VARCHAR(1) NOT NULL, lo_quantity INTEGER NOT NULL, lo_extendedprice INTEGER NOT NULL, lo_ordertotalprice INTEGER NOT NULL, lo_discount INTEGER NOT NULL, lo_revenue INTEGER NOT NULL, lo_supplycost INTEGER NOT NULL, lo_tax INTEGER NOT NULL, lo_commitdate INTEGER NOT NULL, lo_shipmode VARCHAR(10) NOT NULL ); For ClickHouse, the table definition looks like this: CREATE TABLE lineorderfull ( LO_ORDERKEY UInt32, LO_LINENUMBER UInt8, LO_CUSTKEY UInt32, LO_PARTKEY UInt32, LO_SUPPKEY UInt32, LO_ORDERDATE Date, LO_ORDERPRIORITY String, LO_SHIPPRIORITY UInt8, LO_QUANTITY UInt8, LO_EXTENDEDPRICE UInt32, LO_ORDTOTALPRICE UInt32, LO_DISCOUNT UInt8, LO_REVENUE UInt32, LO_SUPPLYCOST UInt32, LO_TAX UInt8, LO_COMMITDATE Date, LO_SHIPMODE String )Engine=MergeTree(LO_ORDERDATE,(LO_ORDERKEY,LO_LINENUMBER),8192); From this we can see we need to use datatypes like UInt8 and UInt32, which are somewhat unusual for database world datatypes. The second table (RedShift definition): CREATE TABLE customer ( c_custkey INTEGER NOT NULL, c_name VARCHAR(25) NOT NULL, c_address VARCHAR(25) NOT NULL, c_city VARCHAR(10) NOT NULL, c_nation VARCHAR(15) NOT NULL, c_region VARCHAR(12) NOT NULL, c_phone VARCHAR(15) NOT NULL, c_mktsegment VARCHAR(10) NOT NULL ); For ClickHouse, I defined as: CREATE TABLE customerfull ( C_CUSTKEY UInt32, C_NAME String, C_ADDRESS String, C_CITY String, C_NATION String, C_REGION String, C_PHONE String, C_MKTSEGMENT String, C_FAKEDATE Date )Engine=MergeTree(C_FAKEDATE,(C_CUSTKEY),8192); For reference, the full schema for the benchmark is here: https://github.com/vadimtk/ssb-clickhouse/blob/master/create.sql. For this table, we need to define a rudimentary column C_FAKEDATE Date in order to use ClickHouse’s most advanced engine (MergeTree). I was told by the ClickHouse team that they plan to remove this limitation in the future. To generate data acceptable by ClickHouse, I made modifications to ssb-dbgen. You can find my version here: https://github.com/vadimtk/ssb-dbgen. The most notable change is that ClickHouse can’t accept dates in CSV files formatted as “19971125”. It has to be “1997-11-25”. This is something to keep in mind when loading data into ClickHouse. It is possible to do some preformating on the load, but I don’t have experience with that. A common approach is to create the staging table with datatypes that match loaded data, and then convert them using SQL functions when inserting to the main table. Hardware Setup One of the goals of this benchmark to see how ClickHouse scales on multiple nodes. I used a setup of one node, and then compared to a setup of three nodes. Each node is 24 cores of “Intel(R) Xeon(R) CPU E5-2643 v2 @ 3.50GHz” CPUs, and the data is located on a very fast PCIe Flash storage. For the SSB benchmark I use a scale factor of 2500, which provides (in raw data): Table lineorder – 15 bln rows, raw size 1.7TB, Table customer – 75 mln rows When loaded into ClickHouse, the table lineorder takes 464GB, which corresponds to a 3.7x compression ratio. We compare a one-node (table names lineorderfull, customerfull) setup vs. a three-node (table names lineorderd, customerd) setup. Single Table Operations Query: SELECT toYear(LO_ORDERDATE) AS yod, sum(LO_REVENUE) FROM lineorderfull GROUP BY yod One node: 7 rows in set. Elapsed: 9.741 sec. Processed 15.00 billion rows, 90.00 GB (1.54 billion rows/s., 9.24 GB/s.) Three nodes: 7 rows in set. Elapsed: 3.258 sec. Processed 15.00 billion rows, 90.00 GB (4.60 billion rows/s., 27.63 GB/s.) We see a speed up of practically three times. Handling 4.6 billion rows/s is blazingly fast! One Table with Filtering SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorderfull WHERE (toYear(LO_ORDERDATE) = 1993) AND ((LO_DISCOUNT >= 1) AND (LO_DISCOUNT <= 3)) AND (LO_QUANTITY < 25) One node: 1 rows in set. Elapsed: 3.175 sec. Processed 2.28 billion rows, 18.20 GB (716.60 million rows/s., 5.73 GB/s.) Three nodes: 1 rows in set. Elapsed: 1.295 sec. Processed 2.28 billion rows, 18.20 GB (1.76 billion rows/s., 14.06 GB/s.) It’s worth mentioning that during the execution of this query, ClickHouse was able to use ALL 24 cores on each box. This confirms that ClickHouse is a massively parallel processing system. Two Tables (Independent Subquery) In this case, I want to show how Clickhouse handles independent subqueries: SELECT sum(LO_REVENUE) FROM lineorderfull WHERE LO_CUSTKEY IN ( SELECT C_CUSTKEY AS LO_CUSTKEY FROM customerfull WHERE C_REGION = 'ASIA' ) One node: 1 rows in set. Elapsed: 28.934 sec. Processed 15.00 billion rows, 120.00 GB (518.43 million rows/s., 4.15 GB/s.) Three nodes: 1 rows in set. Elapsed: 14.189 sec. Processed 15.12 billion rows, 121.67 GB (1.07 billion rows/s., 8.57 GB/s.) We  do not see, however, the close to 3x speedup on three nodes, because of the required data transfer to perform the match LO_CUSTKEY with C_CUSTKEY Two Tables JOIN With a subquery using columns to return results, or for GROUP BY, things get more complicated. In this case we want to GROUP BY the column from the second table. First, ClickHouse doesn’t support traditional subquery syntax, so we need to use JOIN. For JOINs, ClickHouse also strictly prescribes how it must be written (a limitation that will also get changed in the future). Our JOIN should look like: SELECT C_REGION, sum(LO_EXTENDEDPRICE * LO_DISCOUNT) FROM lineorderfull ANY INNER JOIN ( SELECT C_REGION, C_CUSTKEY AS LO_CUSTKEY FROM customerfull ) USING (LO_CUSTKEY) WHERE (toYear(LO_ORDERDATE) = 1993) AND ((LO_DISCOUNT >= 1) AND (LO_DISCOUNT <= 3)) AND (LO_QUANTITY < 25) GROUP BY C_REGION One node: 5 rows in set. Elapsed: 31.443 sec. Processed 2.35 billion rows, 28.79 GB (74.75 million rows/s., 915.65 MB/s.) Three nodes: 5 rows in set. Elapsed: 25.160 sec. Processed 2.58 billion rows, 33.25 GB (102.36 million rows/s., 1.32 GB/s.) In this case the speedup is not even two times. This corresponds to the fact of the random data distribution for the tables lineorderd and customerd. Both tables were defines as: CREATE TABLE lineorderd AS lineorder ENGINE = Distributed(3shards, default, lineorder, rand()); CREATE TABLE customerd AS customer ENGINE = Distributed(3shards, default, customer, rand()); Where  rand() defines that records are distributed randomly across three nodes. When we perform a JOIN by LO_CUSTKEY=C_CUSTKEY, records might be located on different nodes. One way to deal with this is to define data locally. For example: CREATE TABLE lineorderLD AS lineorderL ENGINE = Distributed(3shards, default, lineorderL, LO_CUSTKEY); CREATE TABLE customerLD AS customerL ENGINE = Distributed(3shards, default, customerL, C_CUSTKEY); Three Tables JOIN This is where it becomes very complicated. Let’s consider the query that you would normally write: SELECT sum(LO_REVENUE),P_MFGR, toYear(LO_ORDERDATE) yod FROM lineorderfull ,customerfull,partfull WHERE C_REGION = 'ASIA' and LO_CUSTKEY=C_CUSTKEY and P_PARTKEY=LO_PARTKEY GROUP BY P_MFGR,yod ORDER BY P_MFGR,yod; With Clickhouse’s limitations on JOINs syntax, the query becomes: SELECT sum(LO_REVENUE), P_MFGR, toYear(LO_ORDERDATE) AS yod FROM ( SELECT LO_PARTKEY, LO_ORDERDATE, LO_REVENUE FROM lineorderfull ALL INNER JOIN ( SELECT C_REGION, C_CUSTKEY AS LO_CUSTKEY FROM customerfull ) USING (LO_CUSTKEY) WHERE C_REGION = 'ASIA' ) ALL INNER JOIN ( SELECT P_MFGR, P_PARTKEY AS LO_PARTKEY FROM partfull ) USING (LO_PARTKEY) GROUP BY P_MFGR, yod ORDER BY P_MFGR ASC, yod ASC By writing queries this way, we force ClickHouse to use the prescribed JOIN order — at this moment there is no optimizer in ClickHouse and it is totally unaware of data distribution. There is also not much speedup when we compare one node vs. three nodes: One node execution time: 35 rows in set. Elapsed: 697.806 sec. Processed 15.08 billion rows, 211.53 GB (21.61 million rows/s., 303.14 MB/s.) Three nodes execution time: 35 rows in set. Elapsed: 622.536 sec. Processed 15.12 billion rows, 211.71 GB (24.29 million rows/s., 340.08 MB/s.) There is a way to make the query faster for this 3-way JOIN, however. (Thanks to Alexander Zaytsev from https://www.altinity.com/ for help!) Optimized query: SELECT sum(revenue), P_MFGR, yod FROM ( SELECT LO_PARTKEY AS P_PARTKEY, toYear(LO_ORDERDATE) AS yod, SUM(LO_REVENUE) AS revenue FROM lineorderfull WHERE LO_CUSTKEY IN ( SELECT C_CUSTKEY FROM customerfull WHERE C_REGION = 'ASIA' ) GROUP BY P_PARTKEY, yod ) ANY INNER JOIN partfull USING (P_PARTKEY) GROUP BY P_MFGR, yod ORDER BY P_MFGR ASC, yod ASC Optimized query time: One node: 35 rows in set. Elapsed: 106.732 sec. Processed 15.00 billion rows, 210.05 GB (140.56 million rows/s., 1.97 GB/s.) Three nodes: 35 rows in set. Elapsed: 75.854 sec. Processed 15.12 billion rows, 211.71 GB (199.36 million rows/s., 2.79 GB/s. That’s an improvement of about 6.5 times compared to the original query. This shows the importance of understanding data distribution, and writing the optimal query to process the data. Another option for dealing with JOIN complexity, and to improve performance, is to use ClickHouse’s dictionaries. These dictionaries are described here: https://www.altinity.com/blog/2017/4/12/dictionaries-explained. I will review dictionary performance in future posts. Another traditional way to deal with JOIN complexity in an analytics workload is to use denormalization. We can move some columns (for example, P_MFGR from the last query) to the facts table (lineorder). Observations ClickHouse can handle general analytical queries (it requires special schema design and considerations, however) Linear speedup is possible, but it depends on query design and requires advanced planning — proper speedup depends on data locality ClickHouse is blazingly fast (beyond what I’ve seen before) because it can use all available CPU cores for query, as shown above using 24 cores for single server and 72 cores for three nodes Multi-table JOINs are cumbersome and require manual work to achieve better performance, so consider using dictionaries or denormalization
  3. Debian logo by Software in the Public Interest, Inc. (CC-BY-SA) MySQL and Debian 9 “Stretch” Debian 9 (“Stretch”) was released on June 17, and we congratulate the Debian community on another iteration of a central Linux distro. Now, the Debian release team some time ago decided not to ship MySQL as part of the distro, instead replacing it […]
  4. LAMP (Linux, Apache, MySQL, PHP) is a combination of open source software, typically installed on a server used for hosting dynamic websites and web applications. It includes the Linux operating system, the Apache web server, MySQL for data storage and management and PHP for handling the dynamic content. In this tutorial, we will go through a complete LAMP installation on a Ubuntu 16.04 based VPS. Requirements SSH access with root privileges to your server, as well as an Ubuntu 16.04 VPS is required for you to be able to complete all the steps in this tutorial. 1. Update your system […]
  5. Recently, I had to do this as 10GB was not sufficient for Centos6.7 VM. I would just like to mention simple steps for how to do it. Shutdown VM and quit from VirtualBox. Open the terminal and go to the below path (default location). From here you can run the command to resize the VDI file. Nils-Mac:MacOS nilnandan$ pwd /Applications/VirtualBox.app/Contents/Resources/VirtualBoxVM.app/Contents/MacOS Nils-Mac:MacOS nilnandan$ Start another terminal with command + T and find the directory where actually the VDI file stored. Default location would be , Nils-Mac:centos67-mysql nilnandan$ pwd /Users/nilnandan/VirtualBox VMs/centos67-mysql Nils-Mac:centos67-mysql nilnandan$ ls -alh total 21925920 drwx------   7 nilnandan  staff   238B Aug 11 18:15 . drwx------  12 nilnandan  staff   408B Aug 11 17:50 .. drwx------   6 nilnandan  staff   204B Aug 11 17:50 Logs drwx------   2 nilnandan  staff    68B Aug 11 17:47 Snapshots -rw-------   1 nilnandan  staff    10G Aug 11 18:20 centos67-3.vdi -rw-------   1 nilnandan  staff   7.7K Aug 11 18:15 centos67-mysql.vbox -rw-------   1 nilnandan  staff   7.7K Aug 11 17:50 centos67-mysql.vbox-prev Nils-Mac:centos67-mysql nilnandan$ Now from the location of step 2, run the command to resize it. I’m going to increase size from 10GB to 25GB Nils-Mac:MacOS nilnandan$ VBoxManage modifyhd --resize 25600 /Users/nilnandan/VirtualBox VMs/centos67-mysql/centos67-3.vdi Oracle VM VirtualBox Command Line Management Interface Version 5.0.12 (C) 2005-2015 Oracle Corporation All rights reserved. Usage: VBoxManage modifymedium [disk|dvd|floppy] <uuid|filename> [--type normal|writethrough|immutable|shareable| readonly|multiattach] [--autoreset on|off] [--property <name=[value]>] [--compact] [--resize <megabytes>|--resizebyte <bytes>] Syntax error: Invalid parameter 'VMs/centos67-mysql/centos67-3.vdi' Nils-Mac:MacOS nilnandan$ Here, I got the error because there is a space between VirtualBox and VMs so I have to use escape character to solve this, Nils-Mac:MacOS nilnandan$ VBoxManage modifyhd --resize 25600 /Users/nilnandan/VirtualBox\ VMs/centos67-mysql/centos67-3.vdi 0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100% Nils-Mac:MacOS nilnandan$ Now, when you’ll check the size, it will be Nils-Mac:MacOS nilnandan$ VBoxManage showhdinfo /Users/nilnandan/VirtualBox\ VMs/centos67-mysql/centos67-3.vdi ... Location:       /Users/nilnandan/VirtualBox VMs/centos67-mysql/centos67-3.vdi Storage format: VDI Format variant: dynamic default Capacity:       25600 MBytes Size on disk:   10706 MBytes Encryption:     disabled ... Nils-Mac:MacOS nilnandan$ It’s done. Just open the VirtualBox and start VM  

Hvad er din ide ?

Overskrift 
Beskriv din ide 
Hvad skal du bruge hjælp til ? 
Dit navn 
Din mail + tlf