Follow these steps to create a VPC endpoint service for your RDS instance. Repeat these steps if you have multiple instances that require endpoint services:
When it's done, share details such as private DNS name, VPC service name and availability zone.
ClickPipes team will initiate VPC endpoints creation in ClickPipes VPC. This will require connection request acceptance on your side.
This is a step by step example on how to start using Python with ClickHouse Cloud service.
Note
Keep in mind that Python versions and libraries dependencies are constantly evolving. Make also sure to use the latest supported versions of both the driver and Python environment when trying this.
At the time of writing this article, we're using the clickhouse-connect driver version 0.5.23 and python 3.11.2 respectively.
(venv) chpython$ venv/bin/python main.py connected to HOSTNAME.clickhouse.cloud table new_table created or exists already! written 2 rows to table new_table query: [SELECT max(key), avg(metric) FROM new_table] returns: [(2000, -50.9035)]
Tip
If using an older Python version (e.g. 3.9.6) you might be getting an ImportError related to urllib3 library.
In that case either upgrade your Python environment to a newer version or pin the urllib3 version to 1.26.15 in your requirements.txt file.
You can replace your own parameters like service name, region etc.. in the resources section above.
secret.tfvars is where you'll put all the API Key related info that you downloaded earlier. The idea behind this file is that all your secret credentials will be hidden from the main config file.
It would be something like (replace these parameters):
Initializing the backend... Initializing provider plugins... - Finding clickhouse/clickhouse versions matching "0.0.2"... - Installing clickhouse/clickhouse v0.0.2... - Installed clickhouse/clickhouse v0.0.2 (self-signed, key ID D7089EE5C6A92ED1) Partner and community providers are signed by their developers. If you'd like to know more about provider signing, you can read about it here: https://www.terraform.io/docs/cli/plugins/signing.html Terraform has created a lock file .terraform.lock.hcl to record the provider selections it made above. Include this file in your version control repository so that Terraform can guarantee to make the same selections by default when you run "terraform init" in the future. Terraform has been successfully initialized! You may now begin working with Terraform. Try running "terraform plan" to see any changes that are required for your infrastructure. All Terraform commands should now work. If you ever set or change modules or backend configuration for Terraform, rerun this command to reinitialize your working directory. If you forget, other commands will detect it and remind you to do so if necessary.
6). Run terraform apply -var-file=secret.tfvars command.
Something like:
➜ test terraform apply -var-file=secret.tfvars Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols: + create Terraform will perform the following actions: # clickhouse_service.service123 will be created + resource "clickhouse_service" "service123" { + cloud_provider = "aws" + endpoints = (known after apply) + id = (known after apply) + idle_scaling = true + ip_access = [ + { + description = "Anywhere" + source = "0.0.0.0/0" }, ] + last_updated = (known after apply) + name = "jai-terraform" + password = (sensitive value) + region = "us-east-2" + tier = "development" } Plan: 1 to add, 0 to change, 0 to destroy. Changes to Outputs: + CLICKHOUSE_HOST = (known after apply) Do you want to perform these actions? Terraform will perform the actions described above. Only 'yes' will be accepted to approve. Enter a value: yes
Type yes and hit enter
Side note: Notice it says password = (sensitive value) above.
This is because we set sensitive = true for the password in the main.tf file.
7). It will take a couple of mins to create the service but eventually it should come up like:
Enter a value: yes clickhouse_service.service123: Creating... clickhouse_service.service123: Still creating... [10s elapsed] clickhouse_service.service123: Still creating... [20s elapsed] clickhouse_service.service123: Still creating... [30s elapsed] clickhouse_service.service123: Still creating... [40s elapsed] clickhouse_service.service123: Still creating... [50s elapsed] clickhouse_service.service123: Still creating... [1m0s elapsed] clickhouse_service.service123: Still creating... [1m10s elapsed] clickhouse_service.service123: Still creating... [1m20s elapsed] clickhouse_service.service123: Still creating... [1m30s elapsed] clickhouse_service.service123: Still creating... [1m40s elapsed] clickhouse_service.service123: Creation complete after 1m41s [id=aa8d8d63-1878-4600-8470-630715af38ed] Apply complete! Resources: 1 added, 0 changed, 0 destroyed. Outputs: CLICKHOUSE_HOST = "h3ljlaqez6.us-east-2.aws.clickhouse.cloud" ➜ test
8). Check Cloud Console, you should be able to see the service created.
9). To clean up/destroy the service again, run terraform destroy -var-file=secret.tfvars
Something like:
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols: - destroy Terraform will perform the following actions: # clickhouse_service.service123 will be destroyed - resource "clickhouse_service" "service123" { - cloud_provider = "aws" -> null - ............ Plan: 0 to add, 0 to change, 1 to destroy. Changes to Outputs: - CLICKHOUSE_HOST = "h3ljlaqez6.us-east-2.aws.clickhouse.cloud" -> null Do you really want to destroy all resources? Terraform will destroy all your managed infrastructure, as shown above. There is no undo. Only 'yes' will be accepted to confirm. Enter a value:
Type yes and hit enter
10).
clickhouse_service.service123: Destroying... [id=aa8d8d63-1878-4600-8470-630715af38ed] clickhouse_service.service123: Still destroying... [id=aa8d8d63-1878-4600-8470-630715af38ed, 10s elapsed] clickhouse_service.service123: Still destroying... [id=aa8d8d63-1878-4600-8470-630715af38ed, 20s elapsed] clickhouse_service.service123: Destruction complete after 27s Destroy complete! Resources: 1 destroyed.
Sometimes you need to reingest all the data from one table to another.
For example, you might want to reingest data from a staging table to a production table. This article shows how to do this using the INSERT INTO statement.
┌─city──────┬─country───┬─continent──────┬─population─┐ │ New York │ USA │ North America │ 0 │ │ Tokyo │ Japan │ Asia │ 0 │ │ Berlin │ Germany │ Europe │ 0 │ └───────────┴───────────┴────────────────┴────────────┘
Users may see cases where their query is slower than expected, in the belief they are ordering or filtering by a primary key. In this article we show how users can confirm the key is used, highlighting common reasons its not.
Notice how the number of granules scanned 8012 is a fraction of the total 12209. The section higlighted below, confirms use of the primary key code.
PrimaryKey Keys: code
Granules are the unit of data processing in ClickHouse, with each typically holding 8192 rows. For further details on granules and how they are filtered we recommend reading this guide.
Note
Filtering on keys later in an ordering key will not be as efficient as filtering on those that are earlier in the tuple. For reasons why, see here
ClickHouse can also exploit ordering keys for efficient sorting. Specifically,
When the optimize_read_in_order setting is enabled (by default), the ClickHouse server uses the table index and reads the data in order of the ORDER BY key. This allows us to avoid reading all data in case of specified LIMIT. So, queries on big data with small limits are processed faster. See here and here for further details.
This, however, requires alignment of the keys used.
The line MergeTreeSelect(pool: ReadPool, algorithm: Thread) here does not indicate the use of the optimization but rather a standard read. This is caused by our table ordering key using toUnixTimestamp(Timestamp)NOTtimestamp. Rectifying this mismatch addresses the issue:
The following steps can be used to enable SSL for a single ClickHouse Server using Let's Encrypt, a free, automated, and open Certificate Authority (CA) designed to make it easy for anyone to secure their websites with HTTPS. By automating the certificate issuance and renewal process, Let's Encrypt ensures websites remain secure without requiring manual intervention.
Note
We assume ClickHouse has been installed at the standard package locations in the following guide. We use the domain product-test-server.clickhouse-dev.com for all examples. Substitute your domain accordingly.
Verify you have a DNS A or AAAA record pointing to your server. This can be achieved using the Linux tool dig. For example, the response for product-test-server.clickhouse-dev.com if using the Cloudflare DNS server 1.1.1.1:
dig @1.1.1.1 product-test-server.clickhouse-dev.com ;<<>> DiG 9.18.28-0ubuntu0.22.04.1-Ubuntu <<>> @1.1.1.1 product-test-server.clickhouse-dev.com ;(1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 22315 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1232 ;; QUESTION SECTION: ;product-test-server.clickhouse-dev.com. IN A ;; ANSWER SECTION: product-test-server.clickhouse-dev.com. 300 IN A 34.248.59.9 ;; Query time: 52 msec ;; SERVER: 1.1.1.1#53(1.1.1.1) (UDP) ;; WHEN: Thu Dec 12 09:37:33 UTC 2024 ;; MSG SIZE rcvd: 83
Notice the section below confirming the presence of an A record.
;; ANSWER SECTION: product-test-server.clickhouse-dev.com. 300 IN A 34.248.59.9
Open port 80 on your server. This port will be used for automatic certificate renewal using the ACME protocol with certbot. For AWS, this can be achieved by modifying the instance's associated Security Group.
sudo certbot certonly Saving debug log to /var/log/letsencrypt/letsencrypt.log How would you like to authenticate with the ACME CA? - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 1: Spin up a temporary webserver (standalone) 2: Place files in webroot directory (webroot) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Select the appropriate number [1-2]then[enter](press 'c' to cancel): 1 Please enter the domain name(s) you would like on your certificate (comma and/or space separated)(Enter 'c' to cancel): product-test-server.clickhouse-dev.com Requesting a certificate for product-test-server.clickhouse-dev.com Successfully received certificate. Certificate is saved at: /etc/letsencrypt/live/product-test-server.clickhouse-dev.com/fullchain.pem Key is saved at: /etc/letsencrypt/live/product-test-server.clickhouse-dev.com/privkey.pem This certificate expires on 2025-03-12. These files will be updated when the certificate renews. Certbot has set up a scheduled task to automatically renew this certificate in the background. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - If you like Certbot, please consider supporting our work by: * Donating to ISRG / Let's Encrypt: https://letsencrypt.org/donate * Donating to EFF: https://eff.org/donate-le
Note
We don't have a web server running on our server, so use (1) allowing Certbot to use a standalone temporary web server.
Enter the full domain name of your server e.g. product-test-server.clickhouse-dev.com when requested.
Note
Let's Encrypt has a policy of not issuing certificates for certain types of domains, such as public cloud provider-generated domains (e.g., AWS *.compute.amazonaws.com domains). These domains are considered shared infrastructure and are blocked for security and abuse prevention reasons.
This command sets up a cron job to automate the management of Let's Encrypt SSL certificates for a ClickHouse server. It runs every minute as the root user, copying the .pem files from the Let's Encrypt directory to the ClickHouse server's configuration directory, but only if the files have been updated. After copying, the script adjusts the ownership of the files to the clickhouse user and group, ensuring the server has the required access. It also sets secure read-only permissions (chmod 400) on the copied files to maintain strict file security. This ensures that the ClickHouse server always has access to the latest SSL certificates without requiring manual intervention, maintaining security and minimizing operational overhead.
Configure the use of these certificates in clickhouse-server.
curl https://product-test-server.clickhouse-dev.com:8443/ Ok.
For this last step to work you may need to ensure port 8443 is accessible e.g. included in your Security Group in AWS. Alternatively, if you only want to access ClickHouse from the server, modify your hosts file i.e.
echo"127.0.0.1 product-test-server.clickhouse-dev.com"|sudotee -a /etc/hosts
Danger
If you open connections from wildcard addresses, make sure that at least one of the following measures is applied:
server is protected by firewall and not accessible from untrusted networks;
all users are restricted to a subset of network addresses (see users.xml);
all users have strong passwords, only secure (TLS) interfaces are accessible, or connections are only made via TLS interfaces.
The following should also work if connecting from the local machine on which ClickHouse is running. To connect via product-test-server.clickhouse-dev.com open port 9440 in your:
The key ClickHouse appeared on both rows and its values have been summed. The key ClickBench was only present on one line, so it sums a single value, which returns that value!
We can also use maxMap to find the maximum values per key:
Alternatively, we can compute the average price using avgMap.
Those values have a lot of decimal points, which we can clean up by using the mapApply function to call the floor function on each value in the map:
WITHyear(toStartOfInterval(date, toIntervalYear(10)))ASyear SELECT county, medianMap(map(year, price))AS medianPrices, mapApply((k, v)->(k, floor(v)), avgMap(map(year, price)))AS avgPrices FROM uk.uk_price_paid WHEREyear>=2010 GROUPBYALL ORDERBYmax(price)DESC LIMIT10;
The EXCHANGE command is useful when you need to switch a current table with another table that is temporary where possibly Primary Keys or other settings were updated.
This happens atomically vs with the RENAME command.
It is also useful when you have Materialized Views triggering on a source table and are trying to avoid rebuilding the view.
Below is a simple example on how it works and how to test: