/data - articles
-
Vector similarity search using Redis Stack
Using Redis Stack to store vectors and do vector similarity search, for KNN and other ML tasks.
Published half a year ago in #data about #vector search, #redis, #knn, #hnsw and #rag -
Efficient vector similarity search with Annoy library based on ANN
What is vector search? Performance issues with vector search on large amounts of data. ANN strategy to get fast vector search at scale.
Published a year ago in #data about #vector search, #ann, #annoy and #python -
Improving Sphinxsearch performance with attributes indexes
Sphinxsearch is a popular full-text database and provides filtering based on attributes. Filtering queries can run with or without full-text search and might demonstrate poor performance on big document sets. Sphinx introduces attribute indexes to improve filtering queries performance, let's see how this works.
Published 2 years ago in #data about #sphinx -
Enabling data at rest encryption in Mysql
Data-at-rest encryption is important to ensure that data is secured from direct access to original database files. Let's see how to enable and use data-at-rest encryption in Mysql, which is supported for InnoDB storage engine.
Published 2 years ago in #data about #mysql and #security -
Converting strings to numbers in ClickHouse
How to convert strings to integers and floats in ClickHouse. Controlling invalid values behavior on conversion.
Published 2 years ago in #data about #clickhouse -
How to manage ingesting errors in ClickHouse
Managing errors when ingesting data into ClickHouse, including text data sources like CSV and TSV.
Published 2 years ago in #data about #clickhouse -
How to merge large tables in ClickHouse using join
How to merge multiple large tables into a single table based on a given column. A solution to MEMORY_LIMIT_EXCEEDED problem when joining large tables.
Published 2 years ago in #data about #clickhouse -
How to use Regex to feed text data to ClickHouse
Using regex input format can help in loading unformatted or broken text data into Clickhouse. Using Regexp format for that with a practical example.
Published 2 years ago in #data about #clickhouse -
Using Sphinx to add full-text search to Clickhouse
How to configure Sphinx to index text data from Clickhouse. What IDs to use for Clickhouse documents with Sphinx. How to build an index and resolve found documents in Clickhouse.
Published 2 years ago in #data about #clickhouse and #sphinx -
How to use multiple disks in Clickhouse
How to configure multiple disks as storages in Clickhouse, and how to use different disks for different tables in Clickhouse.
Published 2 years ago in #data about #clickhouse -
Welcome to DataChild - place to learn data programming and ML
This is a welcoming post about the idea behind this place, basic approaches, target audience and goals.
Published 2 years ago in #data