-
Improving Sphinxsearch performance with attributes indexes
Sphinxsearch is a popular full-text database and provides filtering based on attributes. Filtering queries can run with or without full-text search and might demonstrate poor performance on big document sets. Sphinx introduces attribute indexes to improve filtering queries performance, let's see how this works.
Published half a year ago in #data about #sphinx -
Enabling data at rest encryption in Mysql
Data-at-rest encryption is important to ensure that data is secured from direct access to original database files. Let's see how to enable and use data-at-rest encryption in Mysql, which is supported for InnoDB storage engine.
Published half a year ago in #data about #mysql and #security -
Converting strings to numbers in ClickHouse
How to convert strings to integers and floats in ClickHouse. Controlling invalid values behavior on conversion.
Published half a year ago in #data about #clickhouse -
How manage ingesting errors in ClickHouse
Managing errors when ingesting data into ClickHouse, including text data sources like CSV and TSV.
Published half a year ago in #data about #clickhouse -
How to merge large tables in ClickHouse using join
How to merge multiple large tables into a single table based on a given column. A solution to MEMORY_LIMIT_EXCEEDED problem when joining large tables.
Published half a year ago in #data about #clickhouse -
How to use Regex to feed text data to ClickHouse
Using regex input format can help in loading unformatted or broken text data into Clickhouse. Using Regexp format for that with a practical example.
Published a year ago in #data about #clickhouse -
Formatting unstructured data using OpenAI API and Python
How to use OpenAI to format unstructured text data, e.g. CSV. Setting additional formatting requirements to format specific values in the resulting CSV.
Published a year ago in #machinelearning about #python and #openai -
Quick start OpenAI API example using Python
How to start using OpenAI API with Python. A simple example of a Python script that generates data based on the OpenAI language model.
Published a year ago in #machinelearning about #python and #openai -
Using Sphinx to add full-text search to Clickhouse
How to configure Sphinx to index text data from Clickhouse. What IDs to use for Clickhouse documents with Sphinx. How to build an index and resolve found documents in Clickhouse.
Published a year ago in #data about #clickhouse and #sphinx -
How to use multiple disks in Clickhouse
How to configure multiple disks as storages in Clickhouse, and how to use different disks for different tables in Clickhouse.
Published a year ago in #data about #clickhouse -
What is a function derivative and how to optimize functions
The article explains what a function derivative is on a very basic level. Starting from the concept of the function, we move along function changes and finally, look at a Python example of optimizing a function based on its derivative.
Published a year ago in #machinelearning about #math, #derivative and #python -
Matrices and vectors math for AI with Python examples
Article provides an introduction to vectors and matrices, two fundamental concepts in linear algebra, which are widely used in artificial intelligence. It explains what vectors and matrices are and how they are defined in math. Basic operations with vectors and matrices using Python, including adding, multiplying, and transposing matrices.
Published a year ago in #machinelearning about #math, #matrix and #vector -
Creating a bigram language model for text generation with Python
Understanding bigram language models, which are statistical models that predict the likelihood of a word given its preceding word. Includes an example of a simple bigram language model in Python.
Published a year ago in #machinelearning about #nlp, #language-models and #python -
What is a language model and how it works
Basics about language models, which are algorithms that enable computers to analyze and understand human language. The article explains how language models work and how they are trained, using a simple example of a program that can understand and respond to simple questions.
Published a year ago in #machinelearning about #nlp and #language-models -
What is Machine Learning and how it works
Machine Learning basics, the math behind machine learning, predictions, prediction errors, training dataset, validation dataset.
Published a year ago in #machinelearning -
Using csvkit to format, clean, and fix CSV files
Formatting CSV, TSV, and other files, converting CSV delimiters, converting CSV quoting symbols, fixing invalid CSV files, working with compressed CSV files
Published a year ago in #programming about #python and #csv -
Reading CSV, TSV, and invalid CSV files with Golang
Reading CSV with Golang line by line or entirely, reading CSV with custom delimiters (including TSV) and escaping rules, and reading broken CSV files.
Published a year ago in #programming about #golang and #csv -
Welcome to DataChild - place to learn data programming and ML
This is a welcoming post about the idea behind this place, basic approaches, target audience and goals.
Published a year ago in #data