-
How to install Stable Diffusion on Ubuntu and use it in CLI
What is Stable Diffusion? How to install Stable Diffusion on Ubuntu? How to use Stable Diffusion in CLI? Tuning params to get better image quality.
Published half a year ago in #machinelearning about #stable diffusion, #ubuntu and #images -
Searching images based on text using CLIP model
What is CLIP and common image/text vector space? How to install and run CLIP? How to compare image and text embeddings to find corresponding images based on text query?
Published half a year ago in #machinelearning about #clip, #embeddings, #vector search and #python -
What is a text embedding and how to use it for text search
What is a text embedding? How to get embeddings from a lot of text data? How to search within text data using text embeddings? What is the difference between vector-based text search and full-text search?
Published half a year ago in #machinelearning about #embeddings, #vector search, #python and #openai -
Image similarity search based on embeddings and sentence_transformers
How to get image embeddings using sentence_transformers models. How to store vectors in the database. How to find similar images to the given query image.
Published half a year ago in #machinelearning about #embeddings, #sentence_transformers, #clip, #vector search, #python and #clickhouse -
What is actually a neural network?
The very basic and simple explanation of what a Neural Network is and why a lot of modern articles and videos explain it wrong.
Published half a year ago in #machinelearning about #neural network -
Vector similarity search using Redis Stack
Using Redis Stack to store vectors and do vector similarity search, for KNN and other ML tasks.
Published half a year ago in #data about #vector search, #redis, #knn, #hnsw and #rag -
Efficient vector similarity search with Annoy library based on ANN
What is vector search? Performance issues with vector search on large amounts of data. ANN strategy to get fast vector search at scale.
Published half a year ago in #data about #vector search, #ann, #annoy and #python -
Improving Sphinxsearch performance with attributes indexes
Sphinxsearch is a popular full-text database and provides filtering based on attributes. Filtering queries can run with or without full-text search and might demonstrate poor performance on big document sets. Sphinx introduces attribute indexes to improve filtering queries performance, let's see how this works.
Published 2 years ago in #data about #sphinx -
Enabling data at rest encryption in Mysql
Data-at-rest encryption is important to ensure that data is secured from direct access to original database files. Let's see how to enable and use data-at-rest encryption in Mysql, which is supported for InnoDB storage engine.
Published 2 years ago in #data about #mysql and #security -
Converting strings to numbers in ClickHouse
How to convert strings to integers and floats in ClickHouse. Controlling invalid values behavior on conversion.
Published 2 years ago in #data about #clickhouse -
How to manage ingesting errors in ClickHouse
Managing errors when ingesting data into ClickHouse, including text data sources like CSV and TSV.
Published 2 years ago in #data about #clickhouse -
How to merge large tables in ClickHouse using join
How to merge multiple large tables into a single table based on a given column. A solution to MEMORY_LIMIT_EXCEEDED problem when joining large tables.
Published 2 years ago in #data about #clickhouse -
How to use Regex to feed text data to ClickHouse
Using regex input format can help in loading unformatted or broken text data into Clickhouse. Using Regexp format for that with a practical example.
Published 2 years ago in #data about #clickhouse -
Formatting unstructured data using OpenAI API and Python
How to use OpenAI to format unstructured text data, e.g. CSV. Setting additional formatting requirements to format specific values in the resulting CSV.
Published 2 years ago in #machinelearning about #python and #openai -
Quick start OpenAI API example using Python
How to start using OpenAI API with Python. A simple example of a Python script that generates data based on the OpenAI language model.
Published 2 years ago in #machinelearning about #python and #openai -
Using Sphinx to add full-text search to Clickhouse
How to configure Sphinx to index text data from Clickhouse. What IDs to use for Clickhouse documents with Sphinx. How to build an index and resolve found documents in Clickhouse.
Published 2 years ago in #data about #clickhouse and #sphinx -
How to use multiple disks in Clickhouse
How to configure multiple disks as storages in Clickhouse, and how to use different disks for different tables in Clickhouse.
Published 2 years ago in #data about #clickhouse -
What is a function derivative and how to optimize functions
The article explains what a function derivative is on a very basic level. Starting from the concept of the function, we move along function changes and finally, look at a Python example of optimizing a function based on its derivative.
Published 2 years ago in #machinelearning about #math, #derivative and #python -
Matrices and vectors math for AI with Python examples
Article provides an introduction to vectors and matrices, two fundamental concepts in linear algebra, which are widely used in artificial intelligence. It explains what vectors and matrices are and how they are defined in math. Basic operations with vectors and matrices using Python, including adding, multiplying, and transposing matrices.
Published 2 years ago in #machinelearning about #math, #matrix and #vector -
Creating a bigram language model for text generation with Python
Understanding bigram language models, which are statistical models that predict the likelihood of a word given its preceding word. Includes an example of a simple bigram language model in Python.
Published 2 years ago in #machinelearning about #nlp, #language-models and #python -
What is a language model and how it works
Basics about language models, which are algorithms that enable computers to analyze and understand human language. The article explains how language models work and how they are trained, using a simple example of a program that can understand and respond to simple questions.
Published 2 years ago in #machinelearning about #nlp and #language-models -
What is Machine Learning and how it works
Machine Learning basics, the math behind machine learning, predictions, prediction errors, training dataset, validation dataset.
Published 2 years ago in #machinelearning -
Using csvkit to format, clean, and fix CSV files
Formatting CSV, TSV, and other files, converting CSV delimiters, converting CSV quoting symbols, fixing invalid CSV files, working with compressed CSV files
Published 2 years ago in #programming about #python and #csv -
Reading CSV, TSV, and invalid CSV files with Golang
Reading CSV with Golang line by line or entirely, reading CSV with custom delimiters (including TSV) and escaping rules, and reading broken CSV files.
Published 2 years ago in #programming about #golang and #csv -
Welcome to DataChild - place to learn data programming and ML
This is a welcoming post about the idea behind this place, basic approaches, target audience and goals.
Published 2 years ago in #data