Build software better, together

snorkel-team / snorkel

A system for quickly generating training data with weak supervision

python data-science machine-learning ai weak-supervision snorkel labeling data-augmentation training-data data-slicing

Updated May 2, 2024
Python

NVIDIA / DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

python machine-learning deep-learning neural-network mxnet gpu image-processing pytorch gpu-tensorflow data-processing data-augmentation audio-processing paddle image-augmentation fast-data-pipeline

Updated Jan 24, 2025
C++

ZhaoJ9014 / face.evoLVe

Star

🔥🔥High-Performance Face Recognition Library on PaddlePaddle & PyTorch🔥🔥

Updated Dec 23, 2022
Python

QData / TextAttack

Star

TextAttack �?� is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/

nlp security machine-learning natural-language-processing data-augmentation adversarial-machine-learning adversarial-examples adversarial-attacks

Updated Jul 25, 2024
Python

webdataset / webdataset

Star

A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.

deep-learning pytorch data-augmentation webdataset webdataset-format

Updated Dec 11, 2024
Python

fepegar / torchio

Star

Medical imaging toolkit for deep learning

python machine-learning deep-learning pytorch medical-image-computing medical-images data-augmentation augmentation medical-image-processing medical-image-analysis medical-imaging-datasets medical-imaging-with-deep-learning

Updated Jan 20, 2025
Python

iver56 / audiomentations

Sponsor

Star

A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning.

audio python music machine-learning deep-learning dsp sound sound-processing data-augmentation augmentation audio-effects audio-data-augmentation

Updated Dec 9, 2024
Python

425776024 / nlpcda

Star

一键中文数�?�增强包； NLP数�?�增强�?bert数�?�增强�?EDA：pip install nlpcda

nlp data-augmentation chinese-data-augmentation nlpcda chinese-eda

Updated Apr 15, 2024
Python

fastdup is a powerful, free tool designed to rapidly generate valuable insights from image and video datasets. It helps enhance the quality of both images and labels, while significantly reducing data operation costs, all with unmatched scalability.

visualization python machine-learning image deep-learning image-processing dataset image-classification outlier-detection object-detection image-analysis visual-search data-augmentation data-curation visualization-tools image-similarity image-duplicate-detection novelty-detection image-classfication

Updated Jan 16, 2025
Python

AgaMiko / data-augmentation-review

Star

List of useful data augmentation resources. You will find here some not common techniques, libraries, links to GitHub repos, papers, and others.

review machine-learning survey generative-adversarial-network style-transfer data-generation data-augmentation image-augmentation data-synthesis autoaugment audio-augmentation data-augmentations augmentation-policies nlp-augmentation graph-data-augmentation

Updated Aug 14, 2024

jasonwei20 / eda_nlp

Star

Data augmentation for NLP, presented at EMNLP 2019

nlp text-classification position cnn embeddings synonyms swap classification rnn sentence data-augmentation

Updated Mar 19, 2023
Python

yongzhuo / nlp_xiaojiang

Star

自然语言处�?�（nlp），�?姜机器人（闲�?�检索�?chatbot），BERT�?��?��?-相似度（Sentence Similarity），XLNET�?��?��?-相似度（text xlnet embedding），文本分类（Text classification），实体�??�?�（ner，bert+bilstm+crf），数�?�增强（text augment, data enhance），�?�义�?��?�义�?生�?，�?��?主干�??�?�（mainpart），中文汉语短文本相似度，文本特�?工程，keras-http-service调用

nlp text-classification distance chatbot chinese feature bert data-augmentation enhance text-augment xlnet

Updated Sep 23, 2021
Python

LirongWu / awesome-graph-self-supervised-learning

Star

Code for TKDE paper "Self-supervised learning on graphs: Contrastive, generative, or predictive"

machine-learning deep-learning transfer-learning representation-learning unsupervised-learning data-augmentation graph-neural-networks self-supervised-learning pre-training pretext-task

Updated Aug 15, 2024

zhanlaoban / EDA_NLP_for_Chinese

Star

An implement of the paper of EDA for Chinese corpus.中文语料的EDA数�?�增强工具。NLP数�?�增强。论文阅读笔记。

text-classification eda chinese data-augmentation chinese-data-augmentation easy-data-augmentation

Updated May 31, 2022
Python

Paperspace / DataAugmentationForObjectDetection

Star

Data Augmentation For Object Detection

opencv deep-learning object-detection data-augmentation bounding-box imagine-augmentation

Updated Apr 14, 2020
Jupyter Notebook

asteroid-team / torch-audiomentations

Star

Fast audio data augmentation in PyTorch. Inspired by audiomentations. Useful for deep learning.

audio python music machine-learning deep-learning dsp waveform sound pytorch sound-processing data-augmentation augmentation audio-effects differentiable-data-augmentation audio-data-augmentation

Updated Jan 15, 2025
Python

quqxui / Awesome-LLM4IE-Papers

Star

Awesome papers about generative Information Extraction (IE) using Large Language Models (LLMs)

information-extraction named-entity-recognition event-detection event-extraction data-augmentation relation-extraction zero-shot-learning few-shot-learning knowledge-graph-construction event-arguments cross-domain-learning in-context-learning large-language-models

Updated Nov 18, 2024

styfeng / DataAug4NLP

Star

Collection of papers and resources for data augmentation for NLP.

machine-learning natural-language-processing deep-learning text-classification transformers artificial-intelligence survey data-augmentation survey-paper acl2021

Updated Aug 12, 2022

goru001 / inltk

Star

Natural Language Toolkit for Indic Languages aims to provide out of the box support for various NLP tasks that an application developer might need

nlp deep-learning word-embeddings pytorch data-augmentation indic-languages sentence-similarity sentence-embeddings sentence-encoding

Updated Jan 20, 2024
Python

Tebmer / Awesome-Knowledge-Distillation-of-LLMs

Star

This repository collects papers for "A Survey on Knowledge Distillation of Large Language Models". We break down KD into Knowledge Elicitation and Distillation Algorithms, and explore the Skill & Vertical Distillation of LLMs.

compression feedback survey alignment self-training multi-modal knowledge-distillation data-augmentation kd data-synthesis self-distillation instruction-following llm large-language-model supervised-finetuning

Updated Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data-augmentation

Here are 1,132 public repositories matching this topic...

snorkel-team / snorkel

NVIDIA / DALI

ZhaoJ9014 / face.evoLVe

QData / TextAttack

webdataset / webdataset

fepegar / torchio

iver56 / audiomentations

425776024 / nlpcda

visual-layer / fastdup

AgaMiko / data-augmentation-review

jasonwei20 / eda_nlp

yongzhuo / nlp_xiaojiang

LirongWu / awesome-graph-self-supervised-learning

zhanlaoban / EDA_NLP_for_Chinese

Paperspace / DataAugmentationForObjectDetection

asteroid-team / torch-audiomentations

quqxui / Awesome-LLM4IE-Papers

styfeng / DataAug4NLP

goru001 / inltk

Tebmer / Awesome-Knowledge-Distillation-of-LLMs

Improve this page

Add this topic to your repo