
a popular application of AI in medicine, so it is
used more widely with different kinds of medical im-
age data [10, 11, 12]. Polyp segmentation is one
of popular segmentation tasks that uses ML tech-
niques to detect and segment polyps in images/videos
collected from gastrointestinal tract (GI) screenings.
Early identification of polyps in GI tract is criti-
cal to prevent colorectal cancers [13]. Therefore,
many ML models have been investigated to segment
polyps automatically in GI tract videos recorded
from endoscopy [14, 15, 16] or PilCams examina-
tions [17, 18, 19] to augment performance of doctors
by detecting polyps missed by experts, thereby both
decreasing the miss rates and reducing the observer
variations.
Most of polyp segmentation models are based
on convolutional neural networks (CNNs) and are
trained using publicly available polyp segmentation
datasets [20, 21, 22, 23, 24]. However, these datasets
have a limited number of images with corresponding
expert annotated masks. For examples, the CVC-
VideoClinicDB [21] dataset has 11, 954 images from
10 polyp videos and 10 non-polyp videos, the PIC-
COLO dataset [24] has 3, 433 manually annotated
images (2, 131 white-light images and 1, 302 narrow-
band images), and the Hyper-Kvasir [20] dataset has
only 1, 000 segmented images, but also contains of
100, 000 unlabeled images.
We identified two main reasons for having small
datasets in medical domain compared to other do-
mains. The first reason is privacy concerns attached
with medical data, and the second is the costly and
time-consuming medical data annotation processes
that the medical domain experts must perform.
The privacy concerns can vary from country to
country and region to region according to data pro-
tection regulations introduced in the specific ar-
eas. For example, Norway should follow the rules
given by the Norwegian data protection authority
(NDPA) [25] and enforce the personal data act [26]
in addition to following the general data protec-
tion regulation (GDPR) [27] guidelines being the
same for all European countries. While there is
no central level privacy protection guideline in the
US like GDPR in Europe, US rules and regulations
are enforced through other US privacy laws, such as
Health Insurance Portability and Accountability Act
(HIPAA) [28] and California Consumer Privacy Act
(CCPA) [29]. In Asian counties, they follow their
own sets of rules, such as Japan’s Act on Protection of
Personal Information [30], the South Korean Personal
Information Protection Commission [31] and the Per-
sonal Data Protection Bill in India [32].
If research is performed with such privacy re-
strictions, the papers published are often theoretical
methods only. According to the analyzed medical im-
age segmentation studies in [33], 30% have used pri-
vate datasets. As a result, the studies are not repro-
ducible. Researchers must keep datasets private due
to medical data sharing restrictions. Furthermore,
universities and research institutes that use medical
domain data for teaching purposes use the same med-
ical datasets for years, which affects the quality of
education. In addition to the privacy concerns, the
costly and time-consuming medical data labeling and
annotation process [34] is an obstacle to producing
big datasets for AI algorithms. Compared to other
already time-consuming medical data labeling pro-
cesses, a pixel-wise data annotation are far more de-
manding on the valuable medical experts’ time. The
experts in the medical domain can perform the an-
notations fully trustable in terms of correctness. If
the data annotations by experts are not possible, the
experts should do at least a review process to make
the annotations correct before using them in AI al-
gorithms. The importance of having accurate anno-
tations from experts for medical data is, for example,
discussed by Yu et al. [35] using a mandible segmenta-
tion dataset of CT images. In this regard, researching
a way to produce synthetic segmentation datasets is
important to overcome the timely and costly medical
data annotation process. Therefore, researching an
alternative way for medical data sharing, bypassing
both the privacy and time-consuming dataset gener-
ation challenges, is the main objective of this study.
In this regard, the contributions of this paper are
as follows.
• This study introduces the novel SynGAN-Seg
pipeline to generate synthetic medical image and
its corresponding segmentation mask using a
modified version of the state-of-the-art SinGAN
2