56 Aleksandra Marcinów, Małgorzata Biegańska, Bianka Kowalska, Hubert Baran, Daniil Hardzetski, Halina Kwaśnicka
can facilitate the design of more eective, functional, and
user-friendly buildings. Historical, demographic, and so-
cial data can also be benecial for architects in under-
standing the cultural and social context of the area under
consideration. They consider local needs and community
preferences, thereby contributing to more appropriate
designs. Data on space use patterns are increasingly col-
lected, such as data on pedestrian trac, the use of public
spaces, and local customs. The analysis of this data can
lead to the optimization of urban and architectural designs.
Datasets containing information about trends in architec-
tural design, interior design, and space use can be used to
identify new inspirations, styles, and directions for devel-
opment in architecture. Automating image classication
in cultural heritage through deep learning, particularly
convolutional neural networks, improves accuracy but re -
quires large, well-prepared datasets. Creating and sharing
new datasets is crucial for maximizing AI’s benets in this
eld (Llamas et al. 2017). Segmentation is vital in urban
planning, especially for analysing building façades. City
digital twins (CDTs) allow the creation of high-quali-
ty synthetic datasets, improving segmentation eciency
compared to virtual data (Zhang et al. 2022). Parametric
BIM aids in generating training data for AI to recognize
building objects in images, demonstrating that AI trained
on synthetic data can eectively solve real-world archi-
tectural problems (Alawadhi, Yan 2023). Generative AI
signicantly enhances innovation and eciency in archi-
tectural design, from creating 2D images and 3D models to
inuencing all design stages (Li et al. 2024).
Integrating data and articial intelligence technologies
in architectural design represents a game-changing shift
in the eld. It paves the way for innovative designs that
are more community-centric, ecient, and sustainable.
The availability of comprehensive datasets has become
a crucial tool for architects, empowering them to create fu-
ture-proof construction and urban planning solutions.
Constructing datasets has many vital purposes and ben-
ets in the eld of machine learning and data science, as
they are the foundation for training machine learning mod-
els. The larger and more representative the dataset, the bet-
ter results can be achieved. Datasets allow the collection of
information about a given area, which can lead to the de-
tection of essential patterns, trends, or other relationships
in the data. It is essential that datasets are tailored to spe-
cic problems and contexts, as this allows the creation of
more eective and accurate models for solving particular
tasks (Bialek et al. 2016; Vaccari et al. 2020).
The utilization of publicly available datasets, common-
ly referred to as benchmark datasets, is of utmost signi-
cance in the evaluation of model performance. The imple-
mentation of trusted procedures and datasets instils a sense
of reliability and objectivity within the evaluation process,
thereby allowing relatively objective comparisons, which
in turn enhances the credibility of our evaluations. Thanks
to this, we can improve existing solutions and create new,
more advanced technologies with condence.
Overall, constructing datasets is a crucial step in ma-
chine learning and data analysis that enables the devel-
opment of new models, techniques and tools and leads to
the discovery of new patterns, relationships, and trends in
data. The increasing prevalence of articial intelligence
in the eld of architecture necessitates the construction of
novel, expansive, and structured datasets. AI models that
require knowledge of specic architectural styles are de-
pendent upon the availability of uniform datasets.
The following article serves as an introduction to a new
dataset, collected and created by the authors. The collec-
tion serves the purpose of a historical dataset, containing
façades of tenements from the 19
th
and early 20
th
centuries
from Wrocław and, in due course, from other cities of sim-
ilar architectural styles (for example Berlin and Szczecin).
The paper is structured as follows: sections describe the
methodology for creating the dataset, verication of anno-
tations and present example of annotated façade from the
collection. The summary concludes the paper.
Methods
Datasets represent the fundamental building blocks of
machine learning development. Standardized benchmark
datasets have come to be accepted as a reliable tool for the
comparison and evaluation of various models (Kistowski
et al. 2015). The quality of datasets can be described by
ve attributes: relevance, reproducibility, fairness, veri-
ability and usability.
In consideration of the aforementioned key character-
istics of the dataset, the authors have developed a meth-
odology for the creation of the collection (Kistowski et al.
2015). Our methodology consists of eight steps divided
into three stages:
– Stage I. Preparation of the data acquisition procedure
(objectives formulation, use case requirements determina-
tion, planning the procedure of gathering images, collect-
ing photos taken by a group of collaborating students).
–
Stage II. Data processing: creation of a dataset that
meets the requirements (photographs review in conside ra-
tion of their quality, photos annotation, verication of anno -
tations).
– Stage III. Summary of the dataset (reconciliation of
descriptions, the description of the generated dataset, and
its transfer for use in the machine learning models).
Stage I.
Preparation of the data acquisition procedure
Providing large datasets to train and test models is nec-
essary for achieving success in AI learning. However, the
initial sets cannot be too diverse because there will be too
little data for a given style or type of element. The model
cannot learn its features and may even treat it as “noise”.
At the same time, the dataset must be varied because other-
wise, the model will be overtting and unable to cope with
new examples. That is why we chose one city to start with
so that the tenement houses would be manageable.
Objectives formulation
The objective is to create a collection of photographs de-
picting the of historic townhouses in Wrocław dating from
the 19
th
and early 20
th
centuries. These buildings serve