Please use this identifier to cite or link to this item: https://open.uns.ac.rs/handle/123456789/32256
Title: Konstrukcija i analiza klaster algoritma sa primenom u definisanju bihejvioralnih faktora rizika u populaciji odraslog stanovništva Srbije
Other Titles: Construction and analysis of cluster algorithmwith application in defining behavioural riskfactors in Serbian adult population
Authors: Dragnić Nataša 
Keywords: Cluster Analysis; Algorithms; Nonsmoothoptimization; Complexity; Categorical data;Behavioural research, Risk factors, Adult,Socioeconomic factors, Demography;Klaster analiza; algoritmi; neglatka optimizacija; složenost; kategorijalni podaci; bihejvioralna istraživanja; faktori rizika; odrasli; socioekonomski faktori; demografija
Issue Date: 23-Jun-2016
Publisher: Univerzitet u Novom Sadu, Doktorske disertacije iz interdisciplinarne odnosno multidisciplinarne oblasti na Univerzitetu u Novom Sadu
University of Novi Sad, Doctoral dissertations in the interdisciplinary or multidisciplinary field
Abstract: <p>Klaster analiza ima dugu istoriju i mada se<br />primenjuje u mnogim oblastima i dalje ostaju<br />značajni izazovi. U disertaciji je prikazan uvod<br />u neglatki optimizacioni pristup u<br />klasterovanju, sa osvrtom na problem<br />klasterovanja velikih skupova podataka.<br />Međutim, ovi optimizacioni algoritmi bolje<br />funkcioni&scaron;u u radu sa neprekidnim podacima.<br />Jedan od glavnih izazova u klaster analizi je<br />rad sa velikim skupovima podataka sa<br />kategorijalnim i kombinovanim (numerički i<br />kategorijalni) tipovima promenljivih. Rad sa<br />velikim brojem instanci (objekata) i velikim<br />brojem dimenzija (promenljivih), može<br />predstavljati problem u klaster analizi, zbog<br />vremenske složenosti. Jedan od načina<br />re&scaron;avanja ovog problema je redukovanje broja<br />instanci, bez gubitka informacija.<br />Prvi cilj disertacije je bio upoređivanje<br />rezultata klasterovanja na celom skupu i<br />prostim slučajnim uzorcima sa kategorijalnim i<br />kombinovanim podacima, za različite veličine<br />uzorka i različit broj klastera. Nije utvrđena<br />značajna razlika (p&gt;0.05) u rezultatima<br />klasterovanja na uzorcima obima<br />0.03m,0.05m,0.1m,0.3m (gde je m obim<br />posmatranog skupa) i celom skupu.<br />Drugi cilj disertacije je bio konstrukcija<br />efikasnog postupka klasterovanja velikih<br />skupova podataka sa kategorijalnim i<br />kombinovanim tipovima promenljivih.<br />Predloženi postupak se sastoji iz sledećih<br />koraka: 1. klasterovanje na prostim slučajnim<br />uzorcima određene kardinalnosti; 2.<br />određivanje najboljeg klasterskog re&scaron;enja na<br />uzorku, primenom odgovarajućeg kriterijuma<br />validnosti; 3. dobijeni centri klastera iz ovog<br />uzorka služe za klasterovanje ostatka skupa.<br />Treći cilj disertacije predstavlja primenu<br />klaster analize u definisanju klastera<br />bihejvioralnih faktora rizika u populaciji<br />odraslog stanovni&scaron;tva Srbije, kao i analizu<br />sociodemografskih karakteristika dobijenih<br />klastera. Klaster analiza je primenjena na<br />velikom reprezentativnom uzorku odraslog<br />stanovni&scaron;tva Srbije, starosti 20 i vi&scaron;e godina.<br />Izdvojeno je pet jasno odvojenih klastera sa<br />karakterističnim kombinacijama bihejvioralnih<br />faktora rizika: Bez rizičnih faktora, &Scaron;tetna<br />upotreba alkohola i druge rizične navike,<br />Nepravilna ishrana i druge rizične navike,<br />Nedovoljna fizička aktivnost, Pu&scaron;enje. Rezultati<br />multinomnog logističkog regresionog modela<br />ukazuju da ispitanici koji nisu u braku, lo&scaron;ijeg<br />su materijalnog stanja, nižeg obrazovanja i žive<br />u Vojvodini imaju veću &scaron;ansu za prisustvo<br />vi&scaron;estrukih bihejvioralnih faktora rizika.</p>
<p>The cluster analysis has a long history and a<br />large number of clustering techniques have<br />been developed in many areas, however,<br />significant challenges still remain. In this<br />thesis we have provided a introduction to<br />nonsmooth optimization approach to clustering<br />with reference to clustering large datasets.<br />Nevertheless, these optimization clustering<br />algorithms work much better when a dataset<br />contains only vectors with continuous features.<br />One of the main challenges is clustering of large<br />datasets with categorical and mixed (numerical<br />and categorical) data. Clustering deals with a<br />large number of instances (objects) and a large<br />number of dimensions (variables) can be<br />problematic because of time complexity. One of<br />the ways to solve this problem is by reducing<br />the number of instances, without the loss of<br />information.<br />The first aim of this thesis was to compare<br />the results of cluster algorithms on the whole<br />dataset and on simple random samples with<br />categorical and mixed data, in terms of validity,<br />for different number of clusters and for<br />different sample sizes. There were no<br />significant differences (p&gt;0.05) between the<br />obtained results on the samples of the size of<br />0.03m,0.05m,0.1m,0.3m (where m is the size of<br />the dataset) and the whole dataset.<br />The second aim of this thesis was to<br />develop an efficient clustering procedure for<br />large datasets with categorical and mixed<br />(numeric and categorical) values. The proposed<br />procedure consists of the following steps: 1.<br />clustering on simple random samples of a given<br />cardinality; 2. finding the best cluster solution<br />on a sample (by appropriate validity measure);<br />3. using cluster centers from this sample for<br />clustering of the remaining data.<br />The third aim of this thesis was to<br />examine clustering of four lifestyle risk factors<br />and to examine the variation across different<br />socio-demographic groups in a Serbian adult<br />population. Cluster analysis was carried out on<br />a large representative sample of Serbian adults<br />aged 20 and over. We identified five<br />homogenous health behaviour clusters with<br />specific combination of risk factors: &#39;No Risk<br />Behaviours&#39;, &#39;Drinkers with Risk Behaviours&#39;,<br />&#39;Unhealthy diet with Risk Behaviours&#39;,<br />&#39;Smoking&#39;. Results of multinomial logistic<br />regression indicated that single adults, less<br />educated, with low socio-economic status and<br />living in the region of Vojvodina are most likely<br />to be a part of the clusters with a high-risk<br />profile.</p>
URI: https://open.uns.ac.rs/handle/123456789/32256
Appears in Collections:UNS Teze/Theses

Show full item record

Page view(s)

48
Last Week
9
Last month
0
checked on May 10, 2024

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.