Please use this identifier to cite or link to this item: https://ruomo.lib.uom.gr/handle/7000/1554
Full metadata record
DC FieldValueLanguage
dc.contributor.authorKaramanou, Areti-
dc.contributor.authorBrimos, Petros-
dc.contributor.authorKalampokis, Evangelos-
dc.contributor.authorTarabanis, Konstantinos-
dc.date.accessioned2022-12-16T10:59:32Z-
dc.date.available2022-12-16T10:59:32Z-
dc.date.issued2022-
dc.identifier10.3390/s22249684en_US
dc.identifier.issn1424-8220en_US
dc.identifier.urihttps://doi.org/10.3390/s22249684en_US
dc.identifier.urihttps://ruomo.lib.uom.gr/handle/7000/1554-
dc.description.abstractDynamic data (including environmental, traffic, and sensor data) were recently recognized as an important part of Open Government Data (OGD). Although these data are of vital importance in the development of data intelligence applications, such as business applications that exploit traffic data to predict traffic demand, they are prone to data quality errors produced by, e.g., failures of sensors and network faults. This paper explores the quality of Dynamic Open Government Data. To that end, a single case is studied using traffic data from the official Greek OGD portal. The portal uses an Application Programming Interface (API), which is essential for effective dynamic data dissemination. Our research approach includes assessing data quality using statistical and machine learning methods to detect missing values and anomalies. Traffic flow-speed correlation analysis, seasonal-trend decomposition, and unsupervised isolation Forest (iForest) are used to detect anomalies. iForest anomalies are classified as sensor faults and unusual traffic conditions. The iForest algorithm is also trained on additional features, and the model is explained using explainable artificial intelligence. There are 20.16% missing traffic observations, and 50% of the sensors have 15.5% to 33.43% missing values. The average percent of anomalies per sensor is 71.1%, with only a few sensors having less than 10% anomalies. Seasonal-trend decomposition detected 12.6% anomalies in the data of these sensors, and iForest 11.6%, with very few overlaps. To the authors’ knowledge, this is the first time a study has explored the quality of dynamic OGD.en_US
dc.language.isoenen_US
dc.rightsAttribution-NonCommercial-ShareAlike 4.0 International*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/*
dc.sourceSensorsen_US
dc.subjectFRASCATI::Natural sciences::Computer and information sciencesen_US
dc.subject.otheropen government dataen_US
dc.subject.otherdynamic government dataen_US
dc.subject.otherhigh-valuable dataen_US
dc.subject.otherreal-time dataen_US
dc.subject.othertraffic dataen_US
dc.subject.otherdata qualityen_US
dc.subject.otherisolation foresten_US
dc.subject.othereXplainable artificial intelligenceen_US
dc.titleExploring the Quality of Dynamic Open Government Data Using Statistical and Machine Learning Methodsen_US
dc.typeArticleen_US
dc.contributor.departmentΤμήμα Οργάνωσης & Διοίκησης Επιχειρήσεωνen_US
local.identifier.volume22en_US
local.identifier.issue24en_US
local.identifier.firstpage9684en_US
Appears in Collections:Department of Business Administration

Files in This Item:
File Description SizeFormat 
sensors-22-09684-v2.pdf2,01 MBAdobe PDFThumbnail
View/Open


This item is licensed under a Creative Commons License Creative Commons