CTM - Indexed Articles in Journals

Permanent URI for this collection

Browse

Recent Submissions

Now showing 1 - 5 of 222
  • Item
    Unveiling the performance of video anomaly detection models - A benchmark-based review
    ( 2023) Pedro Miguel Carvalho ; Jaime Cardoso ; 4358 ; 3889
    Deep learning has recently gained popularity in the field of video anomaly detection, with the development of various methods for identifying abnormal events in visual data. The growing need for automated systems to monitor video streams for anomalies, such as security breaches and violent behaviours in public areas, requires the development of robust and reliable methods. As a result, there is a need to provide tools to objectively evaluate and compare the real-world performance of different deep learning methods to identify the most effective approach for video anomaly detection. Current state-of-the-art metrics favour weakly-supervised strategies stating these as the best-performing approaches for the task. However, the area under the ROC curve, used to justify this statement, has been shown to be an unreliable metric for highly unbalanced data distributions, as is the case with anomaly detection datasets. This paper provides a new perspective and insights on the performance of video anomaly detection methods. It reports the results of a benchmark study with state-of-the-art methods using a novel proposed framework for evaluating and comparing the different models. The results of this benchmark demonstrate that using the currently employed set of reference metrics led to the misconception that weakly-supervised methods consistently outperform semi-supervised ones. © 2023 The Authors
  • Item
    Synthesizing Human Activity for Data Generation
    ( 2023) Américo José Pereira ; Pedro Miguel Carvalho ; Luís Corte Real ; 6078 ; 4358 ; 243
    The problem of gathering sufficiently representative data, such as those about human actions, shapes, and facial expressions, is costly and time-consuming and also requires training robust models. This has led to the creation of techniques such as transfer learning or data augmentation. However, these are often insufficient. To address this, we propose a semi-automated mechanism that allows the generation and editing of visual scenes with synthetic humans performing various actions, with features such as background modification and manual adjustments of the 3D avatars to allow users to create data with greater variability. We also propose an evaluation methodology for assessing the results obtained using our method, which is two-fold: (i) the usage of an action classifier on the output data resulting from the mechanism and (ii) the generation of masks of the avatars and the actors to compare them through segmentation. The avatars were robust to occlusion, and their actions were recognizable and accurate to their respective input actors. The results also showed that even though the action classifier concentrates on the pose and movement of the synthetic humans, it strongly depends on contextual information to precisely recognize the actions. Generating the avatars for complex activities also proved problematic for action recognition and the clean and precise formation of the masks.
  • Item
    From a Visual Scene to a Virtual Representation: A Cross-Domain Review
    ( 2023) Pedro Miguel Carvalho ; Paula Viana ; Nuno Alexandre Pereira ; Américo José Pereira ; Luís Corte Real ; 4358 ; 1107 ; 7023 ; 6078 ; 243
    The widespread use of smartphones and other low-cost equipment as recording devices, the massive growth in bandwidth, and the ever-growing demand for new applications with enhanced capabilities, made visual data a must in several scenarios, including surveillance, sports, retail, entertainment, and intelligent vehicles. Despite significant advances in analyzing and extracting data from images and video, there is a lack of solutions able to analyze and semantically describe the information in the visual scene so that it can be efficiently used and repurposed. Scientific contributions have focused on individual aspects or addressing specific problems and application areas, and no cross-domain solution is available to implement a complete system that enables information passing between cross-cutting algorithms. This paper analyses the problem from an end-to-end perspective, i.e., from the visual scene analysis to the representation of information in a virtual environment, including how the extracted data can be described and stored. A simple processing pipeline is introduced to set up a structure for discussing challenges and opportunities in different steps of the entire process, allowing to identify current gaps in the literature. The work reviews various technologies specifically from the perspective of their applicability to an end-to-end pipeline for scene analysis and synthesis, along with an extensive analysis of datasets for relevant tasks.
  • Item
    Cognition inspired format for the expression of computer vision metadata
    ( 2016) Pedro Miguel Carvalho ; Hélder Fernandes Castro ; João Pedro Monteiro ; Américo José Pereira ; 4358 ; 4487 ; 5568 ; 6078
    Over the last decade noticeable progress has occurred in automated computer interpretation of visual information. Computers running artificial intelligence algorithms are growingly capable of extracting perceptual and semantic information from images, and registering it as metadata. There is also a growing body of manually produced image annotation data. All of this data is of great importance for scientific purposes as well as for commercial applications. Optimizing the usefulness of this, manually or automatically produced, information implies its precise and adequate expression at its different logical levels, making it easily accessible, manipulable and shareable. It also implies the development of associated manipulating tools. However, the expression and manipulation of computer vision results has received less attention than the actual extraction of such results. Hence, it has experienced a smaller advance. Existing metadata tools are poorly structured, in logical terms, as they intermix the declaration of visual detections with that of the observed entities, events and comprising context. This poor structuring renders such tools rigid, limited and cumbersome to use. Moreover, they are unprepared to deal with more advanced situations, such as the coherent expression of the information extracted from, or annotated onto, multi-view video resources. The work here presented comprises the specification of an advanced XML based syntax for the expression and processing of Computer Vision relevant metadata. This proposal takes inspiration from the natural cognition process for the adequate expression of the information, with a particular focus on scenarios of varying numbers of sensory devices, notably, multi-view video.
  • Item
    Efficient CIEDE2000-Based Color Similarity Decision for Computer Vision
    ( 2020) Américo José Pereira ; Pedro Miguel Carvalho ; Luís Corte Real ; 6078 ; 4358 ; 243
    Color and color differences are critical aspects in many image processing and computer vision applications. A paradigmatic example is object segmentation, where color distances can greatly influence the performance of the algorithms. Metrics for color difference have been proposed in the literature, including the definition of standards such as CIEDE2000, which quantifies the change in visual perception of two given colors. This standard has been recommended for industrial computer vision applications, but the benefits of its application have been impaired by the complexity of the formula. This paper proposes a new strategy that improves the usability of the CIEDE2000 metric when a maximum acceptable distance can be imposed. We argue that, for applications where a maximum value, above which colors are considered to be different, can be established, then it is possible to reduce the amount of calculations of the metric, by preemptively analyzing the color features. This methodology encompasses the benefits of the metric while overcoming its computational limitations, thus broadening the range of applications of CIEDE2000 in both the computer vision algorithms and computational resource requirements.