CTM - Indexed Articles in Journals
Permanent URI for this collection
Browse
Browsing CTM - Indexed Articles in Journals by Author "4358"
Results Per Page
Sort Options
-
ItemAutomatic TV Logo Identification for Advertisement Detection without Prior Data( 2021) Pedro Miguel Carvalho ; Américo José Pereira ; Paula Viana ; 1107 ; 4358 ; 6078Advertisements are often inserted in multimedia content, and this is particularly relevant in TV broadcasting as they have a key financial role. In this context, the flexible and efficient processing of TV content to identify advertisement segments is highly desirable as it can benefit different actors, including the broadcaster, the contracting company, and the end user. In this context, detecting the presence of the channel logo has been seen in the state-of-the-art as a good indicator. However, the difficulty of this challenging process increases as less prior data is available to help reduce uncertainty. As a result, the literature proposals that achieve the best results typically rely on prior knowledge or pre-existent databases. This paper proposes a flexible method for processing TV broadcasting content aiming at detecting channel logos, and consequently advertising segments, without using prior data about the channel or content. The final goal is to enable stream segmentation identifying advertisement slices. The proposed method was assessed over available state-of-the-art datasets as well as additional and more challenging stream captures. Results show that the proposed method surpasses the state-of-the-art.
-
ItemBMOG: boosted Gaussian Mixture Model with controlled complexity for background subtraction( 2018) Alba Castro,JL ; Pedro Miguel Carvalho ; Martins,I ; Luís Corte Real ; 4358 ; 243
-
ItemBoosting color similarity decisions using the CIEDE2000_PF Metric( 2022) Américo José Pereira ; Pedro Miguel Carvalho ; Luís Corte Real ; 243 ; 4358 ; 6078
-
ItemCognition inspired format for the expression of computer vision metadata( 2016) Pedro Miguel Carvalho ; Hélder Fernandes Castro ; João Pedro Monteiro ; Américo José Pereira ; 4358 ; 4487 ; 5568 ; 6078Over the last decade noticeable progress has occurred in automated computer interpretation of visual information. Computers running artificial intelligence algorithms are growingly capable of extracting perceptual and semantic information from images, and registering it as metadata. There is also a growing body of manually produced image annotation data. All of this data is of great importance for scientific purposes as well as for commercial applications. Optimizing the usefulness of this, manually or automatically produced, information implies its precise and adequate expression at its different logical levels, making it easily accessible, manipulable and shareable. It also implies the development of associated manipulating tools. However, the expression and manipulation of computer vision results has received less attention than the actual extraction of such results. Hence, it has experienced a smaller advance. Existing metadata tools are poorly structured, in logical terms, as they intermix the declaration of visual detections with that of the observed entities, events and comprising context. This poor structuring renders such tools rigid, limited and cumbersome to use. Moreover, they are unprepared to deal with more advanced situations, such as the coherent expression of the information extracted from, or annotated onto, multi-view video resources. The work here presented comprises the specification of an advanced XML based syntax for the expression and processing of Computer Vision relevant metadata. This proposal takes inspiration from the natural cognition process for the adequate expression of the information, with a particular focus on scenarios of varying numbers of sensory devices, notably, multi-view video.
-
ItemDeep Anomaly Detection for In-Vehicle Monitoring—An Application-Oriented Review( 2022) Caetano,F ; Pedro Miguel Carvalho ; Jaime Cardoso ; 3889 ; 4358Anomaly detection has been an active research area for decades, with high application potential. Recent work has explored deep learning approaches to the detection of abnormal behaviour and abandoned objects in outdoor video surveillance scenarios. The extension of this recent work to in-vehicle monitoring using solely visual data represents a relevant research opportunity that has been overlooked in the accessible literature. With the increasing importance of public and shared transportation for urban mobility, it becomes imperative to provide autonomous intelligent systems capable of detecting abnormal behaviour that threatens passenger safety. To investigate the applicability of current works to this scenario, a recapitulation of relevant state-of-the-art techniques and resources is presented, including available datasets for their training and benchmarking. The lack of public datasets dedicated to in-vehicle monitoring is addressed alongside other issues not considered in previous works, such as moving backgrounds and frequent illumination changes. Despite its relevance, similar surveys and reviews have disregarded this scenario and its specificities. This work initiates an important discussion on application-oriented issues, proposing solutions to be followed in future works, particularly synthetic data augmentation to achieve representative instances with the low amount of available sequences.
-
ItemEfficient CIEDE2000-based Color Similarity Decision for Computer Vision( 2019) Luís Corte Real ; Américo José Pereira ; Pedro Miguel Carvalho ; Coelho,G ; 6078 ; 243 ; 4358
-
ItemEfficient CIEDE2000-Based Color Similarity Decision for Computer Vision( 2020) Américo José Pereira ; Pedro Miguel Carvalho ; Luís Corte Real ; 6078 ; 4358 ; 243Color and color differences are critical aspects in many image processing and computer vision applications. A paradigmatic example is object segmentation, where color distances can greatly influence the performance of the algorithms. Metrics for color difference have been proposed in the literature, including the definition of standards such as CIEDE2000, which quantifies the change in visual perception of two given colors. This standard has been recommended for industrial computer vision applications, but the benefits of its application have been impaired by the complexity of the formula. This paper proposes a new strategy that improves the usability of the CIEDE2000 metric when a maximum acceptable distance can be imposed. We argue that, for applications where a maximum value, above which colors are considered to be different, can be established, then it is possible to reduce the amount of calculations of the metric, by preemptively analyzing the color features. This methodology encompasses the benefits of the metric while overcoming its computational limitations, thus broadening the range of applications of CIEDE2000 in both the computer vision algorithms and computational resource requirements.
-
ItemFrom a Visual Scene to a Virtual Representation: A Cross-Domain Review( 2023) Pedro Miguel Carvalho ; Paula Viana ; Nuno Alexandre Pereira ; Américo José Pereira ; Luís Corte Real ; 4358 ; 1107 ; 7023 ; 6078 ; 243The widespread use of smartphones and other low-cost equipment as recording devices, the massive growth in bandwidth, and the ever-growing demand for new applications with enhanced capabilities, made visual data a must in several scenarios, including surveillance, sports, retail, entertainment, and intelligent vehicles. Despite significant advances in analyzing and extracting data from images and video, there is a lack of solutions able to analyze and semantically describe the information in the visual scene so that it can be efficiently used and repurposed. Scientific contributions have focused on individual aspects or addressing specific problems and application areas, and no cross-domain solution is available to implement a complete system that enables information passing between cross-cutting algorithms. This paper analyses the problem from an end-to-end perspective, i.e., from the visual scene analysis to the representation of information in a virtual environment, including how the extracted data can be described and stored. A simple processing pipeline is introduced to set up a structure for discussing challenges and opportunities in different steps of the entire process, allowing to identify current gaps in the literature. The work reviews various technologies specifically from the perspective of their applicability to an end-to-end pipeline for scene analysis and synthesis, along with an extensive analysis of datasets for relevant tasks.
-
ItemPhoto2Video: Semantic-Aware Deep Learning-Based Video Generation from Still Content( 2022) Paula Viana ; Maria Teresa Andrade ; Pedro Miguel Carvalho ; Luís Miguel Salgado ; Inês Filipa Teixeira ; Tiago André Costa ; Jonker,P ; 400 ; 1107 ; 4358 ; 5363 ; 7420 ; 7514Applying machine learning (ML), and especially deep learning, to understand visual content is becoming common practice in many application areas. However, little attention has been given to its use within the multimedia creative domain. It is true that ML is already popular for content creation, but the progress achieved so far addresses essentially textual content or the identification and selection of specific types of content. A wealth of possibilities are yet to be explored by bringing the use of ML into the multimedia creative process, allowing the knowledge inferred by the former to influence automatically how new multimedia content is created. The work presented in this article provides contributions in three distinct ways towards this goal: firstly, it proposes a methodology to re-train popular neural network models in identifying new thematic concepts in static visual content and attaching meaningful annotations to the detected regions of interest; secondly, it presents varied visual digital effects and corresponding tools that can be automatically called upon to apply such effects in a previously analyzed photo; thirdly, it defines a complete automated creative workflow, from the acquisition of a photograph and corresponding contextual data, through the ML region-based annotation, to the automatic application of digital effects and generation of a semantically aware multimedia story driven by the previously derived situational and visual contextual data. Additionally, it presents a variant of this automated workflow by offering to the user the possibility of manipulating the automatic annotations in an assisted manner. The final aim is to transform a static digital photo into a short video clip, taking into account the information acquired. The final result strongly contrasts with current standard approaches of creating random movements, by implementing an intelligent content- and context-aware video.
-
ItemA Review of Recent Advances and Challenges in Grocery Label Detection and Recognition( 2023) Guimaraes,V ; Nascimento,J ; Viana,P ; Pedro Miguel Carvalho ; 4358When compared with traditional local shops where the customer has a personalised service, in large retail departments, the client has to make his purchase decisions independently, mostly supported by the information available in the package. Additionally, people are becoming more aware of the importance of the food ingredients and demanding about the type of products they buy and the information provided in the package, despite it often being hard to interpret. Big shops such as supermarkets have also introduced important challenges for the retailer due to the large number of different products in the store, heterogeneous affluence and the daily needs of item repositioning. In this scenario, the automatic detection and recognition of products on the shelves or off the shelves has gained increased interest as the application of these technologies may improve the shopping experience through self-assisted shopping apps and autonomous shopping, or even benefit stock management with real-time inventory, automatic shelf monitoring and product tracking. These solutions can also have an important impact on customers with visual impairments. Despite recent developments in computer vision, automatic grocery product recognition is still very challenging, with most works focusing on the detection or recognition of a small number of products, often under controlled conditions. This paper discusses the challenges related to this problem and presents a review of proposed methods for retail product label processing, with a special focus on assisted analysis for customer support, including for the visually impaired. Moreover, it details the public datasets used in this topic and identifies their limitations, and discusses future research directions of related fields.
-
ItemStereo vision system for human motion analysis in a rehabilitation context( 2019) Matos,AC ; Teresa Cristina Terroso ; Luís Corte Real ; Pedro Miguel Carvalho ; 6217 ; 4358 ; 243The present demographic trends point to an increase in aged population and chronic diseases which symptoms can be alleviated through rehabilitation. The applicability of passive 3D reconstruction for motion tracking in a rehabilitation context was explored using a stereo camera. The camera was used to acquire depth and color information from which the 3D position of predefined joints was recovered based on: kinematic relationships, anthropometrically feasible lengths and temporal consistency. Finally, a set of quantitative measures were extracted to evaluate the performed rehabilitation exercises. Validation study using data provided by a marker based as ground-truth revealed that our proposal achieved errors within the range of state-of-the-art active markerless systems and visual evaluations done by physical therapists. The obtained results are promising and demonstrate that the developed methodology allows the analysis of human motion for a rehabilitation purpose. © 2018, © 2018 Informa UK Limited, trading as Taylor & Francis Group.
-
ItemSynthesizing Human Activity for Data Generation( 2023) Américo José Pereira ; Pedro Miguel Carvalho ; Luís Corte Real ; 6078 ; 4358 ; 243The problem of gathering sufficiently representative data, such as those about human actions, shapes, and facial expressions, is costly and time-consuming and also requires training robust models. This has led to the creation of techniques such as transfer learning or data augmentation. However, these are often insufficient. To address this, we propose a semi-automated mechanism that allows the generation and editing of visual scenes with synthetic humans performing various actions, with features such as background modification and manual adjustments of the 3D avatars to allow users to create data with greater variability. We also propose an evaluation methodology for assessing the results obtained using our method, which is two-fold: (i) the usage of an action classifier on the output data resulting from the mechanism and (ii) the generation of masks of the avatars and the actors to compare them through segmentation. The avatars were robust to occlusion, and their actions were recognizable and accurate to their respective input actors. The results also showed that even though the action classifier concentrates on the pose and movement of the synthetic humans, it strongly depends on contextual information to precisely recognize the actions. Generating the avatars for complex activities also proved problematic for action recognition and the clean and precise formation of the masks.
-
ItemTexture collinearity foreground segmentation for night videos( 2020) Martins,I ; Pedro Miguel Carvalho ; Luís Corte Real ; Luis Alba Castro,JL ; 243 ; 4358One of the most difficult scenarios for unsupervised segmentation of moving objects is found in nighttime videos where the main challenges are the poor illumination conditions resulting in low-visibility of objects, very strong lights, surface-reflected light, a great variance of light intensity, sudden illumination changes, hard shadows, camouflaged objects, and noise. This paper proposes a novel method, coined COLBMOG (COLlinearity Boosted MOG), devised specifically for the foreground segmentation in nighttime videos, that shows the ability to overcome some of the limitations of state-of-the-art methods and still perform well in daytime scenarios. It is a texture-based classification method, using local texture modeling, complemented by a color-based classification method. The local texture at the pixel neighborhood is modeled as an N-dimensional vector. For a given pixel, the classification is based on the collinearity between this feature in the input frame and the reference background frame. For this purpose, a multimodal temporal model of the collinearity between texture vectors of background pixels is maintained. COLBMOG was objectively evaluated using the ChangeDetection.net (CDnet) 2014, Night Videos category, benchmark. COLBMOG ranks first among all the unsupervised methods. A detailed analysis of the results revealed the superior performance of the proposed method compared to the best performing state-of-the-art methods in this category, particularly evident in the presence of the most complex situations where all the algorithms tend to fail. © 2020 Elsevier Inc.
-
ItemTowards vehicle occupant-invariant models for activity characterisation( 2022) Leonardo Gomes Capozzi ; Barbosa,V ; Pinto,C ; João Tiago Pinto ; Américo José Pereira ; Pedro Miguel Carvalho ; Jaime Cardoso ; 3889 ; 4358 ; 6078 ; 7250 ; 8288
-
ItemUnveiling the performance of video anomaly detection models - A benchmark-based review( 2023) Pedro Miguel Carvalho ; Jaime Cardoso ; 4358 ; 3889Deep learning has recently gained popularity in the field of video anomaly detection, with the development of various methods for identifying abnormal events in visual data. The growing need for automated systems to monitor video streams for anomalies, such as security breaches and violent behaviours in public areas, requires the development of robust and reliable methods. As a result, there is a need to provide tools to objectively evaluate and compare the real-world performance of different deep learning methods to identify the most effective approach for video anomaly detection. Current state-of-the-art metrics favour weakly-supervised strategies stating these as the best-performing approaches for the task. However, the area under the ROC curve, used to justify this statement, has been shown to be an unreliable metric for highly unbalanced data distributions, as is the case with anomaly detection datasets. This paper provides a new perspective and insights on the performance of video anomaly detection methods. It reports the results of a benchmark study with state-of-the-art methods using a novel proposed framework for evaluating and comparing the different models. The results of this benchmark demonstrate that using the currently employed set of reference metrics led to the misconception that weakly-supervised methods consistently outperform semi-supervised ones. © 2023 The Authors