Large-scale datasets have successively proven their fundamental importance in several research fields, especially for early progress in some emerging topics. In this paper, we focus on the problem of visual speech recognition, also known as lip-reading, which has received increasing interest in recent years. We present a naturally-distributed large-scale benchmark for lip-reading in the wild, named LRW-1000, which contains 1,000 classes with 718,018 samples from more than 2,000 individual speakers. Each class corresponds to the syllables of a Mandarin word composed of one or several Chinese characters. To the best of our knowledge, it is currently the largest word-level lipreading dataset and also the only public large-scale Mandarin lip-reading dataset. This dataset aims at covering a "natural" variability over different speech modes and imaging conditions to incorporate challenges encountered in practical applications. It has shown a large variation in this benchmark in several aspects, including the number of samples in each class, video resolution, lighting conditions, and speakers' attributes such as pose, age, gender, and make-up. Besides providing a detailed description of the dataset and its collection pipeline, we evaluate several typical popular lip-reading methods and perform a thorough analysis of the results from several aspects. The results demonstrate the consistency and challenges of our dataset, which may open up some new promising directions for future work.
In this paper, we present CogNet, a knowledge base (KB) dedicated to integrating three types of knowledge: (1) linguistic knowledge from FrameNet, which schematically describes situations, objects and events. (2) world knowledge from YAGO, Freebase, DBpedia and Wikidata, which provides explicit knowledge about specific instances. (3) commonsense knowledge from ConceptNet, which describes implicit general facts. To model these different types of knowledge consistently, we introduce a three-level unified frame-styled representation architecture. To integrate free-form commonsense knowledge with other structured knowledge, we propose a strategy that combines automated labeling and crowdsourced annotation. At present, CogNet integrates 1,000+ semantic frames from linguistic KBs, 20,000,000+ frame instances from world KBs, as well as 90,000+ commonsense assertions from commonsense KBs. All these data can be easily queried and explored on our online platform, and free to download in RDF format for utilization under a CC-BY-SA 4.0 license. The demo and data are available at http://cognet.top/.
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training. This approach capitalizes on the inherent superposition property of wireless channels, facilitating fast and scalable parameter aggregation. Meanwhile, it enhances the robustness of the model training process by dynamically adjusting the stepsize in accordance with the global gradient update. We derive the convergence rate of the training algorithms, encompassing the effects of channel fading and interference, for a broad spectrum of nonconvex loss functions. Our analysis shows that the AdaGrad-based algorithm converges to a stationary point at the rate of $\mathcal{O}( \ln{(T)} /{ T^{ 1 - \frac{1}{\alpha} } } )$, where $\alpha$ represents the tail index of the electromagnetic interference. This result indicates that the level of heavy-tailedness in interference distribution plays a crucial role in the training efficiency: the heavier the tail, the slower the algorithm converges. In contrast, an Adam-like algorithm converges at the $\mathcal{O}( 1/T )$ rate, demonstrating its advantage in expediting the model training process. We conduct extensive experiments that corroborate our theoretical findings and affirm the practical efficacy of our proposed federated adaptive gradient methods.
Federated learning (FL) provides a privacy-preserving approach to realizing networked intelligence. However, the performance of FL is often constrained by the limited communication resources, especially in the context of a wireless system. To tackle this communication bottleneck, recent studies propose an analog over-the-air (A-OTA) FL paradigm which employs A-OTA computations in the model aggregation step that significantly enhances scalability. The existing architectures mainly conduct model training via (stochastic) gradient descent, while adaptive optimization methods, which have achieved notable success in deep learning, remain unexplored. In this paper, we establish a distributed training paradigm that incorporates adaptive gradient methods into the A-OTA FL framework, aiming to enhance the system's convergence performance. We derive an analytical expression for the convergence rate, capturing the effects of various system parameters on the convergence performance of the proposed method. We also perform several experiments to validate the efficacy of the proposed method.
Universal adhesion of hydrogels to diverse materials is essential to their extensive applications. Unfortunately, tough adhesion of wet surfaces remains an urgent challenge so far, requiring robust cohesion strength for effective stress dissipation. In this work, a dual-network hydrogel polyethylenimine–poly(acrylic acid)/alginate (PEI–PAA/Alg) with excellent mechanical strength is realized via PEI–PAA complex and calcium alginate coordination for universal adhesion by the synergistic effort of topological entanglement and catechol chemistry. The dual networks of PEI–PAA/Alg provide mechanically reinforced cohesion strength, which is sufficient for energy dissipation during adhesion with universal materials. After the integration of mussel-inspired dopamine into PAA or Alg, the adhesive demonstrates further improved adhesion performance with a solid adherend and capability to bond cancellous bones. Notably, the dopamine-modified adhesive exhibits better instant adhesion and reversibility with wet surfaces compared with commercial fibrin. Adhesion interfaces are investigated by SEM and micro-FTIR to verify the effectiveness of strategies of topological entanglement. Furthermore, the adhesive also possesses great injectability, stability, tissue adhesion, and biocompatibility. In vivo wound healing and histological analysis indicate that the hydrogel can promote wound closure, epidermis regeneration, and tissue refunctionalization, implying its potential application for bioadhesive and wound dressing.
Lip-reading aims to recognize speech content from videos via visual analysis of speakers' lip movements. This is a challenging task due to the existence of homophemes-words which involve identical or highly similar lip movements, as well as diverse lip appearances and motion patterns among the speakers. To address these challenges, we propose a novel lip-reading model which captures not only the nuance between words but also styles of different speakers, by a multi-grained spatio-temporal modeling of the speaking process. Specifically, we first extract both frame-level fine-grained features and short-term medium-grained features by the visual front-end, which are then combined to obtain discriminative representations for words with similar phonemes. Next, a bidirectional ConvLSTM augmented with temporal attention aggregates spatio-temporal information in the entire input sequence, which is expected to be able to capture the coarse-gained patterns of each word and robust to various conditions in speaker identity, lighting conditions, and so on. By making full use of the information from different levels in a unified framework, the model is not only able to distinguish words with similar pronunciations, but also becomes robust to appearance changes. We evaluate our method on two challenging word-level lip-reading benchmarks and show the effectiveness of the proposed method, which also demonstrate the above claims.
Event ontology provides a shared and formal specification about what happens in the real world and can benefit many natural language understanding tasks. However, the independent development of event ontologies often results in heterogeneous representations that raise the need for establishing alignments between semantically related events. There exists a series of works about ontology alignment (OA), but they only focus on the entity-based OA, and neglect the event-based OA. To fill the gap, we construct an Event Ontology Alignment (EventOA) dataset based on FrameNet and Wikidata, which consists of 900+ event type alignments and 8,000+ event argument alignments. Furthermore, we propose a multi-view event ontology alignment (MEOA) method, which utilizes description information (i.e., name, alias and definition) and neighbor information (i.e., subclass and superclass) to obtain richer representation of the event ontologies. Extensive experiments show that our MEOA outperforms the existing entity-based OA methods and can serve as a strong baseline for EventOA research.
Commonsense knowledge graphs (CKGs) are increasingly applied in various natural language processing tasks. However, most existing CKGs are limited to English, which hinders related research in non-English languages. Meanwhile, directly generating commonsense knowledge from pretrained language models has recently received attention, yet it has not been explored in non-English languages. In this paper, we propose a large-scale Chinese CKG generated from multilingual PLMs, named as **CN-AutoMIC**, aiming to fill the research gap of non-English CKGs. To improve the efficiency, we propose generate-by-category strategy to reduce invalid generation. To ensure the filtering quality, we develop cascaded filters to discard low-quality results. To further increase the diversity and density, we introduce a bootstrapping iteration process to reuse generated results. Finally, we conduct detailed analyses on CN-AutoMIC from different aspects. Empirical results show the proposed CKG has high quality and diversity, surpassing the direct translation version of similar English CKGs. We also find some interesting deficiency patterns and differences between relations, which reveal pending problems in commonsense knowledge generation. We share the resources and related models for further study.
Background: The development of cloud-based, service-focused and intelligent networks has increased the demand for highly reliable, error-tolerant and computationally efficient means of reducing the costs associated with network operation, maintenance, testing and innovations.Methods: We present a fault self-recovery method for fifth-generation core (5GC) networks. Data models are built according to the data governance approach to include the equipment, links and services of the physical network in the digital twin. Visual topology technology is used to extract knowledge-as-a-service (KaaS) capabilities such as call quality tests, fault-propagation chain reasoning and disaster recovery analysis.Results: The proposed method realises 5GC closed-loop self-recovery through four processes: perception, analysis, decision-making and execution. In tests, it achieved 5GC network fault detection in 1 min, delimitation in 20 min, and recovery in 5 min.Conclusions: Through the network digital twin technology, based on the model and state data, the twinning capabilities such as simulation and event topology can be used to realize the network anomaly perception, fault rapid confinement and service survival decision, thus effectively improving the fault processing efficiency and reducing the fault impact.