Augmented Reality (AR) technology is now increasingly applied in various fields, which can bring an unprecedented immersive experience and rich interaction to the application field. However, complex interactions and informative interfaces impose a long learning curve and burden on users. Making the AR experience more intelligent to reduce redundant operations is one solution to enhance the user experience. One potential research direction is seamlessly combining the two fields of machine learning and AR. This paper proposes using semantic segmentation to assist automatic information placement in AR using a case study within precision agriculture as an example. The precise location of the crop area in the user view is determined by semantic segmentation, which helps to place information in the AR environment automatically. The dataset used for machine learning model construction consists of 242 farmland images. Four semantic segmentation techniques are proposed and bench-marked against each other. The results show that the Attention U-Net deep neural network has the highest recognition accuracy, reaching 91.9%. An AR automatic information placement prototype using Attention U-Net has been developed to run on tablets utilising a micro-service approach. This work demonstrates how AR user interfaces could be placed correctly within the real world, which traditionally has been an understudied area of research within AR and is essential for future AR games and Enterprise applications. As such, this solution has potential usage in all areas of AR application.
In document-level event extraction (DEE) task, event arguments always scatter across sentences (across-sentence issue) and multiple events may lie in one document (multi-event issue). In this paper, we argue that the relation information of event arguments is of great significance for addressing the above two issues, and propose a new DEE framework which can model the relation dependencies, called Relation-augmented Document-level Event Extraction (ReDEE). More specifically, this framework features a novel and tailored transformer, named as Relation-augmented Attention Transformer (RAAT). RAAT is scalable to capture multi-scale and multi-amount argument relations. To further leverage relation information, we introduce a separate event relation prediction task and adopt multi-task learning method to explicitly enhance event extraction performance. Extensive experiments demonstrate the effectiveness of the proposed method, which can achieve state-of-the-art performance on two public datasets. Our code is available at https://github. com/TencentYoutuResearch/RAAT.
3D teeth reconstruction from X-ray is important for dental diagnosis and many clinical operations. However, no existing work has explored the reconstruction of teeth for a whole cavity from a single panoramic radiograph. Different from single object reconstruction from photos, this task has the unique challenge of constructing multiple objects at high resolutions. To conquer this task, we develop a novel ConvNet X2Teeth that decomposes the task into teeth localization and single-shape estimation. We also introduce a patch-based training strategy, such that X2Teeth can be end-to-end trained for optimal performance. Extensive experiments show that our method can successfully estimate the 3D structure of the cavity and reflect the details for each tooth. Moreover, X2Teeth achieves a reconstruction IoU of 0.681, which significantly outperforms the encoder-decoder method by $1.71X and the retrieval-based method by $1.52X. Our method can also be promising for other multi-anatomy 3D reconstruction tasks.
For the first time, a room-temperature (RT) stress-induced order-disorder transition was observed in a metal in-situ TEM tensile sample. This uncommon phase transition from a long-range order (LRO)-C11b precipitate to a γ phase is discovered near the hierarchical fracture of a Ni-Cr-Mo microsample with an orientation of (140o, 30o, -141o). Under such a targeted orientation, dislocation slip activities, which should have been suppressed, were sufficiently activated instead of the desired twinning. Consequently, the C11b domains collapse with the continuous shearing of the nanoscale C11b superstructure by the activated perfect dislocations, ultimately causing an order-disorder transition. The current findings indicate a good chance that researchers will discover a novel plasticizing route as well as a fascinating phase transition mechanism in ordering alloys.
The ancient City of Xi’an has a history of more than 7,000 years of civilization, more than 3,100 years of City development, and 1,100 years of capital construction. With the gradual development of urban areas, the number of imperial tombs in Xi’an has reached 72. These mausoleums are large in scale and valuable, yet they are influenced by the rapid development of present urbanization, cities, and mausoleum spaces. The development contradictions between cities and mausoleum spaces progressively become prominent and need to be handled urgently. This article utilizes spacetime as the base scale, GIS spatio-temporal analysis, field research (including aerial photographs of unmanned aerial vehicle (UAV) in the 8 years, 2015–2022), and Pearson analysis to explore the temporal and spatial evolution law between Xi’an’s urban space and the 55 mausoleums dominated by emperors of various dynasties. It was concluded that the nuclear density area distance layout of Xi’an City and the mausoleum is closely related to time and space. Since ancient times, there has always been a relationship between the Spatio-temporal development of Xi’an City and its mausoleums, and the nuclear density area distance layout of the mausoleums is intimately connected to the status and nature of Xi’an City. Currently, mausoleums are part of site protection. However, because of the large space of the mausoleum, the contradiction between the protection and utilization of mausoleum sites and the development of urban space is revealed. This paper hopes to provide urban planners and site protectors with ideas and data support for the Spatio-temporal development of cities and mausoleums and realize the integration of the protection and renewal of mausoleum sites into the path of urban design and planning.
Visual Simultaneous Localization and Mapping (SLAM) in dynamic scenes is a prerequisite for robot-related applications. Most of the existing SLAM algorithms mainly focus on dynamic object rejection, which makes part of the valuable information lost and prone to failure in complex environments. This paper proposes a semantic visual SLAM system that incorporates rigid object tracking. A robust scene perception frame is designed, which gives autonomous robots the ability to perceive scenes similar to human cognition. Specifically, we propose a two-stage mask revision method to generate fine mask of the object. Based on the revised mask, we propose a semantic and geometric constraint (SAG) strategy, which provides a fast and robust way to perceive dynamic rigid objects. Then, the motion tracking of rigid objects is integrated into the SLAM pipeline, and a novel bundle adjustment is constructed to optimize camera localization and object' 6-DoF poses. Finally, the evaluation of the proposed algorithm is performed on publicly available KITTI dataset, Oxford Multimotion Dataset, and real-world scenarios. The proposed algorithm achieves the comprehensive performance of RPEt less than 0.07m per frame and RPER about 0.03° per frame in the KITTI dataset. The experimental results reveal that the proposed algorithm enables accurate localization and robust tracking than state-of-the-art SLAM algorithms in challenging dynamic scenarios.
Building a socially intelligent agent involves many challenges, one of which is to track the agent's mental state transition and teach the agent to make rational decisions guided by its utility like a human. Towards this end, we propose to incorporate a mental state parser and utility model into dialogue agents. The hybrid mental state parser extracts information from both the dialogue and event observations and maintains a graphical representation of the agent's mind; Meanwhile, the utility model is a ranking model that learns human preferences from a crowd-sourced social commonsense dataset, Social IQA. Empirical results show that the proposed model attains state-of-the-art performance on the dialogue/action/emotion prediction task in the fantasy text-adventure game dataset, LIGHT. We also show example cases to demonstrate: (\textit{i}) how the proposed mental state parser can assist agent's decision by grounding on the context like locations and objects, and (\textit{ii}) how the utility model can help the agent make reasonable decisions in a dilemma. To the best of our knowledge, we are the first work that builds a socially intelligent agent by incorporating a hybrid mental state parser for both discrete events and continuous dialogues parsing and human-like utility modeling.
This paper proposes a new kind of satellite anti-interception communication system based on weighted-type fractional Fourier transform (WFRFT) and Multiple Input Multiple Output (MIMO) technology to improve the safety and anti-interception of satellite communication system. This system adopts multiple-layer WFRFT to modulate the signal. The number of WFRFT layers and the transmit antennas is the same. The modulation parameter of each WFRFT layer is different. After WFRFT modulation, the original signal has the characteristic of time and frequency domain, which can effectively resist the parameter scanning. MIMO can effectively improve the spectrum utilization and system capacity. By theoretical analysis, the WFRFT-MIMO communication system can be used to restore the original signal efficiently, and the eavesdropper can't intercept the signal. Taking two transmit and one receive antennas for example, the results of the contrast of bit error rate (BER) performance between legal receiver and eavesdropper and relationship between receiving performance and WFRFT order deviation are simulated respectively, when the modulation order error of the eavesdropper is 0.1, and the signal-to-noise ratio is 20dB, the BER of eavesdropper is 10 -3 , which is more than the legal receiver 4dB. The performance between different receiving performance and transmitting antennas groups is simulated, when the signal to noise ratio (SNR) is 10dB, the performance of the three transmit four receive antennas is higher than that of the two transmit one receive antennas, which increases the 4. 5dB.