The rapid advancement of AI-generated content (AIGC) promises to transform various aspects of human life significantly. This work particularly focuses on the potential of AIGC to revolutionize image creation, such as photography and self-expression. We introduce ContextCam, a novel human-AI image co-creation system that integrates context awareness with mainstream AIGC technologies like Stable Diffusion. ContextCam provides user's image creation process with inspiration by extracting relevant contextual data, and leverages Large Language Model-based (LLM) multi-agents to co-create images with the user. A study with 16 participants and 136 scenarios revealed that ContextCam was well-received, showcasing personalized and diverse outputs as well as interesting user behavior patterns. Participants provided positive feedback on their engagement and enjoyment when using ContextCam, and acknowledged its ability to inspire creativity.
Abstract. Traditional methods of geometrical shape analysis of plants most are based on 2D images. But this way is restrained by image position, sensors‘ prices, plants‘ density (e.g. overlapping problems) etc. With the introduced of RGB-D sensor, we can build plants‘ 3D model in little cost of time and money. Since the sensor, Microsoft Kinect V2, is inexpensive, which can be accessible and efficient for real-time plant monitoring in greenhouse. The 3D sensor equipped with a ToF (Time of Flight) light, is capable of capturing distance information, intensity and amplitude data in a single shot. By using the Kinect V2 as a sensor, this experiment shot RGB and RGB-D photo simultaneously in overhead position of plants, inside the house. Then we will reconstruct the 3D model and estimate the geometrical information, like biomass, leaf cover area, height, of the plants. Another destructive manual measurement done after the modeling to get the real geometrical information as compare. When analyzing accuracy of vegetables‘ parameters that gain from 3D model we can find that parameters, can be estimated at a receivable range. With the accuracy have been improved, automation operation in greenhouse can also be promoted.
Emerging terminals, such as smartwatches, true wireless earphones, in-vehicle computers, etc., are complementing our portals to ubiquitous information services. However, the current ecology of information services, encapsulated into millions of mobile apps, is largely restricted to smartphones; accommodating them to new devices requires tremendous and almost unbearable engineering efforts. Interaction Proxy, firstly proposed as an accessible technique, is a potential solution to mitigate this problem. Rather than re-building an entire application, Interaction Proxy constructs an alternative user interface that intercepts and translates interaction events and states between users and the original app's interface. However, in such a system, one key challenge is how to robustly and efficiently "communicate" with the original interface given the instability and dynamicity of mobile apps (e.g., dynamic application status and unstable layout). To handle this, we first define UI-Independent Application Description (UIAD), a reverse-engineered semantic model of mobile services, and then propose Interaction Proxy Manager (IPManager), which is responsible for synchronizing and managing the original apps' interface, and providing a concise programming interface that exposes information and method entries of the concerned mobile services. In this way, developers can build alternative interfaces without dealing with the complexity of communicating with the original app's interfaces. In this paper, we elaborate the design and implementation of our IPManager, and demonstrate its effectiveness by developing three typical proxies, mobile-smartwatch, mobile-vehicle and mobile-voice. We conclude by discussing the value of our approach to promote ubiquitous computing, as well as its limitations.
A dynamic PWM (Pulse Width Modulation) variable spray experiment platform was built up to study the spray distribution characteristics of PWM spray progress. The homogeneity of single nozzle dynamic spraying was evaluated based on this platform under different experiment condition. Carmine solution concentration-absorbance technology was adopted to get the CV (Coefficient of Variation) value of percent area spray coverage. RSM (Response Surface Methodology) with three variables including frequency, duty cycle of PWM control signal and forward velocity of spray was employed to study the effects of these three factors on dynamic spray distribution uniformity. Compared with previous common study of static spraying, the research on this paper can evaluate the actual spray distribution uniformity of dynamic spraying more accurately. Moreover, the platform and the research method can provide references for the settings of PWM variable-rate spray operation parameters in practical plant protection work.
Spatial perception, particularly at small and medium scales, is an essential human sense but poses a significant challenge for the blind and visually impaired (BVI). Traditional learning methods for BVI individuals are often constrained by the limited availability of suitable learning environments and high associated costs. To tackle these barriers, we conducted comprehensive studies to delve into the real-world challenges faced by the BVI community. We have identified several key factors hindering their spatial perception, including the high social cost of seeking assistance, inefficient methods of information intake, cognitive and behavioral disconnects, and a lack of opportunities for hands-on exploration. As a result, we developed AngleSizer, an innovative teaching assistant that leverages smartphone technology. AngleSizer is designed to enable BVI individuals to use natural interaction gestures to try, feel, understand, and learn about sizes and angles effectively. This tool incorporates dual vibration-audio feedback, carefully crafted teaching processes, and specialized learning modules to enhance the learning experience. Extensive user experiments validated its efficacy and applicability with diverse abilities and visual conditions. Ultimately, our research not only expands the understanding of BVI behavioral patterns but also greatly improves their spatial perception capabilities, in a way that is both cost-effective and allows for independent learning.
Robotic Process Automation (RPA) offers a valuable solution for efficiently automating tasks on the graphical user interface (GUI), by emulating human interactions, without modifying existing code. However, its broader adoption is constrained by the need for expertise in both scripting languages and workflow design. To address this challenge, we present PromptRPA, a system designed to comprehend various task-related textual prompts (e.g., goals, procedures), thereby generating and performing corresponding RPA tasks. PromptRPA incorporates a suite of intelligent agents that mimic human cognitive functions, specializing in interpreting user intent, managing external information for RPA generation, and executing operations on smartphones. The agents can learn from user feedback and continuously improve their performance based on the accumulated knowledge. Experimental results indicated a performance jump from a 22.28% success rate in the baseline to 95.21% with PromptRPA, requiring an average of 1.66 user interventions for each new task. PromptRPA presents promising applications in fields such as tutorial creation, smart assistance, and customer service.
This paper presents LangAware, a collaborative approach for constructing personalized context for context-aware applications. The need for personalization arises due to significant variations in context between individuals based on scenarios, devices, and preferences. However, there is often a notable gap between humans and machines in the understanding of how contexts are constructed, as observed in trigger-action programming studies such as IFTTT. LangAware enables end-users to participate in establishing contextual rules in-situ using natural language. The system leverages large language models (LLMs) to semantically connect low-level sensor detectors to high-level contexts and provide understandable natural language feedback for effective user involvement. We conducted a user study with 16 participants in real-life settings, which revealed an average success rate of 87.50% for defining contextual rules in a variety of 12 campus scenarios, typically accomplished within just two modifications. Furthermore, users reported a better understanding of the machine's capabilities by interacting with LangAware.
Test sets play an essential role in evaluating text entry techniques. In this paper, we argue that in addition to the widely adopted metric of bigram representativeness and memorability, word clarity should also be considered as a metric when creating test sets from the target dataset. Word clarity quantifies the extent to which a word is likely to confuse with other words on a keyboard. We formally define word clarity, derive equations calculating it, and both theoretically and empirically show that word clarity has a significant effect on text entry performance: it can yield up to 26.4% difference in error rate, and 25% difference in input speed. We later propose a Pareto optimization method for sampling test sets with different sizes, which optimizes the word clarity and bigram representativeness, and memorability of the test set. The obtained test sets are published on the Internet.
Modern touchscreen keyboards are all powered by the word-level auto-correction ability to handle input errors. Unfortunately, visually impaired users are deprived of such benefit because a screen-reader keyboard offers only character-level input and provides no correction ability. In this paper, we present VIPBoard, a smart keyboard for visually impaired people, which aims at improving the underlying keyboard algorithm without altering the current input interaction. Upon each tap, VIPBoard predicts the probability of each key considering both touch location and language model, and reads the most likely key, which saves the calibration time when the touchdown point misses the target key. Meanwhile, the keyboard layout automatically scales according to users' touch point location, which enables them to select other keys easily. A user study shows that compared with the current keyboard technique, VIPBoard can reduce touch error rate by 63.0% and increase text entry speed by 12.6%.