Image captioning using Encoder-Decoder based approach where CNN is used as the Encoder and sequence generator like RNN as Decoder has proven to be very effective. However, this method has a drawback that is sequence needs to be processed in order. To overcome this drawback some researcher has utilized the Transformer model to generate captions from images using English datasets. However, none of them generated captions in Bengali using the transformer model. As a result, we utilized three different Bengali datasets to generate Bengali captions from images using the Transformer model. Additionally, we compared the performance of the transformer-based model with a visual attention-based Encoder-Decoder approach. Finally, we compared the result of the transformer-based model with other models that employed different Bengali image captioning datasets.
An omnipresent challenging research topic in com-puter vision is the generation of captions from an input image. Previously, numerous experiments have been conducted on image captioning in English but the generation of the caption from the image in Bengali is still sparse and in need of more refining. Only a few papers till now have worked on image captioning in Bengali. Hence, we proffer a standard strategy for Bengali image caption generation on two different sizes of the Flickr8k dataset and BanglaLekha dataset which is the only publicly available Bengali dataset for image captioning. Afterward, the Bengali captions of our model were compared with Bengali captions generated by other researchers using different architectures. Additionally, we employed a hybrid approach based on InceptionResnetV2 or Xception as Convolution Neural Network and Bidirectional Long Short-Term Memory or Bidirectional Gated Recurrent Unit on two Bengali datasets. Furthermore, a different combination of word embedding was also adapted. Lastly, the performance was evaluated using Bilingual Evaluation Understudy and proved that the proposed model indeed performed better for the Bengali dataset consisting of 4000 images and the BanglaLekha dataset.
Attention based approaches has been manifested to be an effective method in image captioning. However, attention can be used on text called semantic attention or on image which in known as spatial attention. We chose to implement the later as the main problem of image captioning is not being able to detect objects in image properly. In this work, we develop an approach which extracts features from images using two different convolutional neural network and combines the features with an attention model in order to generate caption with an RNN. We adapted Xception and InceptionV3 as our CNN and GRU as our RNN. Moreover, we Evaluated our proposed model on Flickr8k dataset translated into Bengali. So that captions can be generated in Bengali using visual attention.
The outbreak of the COVID-19 pandemic caused the death of a large number of people. Millions ofpeople are infected by this virus and are still getting infected day by day. As the cost and required time ofconventional RT-PCR tests to detect COVID-19, researchers are trying to use medical images like X-Ray andComputed Tomography (CT) images to detect it with the help of Artificial Intelligence (AI) based systems. Inthis paper, we reviewed some of these newly emerging AI-based models that can detect COVID-19 frommedical images using X-Ray or CT of lung images. We collected information about available research resourcesand inspected a total of 80 papers from the time period of February 21, 2020 to June 20, 2020. We explored andanalyzed datasets, preprocessing techniques, segmentation, feature extraction, classification and experimentalresults which can be helpful for finding future research directions in the domain of automatic diagnosis ofCovid-19 disease using Artificial Intelligence (AI) based frameworks.
Establishing patient-specific finite element analysis (FEA) models for computational fluid dynamics (CFD) of double stenosed artery models involves time and effort, restricting physicians' ability to respond quickly in time-critical medical applications. Such issues might be addressed by training deep learning (DL) models to learn and predict blood flow characteristics using a dataset generated by CFD simulations of simplified double stenosed artery models with different configurations. When blood flow patterns are compared through an actual double stenosed artery model, derived from IVUS imaging, it is revealed that the sinusoidal approximation of stenosed neck geometry, which has been widely used in previous research works, fails to effectively represent the effects of a real constriction. As a result, a novel geometric representation of the constricted neck is proposed which, in terms of a generalized simplified model, outperforms the former assumption. The sequential change in artery lumen diameter and flow parameters along the length of the vessel presented opportunities for the use of LSTM and GRU DL models. However, with the small dataset of short lengths of doubly constricted blood arteries, the basic neural network model outperforms the specialized RNNs for most flow properties. LSTM, on the other hand, performs better for predicting flow properties with large fluctuations, such as varying blood pressure over the length of the vessels. Despite having good overall accuracies in training and testing across all the properties for the vessels in the dataset, the GRU model underperforms for an individual vessel flow prediction in all cases. The results also point to the need of individually optimized hyperparameters for each property in any model rather than aiming to achieve overall good performance across all outputs with a single set of hyperparameters.
Establishing patient-specific finite element analysis (FEA) models for computational fluid dynamics (CFD) of double stenosed artery models involves time and effort, restricting physicians' ability to respond quickly in time-critical medical applications. Such issues might be addressed by training deep learning (DL) models to learn and predict blood flow character-istics using a dataset generated by CFD simulations of simplified double stenosed artery models with different configurations. When blood flow patterns are compared through an actual double stenosed artery model, derived from IVUS imaging, it is revealed that the sinusoidal approximation of stenosed neck geometry, which has been widely used in previous research works, fails to effectively represent the effects of a real constriction. As a result, a novel geometric representation of the constricted neck is proposed which, in terms of a generalized simplified model, outperforms the former assumption. The sequential change in artery lumen diameter and flow parameters along the length of the vessel presented opportunities for the use of LSTM and GRU DL models. However, with the small dataset of short lengths of doubly constricted blood arteries, the basic neural network model outperforms the specialized RNNs for most flow properties. LSTM, on the other hand, performs better for predicting flow properties with large fluctuations, such as varying blood pressure over the length of the vessels. Despite having good overall accuracies in training and testing across all the properties for the vessels in the dataset, the GRU model underperforms for an individual vessel flow prediction in all cases. The results also point to the need of individually optimized hyperparameters for each property in any model rather than aiming to achieve overall good performance across all outputs with a single set of hyperparameters.
Image captioning using Encoder-Decoder based approach where CNN is used as the Encoder and sequence generator like RNN as Decoder has proven to be very effective. However, this method has a drawback that is sequence needs to be processed in order. To overcome this drawback some researcher has utilized the Transformer model to generate captions from images using English datasets. However, none of them generated captions in Bengali using the transformer model. As a result, we utilized three different Bengali datasets to generate Bengali captions from images using the Transformer model. Additionally, we compared the performance of the transformer-based model with a visual attention-based Encoder-Decoder approach. Finally, we compared the result of the transformer-based model with other models that employed different Bengali image captioning datasets.
AHMERS, active health monitoring and emergency response system, is a mobile application-based system that wirelessly connects to a smartwatch to constantly monitor the human body and respond to sudden changes in vital data, in case of emergency. This app monitors heartbeat rate, blood oxygen saturation, body temperature, and compares them with pre-set normal values. If the data deviates or the user presses one of the emergency switches, the app immediately asks the person if he/she is ok. If the person fails to respond within a few seconds to the "Are you ok?" message, the app continues to send out distress signals to pre-set phone numbers and server along with the person's altitude, latitude and longitude, and current location on the map so that help can be sent quickly. It can determine whether a person is suffering from a health-related problem such as heart failure, Corona virus infection, hypothermia, etc.