Reinforcement Learning-based Frame-level Bit Allocation for VVC
0
Citation
14
Reference
10
Related Paper
Abstract:
As frame-level bit allocation is a dependent sequential decision-making problem, it can be modeled as a Markov Decision Process (MDP) and solved by reinforcement learning (RL). Existing reports using RL for coding mainly have two problems: unrepresentative handcrafted features and limited interaction efficiency. In this paper, to address these two problems, we first propose to use a deep neural network to extract features from the state, which is designed as a combination of raw pixel-level information and handcrafted features. The network is able to extract information necessary for the following decision-making neural network automatically. We then propose an efficient training scheme. By designing a parallel interaction algorithm and using a simplified video encoder, our proposed scheme can train on a sophisticated encoder such as VTM. To the best of our knowledge, this is the first RL algorithm implemented for bit allocation in VTM. The experimental results show that our scheme can learn interpretable frame-level bit allocation and achieves a better rate-distortion (R-D) performance compared with VTM.Keywords:
Bit (key)
Bit (key)
Cite
Citations (0)
Recent research findings suggest that the initial reductive effects of noncontingent reinforcement (NCR) schedules on destructive behavior result from the establishing effects of an antecedent stimulus (i.e., the availability of "free" reinforcement) rather than extinction. A number of authors have suggested that these antecedent effects result primarily from reinforcer satiation, but an alternative hypothesis is that the individual attempts to access contingent reinforcement primarily when noncontingent reinforcement is unavailable, but chooses not to access contingent reinforcement when noncontingent reinforcement is available. If the satiation hypothesis is more accurate, then the reductive effects of NCR should increase over the course of a session, especially for denser schedules of NCR, and should occur during both NCR delivery and the NCR inter-reinforcement interval (NCR IRI). If the choice hypothesis is more accurate, then the reductive effects of NCR should be relatively constant over the course of a session for both denser and leaner schedules of NCR and should occur almost exclusively during the NCR interval (rather than the NCR IRI). To evaluate these hypotheses, we examined within-session trends of destructive behavior with denser and leaner schedules of NCR (without extinction), and also measured responding in the NCR interval separate from responding in the NCR IRI. Reductions in destructive behavior were mostly due to the participants choosing not to access contingent reinforcement when NCR was being delivered and only minimally due to reinforcer satiation.
Extinction (optical mineralogy)
Stimulus (psychology)
Cite
Citations (30)
An attempt was made to modify a socially desirable response of mental patients. It was found that instructions to the patients had no enduring effect unless accompanied by reinforcement. Also, it was found that reinforcement was not effective unless the reinforcement procedure was accompanied by instructions that specified the basis for the reinforcement. Maximum change in behavior was obtained when the reinforcement procedure took advantage of the existing verbal repertoire of the patients. A significant methodological finding was that substantial modification of the behavior of psychotics could be achieved by briefly delaying, rather than withholding, food reinforcement.
Cite
Citations (208)
Rats were exposed to a random sequence of reinforcement on two levers, such that there was no way to predict from the previous reinforcement which lever would deliver reinforcement next. The rats showed a tendency to repeat the choice that had just produced reinforcement, despite the absence of an overall contingency that differentially reinforced such repetition. However, this tendency decreased with continued exposure to the schedule. Runs of successive reinforcements on a lever increased the probability of pressing that lever, but only slightly, and only in the earlier phases of training. The more quickly a press was made after reinforcement the more likely it was to be on the lever that had delivered that reinforcement. Repetition of choice followed by reinforcement should be viewed as a naturally occurring behavior in the rat, but not necessarily as a behavior that will continue without differential reinforcement of repetition.
Lever
Repetition (rhetorical device)
Contingency
Differential reinforcement
Cite
Citations (36)
Three experiments were conducted to investigate the theoretical reduction of rate and duration of reinforcement to their product, rate of reinforcement‐time, under concurrent chain schedules. In Exp. I, rate of reinforcement‐time was varied by varying rate of reinforcement delivery, holding duration of reinforcement availability constant; in Exp. II, rate of reinforcement‐time was varied by holding rate of reinforcement delivery constant and varying duration of reinforcement availability; in Exp. III, rate of reinforcement‐time was held constant by varying both rate and duration of reinforcement simultaneously and inversely. For all three experiments, both relative rate of responding and relative time spent in the initial link were found to match approximately the relative rate of reinforcement‐time arranged in the terminal link. These data were interpreted as support for the notion that rate and duration of reinforcement may be functionally equivalent and reducible to a single variable, rate of reinforcement‐time.
Operant conditioning
Constant (computer programming)
Cite
Citations (37)
Lever
Stimulus (psychology)
Cite
Citations (1)
Positive reinforcement was more effective than negative reinforcement in promoting compliance and reducing escape‐maintained problem behavior for a child with autism. Escape extinction was then added while the child was given a choice between positive or negative reinforcement for compliance and the reinforcement schedule was thinned. When the reinforcement requirement reached 10 consecutive tasks, the treatment effects became inconsistent and reinforcer selection shifted from a strong preference for positive reinforcement to an unstable selection pattern.
Extinction (optical mineralogy)
Cite
Citations (77)
This study examined the effects of reinforcement and reinforcement plus information on both appropriate and inappropriate behavior in subjects provided with direct reinforcement and those seated adjacent to them. Four female kindergarten subjects who were of average intelligence were chosen on the basis of engaging in a relatively high percentage of inappropriate behavior. The subjects were randomly assigned to one of two pairs and within each pair, one subject was randomly designated as the one to be administered direct reinforcement (target subject). The remaining subject in each pair (non-target subject) received no direct reinforcement but was seated adjacent to the target subject. Each pair of the subjects were then exposed to seven experimental conditions: baseline, reinforcement for appropriate behavior, reversal, reinforcement f or inappropriate behavior, reinforcement for appropriate behavior with information about the contingencies, reinforcement for inappropriate behavior with information about the contingencies, reinforcement for appropriate behavior with information about the contingencies. Changes in the non-target subjects were observed as a function of witnessing a target subject receive reinforcement for appropriate behavior. When inappropriate behavior was reinforced in the target subjects, only slight changes were observed in the non-target subjects. Information about the contingencies increased the effectiveness of reinforcement in all subjects. This was particularly relevant to inappropriate behavior. The results are discussed with regard to the vicarious reinforcement literature and with regard to the efficacy of providing information along with reinforcement in order to augment it.
Cite
Citations (0)
A differential-reinforcement-of-other-behavior (DRO) schedule with trials and delayed reinforcement was investigated. Periodically a wheel was briefly available to rats, followed six seconds later by brief availability of a bar. Variable-ratio food reinforcement of wheel turns was adjusted to give 95% turns. After variable-ratio-five reinforcement of bar presses produced 100% pressing, then separate ratio schedules were used for presses following turns (turn presses) and presses following nonturns (nonturn presses). Increasing nonturn-press reinforcements decreased turns, even though total reinforcements increased. Reversal by decreasing nonturn-press reinforcements raised turns, though with hysteresis. Thus food reinforcement increased nonturns even though delayed six to ten seconds after nonturns, a delay that greatly reduces response reinforcement. Those and other results indicate that the turn decrease was not due to reinforcement of competing responses. Evidence against other alternatives, and the reduction of responding by increased reinforcement, indicate that the term inhibition is appropriate for the phenomenon reinforced. Response-specific inhibition appears appropriate for this particular kind, since its effects are more specific to particular responses than Pavlovian conditioned-inhibition. Response-specific inhibition seems best considered a behavioral output comparable to responses (e.g., both reinforcible) but with important properties different from responses (e.g., different reinforcement-delay gradients).
Differential reinforcement
Bar (unit)
Cite
Citations (9)
In order to obtain the study of the bonding properties between the reinforcement-concrete and give full play to the material properties, a lot of research has been carried out on reinforcement-concrete. Existing reinforcement-concrete studies contain mainly reinforcement-concrete bonds, reinforcement lap, and anchorage of reinforcement. The reinforcement-concrete bond test mainly measures the bond-slip curve between the two to determine the bond strength between reinforcement and concrete. The reinforcement lap test is mainly used for the performance study of the anchorage length of reinforcement in concrete, whether the lap bars are in contact with each other, which can be divided into two forms: contact lap and indirect lap. The anchorage test of reinforcement is conducted to study the reduction of the connection length between reinforcement and concrete while meeting the force requirements. According to a large number of tests, the bond strength of the reinforcement is affected by the shape of the mixed reinforcement, the thickness of the protective layer of the diameter concrete, the spacing of the reinforcement, the transverse reinforcement restraint, and the material properties of the reinforcement and concrete. This paper discusses the test methods, influencing factors, and the lack of existing research in the study of the performance of reinforcement-concrete bonding, and lap and anchorage properties.
Cite
Citations (1)