Variational Recurrent Models for Solving Partially Observable Control Tasks

Han, Dongqi; Doya, Kenji; Tani, Jun

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

{"_buckets": {"deposit": "d4b2eec7-5cc4-4059-9261-7d23988ad669"}, "_deposit": {"created_by": 29, "id": "1736", "owners": [29], "pid": {"revision_id": 0, "type": "depid", "value": "1736"}, "status": "published"}, "_oai": {"id": "oai:oist.repo.nii.ac.jp:00001736", "sets": ["203", "222"]}, "author_link": ["1334", "8640", "479"], "item_10003_biblio_info_7": {"attribute_name": "書誌情報", "attribute_value_mlt": [{"bibliographicIssueDates": {"bibliographicIssueDate": "2019-09-26", "bibliographicIssueDateType": "Issued"}, "bibliographic_titles": [{}]}]}, "item_10003_description_5": {"attribute_name": "抄録", "attribute_value_mlt": [{"subitem_description": "In partially observable (PO) environments, deep reinforcement learning (RL) agents often suffer from unsatisfactory performance, since two problems need to be tackled together: how to extract information from the raw observations to solve the task, and how to improve the policy. In this study, we propose an RL algorithm for solving PO tasks. Our method comprises two parts: a variational recurrent model (VRM) for modeling the environment, and an RL controller that has access to both the environment and the VRM. The proposed algorithm was tested in two types of PO robotic control tasks, those in which either coordinates or velocities were not observable and those that require long-term memorization. Our experiments show that the proposed algorithm achieved better data efficiency and/or learned more optimal policy than other alternative approaches in tasks in which unobserved states cannot be inferred from raw observations in a simple manner.", "subitem_description_type": "Abstract"}]}, "item_10003_publisher_8": {"attribute_name": "出版者", "attribute_value_mlt": [{"subitem_publisher": "ICLR 2020"}]}, "item_10003_relation_17": {"attribute_name": "関連サイト", "attribute_value_mlt": [{"subitem_relation_name": [{"subitem_relation_name_text": "ICLR 2020"}], "subitem_relation_type_id": {"subitem_relation_type_id_text": "https://iclr.cc/Conferences/2020", "subitem_relation_type_select": "URI"}}, {"subitem_relation_type_id": {"subitem_relation_type_id_text": "https://iclr.cc/virtual/poster_r1lL4a4tDB.html", "subitem_relation_type_select": "URI"}}]}, "item_10003_rights_15": {"attribute_name": "権利", "attribute_value_mlt": [{"subitem_rights": "@ 2020 The Author(s)."}]}, "item_10003_version_type_20": {"attribute_name": "著者版フラグ", "attribute_value_mlt": [{"subitem_version_resource": "http://purl.org/coar/version/c_970fb48d4fbd8a85", "subitem_version_type": "VoR"}]}, "item_creator": {"attribute_name": "著者", "attribute_type": "creator", "attribute_value_mlt": [{"creatorNames": [{"creatorName": "Han, Dongqi", "creatorNameLang": "en"}], "nameIdentifiers": [{"nameIdentifier": "8640", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "Doya, Kenji", "creatorNameLang": "en"}], "nameIdentifiers": [{"nameIdentifier": "479", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "Tani, Jun", "creatorNameLang": "en"}], "nameIdentifiers": [{"nameIdentifier": "1334", "nameIdentifierScheme": "WEKO"}]}]}, "item_files": {"attribute_name": "ファイル情報", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_date", "date": [{"dateType": "Available", "dateValue": "2020-10-26"}], "displaytype": "detail", "download_preview_message": "", "file_order": 0, "filename": "variational_recurrent_models_for_solving_partially_observable_control_tasks.pdf", "filesize": [{"value": "9.8 MB"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "licensetype": "license_free", "mimetype": "application/pdf", "size": 9800000.0, "url": {"label": "variational_recurrent_models_for_solving_partially_observable_control_tasks", "url": "https://oist.repo.nii.ac.jp/record/1736/files/variational_recurrent_models_for_solving_partially_observable_control_tasks.pdf"}, "version_id": "1ded4e1e-ec26-4a74-937f-727cd6b2c81e"}]}, "item_keyword": {"attribute_name": "キーワード", "attribute_value_mlt": [{"subitem_subject": "Reinforcement Learning", "subitem_subject_language": "en", "subitem_subject_scheme": "Other"}, {"subitem_subject": "Deep Learning", "subitem_subject_language": "en", "subitem_subject_scheme": "Other"}, {"subitem_subject": "Variational Inference", "subitem_subject_language": "en", "subitem_subject_scheme": "Other"}, {"subitem_subject": "Recurrent Neural Network", "subitem_subject_language": "en", "subitem_subject_scheme": "Other"}, {"subitem_subject": "Partially Observable", "subitem_subject_language": "en", "subitem_subject_scheme": "Other"}, {"subitem_subject": "Robotic Control", "subitem_subject_language": "en", "subitem_subject_scheme": "Other"}, {"subitem_subject": "Continuous Control", "subitem_subject_language": "en", "subitem_subject_scheme": "Other"}]}, "item_language": {"attribute_name": "言語", "attribute_value_mlt": [{"subitem_language": "eng"}]}, "item_resource_type": {"attribute_name": "資源タイプ", "attribute_value_mlt": [{"resourcetype": "conference paper", "resourceuri": "http://purl.org/coar/resource_type/c_5794"}]}, "item_title": "Variational Recurrent Models for Solving Partially Observable Control Tasks", "item_titles": {"attribute_name": "タイトル", "attribute_value_mlt": [{"subitem_title": "Variational Recurrent Models for Solving Partially Observable Control Tasks", "subitem_title_language": "en"}]}, "item_type_id": "10003", "owner": "29", "path": ["203", "222"], "permalink_uri": "https://oist.repo.nii.ac.jp/records/1736", "pubdate": {"attribute_name": "公開日", "attribute_value": "2020-10-26"}, "publish_date": "2020-10-26", "publish_status": "0", "recid": "1736", "relation": {}, "relation_version_is_last": true, "title": ["Variational Recurrent Models for Solving Partially Observable Control Tasks"], "weko_shared_id": 29}

Variational Recurrent Models for Solving Partially Observable Control Tasks

https://oist.repo.nii.ac.jp/records/1736

名前 / ファイル	ライセンス	アクション
variational_recurrent_models_for_solving_partially_observable_control_tasks (9.8 MB)

Item type

会議発表論文 / Conference Paper(1)

公開日

2020-10-26

タイトル

言語

タイトル

Variational Recurrent Models for Solving Partially Observable Control Tasks

言語

eng

キーワード

言語

主題Scheme

Other

主題

Reinforcement Learning

キーワード

言語

主題Scheme

Other

主題

Deep Learning

キーワード

言語

主題Scheme

Other

主題

Variational Inference

キーワード

言語

主題Scheme

Other

主題

Recurrent Neural Network

キーワード

言語

主題Scheme

Other

主題

Partially Observable

キーワード

言語

主題Scheme

Other

主題

Robotic Control

キーワード

言語

主題Scheme

Other

主題

Continuous Control

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_5794

資源タイプ

conference paper

著者

Han, Dongqi
Doya, Kenji
Tani, Jun

書誌情報

発行日 2019-09-26

抄録

内容記述タイプ

Abstract

内容記述

In partially observable (PO) environments, deep reinforcement learning (RL) agents often suffer from unsatisfactory performance, since two problems need to be tackled together: how to extract information from the raw observations to solve the task, and how to improve the policy. In this study, we propose an RL algorithm for solving PO tasks. Our method comprises two parts: a variational recurrent model (VRM) for modeling the environment, and an RL controller that has access to both the environment and the VRM. The proposed algorithm was tested in two types of PO robotic control tasks, those in which either coordinates or velocities were not observable and those that require long-term memorization. Our experiments show that the proposed algorithm achieved better data efficiency and/or learned more optimal policy than other alternative approaches in tasks in which unobserved states cannot be inferred from raw observations in a simple manner.

出版者

ICLR 2020

権利

権利情報

@ 2020 The Author(s).

Versions

Ver.1

2023-06-26 11:43:35.296127

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

Variational Recurrent Models for Solving Partially Observable Control Tasks

× Han, Dongqi

× Doya, Kenji

× Tani, Jun

Versions

Share

Cite as

エクスポート