WEKO3
アイテム
{"_buckets": {"deposit": "1b6730af-81f0-4b6c-ab7d-cc52eecf69d8"}, "_deposit": {"created_by": 26, "id": "438", "owners": [26], "pid": {"revision_id": 0, "type": "depid", "value": "438"}, "status": "published"}, "_oai": {"id": "oai:oist.repo.nii.ac.jp:00000438", "sets": ["26"]}, "author_link": ["1790"], "item_10001_biblio_info_7": {"attribute_name": "Bibliographic Information", "attribute_value_mlt": [{"bibliographicIssueDates": {"bibliographicIssueDate": "2017-09-08", "bibliographicIssueDateType": "Issued"}, "bibliographicIssueNumber": "3", "bibliographicPageEnd": "905", "bibliographicPageStart": "891", "bibliographicVolumeNumber": "47", "bibliographic_titles": [{}, {"bibliographic_title": "Neural Processing Letters", "bibliographic_titleLang": "en"}]}]}, "item_10001_creator_3": {"attribute_name": "Author", "attribute_type": "creator", "attribute_value_mlt": [{"creatorNames": [{"creatorName": "Uchibe, Eiji"}], "nameIdentifiers": [{"nameIdentifier": "1790", "nameIdentifierScheme": "WEKO"}]}]}, "item_10001_description_5": {"attribute_name": "Abstract", "attribute_value_mlt": [{"subitem_description": "This paper proposes model-free deep inverse reinforcement learning to find nonlinear reward function structures. We formulate inverse reinforcement learning as a problem of density ratio estimation, and show that the log of the ratio between an optimal state transition and a baseline one is given by a part of reward and the difference of the value functions under the framework of linearly solvable Markov decision processes. The logarithm of density ratio is efficiently calculated by binomial logistic regression, of which the classifier is constructed by the reward and state value function. The classifier tries to discriminate between samples drawn from the optimal state transition probability and those from the baseline one. Then, the estimated state value function is used to initialize the part of the deep neural networks for forward reinforcement learning. The proposed deep forward and inverse reinforcement learning is applied into two benchmark games: Atari 2600 and Reversi. Simulation results show that our method reaches the best performance substantially faster than the standard combination of forward and inverse reinforcement learning as well as behavior cloning.", "subitem_description_type": "Other"}]}, "item_10001_publisher_8": {"attribute_name": "Publisher", "attribute_value_mlt": [{"subitem_publisher": "Springer US"}]}, "item_10001_relation_14": {"attribute_name": "DOI", "attribute_value_mlt": [{"subitem_relation_type": "isIdenticalTo", "subitem_relation_type_id": {"subitem_relation_type_id_text": "info:doi/10.1007/s11063-017-9702-7", "subitem_relation_type_select": "DOI"}}]}, "item_10001_relation_17": {"attribute_name": "Related site", "attribute_value_mlt": [{"subitem_relation_type_id": {"subitem_relation_type_id_text": "https://link.springer.com/article/10.1007%2Fs11063-017-9702-7", "subitem_relation_type_select": "URI"}}]}, "item_10001_rights_15": {"attribute_name": "Rights", "attribute_value_mlt": [{"subitem_rights": "© 2017 The Author(s). "}]}, "item_10001_source_id_9": {"attribute_name": "ISSN", "attribute_value_mlt": [{"subitem_source_identifier": "1370-4621", "subitem_source_identifier_type": "ISSN"}, {"subitem_source_identifier": "1573-773X", "subitem_source_identifier_type": "ISSN"}]}, "item_10001_version_type_20": {"attribute_name": "Author\u0027s flag", "attribute_value_mlt": [{"subitem_version_resource": "http://purl.org/coar/version/c_970fb48d4fbd8a85", "subitem_version_type": "VoR"}]}, "item_files": {"attribute_name": "ファイル情報", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_date", "date": [{"dateType": "Available", "dateValue": "2018-07-23"}], "displaytype": "detail", "download_preview_message": "", "file_order": 0, "filename": "Uchibe2018_Article_Model-FreeDeepInverseReinforce.pdf", "filesize": [{"value": "1.1 MB"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "licensefree": "Creative Commons Attribution 4.0 International\n(http://creativecommons.org/licenses/by/4.0/) ", "licensetype": "license_free", "mimetype": "application/pdf", "size": 1100000.0, "url": {"label": "Uchibe2018_Article_Model-FreeDeepInverseReinforce", "url": "https://oist.repo.nii.ac.jp/record/438/files/Uchibe2018_Article_Model-FreeDeepInverseReinforce.pdf"}, "version_id": "6c7b5bed-28a2-4b8e-887c-88f623a824f8"}]}, "item_language": {"attribute_name": "言語", "attribute_value_mlt": [{"subitem_language": "eng"}]}, "item_resource_type": {"attribute_name": "資源タイプ", "attribute_value_mlt": [{"resourcetype": "journal article", "resourceuri": "http://purl.org/coar/resource_type/c_6501"}]}, "item_title": "Model-Free Deep Inverse Reinforcement Learning by Logistic Regression", "item_titles": {"attribute_name": "タイトル", "attribute_value_mlt": [{"subitem_title": "Model-Free Deep Inverse Reinforcement Learning by Logistic Regression", "subitem_title_language": "en"}]}, "item_type_id": "10001", "owner": "26", "path": ["26"], "permalink_uri": "https://oist.repo.nii.ac.jp/records/438", "pubdate": {"attribute_name": "公開日", "attribute_value": "2018-07-23"}, "publish_date": "2018-07-23", "publish_status": "0", "recid": "438", "relation": {}, "relation_version_is_last": true, "title": ["Model-Free Deep Inverse Reinforcement Learning by Logistic Regression"], "weko_shared_id": 26}
Model-Free Deep Inverse Reinforcement Learning by Logistic Regression
https://oist.repo.nii.ac.jp/records/438
https://oist.repo.nii.ac.jp/records/4380603049a-d309-4129-aaba-bc7f18a2e7e9
名前 / ファイル | ライセンス | アクション |
---|---|---|
Uchibe2018_Article_Model-FreeDeepInverseReinforce (1.1 MB)
|
Creative Commons Attribution 4.0 International
(http://creativecommons.org/licenses/by/4.0/) |
Item type | 学術雑誌論文 / Journal Article(1) | |||||
---|---|---|---|---|---|---|
公開日 | 2018-07-23 | |||||
タイトル | ||||||
言語 | en | |||||
タイトル | Model-Free Deep Inverse Reinforcement Learning by Logistic Regression | |||||
言語 | ||||||
言語 | eng | |||||
資源タイプ | ||||||
資源タイプ識別子 | http://purl.org/coar/resource_type/c_6501 | |||||
資源タイプ | journal article | |||||
著者(英) |
Uchibe, Eiji
× Uchibe, Eiji |
|||||
書誌情報 |
en : Neural Processing Letters 巻 47, 号 3, p. 891-905, 発行日 2017-09-08 |
|||||
抄録 | ||||||
内容記述タイプ | Other | |||||
内容記述 | This paper proposes model-free deep inverse reinforcement learning to find nonlinear reward function structures. We formulate inverse reinforcement learning as a problem of density ratio estimation, and show that the log of the ratio between an optimal state transition and a baseline one is given by a part of reward and the difference of the value functions under the framework of linearly solvable Markov decision processes. The logarithm of density ratio is efficiently calculated by binomial logistic regression, of which the classifier is constructed by the reward and state value function. The classifier tries to discriminate between samples drawn from the optimal state transition probability and those from the baseline one. Then, the estimated state value function is used to initialize the part of the deep neural networks for forward reinforcement learning. The proposed deep forward and inverse reinforcement learning is applied into two benchmark games: Atari 2600 and Reversi. Simulation results show that our method reaches the best performance substantially faster than the standard combination of forward and inverse reinforcement learning as well as behavior cloning. | |||||
出版者 | ||||||
出版者 | Springer US | |||||
ISSN | ||||||
収録物識別子タイプ | ISSN | |||||
収録物識別子 | 1370-4621 | |||||
ISSN | ||||||
収録物識別子タイプ | ISSN | |||||
収録物識別子 | 1573-773X | |||||
DOI | ||||||
関連タイプ | isIdenticalTo | |||||
識別子タイプ | DOI | |||||
関連識別子 | info:doi/10.1007/s11063-017-9702-7 | |||||
権利 | ||||||
権利情報 | © 2017 The Author(s). | |||||
関連サイト | ||||||
識別子タイプ | URI | |||||
関連識別子 | https://link.springer.com/article/10.1007%2Fs11063-017-9702-7 | |||||
著者版フラグ | ||||||
出版タイプ | VoR | |||||
出版タイプResource | http://purl.org/coar/version/c_970fb48d4fbd8a85 |