Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer

Wang, Jiexin; Uchibe, Eiji; Doya, Kenji

doi:info:doi/10.3389/fnbot.2017.00001

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

{"_buckets": {"deposit": "ffd5de62-d1db-4886-bb21-c78f2ae5af99"}, "_deposit": {"created_by": 27, "id": "210", "owners": [27], "pid": {"revision_id": 0, "type": "depid", "value": "210"}, "status": "published"}, "_oai": {"id": "oai:oist.repo.nii.ac.jp:00000210", "sets": ["26"]}, "author_link": ["540", "539", "479"], "item_10001_biblio_info_7": {"attribute_name": "Bibliographic Information", "attribute_value_mlt": [{"bibliographicIssueDates": {"bibliographicIssueDate": "2017-01-23", "bibliographicIssueDateType": "Issued"}, "bibliographicIssueNumber": "1", "bibliographicPageEnd": "15", "bibliographicPageStart": "1", "bibliographicVolumeNumber": "11", "bibliographic_titles": [{}, {"bibliographic_title": "Frontiers in Neurorobotics", "bibliographic_titleLang": "en"}]}]}, "item_10001_creator_3": {"attribute_name": "Author", "attribute_type": "creator", "attribute_value_mlt": [{"creatorNames": [{"creatorName": "Wang, Jiexin"}], "nameIdentifiers": [{"nameIdentifier": "539", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "Uchibe, Eiji"}], "nameIdentifiers": [{"nameIdentifier": "540", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "Doya, Kenji"}], "nameIdentifiers": [{"nameIdentifier": "479", "nameIdentifierScheme": "WEKO"}]}]}, "item_10001_description_5": {"attribute_name": "Abstract", "attribute_value_mlt": [{"subitem_description": "EM-based policy search methods estimate a lower bound of the expected return from the histories of episodes and iteratively update the policy parameters using the maximum of a lower bound of expected return, which makes gradient calculation and learning rate tuning unnecessary. Previous algorithms like Policy learning by Weighting Exploration with the Returns, Fitness Expectation Maximization, and EM-based Policy Hyperparameter Exploration implemented the mechanisms to discard useless low-return episodes either implicitly or using a fixed baseline determined by the experimenter. In this paper, we propose an adaptive baseline method to discard worse samples from the reward history and examine different baselines, including the mean, and multiples of SDs from the mean. The simulation results of benchmark tasks of pendulum swing up and cart-pole balancing, and standing up and balancing of a two-wheeled smartphone robot showed improved performances. We further implemented the adaptive baseline with mean in our two-wheeled smartphone robot hardware to test its performance in the standing up and balancing task, and a view-based approaching task. Our results showed that with adaptive baseline, the method outperformed the previous algorithms and achieved faster, and more precise behaviors at a higher successful rate.", "subitem_description_type": "Other"}]}, "item_10001_relation_13": {"attribute_name": "PubMedNo.", "attribute_value_mlt": [{"subitem_relation_type": "isIdenticalTo", "subitem_relation_type_id": {"subitem_relation_type_id_text": "info:pmid/28167910", "subitem_relation_type_select": "PMID"}}]}, "item_10001_relation_14": {"attribute_name": "DOI", "attribute_value_mlt": [{"subitem_relation_type": "isIdenticalTo", "subitem_relation_type_id": {"subitem_relation_type_id_text": "info:doi/10.3389/fnbot.2017.00001", "subitem_relation_type_select": "DOI"}}]}, "item_10001_relation_17": {"attribute_name": "Related site", "attribute_value_mlt": [{"subitem_relation_type_id": {"subitem_relation_type_id_text": "http://journal.frontiersin.org/article/10.3389/fnbot.2017.00001/full", "subitem_relation_type_select": "URI"}}]}, "item_10001_rights_15": {"attribute_name": "Rights", "attribute_value_mlt": [{"subitem_rights": "©2017 Wang, Uchibe, and Doya."}]}, "item_10001_source_id_9": {"attribute_name": "ISSN", "attribute_value_mlt": [{"subitem_source_identifier": "1662-5218", "subitem_source_identifier_type": "ISSN"}]}, "item_10001_version_type_20": {"attribute_name": "Author\u0027s flag", "attribute_value_mlt": [{"subitem_version_resource": "http://purl.org/coar/version/c_970fb48d4fbd8a85", "subitem_version_type": "VoR"}]}, "item_files": {"attribute_name": "ファイル情報", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_date", "date": [{"dateType": "Available", "dateValue": "2017-12-28"}], "displaytype": "detail", "download_preview_message": "", "file_order": 0, "filename": "fnbot-11-00001.pdf", "filesize": [{"value": "16.2 MB"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "licensefree": "Creative Commons Attribution 4.0 International  \n(http://creativecommons.org/licenses/by/4.0/)", "licensetype": "license_free", "mimetype": "application/pdf", "size": 16200000.0, "url": {"label": "fnbot-11-00001", "url": "https://oist.repo.nii.ac.jp/record/210/files/fnbot-11-00001.pdf"}, "version_id": "d3ab5aea-2f50-4b38-af82-14f5b38e24b3"}]}, "item_language": {"attribute_name": "言語", "attribute_value_mlt": [{"subitem_language": "eng"}]}, "item_resource_type": {"attribute_name": "資源タイプ", "attribute_value_mlt": [{"resourcetype": "journal article", "resourceuri": "http://purl.org/coar/resource_type/c_6501"}]}, "item_title": "Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer", "item_titles": {"attribute_name": "タイトル", "attribute_value_mlt": [{"subitem_title": "Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer"}, {"subitem_title": "Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer", "subitem_title_language": "en"}]}, "item_type_id": "10001", "owner": "27", "path": ["26"], "permalink_uri": "https://oist.repo.nii.ac.jp/records/210", "pubdate": {"attribute_name": "公開日", "attribute_value": "2017-12-28"}, "publish_date": "2017-12-28", "publish_status": "0", "recid": "210", "relation": {}, "relation_version_is_last": true, "title": ["Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer"], "weko_shared_id": 27}

Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer

https://oist.repo.nii.ac.jp/records/210

名前 / ファイル	ライセンス	アクション
fnbot-11-00001 (16.2 MB)	Creative Commons Attribution 4.0 International (http://creativecommons.org/licenses/by/4.0/)

Item type

学術雑誌論文 / Journal Article(1)

公開日

2017-12-28

タイトル

Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer

タイトル

言語

タイトル

Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer

言語

eng

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_6501

資源タイプ

journal article

著者（英）

Wang, Jiexin
Uchibe, Eiji
Doya, Kenji

書誌情報

en : Frontiers in Neurorobotics

巻 11, 号 1, p. 1-15, 発行日 2017-01-23

抄録

内容記述タイプ

Other

内容記述

EM-based policy search methods estimate a lower bound of the expected return from the histories of episodes and iteratively update the policy parameters using the maximum of a lower bound of expected return, which makes gradient calculation and learning rate tuning unnecessary. Previous algorithms like Policy learning by Weighting Exploration with the Returns, Fitness Expectation Maximization, and EM-based Policy Hyperparameter Exploration implemented the mechanisms to discard useless low-return episodes either implicitly or using a fixed baseline determined by the experimenter. In this paper, we propose an adaptive baseline method to discard worse samples from the reward history and examine different baselines, including the mean, and multiples of SDs from the mean. The simulation results of benchmark tasks of pendulum swing up and cart-pole balancing, and standing up and balancing of a two-wheeled smartphone robot showed improved performances. We further implemented the adaptive baseline with mean in our two-wheeled smartphone robot hardware to test its performance in the standing up and balancing task, and a view-based approaching task. Our results showed that with adaptive baseline, the method outperformed the previous algorithms and achieved faster, and more precise behaviors at a higher successful rate.

ISSN

収録物識別子タイプ

ISSN

収録物識別子

1662-5218

PubMed番号

Versions

Ver.1

2023-06-26 12:09:58.664577

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer

× Wang, Jiexin

× Uchibe, Eiji

× Doya, Kenji

Versions

Share

Cite as

エクスポート