【Unity2018】Google Cloud Visionを使用する

はじめに
Google Cloud Vision とは
Google Cloud Vision の無料トライアルに登録する
- Google Cloud Visionの公式サイトにアクセス
- 無料アカウントを作成
APIキーを発行する
Unityでpng/jpg画像認識をしてみる
Unityでウェブカメラの映像を画像認識してみる
参考文献

はじめに

Unityで画像認識したい！
Unityで文字認識したい！

そんな欲望を叶えてくれるのが、Google Cloud APIです。
今回は、このCloud Vision APIをUnityで利用する方法についてみていきたいと思います。

◆必要なもの
* Unity(本記事では2018を使用)
* Googleアカウント
* クレジットカード(面倒くさいけど、無料アカウント登録に必要)

◆作るもの
* ウェブカメラに写っているもののデータを取得する。

~~調べたらこんなものもありました。。。~~ assetstore.unity.com

Google Cloud Vision とは

Google Cloud Visionは、Googleが提供している画像認識APIです。
手書き文字の認識や対象となる画像になにが写っているかなど、たくさんの情報を引き出すことができます。

Cloud Vision APIでできること

具体的にCloud Vision APIでできることは、以下のようになります。

項目	説明
ラベル検出	画像に写っているさまざまなカテゴリを検出できる
ウェブ検出	類似の画像をインターネットで検索できる
光学式文字認識（OCR）	画像内のテキストを検出、抽出できる対応言語の詳細はこちらから
手書き入力認識	手書き入力の認識ができる
ロゴ検出	画像ないの一般的な商品ロゴの検出ができる
Object Localizer	画像内の物体の位置と個数を確認できる
REST APIの統合	REST APIを使用できる？
ランドマーク検出	自然ランドマークや人工建造物の検出ができる
顔検出	画像内の人物の顔を検出できる。感情、帽子の着用などの顔の属性も検出できる ※個人を特定する顔認識には対応していない
コンテンツの管理	アダルト、暴力など、画像内に含まれる不適切なコンテンツを検出できる
ML Kitの統合	モバイルSDKであるML Kitとの結合
商品検索	カタログに記載されている商品を認識できる
画像の属性	ドミナントカラーや切り抜きのヒントなど、画像の一般的な属性を検出できる

料金について

1〜1000ユニットであれば、無料で使用できます。
1000ユニットを超えた場合は、1000ユニットごとに1.5ドル請求されます。
アカウント登録すると300ドル相当のクレジットももらえるので、一人で試す分には問題ないでしょう。

その他詳しい料金体制はこちらをご参照ください。

Cloud Vision APIを触ってみる

なにはともあれ、まずは試しに触ってみましょう。
以下のページにアクセスしてください。
cloud.google.com

ページ中央付近、Try The APIの右横にある点線の長方形内にお好きな画像をD&Dします。
f:id:gunjyousky:20190125115648p:plain:w800

著作権的に大丈夫な、最近撮った霜柱の画像をD&Dしてみます。
『私はロボットではありません』にチェックを入れると、検出結果*1が表示されました。 f:id:gunjyousky:20190125115944p:plain:w800

上部のタブをクリックすると、その他の検出結果を見ることもできます。
『プロパティ』タブの中にある支配的な色なんて、ちょっとおもしろそうですよね。
f:id:gunjyousky:20190125121138p:plain:w800

Google Cloud Vision の無料トライアルに登録する

Google Cloud Vision APIを利用するにはｍ無料トライアルに登録する必要があります。
前述の通り、クレジットカードが必要になりますので用意してください。

Google Cloud Visionの公式サイトにアクセス

cloud.google.com

無料アカウントを作成

『無料トライアル』ボタンをクリック
ステップ1/2
- 国を洗濯
- 利用規約を読んで同意する
- 最新情報のメール通知を受け取るかどうかチェックする
- 『AGREE AND CONTINUE』をクリック
ステップ2/2
- お支払いプロファイルの設定
- お客様情報を入力
- お支払いタイプを選択
- お支払い方法を選択
- 『START MY FREE TRIAL』をクリック
登録完了
・『OK』を押したら登録完了（やったね）

APIキーを発行する

Google Cloud Vision APIをUnityで利用するには、APIキーは発行する必要があります。
発行したAPIキーをUnity側に適用することで、初めてGoogle Cloud Visionが使用できるようになります。

下記『Cloud APIサービスに対する認証』ページを開いてください。

cloud.google.com

開いた画面の中央付近いある『APIキーを設定』をクリック
f:id:gunjyousky:20190125151817p:plain:w800

リンク先の画面中央、『[API Manager] → [認証情報] を使用する』をクリック
f:id:gunjyousky:20190125152108p:plain:w800

リンク先の画面上部、 『[API Manager] → [認証情報] 』をクリック
f:id:gunjyousky:20190125152438p:plain:w800

リンク先のAPIとサービスページ内にある、
『認証情報』タブ＞『認証情報を作成』ボタンから『APIキー』を選択します。
f:id:gunjyousky:20190125152810p:plain:w800

『APIキーを作成しました』モーダルが表示されれば成功です。
『閉じる』ボタンでモーダルを閉じてください。 f:id:gunjyousky:20190125154344p:plain:w800

以降このAPIキーを使用することになるので、ページを忘れないようにしてください。

Unityでpng/jpg画像認識をしてみる

それではさっそくUnityでGoogleCloudVisionを使用してみます。

CloudVisionAPI.csの作成

まず、Unityを起動したら、Assetsの中に新規フォルダ構成を作成し、Sceneを保存しましょう。
ここでは、CloudVisionTextと名前をつけてScenesフォルダ内に保存しています。

f:id:gunjyousky:20190225161214p:plain

次に、Scriptsフォルダ内に新規C#スクリプトを作成します。スクリプト名をCloudVision.csとし、下記ソースコードを貼り付けて保存します。

using UnityEngine;
using System.Collections;
using System.Collections.Generic;
using UnityEngine.Networking;
using System;
using System.Text;

public class CloudVisionAPI : MonoBehaviour
{
    public string url = "https://vision.googleapis.com/v1/images:annotate?key=";
    public string apiKey = "";
    public FeatureType featureType;
    public Texture2D texture2D;


    [System.Serializable]
    public class requestBody
    {
        public List<AnnotateImageRequest> requests;
    }

    [System.Serializable]
    public class AnnotateImageRequest
    {
        public Image image;
        public List<Feature> features;
        //public string imageContext;
    }

    [System.Serializable]
    public class Image
    {
        public string content;
        //public ImageSource source;
    }

    [System.Serializable]
    public class ImageSource
    {
        public string gcsImageUri;
    }

    [System.Serializable]
    public class Feature
    {
        public string type;
        public int maxResults;
    }

    public enum FeatureType
    {
        TYPE_UNSPECIFIED,
        FACE_DETECTION,
        LANDMARK_DETECTION,
        LOGO_DETECTION,
        LABEL_DETECTION,
        TEXT_DETECTION,
        SAFE_SEARCH_DETECTION,
        IMAGE_PROPERTIES
    }

    [System.Serializable]
    public class ImageContext
    {
        public LatLongRect latLongRect;
        public string languageHints;
    }

    [System.Serializable]
    public class LatLongRect
    {
        public LatLng minLatLng;
        public LatLng maxLatLng;
    }

    [System.Serializable]
    public class LatLng
    {
        public float latitude;
        public float longitude;
    }

    [System.Serializable]
    public class responseBody
    {
        public List<AnnotateImageResponse> responses;
    }

    [System.Serializable]
    public class AnnotateImageResponse
    {
        public List<EntityAnnotation> labelAnnotations;
    }

    [System.Serializable]
    public class EntityAnnotation
    {
        public string mid;
        public string locale;
        public string description;
        public float score;
        public float confidence;
        public float topicality;
        public BoundingPoly boundingPoly;
        public List<LocationInfo> locations;
        public List<Property> properties;
    }

    [System.Serializable]
    public class BoundingPoly
    {
        public List<Vertex> vertices;
    }

    [System.Serializable]
    public class Vertex
    {
        public float x;
        public float y;
    }

    [System.Serializable]
    public class LocationInfo
    {
        LatLng latLng;
    }

    [System.Serializable]
    public class Property
    {
        string name;
        string value;
    }

    // Use this for initialization
    void Start()
    {
        if (apiKey == null || apiKey == "")
            Debug.LogError("No API key.");

        //画像を送り解析する
        StartCoroutine("RequestVisionAPI");

    }
    /*-----------------------------------------------------------*
     * ◆GoogleCloudAPIと接続して画像を渡しJsonを返す
     *-----------------------------------------------------------*/
    private IEnumerator RequestVisionAPI()
    {

        if (this.apiKey == null) yield return null;

        // 画像をbase64Imageに変換する(GPUで処理できるようにするため)
        byte[] jpg = texture2D.EncodeToJPG();
        string base64Image = System.Convert.ToBase64String(jpg);

        // requestBodyを作成
        var requests = new requestBody();
        requests.requests = new List<AnnotateImageRequest>();

        var request = new AnnotateImageRequest();
        request.image = new Image();
        request.image.content = base64Image;

        request.features = new List<Feature>();
        var feature = new Feature();
        feature.type = featureType.ToString();
        feature.maxResults = 10;
        request.features.Add(feature);

        requests.requests.Add(request);

        // JSONに変換
        string jsonRequestBody = JsonUtility.ToJson(requests);

        // ヘッダを"application/json"にして投げる
        var webRequest = new UnityWebRequest(url + apiKey, "POST");
        byte[] postData = Encoding.UTF8.GetBytes(jsonRequestBody);
        webRequest.uploadHandler = (UploadHandler)new UploadHandlerRaw(postData);
        webRequest.downloadHandler = (DownloadHandler)new DownloadHandlerBuffer();
        webRequest.SetRequestHeader("Content-Type", "application/json");

        yield return webRequest.SendWebRequest();

        if (webRequest.isNetworkError)
        {
            // エラー時の処理
            Debug.Log("Error");
        }
        else
        {
            // 成功時の処理
            Debug.Log(webRequest.downloadHandler.text);

            responseBody responses = JsonUtility.FromJson<responseBody>(webRequest.downloadHandler.text.Replace("\n", "").Replace(" ", ""));

            foreach (var label in responses.responses[0].labelAnnotations)
            {
                Debug.Log("\ndescriptopn : " + label.description);
            }

        }
    }

}

この作成したスクリプトが、画像認識をしてくれるメインのスクリプトになります。
Hierarchyビューで右クリックしてCreateEmptyを選択します。
作成した空のゲームオブジェクトにAPIDirectorと名前をつけて、CloudVisionAPI.csをアタッチしましょう。

f:id:gunjyousky:20190225171403p:plain

CloudVisionAPIコンポーネントの値を設定する

CloudVisionAPI.csコンポーネントの値を以下のように調整します。

パラメータ名	意味	値
Url	通信を行うためのURL	https://vision.googleapis.com/v1/images:annotate?key=
Api Key	発行したAPIキーをコピペする	APIキーの取得方法
Feature Type	画像認識の方法を選択するラベル認識以外を使用する際は、CloudVisionAPI.cs上部の必要になてくるクラスが不足していると思いますので、各自修正してください	LABEL_DETECTION
Texture 2D	テクスチャを画像認識させたい場合は、ここにD&Dする	認識対象となる画像

f:id:gunjyousky:20190228183518p:plain

Texture2Dとして使用する画像の設定

Texture2Dで使用できる画像の拡張子はjpgまたはpngになります。
~~(その他の拡張子は試していませんごめんなさい)~~
注意事項として、以下の２項目を設定する必要があります。

パラメータ名	値
Texture Type	Default
Advanced > Read and Write Enabled	☑️

f:id:gunjyousky:20190225174338p:plain

エラーのヒント

ここまできたらあとは実行するだけです！
画像認識の結果はConsoleLogに出力されますので、さっそく実行してみましょう！

おっと、なにやらエラー文が表示されました。

{
  "error": {
    "code": 403,
    "message": "Cloud Vision API has not been used in project 813196382142 before or it is disabled. Enable it by visiting https://console.developers.google.com/apis/api/vision.googleapis.com/overview?project=813196382142 then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry.",
    "status": "PERMISSION_DENIED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.Help",
        "links": [
          {
            "description": "Google developers console API activation",
            "url": "https://console.developers.google.com/apis/api/vision.googleapis.com/overview?project=813196382142"
          }
        ]
      }
    ]
  }
}

UnityEngine.Debug:Log(Object)
<RequestVisionAPI>c__Iterator0:MoveNext() (at Assets/WebCamTextureCloudVision2.cs:191)
UnityEngine.SetupCoroutine:InvokeMoveNext(IEnumerator, IntPtr)

message部分を読んでみると、APIキーごとにはじめて実行したタイミングで認証手順が必要になるようです。 エラー文内にあるURLのページから、認証手続きをしてください。
認証手続き完了後、再度Unityを実行すればうまくいっているはずです。

実行結果

f:id:gunjyousky:20190225175928j:plain

ようやく環境が整ったので、上記の画像をUnity上で認識させてみたいと思います。
実行結果がこちら↓

f:id:gunjyousky:20190225180222p:plain

Freezing, Water, Hand, Ice, Winter

実行結果をみる限り、問題なくプログラムは動いているようです。
consoleの一番上に表示されているものが認識結果の生データです。
以降、認識結果のラベルのみを抽出して表示してみました。

霜柱（Frost Piller）が出なかったのは少し残念ですが、可能性は十分に確認できたかと思います。

Unityでウェブカメラの映像を画像認識してみる

最後に、ウェブカメラの映像を使って画像認識をしてみましょう。

Planeにウェブカメラの映像を表示させる

下準備として、Unity上に配置したPlaneにウェブカメラの映像を表示してみたいと思います。
詳細に関しては下記記事を参照してください。

linemarker.hatenablog.com

ウェブカメラの映像を渡す

ウェブカメラの映像を表示させることができましたか？
続いて、ウェブカメラの映像をCloudVisionAPI.csに渡すための対応をしていきます。
WebCamController.cs内にある、ウェブカメラの映像を渡す関数を追加します。
Texture2D型で渡すと都合がいいので、WebCamTexture型の生データをTexture2D型に変換して引き渡したいと思います。

完成したソースコードがこちら↓

using System.Collections;
using System.Collections.Generic;
using UnityEngine;

public class WebCamController : MonoBehaviour {
    public int width = 640;
    public int height = 480;
    public int fps = 30;
    WebCamTexture webCamTexture;
    Texture2D texture2D;


    public Texture2D GetWebCamTexture2D()
    {
        Color[] pixels = webCamTexture.GetPixels();
        if (pixels.Length == 0)
        {
            Debug.Log("webCam Error: no pixels");
        }
        else
        {
            if (texture2D == null || webCamTexture.width != texture2D.width || webCamTexture.height != texture2D.height)
            {
                texture2D = new Texture2D(webCamTexture.width, webCamTexture.height, TextureFormat.RGBA32, false);
            }
        }
        texture2D.SetPixels(pixels);
        return texture2D;
    }

    // Use this for initialization
    void Awake()
    {
        WebCamDevice[] devices = WebCamTexture.devices;
        for (var i = 0; i < devices.Length; i++)
        {
            Debug.Log(devices[i].name);
        }
        if (devices.Length > 0)
        {
            webCamTexture = new WebCamTexture(devices[0].name, this.width, this.height, this.fps);
            GetComponent<Renderer>().material.mainTexture = webCamTexture;
            texture2D = new Texture2D(webCamTexture.width, webCamTexture.height, TextureFormat.RGBA32, false);
            webCamTexture.Play();
        }
    }

}

13行目のGetWebCamTexture2D関数が引き渡し部分の本体になります。
34行目のウェブカメラを起動する関数がStartからAwakeに変更されている点にも注意してください。

ウェブカメラの映像を受け取る

ウェブカメラの映像を引き渡す処理は完成したので、受け取る側の記述もしていきましょう。
CloudVisionAPI.cs内の画像認識用Texture2DにGetWebCamTexture2D()関数を代入するだけで問題ありません。ウェブカメラの画像を指定時間ごとに更新できるように修正します。

完成版のソースコードがこちら↓

using UnityEngine;
using System.Collections;
using System.Collections.Generic;
using UnityEngine.Networking;
using System;
using System.Text;

public class CloudVisionAPI : MonoBehaviour
{
    public string url = "https://vision.googleapis.com/v1/images:annotate?key=";
    public string apiKey = "";
    public FeatureType featureType;
    public Texture2D texture2D;
    public WebCamController webCamController;
    public float captureIntervalSeconds = 5.0f;


    [System.Serializable]
    public class requestBody
    {
        public List<AnnotateImageRequest> requests;
    }

    [System.Serializable]
    public class AnnotateImageRequest
    {
        public Image image;
        public List<Feature> features;
        //public string imageContext;
    }

    [System.Serializable]
    public class Image
    {
        public string content;
        //public ImageSource source;
    }

    [System.Serializable]
    public class ImageSource
    {
        public string gcsImageUri;
    }

    [System.Serializable]
    public class Feature
    {
        public string type;
        public int maxResults;
    }

    public enum FeatureType
    {
        TYPE_UNSPECIFIED,
        FACE_DETECTION,
        LANDMARK_DETECTION,
        LOGO_DETECTION,
        LABEL_DETECTION,
        TEXT_DETECTION,
        SAFE_SEARCH_DETECTION,
        IMAGE_PROPERTIES
    }

    [System.Serializable]
    public class ImageContext
    {
        public LatLongRect latLongRect;
        public string languageHints;
    }

    [System.Serializable]
    public class LatLongRect
    {
        public LatLng minLatLng;
        public LatLng maxLatLng;
    }

    [System.Serializable]
    public class LatLng
    {
        public float latitude;
        public float longitude;
    }

    [System.Serializable]
    public class responseBody
    {
        public List<AnnotateImageResponse> responses;
    }

    [System.Serializable]
    public class AnnotateImageResponse
    {
        public List<EntityAnnotation> labelAnnotations;
    }

    [System.Serializable]
    public class EntityAnnotation
    {
        public string mid;
        public string locale;
        public string description;
        public float score;
        public float confidence;
        public float topicality;
        public BoundingPoly boundingPoly;
        public List<LocationInfo> locations;
        public List<Property> properties;
    }

    [System.Serializable]
    public class BoundingPoly
    {
        public List<Vertex> vertices;
    }

    [System.Serializable]
    public class Vertex
    {
        public float x;
        public float y;
    }

    [System.Serializable]
    public class LocationInfo
    {
        LatLng latLng;
    }

    [System.Serializable]
    public class Property
    {
        string name;
        string value;
    }

    // Use this for initialization
    void Start()
    {
        if (apiKey == null || apiKey == "")
            Debug.LogError("No API key.");

        //画像を送り解析結果を受け取る
        StartCoroutine("RequestVisionAPI");

    }
    /*-----------------------------------------------------------*
     * ◆GoogleCloudAPIと接続して画像を渡しJsonを返す
     *-----------------------------------------------------------*/
    private IEnumerator RequestVisionAPI()
    {
        do
        {
            if (this.apiKey == null) yield return null;

            if (webCamController != null)
            {
                //更新遅延
                yield return new WaitForSeconds(captureIntervalSeconds);
                texture2D = webCamController.GetWebCamTexture2D();
            }

            // 画像をbase64Imageに変換する(GPUで処理できるようにするため)
            byte[] jpg = texture2D.EncodeToJPG();
            string base64Image = System.Convert.ToBase64String(jpg);

            // requestBodyを作成
            var requests = new requestBody();
            requests.requests = new List<AnnotateImageRequest>();

            var request = new AnnotateImageRequest();
            request.image = new Image();
            request.image.content = base64Image;

            request.features = new List<Feature>();
            var feature = new Feature();
            feature.type = featureType.ToString();
            feature.maxResults = 10;
            request.features.Add(feature);

            requests.requests.Add(request);

            // JSONに変換
            string jsonRequestBody = JsonUtility.ToJson(requests);

            // ヘッダを"application/json"にして投げる
            var webRequest = new UnityWebRequest(url + apiKey, "POST");
            byte[] postData = Encoding.UTF8.GetBytes(jsonRequestBody);
            webRequest.uploadHandler = (UploadHandler)new UploadHandlerRaw(postData);
            webRequest.downloadHandler = (DownloadHandler)new DownloadHandlerBuffer();
            webRequest.SetRequestHeader("Content-Type", "application/json");

            yield return webRequest.SendWebRequest();

            if (webRequest.isNetworkError)
            {
                // エラー時の処理
                Debug.Log("Error");
            }
            else
            {
                // 成功時の処理
                Debug.Log(webRequest.downloadHandler.text);

                responseBody responses = JsonUtility.FromJson<responseBody>(webRequest.downloadHandler.text.Replace("\n", "").Replace(" ", ""));

                foreach (var label in responses.responses[0].labelAnnotations)
                {
                    Debug.Log("\ndescriptopn : " + label.description);
                }

            }
        } while (webCamController != null);
    }

}

UnityのInspectorビューに戻り、CloudVisionAPI.csコンポーネントに新たに追加された２つの変数を調整します。
Web Cam Controllerには、WebCamController.csのついたPlaneオブジェクトをD&Dしてください。
Capture Interval Secondsはウェブカメラの映像更新間隔ですので、とりあえず５秒にしておきましょう。
※Texture2DとWebCamController両方がセットされていた場合、ウェブカメラの方が優先されるプログラムになっています。

パラメータ名	意味	値
Web Cam Controller	ウェブカメラの画像を認識させたい場合は、WebCamController.csを持つオブジェクトをD&Dする	なし
Capture Interval Seconds	ウェブカメラの画像を認識させる際の更新秒数	5