example_5.md 30 KB

使用过程提示

Tips:

  • 您可点击右下角“下一步”,查看后续的建模讲解和源代码。
  • 本教程文档是 .md 格式,此讲解窗口采用阿里云 CloudShell 的teachme指令打开,如命令行输入 teachme doc/fileName.md 命令打开。
  • CloudShell 默认模式是登录失效后线上机器内信息删除,建议您在本地机器上编辑和保存自己的代码,然后通过 CloudShell 左上角上传文件的方式上传,也可进行下载。CloudShell 窗口关闭后原登录信息会清除,建议同时开两个窗口,防止误操作关闭;同时不建议离开窗口太久导致登录失效。
  • 如果想体验运行MindOpt求解器代码,请①在天池-决策优化MindOpt网址申请免费的授权Token,并替换掉示例代码里的Token;②并安装MindOpt优化求解器的SDK:安装指导。如已安装,可跳过。

^___^ 还没体验过安装和运行求解器的同学,可以直接点击链接:一键安装MindOpt求解器,并一键用令行求解.mps或.lp文件~~ ​

案例问题背景

回归是一种预测技术,目的是建立 自变量x(向量)和 相关变量y(标量)之间的关系。比如,学生的行为习惯和测试成绩之间的关系。线性回归使用线性函数 y = <a,x> + b 来描述这种关系(更准确地说法是仿射函数)。向量a表示“梯度”,标量b表示“截距”。通过 m 次观测 (x_1, y_1) ... (x_m, y_m) ,我们可以估计出 a 和 b 的值。

普通的最小二乘回归(least-squares regression)对“离群观测”很敏感。仅一个离群观测就会影响对 a 和 b 的估计。而鲁棒回归则不受少数离群观测的影响。

假设“离群观测”的数量相对 m 而言很少,其他正常观测又很准确。则我们应该使用鲁棒线性回归:

去恢复真实的 a 和 b。相比最小二乘回归,鲁棒回归可以更好地抵御(但不也能完全消除)离群观测的影响。

通过变换和引入变量 c,我们可以把这个问题转化为线性规划: ​

为了编程方便,我们将变量放在不等式一侧,非变量参数放在另一侧: ​

等价的矩阵形式

x_1, ..., x_m 作为行放入矩阵 X,将 y_1, ..., y_m 作为元素放在列向量 y 中。令 1 为元素为1的向量。原始鲁棒线性回归问题可以简写为 ​

. ​

通过变换,我们可以把这个问题转化为线性规划:

产生数据

我们通过生成随机数据,验证鲁棒线性回归的有效性。现在随机产生真正的 a 和 b :

再使用它们产生 0.9*m 次正确观测:

外加 0.1*m 次离群观测: ​

因为观测的次序对鲁棒线性回归没有任何影响,所以不失一般性,我们将离群观测放在最后。

Tips:案例内容可能参考自互联网,如有侵权,请联系我们删除。

代码演示-Python示范

本代码示例里面需要numpy,请运行前先安装numpy

pip install numpy==1.19.3

创建py文件和运行

创建一个.py文件如下(文件夹已创建可删除前几句)。(除示例nano指令外,编辑文件可使用vi、vim、nano、emacs等不同编辑器指令,或touch创建文件,也可采用CloudShell右上角的“笔”的编辑图标来选择对应的文档进行新增和修改。)

mkdir example5
cd example5
nano mindopt_example5.py

然后复制后文的代码拷入。根据提示来修改token和保存文件,如ctrl+o保存,然后看到对应名字OK后摁enter即可,如ctrl+x退出,如有修改保存按Y,再看到对应名字OK后摁enter即可退出。

也可采用CloudShell右上角的“笔”的编辑图标来选择对应的文档进行修改。

运行程序可输入以下指令:

python mindopt_example5.py

注:问题提交远端计算Server后,会需要求解时间,请耐心等待。由于计算资源限制,同一Token同时间只能求解一个问题,请勿同时段提交太多次。如果代码运行失败,可能是mindopt环境变量链接失败,可重新走一遍安装步骤(安装指导)。

python 代码

Tips:  如果想体验运行MindOpt求解器代码,请先在天池-决策优化MindOpt-免费使用上申请授权Token,并替换掉示例代码里的Token。

请修改代码里面的 line 12的Token为你自己的Token。

# mindopt_example5.py

import numpy as np
import random
import time

from mindoptpy import *

if __name__ == "__main__":

    # set constants
    token = "xxxxxxxYOUR TOKENxxxxxxxxxxx"
    server_ip = "mindopt.tianchi.aliyun.com"
    description = "tianchi example5 RobustRegression"
    temporary_path = "."   # folder to hold serialized data

    #--- MindOpt Step 1. Initialize an empty optimization model ---
    model = MdoModel()

    model.set_str_param("Remote/Server", server_ip)
    model.set_str_param("Remote/Token", token)
    model.set_str_param("Remote/Desc", description)
    model.set_str_param("Remote/File/Path", temporary_path)

    # --- Create simulation data ---
    # use time to seed the random number generator
    random.seed(time.time())

    # randomly generate vector a and scalar b
    d = 4    # dimension
    true_a = np.random.randn(d)*np.sqrt([d])
    true_b = np.random.randn(1)[0]

    # randomly generate observations (x[i],y[i]), i = 1,...,m with 10% outliers
    m = 15
    X = np.zeros((m,d))
    y = np.zeros((m,1))
    for i in range(m):
        X[i,] = np.random.randn(d)
        if i < np.floor(0.9*m):
            y[i] = np.inner(true_a,X[i,]) + true_b
        else:
            y[i] = np.random.randn(1)

    try:
        #--- MindOpt Step 2. Set model inputs ---
        # Change to minimization problem.
        model.set_int_attr("MinSense", 1)

        # add variables, specify their low bounds, upper bounds, and cost coefficients
        # set objective: minimize_{a,b,c} 0'a + 0b + 1'c
        INF = MdoModel.get_infinity()

        var_a = []
        for j in range(d):
            var_a.append(model.add_var(-INF, INF, 0.0)) # low bnd, upr bnd, 0 cost coeff

        var_b = model.add_var(-INF, INF, 0) # low bnd, upr bnd, 0 cost coeff

        var_c = []
        for i in range(m):
            var_c.append(model.add_var(-INF, INF, 1.0)) # low bnd, upr bnd, cost coeff = 1.0

        # Add constraints row by row
        for i in range(m):
            # compute a'x[i,]
            expr = MdoExprLinear()
            for j in range(d):
                expr += X[i,j]*var_a[j]

            # add: y[i] <= a'x[i,] + b + c[i]
            model.add_cons(y[i], INF, expr + var_b + var_c[i])

            # add: a'x[i,] + b - c[i] <= y[i]
            model.add_cons(-INF, y[i], expr + var_b - var_c[i])

        #--- MindOpt Step 3. Set solver parameters ---
        model.set_int_param("Method", 2)  # as an example, set to run the interior-point method
        model.set_int_param("NumThreads", 2)  # as an example, set to run 2 threads

        #--- MindOpt Step 4. Upload serialize model and parameter to server for optimization ---
        job_id = model.submit_task()
        if job_id == "":
            print("ERROR: Empty job ID.")
            raise MdoError(-1)
        else:
            print("Job was submitted to server successfully.")
            status = 'Submitted'

        #--- MindOpt Step 5. Wait and try to retrieve job ---
        while status == 'Submitted' or status == 'Solving':
            status, model_status, result, has_soln = model.retrieve_task(job_id)
            time.sleep(2)   # Sleep for 2 seconds between inquiries

        #--- MindOpt Step 7. Display results ---
        model_status_details = model.explain_status(model_status)
        result_details = model.explain_result(result)
        print(" - Job status : {}".format(status))
        print(" - Model status          : {0} ({1})".format(model_status_details, model_status))
        print(" - Optimization status   : {0} ({1})".format(result_details, result))
        print(" - Solution availability : {0}\n".format("available" if has_soln else "not available"))

        # Compare var_a to true_a, and var_b to true_b; Display residuals
        if has_soln:
            model.display_results()
            print(" - Primal objective value : {}".format(model.get_real_attr("PrimalObjVal")))
            print(" - Dual objective value : {}".format(model.get_real_attr("DualObjVal")))
            print(" - Solution time : {} sec.".format(model.get_real_attr("SolutionTime")))

            # Display solutions a[0],...,a[d-1]
            print('\n{0:<5} {1:>9} {2:>9}'.format('Entry','True','Soln'))
            for j in range(d):
                print('{0:>5} {1:>9f} {2:>9f}'.format('a['+'%s'%j+']', true_a[j],
                var_a[j].get_real_attr('PrimalSoln')))

            # Display solution b
            print('\n{0:<5} {1:>9} {2:>9}'.format(' ','True','Soln'))
            print('{0:>5} {1:>9f} {2:>9f}'.format('b', true_b, var_b.get_real_attr('PrimalSoln')))

            # Display solutions c[0],...,c[m]
            print('\n{0:>11} {1:>9}'.format('Observation','Residual'))
            for i in range(m):
                print('{0:>11d} {1:>9f}'.format(i, var_c[i].get_real_attr('PrimalSoln')))

    except MdoError as e:
        print("Received Mindopt exception.")
        print(" - Code : {}".format(e.code))
        print(" - Reason : {}".format(e.message))
    except Exception as e:
        print("Received exception.")
        print(" - Reason : {}".format(e))
    finally:
        # MindOpt Step 5. Free the model.
        model.free_mdl()

C++ MakeFile 文件

step 1:建立一个makefile文件

如下,类似案例2中的C++的编译文件。除示例nano指令外,编辑文件可使用vi、vim、nano、emacs等不同编辑器指令,或touch创建文件,也可采用CloudShell右上角的“笔”的编辑图标来选择对应的文档进行新增和修改。 ​

mkdir example5
cd example5
mkdir tmp
nano Makefile_5

然后复制下面的makefile的代码(注意根据软件版本或安装地址来调整):

注意:mindoptcpp 动态库将在运行软件包中的示例时自动生成。换言之,用户在使用 C++ API 之前必须至少运行一次例子软件包中的例子,因此下面的makefile代码中有mindoptcpp的编译部分。

.RECIPEPREFIX = >
INCPATHS=-I$${HOME}/mindopt/linux64-x86/include -I.
LIBPATHS=-L$${HOME}/mindopt/linux64-x86/lib
MINDOPTLIB=-lmindopt -lmindoptcpp

CCOPT=-std=c++11
LDOPT=-Wl,-rpath-link,$${HOME}/mindopt/linux64-x86/lib -Wl,-rpath,'$$ORIGIN/../../linux64-x86/lib' -pthread -lm -lstdc++ -Wl,--no-as-needed -ldl
CC=g++ -m64
LD=g++ -m64

MINDOPTCPP=mindoptcpp
EXAMPLES=example_5

mindoptcpp:
>make install -C $${HOME}/mindopt/linux64-x86/src

example_5: example_5.cpp
>$(CC) -c $(INCPATHS) $(CCOPT) -o example_5.o example_5.cpp
>$(LD) $(LIBPATHS) example_5.o  $(MINDOPTLIB) $(LDOPT) -o example_5


ALL-TESTS := @echo "Started testing MindOpt package."

#########################################################################
# Clean up and testing                                                  #
#########################################################################
.PHONY: all clean cleanall test

all: $(EXAMPLES)

clean:
>rm -f *.o $(EXAMPLES)

mindoptcpp: $(MINDOPTCPP)

test: $(EXAMPLES)
>@echo "###################################"
>@echo "# Started testing MindOpt package.#"
>@echo "###################################"
>@echo ""
>@echo "###################################"
>@echo "# TEST: MdoMps                    #"
>@echo "###################################"
>./example_5

根据提示来修改和保存文件,如ctrl+o保存,然后看到对应名字OK后摁enter即可,如ctrl+x退出,如有修改保存按Y,再看到对应名字OK后摁enter即可退出。

也可采用CloudShell右上角的“笔”的编辑图标来选择对应的文档进行修改。

step 2: 创建一个cpp文件

此makefile编译的文件名默认为 example_5.cpp。请先将C++源码保存为example_5.cpp,再将makefile保存在同一目录下(该目录中应有example_5.cpp和makefile两个文件)。

可以通过以下指令新建文件,然后复制对应的代码保存。(除示例nano指令外,编辑文件可使用vi、vim、nano、emacs等不同编辑器指令,或touch创建文件,也可采用CloudShell右上角的“笔”的编辑图标来选择对应的文档进行新增和修改。)

nano example_5.cpp

复制下一页中的C++代码粘贴。然后根据提示来修改和保存文件,如ctrl+o保存,然后看到对应名字OK后摁enter即可,如ctrl+x退出,如有修改保存按Y,再看到对应名字OK后摁enter即可退出。

也可采用CloudShell右上角的“笔”的编辑图标来选择对应的文档进行修改。

此时得到的文件如下示意:​

 ls

Makefile_5  example_5.cpp tmp

step 3:编译和运行

make mindoptcpp的编译代码方法(编译一次即可,以后可跳过):

make -f Makefile_5 mindoptcpp

make 编译示例代码和运行的方法:

make -f Makefile_5 test

注:问题提交远端计算Server后,会需要求解时间,请耐心等待。由于计算资源限制,同一Token同时间只能求解一个问题,请勿同时段提交太多次。如果代码运行失败,可能是mindopt环境变量链接失败,可重新走一遍安装步骤(安装指导)。

代码演示-C++ 示范

Tips:  如果想体验运行MindOpt求解器代码,请先在天池-决策优化MindOpt-免费使用上申请授权Token,并替换掉示例代码里的Token。

make编译和运行此文件的方法见前页。

// mindopt_example5.cpp

#include <stdio.h>
#if defined(__APPLE__) || defined(__linux__)
#   include <unistd.h>
#   define SLEEP_10_SEC sleep(10)
#else
#   include <windows.h>
#   define SLEEP_10_SEC Sleep(10000)
#endif

#include <functional>


#include <iostream>
#include <iomanip>
#include <vector>
#include <random>
#include <numeric>
#include <math.h>
#include "MindoptCpp.h"

using namespace mindopt;
using std::cout;
using std::endl;
using std::setw;
using std::setprecision;
using std::vector;

// Main function
int main()
{

    /* Remote computing related information. */
    const char * filepath = "./tmp/";
    const char * token = "xxxxxxxxxtokenxxxxxxxx";
    const char * server = "mindopt.tianchi.aliyun.com";
    const char * desc = "tianchi example5 RobustRegression cpp";

    /* Variables. */
    MdoStatus    model_status = MDO_UNKNOWN;    
    std::string  model_status_details;    
    MdoResult    result = MDO_OKAY;    
    std::string  result_details;    
    MdoBool      has_soln = MDO_NO;        
    std::string  job_id;
    std::string  status;    

    //   generate data
    /* initialize RNG seed by time and standard Normal distribution */
    std::default_random_engine generator{static_cast<unsigned int>(time(0))};
    std::normal_distribution<double> randn(0.0, 1.0);

    /* randomly generate vector a */    
    int d = 4;  // dimension of a
    int j;      // iterator
    vector<double> true_a(d, 0.0);
    for (j = 0; j < d; j++)
        true_a[j] = randn(generator)*sqrt(d);

    /* randomly scalar b */    
    double true_b = randn(generator);

    /* randomly generate observations */
    int m = 15; // number of observations
    int i;      // iterator
    vector<vector<double>> X(m, vector<double>(d, 0.0));
    vector<double> y(m, 0.0);    
    for(i = 0; i < m; i++) {
        vector<double> row_x(d, 0.0);
         for(j = 0; j < d; j++)
            row_x[j] = randn(generator);
        X[i] = row_x;

        /* the last 10% of observations are outliers */
        if (i+1 <= floor(0.9*double(m)))
            y[i] = std::inner_product(true_a.begin(), true_a.end(), row_x.begin(), true_b);
        else
            y[i] = 100*randn(generator);
    }

    /* MindOpt Step 1. Initialize an optimization model */
    MdoModel model;
    try {
        /* MindOpt Step 2. Set model inputs */

        /* Set "minimize" */
        model.setIntAttr("MinSense", MDO_YES);

        /* Add variables a, b, and c*/
        vector<MdoVar> var_a;
        var_a.reserve(d);
        for (j = 0; j < d; j++)  
            /* a[j] is free, has 0 objective coefficient */
            var_a.push_back(model.addVar(-MDO_INFINITY, MDO_INFINITY, 0.0));

        /* b is free, has 0 objective coefficient */
        MdoVar var_b = model.addVar(-MDO_INFINITY, MDO_INFINITY, 0.0);

        vector<MdoVar> var_c;
        var_c.reserve(m);
        for (i = 0; i < m; i++)
            /* c[i] is free, has an objective coefficient = 1.0 */
            var_c.push_back(model.addVar(-MDO_INFINITY, MDO_INFINITY, 1.0));

        /* Add constraints */
        for (i = 0; i < m; i++) {
            /* build inner-product a'x[i,] */
            MdoExprLinear expr;
            for (j = 0; j < d; j++)
                expr += X[i][j]*var_a[j];            
            /* add constriant: y[i] <= a'x[i,] + b + c[i] */
            model.addCons(y[i],  MDO_INFINITY, expr + var_b + var_c[i]);
            /* add constraint: a'x[i,] + b - c[i] <= y[i] */
            model.addCons(-MDO_INFINITY, y[i], expr + var_b - var_c[i]);
        }

// -------------------st



        /*------------------------------------------------------------------*/
        /* Step 3. Input parameters related to the remote computing server. */
        /*------------------------------------------------------------------*/
        /* Input parameters related to remote computing. */
        model.setStrParam("Remote/Token", token);
        model.setStrParam("Remote/Desc", desc);
        model.setStrParam("Remote/Server", server); 
        model.setStrParam("Remote/File/Path", filepath);

        /*------------------------------------------------------------------*/
        /* Step 4. Upload the serialized model and parameters to server, and*/
        /*         then optimize the model.                                 */
        /*------------------------------------------------------------------*/
        job_id = model.submitTask();
        if (job_id.empty())
        {
            std::cerr << "ERROR: Invalid job ID: " << job_id << std::endl;
            return static_cast<int>(MDO_ERROR);
        }
        else
        {
            std::cout << "Job was submitted to server successfully." << std::endl;
            std::cout << "User may query the optimization result with the following job ID: " << job_id << std::endl;
        }

        /*------------------------------------------------------------------*/
        /* Step 5. Check the solution status periodically, and              */
        /*         download the its upon availability.                      */
        /*------------------------------------------------------------------*/
        do 
        {        
            status = model.retrieveTask(job_id, model_status, result, has_soln);

            /* Sleep for 10 seconds. */
            SLEEP_10_SEC;
        }
        while (status[0] == 'S'); /* Continue the loop if the status is in either Submitted status or Solving status. */

        /*------------------------------------------------------------------*/
        /* Step 6. Retrieve the solution and then populate the result.      */
        /*------------------------------------------------------------------*/  
        std::cout << std::endl << "Populating optimization results." << std::endl;   
        model_status_details = model.explainStatus(model_status);
        result_details = model.explainResult(result);

        std::cout << " - Job status             : " << status << std::endl;
        std::cout << " - Model status           : " << model_status_details << " (" << model_status << ")" << std::endl;
        std::cout << " - Optimization status    : " << result_details << " (" << result << ")" << std::endl;
        std::cout << " - Solution availability  : " << std::string(has_soln ? "available" : "not available") << std::endl;


        /* Initialize vector to store solutions */
        vector<double> sol_a(d, 0.0);
        double sol_b;
        vector<double> sol_c(m, 0.0);

        if (has_soln)
        {
            std::cout << "Populating solution." << std::endl;    
            model.displayResults();

            switch (model.getStatus())
            {
            case MDO_UNKNOWN:
                std::cout << "Optimizer terminated with an UNKNOWN status." << std::endl;
                break;
            case MDO_OPTIMAL:
                cout << "Optimizer terminated with an OPTIMAL status." << endl << endl;
                /* Display solutions */
                cout << std::setiosflags(std::ios::right);        
                cout << setw(5) << "Entry"  << "  " \
                    << setw(9) << "Ture a" << "  " \
                    << setw(9) << "Soln a" << endl;
                for (j = 0; j < d; j++) {
                    /* retrieve the value of a[j] */
                    sol_a[j] = var_a[j].getRealAttr("PrimalSoln");
                    cout << setw(5) << j << "  " \
                        << setw(9) << setprecision(5) << true_a[j] << "  " \
                        << setw(9) << setprecision(5) << sol_a[j] << endl;
                }
                cout << endl << setw(9) << "Ture b" << "  " << setw(9) << "Sol b" << endl;
                /* retrieve the value of b */
                sol_b = var_b.getRealAttr("PrimalSoln");
                cout << setw(9) << setprecision(5) << true_b << "  " \
                    << setw(9) << setprecision(5) << sol_b << endl << endl;                              
                cout << "Observation" << "  " << setw(9) << "Residual" << endl;
                for (i = 0; i < m; i++) {
                    /* retrieve the value of c[i] */
                    sol_c[i] = var_c[i].getRealAttr("PrimalSoln");
                    cout << setw(11) << i << "  " \
                        << setw(9) << setprecision(5) << sol_c[i] << endl;
                }
                break;
            case MDO_INFEASIBLE:
                std::cout << "Optimizer terminated with an INFEASIBLE status." << std::endl;
                break;
            case MDO_UNBOUNDED:
                std::cout << "Optimizer terminated with an UNBOUNDED status." << std::endl;
                break;
            case MDO_INF_OR_UBD:
                std::cout << "Optimizer terminated with an INFEASIBLE or UNBOUNDED status." << std::endl;
                break;
            }
        }

    }
    catch (MdoException & e) {
        std::cerr << "===================================" << endl;
        std::cerr << "Error   : code <" << e.getResult() << ">" << endl;
        std::cerr << "Reason  : " << model.explainResult(e.getResult()) << endl;
        std::cerr << "===================================" << endl;
        return static_cast<int>(e.getResult());
    }
    return 0;
}

运行结果

以下为C++语言范例的运行结果.此范例中,我们采用了单纯型法(simplex method)求解问题。

MindOpt Version 0.12.0 (Built: 20201231)
Copyright (c) 2020 Alibaba Cloud.

Returned job ID: 92.
Job was submitted to server successfully.
User may query the optimization result with the following job ID: 92
Reader started. File  : ./tmp/m92.mdo
Reader terminated. Time : 0.000s
Reader started. File  : ./tmp/p92.mdo
Reader terminated. Time : 0.000s
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100     7  100     7    0     0    875      0 --:--:-- --:--:-- --:--:--   875
Reader started. File  : ./tmp/m92.mdo
Reader terminated. Time : 0.000s
Reader started. File  : ./tmp/p92.mdo
Reader terminated. Time : 0.000s
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100     7  100     7    0     0   1166      0 --:--:-- --:--:-- --:--:--  1166
100   169  100   169    0     0  33800      0 --:--:-- --:--:-- --:--:-- 33800
100  1191  100  1191    0     0   193k      0 --:--:-- --:--:-- --:--:--  193k
Reader started. File  : ./tmp/s92.mdo
Reader terminated. Time : 0.000s

Populating optimization results.
 - Job status             : Finished
 - Model status           : OPTIMAL (1)
 - Optimization status    : Nothing wrong. (0)
 - Solution availability  : available
Populating solution.
Optimizer summary.
 - Optimizer used     : Simplex method
 - Optimizer status   : OPTIMAL
 - Total time         : 0.010s

Solution summary.       Primal solution               Dual solution
 - Objective          : 1.7240525156e+02              1.7240525156e+02
 - Norm               : 1.54e+02                      1.00e+00
 - Var. viol.         : 1.11e-15                      0.00e+00
 - Cons. viol.        : 8.88e-16                      8.88e-16
Optimizer terminated with an OPTIMAL status.

Entry     Ture a     Soln a
    0   -0.55396   -0.55396
    1   -0.79948   -0.79948
    2     1.2829     1.2829
    3   -0.72038   -0.72038

   Ture b      Sol b
 0.066423   0.066423

Observation   Residual
          0          0
          1          0
          2          0
          3          0
          4          0
          5          0
          6          0
          7          0
          8          0
          9          0
         10          0
         11          0
         12          0
         13     154.28
         14     18.123

同时,通过 model.write_soln("./model_soln.sol") 语句(以Python为例),文件夹中将生成 model_soln.sol文件,里面标了变量的求解值、和目标值。 image.png

注:由于在程序中的 a 和 b 是随机产生的,因此每一次运行之后得到的计算结果并不一样。