Lists: | pljava-dev |
---|
From: | Erik Mata <erikmata(at)gmail(dot)com> |
---|---|
To: | pljava-dev(at)lists(dot)postgresql(dot)org |
Subject: | Fail to create distributed table in Citus |
Date: | 2020-11-19 15:23:26 |
Message-ID: | CAB0C2fn42hT7RQGK+MwTG-rnxj6eKEhnBr3vyjvZsVOu0gv7qQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pljava-dev |
I have set up a local docker environment with a total of 4 Postgres server
instances, all including Citus (9.5.0) and PL/JAVA (1.6.1) extensions:
- 1 as Citus master
- 3 as Citus workers
I have set up a simple routine in PL/JAVA which creates a table and
requests that the table be distributed in the cluster, as:
CREATE TABLE test1 (a BIGINT PRIMARY KEY);
SELECT create_distributed_table('test1', 'a', colocate_with=>'none');
I connect to the Citus master server via a JDBC-connection to localhost.
I then call the PL/JAVA routine, as:
SELECT test_create_distributed_table();
This results in a local table being created on the Citus master node, but
no tables are created on the workers and the Citus utility functions tell
me that the table is NOT distributed.
If I use the same statements that I have included in the PL/JAVA routine
(see above), but I execute them from within e.g. DataGrip, connected to the
Citus master node, the table is created and distributed correctly. I can
see that the correct tables are created on the workers and the Citus
utility functions tell me that the table is indeed distributed.
So, it seems that there is nothing wrong with my setup (cluster).
The problem, as far as I can tell, should be somewhere in how PL/JAVA
connects to the Citus master node. The PL/JAVA routine runs in a JVM that
is spawned by the Postgres server process on the Citus master node. The
PL/JAVA routine acquires a connection to the local database via a somewhat
fake JDBC-connection (jdbc:default:connection) as explained in the PL/JAVA
documentation.
I suspect that this fake JDBC-connection is the reason that the Citus
master node is unable to distribute the table to the rest of the cluster
nodes.
As a side-note: From within PL/JAVA routines, I am able to execute other
types of statements that interact with distributed tables created outside
of the PL/JAVA routine. I am able to DROP, TRUNCATE, INSERT, UPDATE, DELETE
etc. Just not SELECT create_distributed_table.
Any help is appreciated!
From: | Chapman Flack <chap(at)anastigmatix(dot)net> |
---|---|
To: | Erik Mata <erikmata(at)gmail(dot)com>, pljava-dev(at)lists(dot)postgresql(dot)org |
Subject: | Re: Fail to create distributed table in Citus |
Date: | 2020-11-19 17:28:21 |
Message-ID: | 5FB6AB35.5060101@anastigmatix.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pljava-dev |
On 11/19/20 10:23, Erik Mata wrote:
> I then call the PL/JAVA routine, as:
>
> SELECT test_create_distributed_table();
>
> This results in a local table being created on the Citus master node, but
> no tables are created on the workers and the Citus utility functions tell
> me that the table is NOT distributed.
>
> If I use the same statements that I have included in the PL/JAVA routine
> (see above), but I execute them from within e.g. DataGrip, connected to the
> Citus master node, the table is created and distributed correctly.
That's kind of fascinating. Have you also raised this in a Citus-specific
list? They might hold some pieces of the puzzle. :)
> The problem, as far as I can tell, should be somewhere in how PL/JAVA
> connects to the Citus master node. The PL/JAVA routine runs in a JVM that
> is spawned by the Postgres server process on the Citus master node. The
> PL/JAVA routine acquires a connection to the local database via a somewhat
> fake JDBC-connection (jdbc:default:connection) as explained in the PL/JAVA
> documentation.
>
> I suspect that this fake JDBC-connection is the reason ...
The jdbc:default:connection in PL/Java really isn't a new connection of
any kind; the JVM is running inside the PostgreSQL backend process that is
handling your existing connection, and the jdbc:default:connection is
no more than a JDBC-like wrapper over SPI. It is largely the same as if
you wrote your test_create_distributed_table() function in C, and it
contained
Datum test_create_distributed_table(PG_FUNCTION_ARGS) {
...
SPI_execute("CREATE TABLE test1 (a BIGINT PRIMARY KEY)", false, 0);
SPI_execute("SELECT create_distributed_table("
"'test1', 'a', colocate_with=>'none')", false, 0);
...
}
(It might even be interesting to write a C version like that, and see
what it does.)
There seems to be something about create_distributed_table() that doesn't
work when it is invoked in that context, and even more oddly, seems to be
failing silently, rather than reporting an error of some kind.
Are you able to share your Java code? I'm sure it is straightforward,
but maybe it would help to pin the details down.
I think the Citus create_distributed_table source is here
but in a quick skim I haven't had anything jump out at me as where it
might be failing. Yet.
Regards,
-Chap
From: | Erik Mata <erikmata(at)gmail(dot)com> |
---|---|
To: | Chapman Flack <chap(at)anastigmatix(dot)net> |
Cc: | pljava-dev(at)lists(dot)postgresql(dot)org |
Subject: | Re: Fail to create distributed table in Citus |
Date: | 2020-11-19 18:20:09 |
Message-ID: | CAB0C2fnf6yw6vgtrCkmD6Y8tTV-o3Vh1YQAvjY7DtqTQGTMqbw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pljava-dev |
Hello Chapman!
Thanks for your prompt reply. I posted the same question in the official
citus forum, and am waiting for a response.
I'm pasting the code of my PL/JAVA routine here:
@Function(
effects = VOLATILE,
onNullInput = CALLED,
name = "test_create_distributed_table"
)
public static void testCreateDistributedTable() throws SQLException {
Connection connection =
DriverManager.getConnection("jdbc:default:connection");
Statement stmt = connection.createStatement();
stmt.executeQuery("CREATE TABLE test1 (a BIGINT PRIMARY KEY)");
stmt.executeQuery("SELECT create_distributed_table('test1', 'a',
colocate_with => 'none')");
}
As you mentioned, it is very simple and a slimmed down version of my
original production code. I didn't notice the problem immediately, since
(as you mentioned) there is no error but silently seems to fail.
I was wondering if it could be something with the single quotes surrounding
the parameters, but I've tried to use double single quotes and also to
escape the single quotes, but then it really fails with an error. Looking
through PL/JAVA implementation of SPIDatabaseMetaData I saw that it adds a
single quote to every single quote, unless it is escaped. Don't know if
that affects somehow. But in any case, it should lead to an error in Citus
if the parameters are incorrect.
Well, I'll continue looking for the cause of the problem. I'm really
grateful for any additional pointers that you could give me.
Regards,
Erik
Den tors 19 nov. 2020 kl 18:28 skrev Chapman Flack <chap(at)anastigmatix(dot)net>:
> On 11/19/20 10:23, Erik Mata wrote:
> > I then call the PL/JAVA routine, as:
> >
> > SELECT test_create_distributed_table();
> >
> > This results in a local table being created on the Citus master node, but
> > no tables are created on the workers and the Citus utility functions tell
> > me that the table is NOT distributed.
> >
> > If I use the same statements that I have included in the PL/JAVA routine
> > (see above), but I execute them from within e.g. DataGrip, connected to
> the
> > Citus master node, the table is created and distributed correctly.
>
> That's kind of fascinating. Have you also raised this in a Citus-specific
> list? They might hold some pieces of the puzzle. :)
>
> > The problem, as far as I can tell, should be somewhere in how PL/JAVA
> > connects to the Citus master node. The PL/JAVA routine runs in a JVM that
> > is spawned by the Postgres server process on the Citus master node. The
> > PL/JAVA routine acquires a connection to the local database via a
> somewhat
> > fake JDBC-connection (jdbc:default:connection) as explained in the
> PL/JAVA
> > documentation.
> >
> > I suspect that this fake JDBC-connection is the reason ...
>
> The jdbc:default:connection in PL/Java really isn't a new connection of
> any kind; the JVM is running inside the PostgreSQL backend process that is
> handling your existing connection, and the jdbc:default:connection is
> no more than a JDBC-like wrapper over SPI. It is largely the same as if
> you wrote your test_create_distributed_table() function in C, and it
> contained
>
> Datum test_create_distributed_table(PG_FUNCTION_ARGS) {
> ...
> SPI_execute("CREATE TABLE test1 (a BIGINT PRIMARY KEY)", false, 0);
> SPI_execute("SELECT create_distributed_table("
> "'test1', 'a', colocate_with=>'none')", false, 0);
> ...
> }
>
> (It might even be interesting to write a C version like that, and see
> what it does.)
>
> There seems to be something about create_distributed_table() that doesn't
> work when it is invoked in that context, and even more oddly, seems to be
> failing silently, rather than reporting an error of some kind.
>
> Are you able to share your Java code? I'm sure it is straightforward,
> but maybe it would help to pin the details down.
>
> I think the Citus create_distributed_table source is here
>
>
> https://github.com/citusdata/citus/blob/v9.5.0/src/backend/distributed/commands/create_distributed_table.c
>
> but in a quick skim I haven't had anything jump out at me as where it
> might be failing. Yet.
>
> Regards,
> -Chap
>
From: | Chapman Flack <chap(at)anastigmatix(dot)net> |
---|---|
To: | pljava-dev(at)lists(dot)postgresql(dot)org |
Subject: | Re: Fail to create distributed table in Citus |
Date: | 2020-11-19 18:54:04 |
Message-ID: | 5FB6BF4C.3060209@anastigmatix.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | PostgreSQL : PostgreSQL 메일 링리스트 : 2020-11-19 18:54 이후 토토java-Dev. |
On 11/19/20 13:20, Erik Mata wrote:
> ...
> stmt.executeQuery("SELECT create_distributed_table('test1', 'a',
> colocate_with => 'none')");
> }
> ...
> I was wondering if it could be something with the single quotes surrounding
> the parameters, but I've tried to use double single quotes and also to
> escape the single quotes, but then it really fails with an error. Looking
No, that's fine the way you've written it, as long as those values are
constant strings as you've shown.
The times you want to be careful are when the parameters are being passed
to you and you don't know what characters they might contain. In those cases
you would typically use
ps = connection.prepareStatement(
"SELECT create_distributed_table(?, ?, colocate_with => ?)");
ps.setString(1, tableName);
ps.setString(2, columnName);
ps.setString(3, colocated);
ps.execute();
so the driver takes care of how to pass the parameters correctly.
With data-definition language like CREATE TABLE, it's a little different,
because it can't be prepared and parameterized like a DML statement.
Those are the cases where, if you were being passed the table name or
column name, you would have to be careful about constructing the
CREATE TABLE and using the right quoting rules, getting familiar with
Statement.enquoteIdentifier() or, even better,
org.postgresql.pljava.sqlgen.Lexicals.Identifier.deparse(). And there are
more tricks to getting that right than I should go into here, because
they're probably off-topic for this particular issue.
Regards,
-Chap